Re: [xsl] What is Micro Pipelining: an attempt for a definition (was: Ca

I find this discussion very informative, and couldn't resisting to say
something.

There are essentially two aspects about the information being
processed by an instruction processor - throughput, and latency. The
dictionary defines these two terms as:
Throughput - the quantity or amount of raw material processed within a
given time, esp. the work done by an electronic computer in a given
period of time.
Latency - the period of inactivity between the time the stimulus is
presented and the moment a response occurs.

Let's say, an entire workload (W) is composed of 3 tasks (A, B & C).
Let's say, the instruction processor is presented to process 2-3 (or
in general multiple) such workloads (where each workload is composed
of some tasks - A, B & C in this case).

Pipelining is essentially providing the workloads to the processor in
an overlapped fashion (and not sequentially). What is the effect of
having a pipelined execution over a sequential execution? Pipelining
doesn't help the latency of a single task. But it helps throughput of
the entire workload.

There are other characteristics of pipelined processors
1) Pipeline rate is limited by slowest pipeline stage
2) Multiple tasks operating simultaneously using different resources
3) Potential Speedup = Number of pipe stages
4) Unbalanced lengths of pipe stages reduces speedup
5) Time to fill pipeline and time to drain it reduces speedup

The most well known example in computer hardware is, piplelined
instruction processors (which speedup instruction processing). Where
it takes less CPU cycles to process multiple instructions.

This is essentially the formula for time computation with using pipelines

Time (pipeline)  = Time (non-pipeline) / Pipe stages

(Assuming, that stages are perfectly balanced & there are ideal conditions)

But to realize the pipelinign benefits, there are certain costs:
1) Structural hazards
2) Control hazards
3) Data hazards

It's interesting to see [if we can ?] applying these concepts to XSLT.
I would like to present this example:

Let's say, we have 3 template rules as below.

<xsl:template match="rule1">
  <!-- processing -->
</xsl:template>

<xsl:template match="rule2">
  <!-- processing -->
</xsl:template>

<xsl:template match="rule3">
  <!-- processing -->
</xsl:template>

A given workload (by drawing analogy from the above descriptions) is
composed of 3 tasks (these 3 template rules). To execute the whole
workload, we call these 3 rules in sequence (like this):

<xsl:apply-templates select="rule1" />
<xsl:apply-templates select="rule2" />
<xsl:apply-templates select="rule3" />

Let's say, we need to execute 2-3 (or in general, multiple) instances
of the same workload (composed of the 3 tasks). Can we design a
processing mechanism in XSLT, which can take advantage of pipelined
processing.

Following is the depiction of non-pipelined vs. piplelined processing
(Wn is a workload, whereas, A, B, C etc. are the tasks in the
workload).

(W1)
A -> B -> C
              (W2)
              A -> B -> C
                            (W3)
                             A -> B-> C

(W1)
A -> B -> C
       (W2)
       A -> B -> C
              (W3)
              A -> B -> C

(It can be clearly seen, executing multiple workloads in pipelined
fashion takes less time)

I have a positive feeling, that this processing style *cannot* be
simulated with XSLT. Or, can it be? I think, we need to have a
multithreading or multiprocessing environment for this (for e.g., with
Java we can do this).

PS: Some of my ideas are borrowed from literature on internet.

On 9/1/07, Abel Braaksma <abel(_dot_)online(_at_)xs4all(_dot_)nl> wrote:

Hi Wendell,

Very valuable feedback. However, though I think I grasp your points,
after reading I still find a lot of "micro-pipelining" to be in a gray
area. Let me try to explain myself.

For me, micro-pipelining is defined as the process of pipelining a set
of data during a one pass through an XSLT processor. Normal pipelining
would be to re-apply the results of one pass to the next pass. I
understand that this doesn't tally with your definition below.

Another attempt for a definition (still my own, original, definition)
would be the following: in a case where you apply templates (or for-each
perhaps) to a set of data that is not part of the input data (input data
being any external source, including document() / collection() etc) is a
micro pipeline. Shorter: when you have a variable to contain data that
is already processed and you re-apply this variable, it is a micro pipeline.

Now I do understand that this is a bit of a too wide definition. But now
on to yours. You explain that for anything to be a micro pipeline it
must be applied (or expected to be applied) more than once. More
appropriately: when you can extract the pipeline variable into a global
variable, it is not a micro pipeline anymore (is it then a macro pipeline?).

But things are still not so trivial. It is tempting to say that the
following is only executed once, because there's only one root in a
document, and as a result, it is easy to extract the variable to the
global level:

  <xsl:template match="/">
      <xsl:variable name="micro">
           <xsl:apply-templates select="root/some/data" />
      </xsl:variable>
      <xsl:apply-templates select="$micro/*" />
  </xsl:template>

But often, simple examples, or inquiries on this list, are part of a
larger stylesheet or solution. Suppose I alter the above example to be
as follows:

  <xsl:param name="urls" select=" 'one.xml', 'two.xml', 'three.xml' " />

  <xsl:template name="main">
      <xsl:apply-templates select="document($list-of-documents)" />
  </xsl:template>

  <xsl:template match="/">
      <xsl:variable name="micro">
           <xsl:apply-templates select="root/some/data" />
      </xsl:variable>
      <xsl:apply-templates select="$micro/*" />
  </xsl:template>

is it now still not a micro pipeline? It is applied several times (three
times) and it is not possible anymore to make the variable global. Is it
really necessary to restrict a micro pipeline to be one only when it is
applied to a local level? Though I can follow your point in that it is
closer to a "large pipeline" than to a "micro pipeline".

Now let's try the other opposite. To define a term, you need to know
where its boundaries are. Suppose we have the following fictional
stylesheet, would the application of $micro be a micro pipeline?

 <xsl:variable name="micro">
     this must be tokenized
 <xsl:variable>

 <xsl:template name="main">
    <xsl:apply-templates select="my:tokenize($micro)" />
 </xsl:template>

 <xsl:template match="my:token">
     <xsl:copy-of select="." />
 </xsl:template>

 <xsl:function name="my:tokenize">
    <xsl:param name="tokens">
    <xsl:variable name="preproc-tokens">
        <xsl:for-each select="tokenize($tokens, ' ')">
            <my:token value="{.}" />
        </xsl:for-each>
    </xsl:variable>
    <xsl:apply-templates select="$preproc-tokens/*" mode="my:tokenize" />
 </xsl:function>

 <xsl:template match="my:token" mode="my:tokenize">
    <xsl:copy>
       <xsl:sequence select="replace('text(), [^A-z]', '')" />
    </xsl:copy>
 </xsl:template>


In this tokenize example we see two things. We see a global variable
holding data. This is processed using my:tokenize() and then the results
are re-applied. Even though we are talking about global data (holding a
document node with text), I would consider both phases a micro pipeline:
the apply-templates in the my:tokenize function starts a micro pipeline,
and the apply-templates in the main entry template starts a micro
pipeline. Or not?

I don't know the answers. I have seen the term micro pipeline come up
every now and then without a lot of explanation. I did a quick check on
the internet a couple of times, but a clear definition seems hard to
find. Even the XML Pipeline languages (still in Draft) at W3C do not
mention the distinction. Wikipedia has a small but effective line on the
subject though: "Some standards also categorize transformation as macro
(changes impacting an entire file) or micro (impacting only an element
or attribute)" (http://en.wikipedia.org/wiki/XML_pipeline).

But this simple-enough definition does not help XSLT: we have root
nodes, document nodes, files, non-xml data, result tree fragments,
sequences.... When is it micro and when is it macro? I do think your
definition comes quite close, but it has some rough edges. Would you
(and others) give it a try? (the definition, I mean)?

Cheers,

-- Abel Braaksma



Wendell Piez wrote:

Abel,

Thanks for the very nicely composed explanation for Simon. One of the
best things about this list is how explanations are provided that can
be followed by people who don't themselves have any need for a
particular solution, but who can still learn valuable lessons and
techniques from the sidelines. It's one of the best ways of learning.

I would, however, quarrel with one aspect of your description. You
have the code:

    <xsl:template match="/">
        <!-- mini pipeline: put points into a variable and process
them -->
        <xsl:variable name="points">
            <xsl:apply-templates select="root/set[1]/point[1]"
mode="aggregate">
                <xsl:with-param name="calc" tunnel="yes">
                    <for x="1" y2="2" y3="2"/>
                </xsl:with-param>
            </xsl:apply-templates>
        </xsl:variable>

        <!-- apply set with pre-processed points -->
        <xsl:apply-templates select="root/set">
            <xsl:with-param name="points" select="$points" />
        </xsl:apply-templates>
    </xsl:template>

... which you refer to as using a "micro-pipeline".

As I explained earlier this week, I don't believe this is a
micro-pipeline, since it operates globally. The declaration of $points
could appear outside the template and it would perform the same way. A
micro-pipeline would be if you bound a variable in a template you
expected to fire more than once, and then processed the results of
that variable. Admittedly there may be a grey zone between a pipeline
operating at the document (global) level, and one that operates at a
more local level; but I believe this falls fairly clearly into the
first category.

(Then too, as you also explain, you don't really run a pipeline here
at all, but use the results of pre-processing as a lookup table.)

The reason I stress this is because I'm afraid that if we start
calling anything a micro-pipeline just because it involves some
matching and applying of templates whose results won't appear in the
output, then we'll have to invent a new word for what we actually
invented the term for.

The more general technique, I'd say, is called "pipelining", or -- if
the results are themselves not processed directly (this includes
generating lookup tables) "pre-processing". Another interesting thing
to reflect on is that pipelining and pre-processing can be achieved in
XSLT 1.0 by passing the results of one stylesheet into another as its
source. This really isn't practical with micro-pipelining, which
happens only within the scope of a single branch of the tree at a time.

I apologize if this comes across as rude. Another valuable thing we do
on this list is guard one another's terminology. ("Don't call them
tags", etc.) This keeps the language strong because we have
unambiguous terms to refer to things, methods and techniques, which
can then be discussed and learned about.

Cheers,
Wendell



-- 
Regards,
Mukul Gandhi

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

Re: [xsl] What is Micro Pipelining: an attempt for a definition (was: Calculating cumulative values - Abel's solution)