Re: grouping + global variable (?) (was re: regexs, grouping (?) and XS

On Thu, 12 Aug 2004, Wendell Piez wrote:

1. Poor man's version: run two stylesheets in succession[...].
2. Extended XSLT 1.0[...].
3. XSLT 2.0 -- same as 2, except no extension function is needed:
it'll happen transparently.
4. External pipeline support: many environments, and some processors,
allow you to chain stylesheets directly, thereby eliminating the need
to write out the intermediate result as a file (as option 1 does).
This is the in-between solution. For example, Saxon has a
saxon:next-in-chain attribute you can use on xsl:output.


I'm trying to figure out a better way of slicing the problem I've been
working on. Essentially, the workflow is thus:

1) Author creates a text file. Said file has minimal markup such as
_underline_ and *boldface*, with a blank line between paragraphs. There's
some other specialized stuff, like pulling the author information and
byline.

While that's all fixed now, I'm working on something that'll allow some
substitution. For example, some markets want Times instead of Courier, so
I'm going to make that replaceable in the XSL stylesheet.

2) Currently, a perl script transforms said text file into XML. This is
the part I'd most prefer to change. :)

3) For a novel, there's also an additional step where each chapter's XML
file is included into a wrapper XML file that is transformed into one big
novel PDF.

4) Currently, an XSL stylesheet + XSL-FO transforms said XML file into
PDF. (for the curious: http://www.extraneous.org/wiki/ProseML)

That all works like a charm.

But I've been thinking, based on the comments from the list, that a better
process might be eliminating the perl script entirely. I'm not sure I'd
want to eliminate the intermediate XML file, though.  There have been
times when I've needed to tweak it. For example, I have old files with
smart quotes not saved in UTF-8, and the perl script barfs on UTF-8 files,
so I do the XML conversion, open the file and re-save the XML as UTF-8.

Option 3 seems to be ruled out based on my current toolchain
(apache-FOP), which probably eliminates #2 as well. (I could easily be
wrong on this)

Options 1 and 4 seem most like what the current process is. Currently, a
new XML file is generated only if the timestamp is less than the timestamp
of the text file it's transformed from.

So, my question (you knew there was one): can someone give me a
description of how to accomplish #4, given the workflow I've got, using
something like Saxon? I see that it's an XSLT processor, but I'm don't get
the map of how all the pieces fit together. Right now, I know (after
having looked) that I'm using xalan for the simple reason that it came
with my apache-fop install.

I'd also eventually like to get a decent RTF output. Standard manuscript
prose is not terribly complex, so something that supported basic features
should suffice for that. Unfortunately, the commercial options are too
expensive for the intended audience. Is jfop likely to be my best
available option?

-- 
_Deirdre  web: http://deirdre.net        blog: http://deirdre.org/blog/
yarn: http://fuzzyorange.com    cat's blog: http://fuzzyorange.com/vsd/
"Memes are a hoax! Pass it on!"

Re: grouping + global variable (?) (was re: regexs, grouping (?) and XSLT2?)