xsl-list
[Top] [All Lists]

[xsl] troff to unicode conversion

2006-09-11 07:10:02
Dear Dimitre (cc xsl-list),

> If you can send me the actual troff file and
> the definition of the mappings I will be
> interested to look for a better solution.

Thank you for your willingness to look at this. Because the troff file is quite large (6.7MB), instead of sending it by mail I have uploaded it to:

http://clover.slavic.pitt.edu/~djb/troff-to-unicode.zip

troff-to-unicode.zip contains:

temp3.xml: xml file with troff character coding. Note that by this stage I have already converted the troff structural and procedural markup to xml; the only part of the conversion still to be done involves the character coding of the textual data.

pvl_mappings.xml: xml file with troff/unicode mapping pairs

pvl_regex_fix.xsl: xsl stylesheet that inserts extra backslashes into the mapping file so that replace() will work in subsequent stylesheet. I built the mapping file in two stages this way because that makes it easier for me to read.

pvl_mappingGenerator.xsl: operates on the output of pvl_regex_fix.xsl to produce a new stylesheet (which I call pvl_unicode.xsl), which can be used to convert the character coding in temp3.xml from troff to unicode. I don't include pvl_unicode.xsl in the zip file because it can be generated from the included files (see below).

To process:

saxon8 -o pvl_mappings1.xml pvl_mappings.xml pvl_regex_fix.xsl
saxon8 -o pvl_unicode.xsl pvl_mappings1.xml pvl_mappingGenerator.xsl
saxon8 -o  temp4.xml temp3.xml pvl_unicode.xsl

Step 1 adds extra backslashes to the mapping file so that regex will work correctly. Step 2 reads the output of Step 1 and builds the stylesheet (which I call pvl_unicode.xsl) that will do the actual character conversion. Step 3 applies that stylesheet to temp3.xsml, which is the troff-encoded input. temp4.xml is final output. It has the same structure as temp3.xml, but the troff character coding in temp3.xml is replaced with unicode in temp4.xml

The problem is the inefficiency of the actual character conversion (the application of pvl_unicode.xsl to temp3.xml to produce temp4.xml).

Thank you for any advice or suggestions.

> It seems to me that the str-map template of FXSL 1.x
> should be more efficient, as it only performs a single
> pass on the string and will do all the replacements.

I haven't had occasion to use FXSL in any projects yet (although I was very interested in and impressed by the demonstration at Extreme), so if that proves to be an effective solution, I'll look forward to learning more about it.

Best,

David
djbpitt+xml(_at_)pitt(_dot_)edu

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>