-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 04/23/2011 10:27 PM, Birnbaum, David J wrote:
My question, then, after this long-winded exposition, is: How should
I have conceptualized this task? I broke it down into three types of
replacements and adopted a different strategy for each, and I started
with the easiest (the one-to-one replacements). I then realized that
the problem was more general (there are other possible types of
mappings), and also that there were multiple ways to deal with some
of the types of mapping. Finally, the problem begins with a text()
node, but once a replacement inserts some markup, it's no longer just
a text() node, so a recursive strategy that requires with a pristine
text() node as input may become inapplicable as the replacements
accrue.
On the one hand, this is a one-off transformation for a particular
project, and once it's done I'll never have to run it again, so
efficiency of execution isn't a high priority. On the other hand,
these kinds of gibberish-to-unicode remappings are very common in my
world (legacy documents in unusual writing systems), and I really
should think about the general problem type, instead of cobbling
together a new ad hoc solution every time a new project crosses my
desk. I'd be grateful for any advice.
The main thing that comes to mind is: Did this need to be done in XSLT?
While it’s certainly possible, this very much smells like a job for
Perl (or Python, if you prefer) to me. That makes the many-to-many case
easier, as well.
If you were to run into a particular (ab)use of encoding repeatedly, you
could even implement it as an encoding module in Perl, and then just
read the input as being in that encoding and re-write it in UTF-8.
That all said, I think your approach was sound, insofar as XSLT was the
tool to use.
~Chris
- --
Chris Maden, text nerd <URL: http://crism.maden.org/ >
“Those in power write the history, while those who suffer
write the songs.” — Frank Harte
GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk2zjE0ACgkQGfhmdwB3wxl0GQCgvShXQhgMoyfMKXVpO0UgCYRw
O5wAoK56qcVpL6Lo8ZcJLXswxm5kuE+K
=2u3v
-----END PGP SIGNATURE-----
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--