xsl-list
[Top] [All Lists]

Re: [xsl] hyphenator in xsl implementing LIANG's algorithm

2009-08-27 03:23:14
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Michael,

The patterns are regex indicating where a substring can be cut.
Only odd numbers (1 and 3) are valid hyphens.
If a regex matches a region of the word, this region is replaced with
the pattern.
At the end, for each position, only the max number is kept, and if this
number is odd then it gives an hyphen.

For example, with the word "chance" the patterns which match are:
1c2h : 1c2h0a0n0c0e0
1ce :   c0h0a0n1c0e0
1ha :   c1h0a0n0c0e0
c4ha :  c4h0a0n0c0e0 (the true pattern is c2ha)

Now let's keep the max for each position:
1c4h0a0n1c0e0

Then, replace odds by hyphens and suppress the first odd:
 c h a n-c e
chan-ce

There is still problems in my function with some starting patterns which
 didn't match (for example: ^arg3ent), but it 'll be ok in v2 in a few days.

For more explanations, you may read this article in English (only the
first part concerns Liang algorythm):
http://www.tug.org/TUGboat/Articles/tb27-1/tb86nemeth.pdf
Unfortunately, it's a pdf but if someone want a html/tex version just ask.

Best regards,
Bruno


Michael Ludwig wrote:
Bruno Mascret schrieb am 26.08.2009 um 22:27:30 (+0200):

Files can be found here:
https://svn.liris.cnrs.fr/nat/trunk/xsl/hyphenation.xsl
http://liris.cnrs.fr/~bmascret/nat/xsl/hyphens.xsl (French rules) (2)

Hi Bruno,

is there an explanation for how the patterns work?

'a1b2r,'a1g2n,'a1mi,'a1na,'a1po, ...

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqWNHYACgkQaOubDsBUvbuKwgCg73FjAzfpI4fdHawltCQeVxph
jAIAn0eDEMMqyRBtT3+LWVA5jDQ75Kh4
=A8di
-----END PGP SIGNATURE-----

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--