xsl-list
[Top] [All Lists]

Re: [xsl] recognize character entities

2006-08-30 02:38:06
Florent Georges wrote:
    <xsl:variable name="entity.values"
                  select="('&#65533;...', '&#65533;...', ...)"/>
Perhaps it is easier, if I may suggest so, to use regular expressions. I think they would require a lot less work to create, because often the character entities used for MathML are inside ranges. Looking around at the entity tables on http://www.w3.org/TR/2003/REC-MathML2-20031021/chapter6.html#chars.entity.tables, I found that most sets are more a less complete parts from the Unicode 4.0 specification.

For instance, almost all characters in the range 0x02200 - 0x022FF are included (Mathematical Operators subset in Unicode). The regular expression for this is: [\x2200-\x22FF]. I'm not sure if processor dig this too: Mathematical symbols ought to be matched with the simple expression: \P{Sm}.

Similar constructs are available for Greek and Cyrillic: \P{IsGreek} and \P{IsCyrillic}.

Some ranges may be too wide, but perhaps there is little change in your code that symbols not used by MathML, but available in Unicode, will be used. Some characters are specified by MathML with a combining diacritical mark. I think you will have to list them separately in your expression. Same is true for the "normal" Latin-1 characters that are part of MathML, like &amp;, &aacute;, &Acirc; etc.

Using this approach you do not have to wonder if a characther entity is written using its numeric equivalent, the hexadecimal notation or the named notation.

Of course, it will take a few hours to construct your regex, but I think it will be much easier to maintain than a list of all entity values. And, forgot to say, you can only use it with XSLT 2.0 capable processors.

Hope this helps,

Cheers,
Abel Braaksma
http://abelleba.metacarpus.com




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--