Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:
I don't like the <UNNNN+UMMMM> part it will make the parsing messier.
The \xYY\xYY is of course what I meant ;-)
Not that much. It's just a regex after all.
For _perl_ it is but if we are going to get IBM's ICU or others
to back-port it then it is better to keep things clean.
So let us have yacc-like:
from : codepoint
| from codepoint
;
codepoint : '<' 'U' hexdigits '>'
;
to : octet
| to octet
;
octet : '\\' 'x' hexdigits
;
Let's TIMTOWTDI it.
<U...><U...> has already been working. <U...+U...> soon to come.
Dan the Encode Maintainer
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/