ietf-822
[Top] [All Lists]

Re: rather than argue and bicker about who said what...

2003-01-17 19:03:35

Keith Moore writes:
existing expression matchers seem unlikely to do useful things with utf-8

You couldn't possibly be more wrong.

A byte-by-byte regexp matcher that doesn't know anything about UTF-8,
such as an ancient version of the UNIX grep program, nevertheless does
a perfect job of matching a UTF-8 regexp against a UTF-8 string.

The relevant features of UTF-8 are that (1) it's compatible with ASCII,
so characters such as * are the same in ASCII and UTF-8; and (2) it's
self-synchronizing, so a UTF-8 character cannot match a UTF-8 string
except at a character boundary.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago