ietf-822
[Top] [All Lists]

Re: rather than argue and bicker about who said what...

2003-01-17 22:56:44

"D. J. Bernstein" <djb(_at_)cr(_dot_)yp(_dot_)to> wrote:

A byte-by-byte regexp matcher that doesn't know anything about UTF-8,
such as an ancient version of the UNIX grep program, nevertheless does
a perfect job of matching a UTF-8 regexp against a UTF-8 string.

I think it will never match something that shouldn't match, which is
indeed a pretty cool feature of UTF-8, but it will sometimes fail to
match something that should match.  For example, the regexp foo.bar will
fail to match foo±bar (because the ± character is two bytes).

AMC