ietf-822
[Top] [All Lists]

Re: rather than argue and bicker about who said what...

2003-01-17 18:42:57

Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

seems like the code needs to be changed either way.  existing expression
matchers seem unlikely to do useful things with utf-8 regardless of
whether or not the utf-8 is encoded as ascii.  for instance, will the *
character match a sequence of octets or a sequence of utf-8 characters?

It turns out that it's not actually that bad.  Yes, * will match a series
of octets rather than a sequence of characters, but because of the
(excellent) structure of UTF-8, it's rather difficult to construct
realistic cases where that doesn't end up amounting to the same thing.
The main damage is to the ? single-character match and character classes,
but those are rather rarely used features in practice, to a degree that
character classes likely won't be standardized in the NNTP revision.

In practice, nearly all of the wildmats one sees are either literal group
names or patterns using only *.  I expect that servers could go quite some
distance with UTF-8 groups without changing their wildmat engine at all.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>