ietf-822
[Top] [All Lists]

Re: rather than argue and bicker about who said what...

2003-01-17 19:11:12

"Keith" == Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

Which is an argument in favor of using UTF-8 newsgroup names on
the wire between news servers, since then a UTF-8-aware wildmat
will work as one expects.  If newsgroup names are decoded into
UTF-8 before matching, wildmat matches will always work as
expected.

 Keith> seems like the code needs to be changed either way.

most of the server (relay and serving agents in USEFOR-speek) does not
need to be changed in any way whatsoever to support raw UTF-8. Note
that many servers are _only_ relay agents and that these form a
fundamental part of the Usenet infrastructure.

The sole server change required is for injecting agents that wish to
support posting to non-ascii moderated groups, and all they require is
a bolt-on addition to the mail-to-moderator operation which is
normally handled in an external program anyway and therefore can
almost certainly be dealt with without having to actually touch the
server code itself.

 Keith> existing expression matchers seem unlikely to do useful things
 Keith> with utf-8 regardless of whether or not the utf-8 is encoded
 Keith> as ascii.  for instance, will the * character match a sequence
 Keith> of octets or a sequence of utf-8 characters?

for raw UTF-8 it doesn't matter; a valid UTF-8 wildmat using '*'
metacharacters (only) will match precisely the same set of UTF-8
strings regardless of whether the matching algorithm understands UTF-8
or not. (The behaviour can only differ if either the wildmat or the
string being matched against is not itself a valid UTF-8 string.)

On the other hand, if the UTF-8 is encoded into some ASCII
representation, then this is no longer guaranteed.

-- 
Andrew.