"Keith" == Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:
Which is an argument in favor of using UTF-8 newsgroup names on
the wire between news servers, since then a UTF-8-aware wildmat
will work as one expects. If newsgroup names are decoded into
UTF-8 before matching, wildmat matches will always work as
expected.
Keith> seems like the code needs to be changed either way.
most of the server (relay and serving agents in USEFOR-speek) does not
need to be changed in any way whatsoever to support raw UTF-8. Note
that many servers are _only_ relay agents and that these form a
fundamental part of the Usenet infrastructure.
The sole server change required is for injecting agents that wish to
support posting to non-ascii moderated groups, and all they require is
a bolt-on addition to the mail-to-moderator operation which is
normally handled in an external program anyway and therefore can
almost certainly be dealt with without having to actually touch the
server code itself.
Keith> existing expression matchers seem unlikely to do useful things
Keith> with utf-8 regardless of whether or not the utf-8 is encoded
Keith> as ascii. for instance, will the * character match a sequence
Keith> of octets or a sequence of utf-8 characters?
for raw UTF-8 it doesn't matter; a valid UTF-8 wildmat using '*'
metacharacters (only) will match precisely the same set of UTF-8
strings regardless of whether the matching algorithm understands UTF-8
or not. (The behaviour can only differ if either the wildmat or the
string being matched against is not itself a valid UTF-8 string.)
On the other hand, if the UTF-8 is encoded into some ASCII
representation, then this is no longer guaranteed.
--
Andrew.