ietf-822
[Top] [All Lists]

Re: prevervation of installed base

2003-01-09 23:26:07

"Dave" == Dave Crocker <dcrocker(_at_)brandenburg(_dot_)com> writes:

To add to Ned's correction, news-> email gatewaying has been
around for at least 20 years, as far as I can
recall. Organizations have often wanted to plug mailing lists
into newsgroups. To permit the newsgroup readers to participate
in the mailing list, a news->email gateway is required. In other
environments, the 2-way gatewaying is simply part of a model that
lets users decide how they want to receive and process their
group discussion messages.

 Charles> Yes indeed, but those are not GENERAL-PURPOSE gateways.

 Dave> I have no idea was technical differences you think exist
 Dave> between whatever it is you have in mind and the examples I
 Dave> described.

 Dave> So, please characterize those technical differences, in enough
 Dave> detail to evalute their impact on the current work.

the most important technical difference (as I personally see it - my
opinions may differ from others involved in USEFOR) is that gateways
for individual groups or hierarchies do not have to contend with the
use of non-ASCII _newsgroup names_ unless they specifically choose to
do so. (In such cases there is generally no particular need to
preserve the Newsgroups header accurately in the case of crossposts to
groups not handled by the same gateway; furthermore, if the gateway is
implemented via the moderation mechanism, this becomes a non-issue
anyway). Furthermore, articles gatewayed from news to email via such
gateways are generally not expected to be gatewayed back into news _as
the same article_ (i.e. in the same group and with the same
message-id), and therefore the gateway does not need to satisfy the
requirements of a news transport. (If an article is to be gatewayed
news->email->news and arrive as the same article from the news
system's perspective, then the two gateways combined constitute a
relaying agent and must satisfy the requirements for such, which
include preserving all headers and body intact and unmolested except
for specific exemptions such as the Path header.)

The use of (unlabelled) 8-bit character sets in unstructured headers
like Subject and in phrases or comments in From headers is already
endemic in Usenet (and probably in email too, but it is harder to get
good statistics for that). Specifically, when I last measured,
something ike 17% of all text posts (excluding detectable spam) to
generally-propagated Usenet groups contained octets >127 in either
Subject or From. That figure increased to 30% if one excluded the
primary English-language hierarchies. The usage of RFC2047 was
substantially less; 2.7% and 4.6% respectively. (The actual split
between unlabelled 8-bit and RFC2047 varies quite widely by hierarchy,
but only in the Japanese-language hierarchies does RFC2047 truly
dominate.)

(Note, however, that no significant amount of this existing usage of
unlabelled 8-bit currently involves the use of utf-8; usually it is a
popular local charset. Introducing more unlabelled utf-8 would not
make this situation significantly worse; the "if it's valid utf-8 then
assume that's what it is" heuristic works extremely well.)

My own view is that I would be happy to see USEFOR's use of utf-8
limited to newsgroup names, with RFC2047 specified for use in those
headers (shared with email) in which it is valid. However, I don't
expect this to have much effect on existing usage; thus, it would be
fundamentally dishonest of USEFOR to in any way suggest that
implementors can disregard the issue of octets >127; such an
implementation would be useless in the real world.

(The question of what to do with newsgroup names themselves has been
extensively thrashed out on USEFOR; I strongly recommend that people
not try and jump into that one without looking at the previous
debate.)

-- 
Andrew.