Re: Transformation of Non-ASCII headers


Charles Lindsey wrote:

Before utf-8 can be adopted, there needs to be a transition
period where there is a moratorium on *all* untagged 8-bit
header field content as a prerequisite to a state where
the only untagged 8-bit content is utf-8.  The current
Usefor draft lacks such a transition plan.



And that just shows how out of touch with the Real World (TM) you are.

How do you propose to introduce such a moratorium (that behaviour is
_already_ non-compliant with the standards). More importantly, how do you
propose to enforce it?


The standard can simply state the requirement, viz. no unencoded
8-bit content.  That is a necessary prerequisite from the current
chaos to use of a single 8-bit untagged charset.  Enforcement
isn't necessary -- if there's a desire to move to utf-8, use of
untagged 8-bit content will have to cease first so that when
generation of untagged utf-8 is eventually permitted, one can
be absolutely assured that such untagged 8-bit content *is* utf-8
and not any of a hundred other charsets. That's the carrot.

I can tell you that it will _never_ happen, and so if you wait for it you
will wait for ever, and UTF-8 will never get introduced into the
nstandards.


That's the stick.

If it doesn't happen, at least the time-tested, standards-compliant,
backwards-compatible, widely-implemented MIME (RFC 2047/2231) methods
can be used, and untagged 8-bit content will still be officially
non-compliant.  And if it doesn't happen, that means there is no
real desire for it to happen.  If it does happen, there are still
issues that need to be addressed, but that's jumping ahead quite a
bit.

OTOH, if it is made clear from the outset that UTF-8 is the one and true
way, then at least some people will take heed, and the situation will get
better (still not perfect, but better).


No the situation will remain as intolerably bad as it is now; a
proliferation of untagged 8-bit use, where it is not possible to
determine the charset in use. Or, in the words of RFC 2978:
   Use of a large number of charsets in a given protocol may hamper
   interoperability.  However, the use of a large number of undocumented
   and/or unlabeled charsets hampers interoperability even more.
And note that RFC 2277, BCP for internationalization, states:
   Negotiating a charset may be regarded as an interim mechanism that is
   to be supported until support for interchange of UTF-8 is prevalent;
   however, the timeframe of "interim" may be at least 50 years, so
   there is every reason to think of it as permanent in practice.
It is entirely up to Usenet users and software authors how long it
will take to reach such a condition; if use of multiple untagged
8-bit charsets ceases quickly, the transition to utf-8 can happen quickly.
If not, then not.  Requiring the cessation of untagged 8-bit content
is a necessary step in that process, not only so that it will
eventually be possible to have a single official untagged charset,
but also to provide for interoperability with existing news infradtructure,
including SMTP (used for moderated postings and with gateways) and
IMAP (used by news and mail UAs).  It also provides time for the
upgrading of that infrastructure.