Re: RFC 2047 and gatewaying


D. J. Bernstein wrote:

Bruce Lilly writes:

and in fact Usenet abounds with untagged charsets



Obviously we can't make all of them work simultaneously. The way out of
this mess is for message readers to support UTF-8---as many implementors
have already done---so that message writers can safely use UTF-8.


We can't make *any* untagged non-ascii charsets work simultaneously for
all users.  Message writers can safely use utf-8 now, as with any other
charset, by specifying "utf-8" as the charset with the mechanisms defined
in RFC 2047 as amended by RFC 2231 and errata. Message readers ought to
support RFC 2047 / 2231 in any event, and are expected to support properly-
tagged utf-8, as they are used with email, which is a much larger application
base than Usenet. Neither Usenet nor Usefor can hope to force raw untagged
utf-8 on that application base, which includes gateways and combined mail/news
user agents.  Drop the raw utf-8 nonsense and move on to somthing that has
a non-zero chance of baing (a) standardized and (b) implemented.

Who said anything about gateways having to use language information?



You said that the user agent ``must preserve ... language information in
some way that it can be used by gateways'' or must use RFC 2047.


... in order for gateways to be able to reconstruct the language information
for RFC 2047 encoding as required by the Usefor draft, where desired for
header information.
  Subject: =?iso-8859-1*en?q?boot?=
and
  Subject: =?iso-8859-1*de?q?boot?=
mean quite different things.  They are pronounced differently, and that is
a consideration for the sight-impaired users who use screen reader software.
Clearly it is simple, practical, and compatible to use proper 2047 / 2231
tagging including language where applicable than attempt the unworkable,
circuitous, complex Usefor-draft-recommended method of dropping the charset
and language information in the generating UA for untagged utf-8, then
attempting to recreate that information out of thin air in a gateway which
has no access to the charset amd language information known to the generating
UA, and might not have header field syntax information.  The gateway itslef
never has any direct need for language information, and I never claimed
otherwise.  The language information should be preserved end-to-end from
generating UA to receiving UA, and untagged utf-8 in compliance with Unicode
standards and across different Unicode versions provides no way to do so. RFC
2047 / 2231 do provide such a mechanism.