Re: Transformation of Non-ASCII headers


Sam Roberts wrote:

Quoteing blilly(_at_)erols(_dot_)com, on Tue, Feb 11, 2003 at 09:09:16AM -0500:

Sam Roberts wrote:

Quoteing blilly(_at_)erols(_dot_)com, on Mon, Feb 10, 2003 at 05:39:26PM -0500:

The draft permits a UA to generate raw utf-8. That is then passed to
an injection agent, which determines that one or more newsgroups are
moderated.  Existing injection agents do not transform raw-utf-8,
and no existing or future injection agent can transform any untagged
8-bit content without charset and language information.



Why not?

The charset seems clearly to be utf-8!


No, in fact Usenet (and mail) abounds with a large variety
of untagged 8-bit charsets.



Sorry, none of that is validly encoded, and has specifically NO meaning,
unless assigned one.


Incorrectly tagging something other than utf-8 *as* utf-8
makes it worse; at least the untagged cruft is clearly
illegal -- incorrectly labelling it doesn't fix it, it
only compunds the error.

Backwards compatibility with standards compliant messages, that I
understand, but backwards compatiblity with invalidly encoded messages?


Before utf-8 can be adopted, there needs to be a transition
period where there is a moratorium on *all* untagged 8-bit
header field content as a prerequisite to a state where
the only untagged 8-bit content is utf-8.  The current
Usefor draft lacks such a transition plan.

And a langugae tag is only allowed for paramaters, and even there is
optional, is it not?


No, language-tagging is provided by MIME for RFC 2047
encoded-words also.



And is still optional, so does nothing to explain why you would make the
statement "no existing or future injection agent can transform any
untagged 8-bit content without charset and language information".


Incorrect; it is "optional" only in the sense that the *user*
need not specify a language. If the user wishes to specify a
language, the protocols MUST provide for preservation of that
language information. See RFC 2277 section 4.

Apparently there are problems, but lack of charset information and
language tagging doesn't seem to be.


There indeed are problems, and the lack of tagging are among them.
Lack of charset tagging could possibly be overcome by a suitable
transition plan, but language tagging provision in the protocol
which carries end-to-end is an absolute requirement which can
only be currently met in a compatible manner via RFC 2047/2231
methods.