Re: UTF-8 in headers

In 
<Pine(_dot_)SOL(_dot_)3(_dot_)95(_dot_)990127145619(_dot_)482B-100000(_at_)elwood(_dot_)innosoft(_dot_)com>
 Chris Newman <Chris(_dot_)Newman(_at_)innosoft(_dot_)com> writes:

On Sun, 24 Jan 1999, Charles Lindsey wrote:

As regards language names, I would presume that for bodies this is a MIME
matter. Does there exist (or is there a proposal for) a Content-Language:
header, or an equivalent parameter for one of the other MIME headers?

Language in MIME headers is defined in RFC 2231 as an extension to RFC
2047.


Yes, but the question I had in mind was how to specify the language being
used in a body part. Suppose I am writing my body in french. Do I put in a
header like
        Content-Type: text/plain; charset=iso-8859-1; language=FR
I was not aware of such a language parameter in RFC-2046.

I have now looked at RFC 2231, and what a can of worms! Well it does give
a way to specify charsets and languages in Mime parameters, and adds
languages to RFC 2047, though the syntactic sugar is not perticularly
sweet :-( .

However, my concern is that it seems to provide yet another way to
downgrade 8bits to 7bits. It seems that, for downgrading from 8bit to 7bit
when headers are written in UTF-8 (as now proposed for news, and soon to
be proposed for email) we have to distinguish 3 cases:

1. Comments (...), Phrases (e.g. as in "Charles H. Lindsey" 
<chl(_at_)(_dot_)(_dot_)(_dot_)>),
Unstructured text (as in Subject:s), "extension message header fields"
(not quite sure what that means) and all "X-" headers:
        downgrading is by RFC 2047

2. Parameters of Mime headers (e.g. Content-Distribution: attachment;
filename="some-name-written-in-funny-characters")
        downgrading is by RFC 2231

3. All other cases
        there is no downgrading mechanism specified yet

Now, within (3.) we can distinguish
        a) Protocol words. I would be happy to see these remain forever in
           ASCII.
        b) Parameters as in "keyword=parameter" that are part of headers
           that are not Mime headers, but have borrowed the Mime syntax.
        c) Other 'tokens' in assorted headers, including those not
           invented yet. Newsgroup-names in news is the obvious example
           here, but is fixable because we know about it. The worrying
           ones are the ones we do not know about.

So I can see a great danger that the grand canonical method of downgrading
UTF-8 headers (which is subject of this thread) is likely to degenerate
into a large collection of special cases, all done differently. And I do
not see any immediate clean answer to this :-( .

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email:     chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk  Web:   
http://www.cs.man.ac.uk/~chl
Voice/Fax: +44 161 437 4506      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9     Fingerprint: 73 6D C2 51 93 A0 01 E7  65 E8 64 7E 14 A4 AB A5