Chris Newman <Chris(_dot_)Newman(_at_)innosoft(_dot_)com> writes:
On Sun, 24 Jan 1999, Charles Lindsey wrote:
As regards language names, I would presume that for bodies this is a MIME
matter. Does there exist (or is there a proposal for) a Content-Language:
header, or an equivalent parameter for one of the other MIME headers?
Language in MIME headers is defined in RFC 2231 as an extension to RFC
Yes, but the question I had in mind was how to specify the language being
used in a body part. Suppose I am writing my body in french. Do I put in a
Content-Type: text/plain; charset=iso-8859-1; language=FR
I was not aware of such a language parameter in RFC-2046.
This is done with a Content-Language field. See RFC1766 for details.
Note, however, that this is only a solution for body information (and not
necessarily even text), not text in header fields.
I have now looked at RFC 2231, and what a can of worms! Well it does give
a way to specify charsets and languages in Mime parameters, and adds
languages to RFC 2047, though the syntactic sugar is not perticularly
sweet :-( .
However, my concern is that it seems to provide yet another way to
downgrade 8bits to 7bits. It seems that, for downgrading from 8bit to 7bit
when headers are written in UTF-8 (as now proposed for news, and soon to
be proposed for email) we have to distinguish 3 cases:
1. Comments (...), Phrases (e.g. as in "Charles H. Lindsey"
Unstructured text (as in Subject:s), "extension message header fields"
(not quite sure what that means) and all "X-" headers:
downgrading is by RFC 2047
2. Parameters of Mime headers (e.g. Content-Distribution: attachment;
downgrading is by RFC 2231
3. All other cases
there is no downgrading mechanism specified yet
Now, within (3.) we can distinguish
a) Protocol words. I would be happy to see these remain forever in
b) Parameters as in "keyword=parameter" that are part of headers
that are not Mime headers, but have borrowed the Mime syntax.
c) Other 'tokens' in assorted headers, including those not
invented yet. Newsgroup-names in news is the obvious example
here, but is fixable because we know about it. The worrying
ones are the ones we do not know about.
So I can see a great danger that the grand canonical method of downgrading
UTF-8 headers (which is subject of this thread) is likely to degenerate
into a large collection of special cases, all done differently. And I do
not see any immediate clean answer to this :-( .
There is no clean answer short of writing out explicit rules, which is what
I've been saying from the start has to be done.