In
<Pine(_dot_)SOL(_dot_)3(_dot_)95(_dot_)990127145619(_dot_)482B-100000(_at_)elwood(_dot_)innosoft(_dot_)com>
Chris Newman <Chris(_dot_)Newman(_at_)innosoft(_dot_)com> writes:
On Sun, 24 Jan 1999, Charles Lindsey wrote:
As regards language names, I would presume that for bodies this is a MIME
matter. Does there exist (or is there a proposal for) a Content-Language:
header, or an equivalent parameter for one of the other MIME headers?
Language in MIME headers is defined in RFC 2231 as an extension to RFC
2047.
Yes, but the question I had in mind was how to specify the language being
used in a body part. Suppose I am writing my body in french. Do I put in a
header like
Content-Type: text/plain; charset=iso-8859-1; language=FR
I was not aware of such a language parameter in RFC-2046.
I have now looked at RFC 2231, and what a can of worms! Well it does give
a way to specify charsets and languages in Mime parameters, and adds
languages to RFC 2047, though the syntactic sugar is not perticularly
sweet :-( .
However, my concern is that it seems to provide yet another way to
downgrade 8bits to 7bits. It seems that, for downgrading from 8bit to 7bit
when headers are written in UTF-8 (as now proposed for news, and soon to
be proposed for email) we have to distinguish 3 cases:
1. Comments (...), Phrases (e.g. as in "Charles H. Lindsey"
<chl(_at_)(_dot_)(_dot_)(_dot_)>),
Unstructured text (as in Subject:s), "extension message header fields"
(not quite sure what that means) and all "X-" headers:
downgrading is by RFC 2047
2. Parameters of Mime headers (e.g. Content-Distribution: attachment;
filename="some-name-written-in-funny-characters")
downgrading is by RFC 2231
3. All other cases
there is no downgrading mechanism specified yet
Now, within (3.) we can distinguish
a) Protocol words. I would be happy to see these remain forever in
ASCII.
b) Parameters as in "keyword=parameter" that are part of headers
that are not Mime headers, but have borrowed the Mime syntax.
c) Other 'tokens' in assorted headers, including those not
invented yet. Newsgroup-names in news is the obvious example
here, but is fixable because we know about it. The worrying
ones are the ones we do not know about.
So I can see a great danger that the grand canonical method of downgrading
UTF-8 headers (which is subject of this thread) is likely to degenerate
into a large collection of special cases, all done differently. And I do
not see any immediate clean answer to this :-( .
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk Web:
http://www.cs.man.ac.uk/~chl
Voice/Fax: +44 161 437 4506 Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5