[Top] [All Lists]

Re: UTF-8 in headers

1999-01-22 10:23:58
What you have is mostly right, but it's only a start to the complete
problem.  You'll note in RFC 2277 there's a requirement for language
labelling of i18n text.  You'll probably have to use RFC 2482 and define
all the rules for default language and when a language reset happens.

On Wed, 20 Jan 1999, Charles Lindsey wrote:
2. Header-names are strictly ascii (in fact, the only characters allowed
are ALPHA / DIGIT / "-", which is more restrictive than DRUMS).

I'd prefer if you followed the rules from the Message Format draft.

7. Tokens can use full UTF-8 (but that probably needs reviewing).

Which tokens?

One of the problems with working with UTF-8 is that some of the character
sets make no distinction between upper- and lower-case letters, and some
have an extra title-case, whatever that might mean. But worse, there is no
algorithmic method of converting upper- to lower-case; it can only be
done, in general, by table lookup, and the table is 450kB in length. So
you cannot allow UTF-8 in any place where some token is said to be
'case-insensitive' and if it is then necessary for agents to be able to
detect and act on it.

In some of these cases, it may be acceptable to drop the
`case-insensitive' rules.  Your 450kB number is misleading.  The 450kB
table is a US-ASCII form of a fairly complete table of several character
attributes and character names for Unicode.  Compressed, it's only 70kB
and for case conversion you probably only need a fraction of that.  So I
suspect the case-conversion table would be of negligable size in practice. 
The hard part is not the size of the table, but the fact that it has to be
periodically updated as new characters are added to Unicode so it's a
nasty maintenance issue.

As regards RFC-2047, it is a SHOULD accept, but SHOULD NOT generate.

For news, this is a practical compromise.  I might add that for a news
client which is also an IMAP or POP client, RFC 2047 should be a MUST

Indeed, in the Newgroups header it is a definite MUST NOT be used. Granted
it may have to be used when downgrading to mail, but in that case it would
have to be restored on the upgrade.

Any news client which is also an SMTP client, MUST support RFC 2047
encoding if it uses non-ASCII characters in any headers sent over SMTP. 
If a "Newsgroups" header is included in an SMTP message, it MUST use RFC
2047 encoding for UTF-8 characters.

                - Chris

P.S. I think your client has a bug such that when you both post to the
local.mime newsgroup and email to the mailing list, your client omits the
"To" header from the email message -- thus the message appears to come
directly to me rather than to the list.  I thus had to manually add the
"ietf-822(_at_)imc(_dot_)org" address to the headers on reply.  If you're using 
news client and newsgroup, it'd be a bug in your news->email gateway.