[Top] [All Lists]

Re: UTF-8 in headers

1999-01-24 10:58:20
As to the language of headers, I can see two possibilities:

1. A Language header. That would be simplest, but not so suitable for the
man who wants to give his real name (in a From:) in Chinese, his Subject:
in Arabic, and his Keywords: in Hebrew. Should we care?

yes.  If I'm replying to a message with a subject in language A, but 
my name is in language B, we need separate language tags for each.

more generally, in an email message which can have names for
each from address and each recipient,  the names may all be
in different languages.  I realize we're just talking about
news but eventually we will have to addresses these issues
for email and we may as well get it right now.

2. Use RFC 2482 (language tags embedded in UTF-8 text). Extremely
flexible, but would undoubtedly raise howls of protest from users whose
existing agents saw them as a sequence of garbage characters (people who
read news can get exceedingly irate when shown such things - as witness
the railings against HTML in news, or even against any form of Mime).

essentially nobody's existing UA supports UTF-8, so if you're
using UTF-8 anyway, including language tags in UTF-8 doesn't make 
the situation much worse.

AFAICS the only reason why a newsreader would care about knowing the
language would be in deciding whether to display the characters
left-to-right or right-to-left. 

that's not the reason.  the newsreader needs to use different
glyphs to display certain Unicode characters depending on the language 
used.  also, text-to-speech converters need language information to know 
how to pronounce the words (but even this doesn't work for all languages).

I imagine that lots of clients will fail to include the language
tags -- and that's probably better than insisting that the clients
include tags which they're likely to get wrong.  the main thing
is to make sure there's a provision for language tags.

Any news client which is also an SMTP client, MUST support RFC 2047
encoding if it uses non-ASCII characters in any headers sent over SMTP. 
If a "Newsgroups" header is included in an SMTP message, it MUST use RFC
2047 encoding for UTF-8 characters.

I think that is a gatewaying issue. What I would like to see is that if
you downgrade to RFC2047-charset=utf-8, you MUST upgrade to UTF-8
if/when it comes back into the news system. In particular, you do not
downgrade to anything other than utf-8 (even if you think you know how)
and you do not attempt to upgrade anything other than utf-8 (even if you
think you know how - that is for users agents only when they are ready to
display). The point is that UTF-8 <-> RFC2047-charset=utf-8 is a simple
algorithm. Anything else may require knowledge which not all agents

this makes sense to me.  people might want to convert 2047 from
utf-8 to other charsets, but they should only do so for mail
to be delivered locally, not for messages that might be gatewayed
back into news.

(there really needs to be a standard for mail<>news gateways anyway,
to address lots of other issues besides format conversion)


<Prev in Thread] Current Thread [Next in Thread>