[Top] [All Lists]

Re: UTF-8 in headers

1999-02-02 20:05:28
On Sun, 24 Jan 1999, Charles Lindsey wrote:

So I can see a great danger that the grand canonical method of downgrading
UTF-8 headers (which is subject of this thread) is likely to degenerate
into a large collection of special cases, all done differently. And I do
not see any immediate clean answer to this :-( .

There is no clean answer short of writing out explicit rules, which is what
I've been saying from the start has to be done.

I cannot believe that anyone could seriously suggest that the way out of
this problem is to invent a separate ad hoc solution for every situation.
That is just not how sensible systems are designed.

New headers for mail and news are being invented all the time. Often, they
are deployed and widely adopted before the RFC gets written (yes, that is
not supposed to happen, but it does). And now you are saying that everyone
who invents a new headers has also to specify how it gets downgraded to

Perhaps we could establish something of a uniform syntax for all new 
headers, and along with it, uniform rules for converting to 7bit.
For example, declare that all human-readable strings in future fields 
are to be enclosed in single quotes.  Then it would be easy for a 
converter to know how to translate between UTF-8 and 7bit - convert 
the strings within quotes and leave everything else alone.  

Surely, a more sensible approach would be to define more carefully where
RFC2047 could be used. For example, that no protocol word or protocol
character (i.e. anything specified by an explicit "..." in the syntax)
could be included within an encoded-word. OK, that is too simplistic as it
stands, but surely there MUST be some uniform principles that could be

RFC 2047 attempts to define this very carefully, but that hasn't stopped
people from trying to use encoded-words where they don't fit - such as 
within a quoted string.


<Prev in Thread] Current Thread [Next in Thread>