[Top] [All Lists]

Re: UTF-8 in headers

1999-02-03 15:32:06
In <199902030308(_dot_)WAA26932(_at_)spot(_dot_)cs(_dot_)utk(_dot_)edu> Keith 
Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

Perhaps we could establish something of a uniform syntax for all new 
headers, and along with it, uniform rules for converting to 7bit.
For example, declare that all human-readable strings in future fields 
are to be enclosed in single quotes.  Then it would be easy for a 
converter to know how to translate between UTF-8 and 7bit - convert 
the strings within quotes and leave everything else alone.

Hmm! I doubt you will persuade the world to go along with that.

Surely, a more sensible approach would be to define more carefully where
RFC2047 could be used. For example, that no protocol word or protocol
character (i.e. anything specified by an explicit "..." in the syntax)
could be included within an encoded-word. OK, that is too simplistic as it
stands, but surely there MUST be some uniform principles that could be

RFC 2047 attempts to define this very carefully, but that hasn't stopped
people from trying to use encoded-words where they don't fit - such as 
within a quoted string.

I still find the RFC2047 rules bizarre.

1. It seems that in an <unstructured> an encoded-word must have whitespace in
front of it and behind it. So if I really really want my Subject: to be
"=?iso-8859-1?Q?my=20text?=" I can put "\=?iso-8859-1?Q?my=20text?", though
it is not clear that reading agents are expected to hideaway the "\".

2. But in a comment this is not so. My encoded-word can have other
characters adjacent to it. But preceding it by a "\" will definitely stop
it being decoded, and it is more likely that the reading agent will hide
the "\".

3. Within a <phrase>, I find the syntax ambiguous. Clearly foobar is a
phrase, but is it one <word> or two (foo and bar are both <word>s, and the
syntax allows adjacent <word>s in a <phrase>)? So can I put
        foo=?iso-8859-1?Q?bar?=   ?

4. It seems the following are allowed in a <phrase>
        Charles Lindsey
        Charles "H." Lindsey
        "Charles H. Lindsey"
but not

5. It seems that the syntax was designed so that an encoded-word would
always be syntactically correct in the places where it was allowed even in
the absence of RFC2047. Which would seem to allow reading agents not to
decode them, for whatever reason. Presumably it was this policy that lead
to RFC 2231.

6. The rule that an encoded-word cannot occur in a <quoted-string> I find
particularly odd. In fact that <phrase> I quoted above 
IS allowed. It fits the syntax of a <quoted-string>. But reading agents
would not be allowed to decode it. AFAICS, the only place where allowing
an encoded-word within a <quoted-string> would be an embarassment would be
in the <local-part> of an <addr-spec> - a problem which could surely be
fixed in other ways. In fact, part of the problem appears to be that the
syntax of DRUMS is set out in such a way as not to facilitate dealing with
problems that do not arise within DRUMS itself (my foobar example above is
a case in point).

7. So can someone please explain to me why the use of encoded-words in a
<quoted-string> was outlawed, and what evils would ensue from letting them

8. Whilst one can see the thinking that lead to RFC 2231, I must point out
that, in a parameter of a Mime header, the following is already legal:
So what would be wrong with letting it mean what it looks as though it

9. In any "X-" header, I can use RFC2047 stuff wherever I like, whether it
fits with any supposed syntax or not. But what about other headers, not
defined in RFC822 or DRUMS, but defined in extensions? That is unclear to

I am asking all these questions just to make sure I understand the present
situation correctly, because it would be useless to try to look for
solutions to the more general problem without doing that first.

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email:     chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk  Web:
Voice/Fax: +44 161 437 4506      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9     Fingerprint: 73 6D C2 51 93 A0 01 E7  65 E8 64 7E 14 A4 AB A5

<Prev in Thread] Current Thread [Next in Thread>