ietf-822
[Top] [All Lists]

Re: UTF-8 over RFC 2047

2003-01-15 07:12:38

Russ Allbery wrote:
Using RFC 2047 for the Newsgroups header is probably unworkable for a
variety of reasons, most notably being that RFC 2047 is not a unique
encoding format (there are multiple ways to encode the same word), and all
existing Usenet software requires that a given newsgroup have one and only
one textual representation.

That reason is not a show-stopper; Unicode (and therefore utf-8) has the
same problem (multiple representations of the same word), and the issue
can be easily handled by specifying a subset of the general method. For
example, a 2047-like encoding (since 2047 is not directly applicable for
other reasons) simply using =xx (q encoding) could work.  Alternatively,
a 2231-like (or a similar URL-like) encoding using %xx could work.

Which raises an interesting question for the MIME authors: 2231 section
4 states
   Percent signs
   ("%") are used as the encoding flag, which agrees with RFC 2047.
whereas 2047 uses '=' for that purpose.  Presumably the flag character
was originally intended to be the same, but was changed w/o changing
the text -- is that correct?  (an erratum might be in order).

The "one and only one textual representation" is an issue that favors
having the canonical form be the compatible form, as then there is no
need for multiple codec cycles, each of which has the possibility of
an error (e.g. failure to decode or encode -- highly likely as there
is currently no encoding or decoding involved in gateways).  That is
in addition to the issues of
1. compatibility with the Internet text message format, which *never*
   permits 8-bit content in any header fields.
2. compatibility with existing protocols (IMAP, SMTP) which prohibit
   8-bit content in header fields
3. compatibility with existing gateways, which do not modify Newsgroups,
   Followup-To, etc. header fields.


<Prev in Thread] Current Thread [Next in Thread>