Re: RFC 2047 and gatewaying


Charles Lindsey wrote:

In <20030113093504(_dot_)67cda815(_dot_)moore(_at_)cs(_dot_)utk(_dot_)edu> Keith 
Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

but it only matters for the purpose of display.  if you're using an encoded-word
for anything other than purely human-readable text, you're not using themproperly.



Actually, it does matter for purposes other than display, because people
want to construct whitelists, blacklists, killfiles, etc so as to filter
or compartmentalise their mail and news, and they are certainly not going
to construct those lists using raw encoded-words.


Keith is still correct; the human-readable text isn't reliable for the purposes
cites as they are easily changed / forged / removed.  Instead of a display name,
filtering on the (machine-readable) addr-spec is more reliable.  Subject is
another matter, but filtering on Subject content has never been reliable.

One *could* however filter using encoded-words; there is no technical reason
why that wouldn't work (regexp matching would have to take special characters
into account in any case). It is true that fully-general 2047 encoding does
provide for variation in encoding (Q vs. B, =20 vs. _, etc.), but that is
also true of any Unicode encoding including utf-8 (pre-composed accented
characters vs. non-spacing modifiers, alternative forms, etc.).  If this were
deemed a sufficiently interesting issue, Usefor *could* still use compatible 
2047
representation along the same lines as RFC 1036; specify a more restrictive
subset of what is permitted in general Internet text messages (e.g. specify
Q encoding only).

and outside of to/cc/from/bcc/reply-to, few structured fields contain human
readable text anyway.



And that I think is the real point that I want to make. So long as
gateways manage to encode those cases correctly (and that is a MUST in my
proposed wording) then the few cases in other onscure headers that slip
through will hardly be a problem in practice.


1. As those header fields are common to mail and news, they should always
   use 2047 for non-ASCII content and may never contain raw 8-bit content.
2. Existing gateways do not need to encode any headers now, because the
   current standard (1036) does not permit raw 8-bit content.  A change
   would break compatibility with those gateways (and with IMAP, etc.).
3. And what about the newly-proposed Mail-Copies-To field? Or Complaints-To?
   Existing gateways have no syntax information for those, so could not
   encode. Not to mention any header fields that might be added in future.
4. Only the generating UA has the charset and language information necessary
   for 2047 tagging; only the generating UA can do the job -- a gateway
   cannot.
5. There remains the issue of picking *one* way; instead of saying use 2047
   or use raw utf-8 (as the Usefor draft currently does), it is far better
   to simply say use 2047:
 a. one (raw utf-8) is not compatible
 b. when there are two ways of doing something, pick one
 c. one (raw utf-8) does not provide for i18n considerations (language-tagging)