There has been a lot of discussion of RFC 2047 in recent days. Please
bear with me while I introduce a little more.
In Usefor, where headers in Netnews will be in UTF-8, we have to say how
to gateway into email. Whereas a random gateway set up for a specific
purpose can do pretty well whatever it likes (and its customers can put
up with), gateways that generate mail that is to be readable anywhere
have to be more careful. This applies especially when articles are both
posted and emailed, and when mailing articles to moderators.
So I am here showing you the text that is currently proposed, primarily
because I want the RFC 2047 experts here to check that what I say is
correct, or at least as correct as it is possible tom be with RFC 2047.
Your comments would be appreciated.
8.8.1.1. Gatewaying into email
Although headers containing non-ASCII characters may well be conveyed
intact by many (if not most) current mail transport agents, that
ability is not a requirement of some transport protocols, notably of
SMTP [RFC 2821]. Likewise, although many mail user agents may
currently display (or be configurable to display) such headers
correctly, or at least adequately, messages containing such headers
are not compliant with the current Email standards, notably with [RFC
2822]. Note that non-ASCII body part headers [RFC 2046] (including
non-ASCII headers of a message/rfc822) are equally at variance with
the current Email standards.
If, at some future time, the Email standards should be updated so as
to allow such headers, it would then become possible to transport
Netnews articles containing them over Email without further ado.
Until such a time, however, if a Netnews article is to be gatewayed
into Email with the intention that it be received and accepted by any
arbitrarily chosen destination, and if it contains any UTF8-xtra-char
in any of its headers or body part headers, then it MUST first be
transformed so as to conform to [RFC 2822] and/or [RFC 2046]. In
particular, articles emailed to moderators (8.2.2) MUST be so
transformed.
NOTE: It is not precluded that a gatewayer who knows, or is able
to control, the capabilities of the particular sites for which
an article is destined and of the transport paths leading to
those sites, may choose to send the article without
transformation, or at least without transformation of any
contained body part headers.
The surest way to transport an article containing non-ASCII headers
through Email is by encapsulation as an application/news-transmission
(6.21.6.1). However this method is not currently available for
sending to moderators for reason explained in section 8.2.2 step 12.
Until this method is considered safe to use, therefore,
transformation of those headers will be necessary. This can be
accomplished in the following steps:
1. If the header is unstructured, or is an experimental header
(4.2.5.1), any word(s) which is delimited by FWS or by the
start/end of the header-content is encoded according to [RFC
2047].
2. If the header is unstructured, any word(s) which is contained
within a comment and is delimited by FWS or by the "(" or ")"
delimiting that comment is encoded according to [RFC 2047], and
likewise any word(s) which is contained within a phrase and is
delimited by FWS or by the start/end of the header-content.
3. If the header contains a (MIME-style) parameter with a non-ASCII
value, the whole parameter is encoded according to [RFC 2231].
4. If the header is a Newsgroups-header or a Followup-To-header (or
any other header that contains a newsgroup-name), each newsgroup-
name is encoded according to section 5.5.2. Even if it is not
decoded at the far end, it is preferable to display that encoded
form than to display nothing at all. Note, however, that such
encoded newsgroup-names MUST be restored to their canonical form
before reinjection into any Netnews system.
5. If the header is not one defined by this standard or by any Email
standard known to the gateway (so that it cannot be determined
whether it is unstructured, or otherwise where comments and
phrases occur within it), then it is not possible to encode it
according to a strict interpretation of [RFC 2047]. Nevertheless,
it is preferable to attempt an encoding than to discard that
header or to allow the gatewaying to fail. It is therefore
suggested that, outside of regions contained within properly
matched DQUOTEs, <...> or [...], any word(s) contained within
properly nested "(" and ")" be treated as being within a comment
and any other word(s) be treated as being within a phrase.
Likewise, following any ";", anything of the syntactic form of a
parameter should be treated as such.
In all cases, there are additional restrictions imposed by [RFC 2047]
regarding the size, placement and contents of encoded-words which
MUST be observed. Moreover, these transformations MUST be applied
both within the header of the article and within any body part
headers (including the headers of any message/rfc822). It is
generally preferable for encodings to use the charset UTF-8, although
it might be wise first to confirm that that is indeed the charset
which had been used (see 4.4.1).
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk Snail: 5
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5