Re: Questions about 1522 encoding scheme


1522 states in section 5 that:

       Ordinary ASCII text and encoded-words may appear together in the
       same header field.  However, an encoded-word that appears in a
       header field defined as "*text" MUST be separated from any
       adjacent encoded-word or "text" by linear-white-space.

and

       A "Q"-encoded encoded-word which appears in a comment MUST NOT
       contain the characters "(", ")" or " encoded-word that appears in
       a "comment" MUST be separated from any adjacent encoded-word or
       "ctext" by linear-white-space.


BTW, this paragraph contains a typo that was apparently caused by
pasting formatted text (from the internet-draft) into the nroff source
for the RFC.  The original nroff source for the internet-draft reads:

] A "Q"-encoded \%encoded-word which appears in a comment MUST NOT
] contain the characters "(", ")" or "\\".  In addition, an
] \%encoded-word that appears in a "comment" MUST be separated from any
] adjacent \%encoded-word or "ctext" by \%linear-white-space.

and in section 6.2

   When displaying a particular header field that contains multiple
   encoded-words, any linear-white-space that separates a pair of
   adjacent encoded-words is ignored.  (This is to allow the use of
   multiple encoded-words to represent long strings of unencoded text,
   without having to separate encoded-words where spaces occur in the
   unencoded text.)

So, as far as I can see the line

      =?US-ASCII?Q?Keith_?=    =?US-ASCII?Q?Moore?=

Should be presented as

      Keith Moore

...because the space between the encoded words should be deleted
when displaying the text. So, the space that must be there
must be in one of the encoded words.

But, what happens when you mix encoded words with text?
For example:

      =?US-ASCII?Q?Keith_?=   Moore

You must have space between the encoded word and the text, so
I presume that the spaces after the encoded word should be
deleted.


This is not correct.  Only linear-white-space between adjacent
encoded-words should be "deleted".  

(More precisely, the linear-white-space that separates adjacent
encoded-words should be "ignored" for the purposes of display.  1522
deliberately does not speak in terms of translating between 8-bit and
7-bit headers because there is no definition for the proper syntax of
8-bit headers.)

But, the RFC sais as you can see that "linear whitespace", not
_one_ space character, is to be deleted so the encoded word:

      =?US-ASCII?Q?Keith?=         Moore

should be displayed as

      KeithMoore


That's correct.

So, now my question/statement which I must have confirmed:

   Any whitespace in a text that is to be encoded, or adjacing
   to text that is to be encoded, must also be encoded?


The short answer to your question is "no".  If you decide that you
must encode a portion of a header, only the white space *internal* to
that string must be explicitly encoded.  Leading and trailing white
space may remain outside of the encoded portion.

Note that in practice RFC 822 headers may be subject to a lot of
rearranging and/or rewrapping, so there is no guarantee that

header:  a
 b                         c

will not end up being displayed as 

header:  a b c

However I would not expect such white space to be deleted entirely.

Keith