Re: quoted-phrase in content-disposition header

+ They were also designed only for "free-form" text, not to
  transparently convey arbitrary data.  For instance, there aren't
  good rules for when an encoded value begins and ends.
     
+ There are complicated rules that say when certain characters can
  appear unencoded in certain contexts (in a phrase, in *text, or in
  a comment).  
     
+ The RFC on encoded-words is written in terms of how encoded-words
  are to be *displayed*, and not as a mapping between an unencoded 
  string octets and an encoded one.  So there is inherent ambiguity,
  for example, in how to treat white space between adjacent 
  encoded-words when they appear in a parameter.



Greg writes:

] I see this as a bug in the encoded word spec.  It seems that a proper
] examination of that text can tighen up the rules for encoded words
] such that they would be precice enough for this usage.  I don't see
] free form text as an opportunity to be imprecice with the encoding.

It's not a bug, it's a feature!  I deliberately chose not to make
encoded-words suitable for encoding arbitrary kinds of binary data.
Here's why:

An encoding scheme that fullfilled the requirements for encoded-words
(compatible with rfc822 atoms, yet distinguishable from them; ability
to mix encoded text with unencoded text; ability to sort-of read ASCII
portions of encoded text; ability to survive line rewrapping) would
not work well for arbitrary data.  In particular:

+ the format used by encoded-words is cumbersome (owing to the need to
  make them sufficiently ugly that they will never be confused with
  ordinary rfc 822 atoms), and

+ the "mix with plain text" rules basically prevent the use of 
  encoded-words as binary encoding in any environment where
  headers can be re-wrapped.

Likewise, a general encoding scheme for binary parameters needs to be
very simple to implement correctly, so that parameters will not be
corrupted by implementations.  Encoded-words do not meet this
requirement.

] In particular, we can specify what whitespace rules to use for
] separating encoded words.  We can also loosen some restrictions as to
] their length, making the encoded word length restriction only a
] function of it's use in an RFC822 header.  

You can do all of these things, but they would only serve to make 1521
more complicated than it already is.  Trying to make encoded-words
more general *increses* the number of special cases, as well as the
opportunities for confusion between these cases in the minds of
implementors.  

Q1: Which characters can appear in "Q" encoding in an encoded-word
    within a comment within a Content-Disposition filename parameter?  

Q2: What is the chance that the guy implementing a MIME gateway for
    BooMail Corp's LanPigeon (tm) is going to get this stuff right?

] None of these changes is
] incompatable with current usage or the spec itself, but they do
] prepare the encoded word for use in the many places where it can be
] useful.

While this might seem to be desirable, I contend that it is not.

Without the requirement to be backward compatible with RFC 822 atoms
in phrases, the encoding can be much simpler.  For new applications,
it is easier to design an encoding scheme that is simple to decode
correctly, than it is to extend a 1522 encoder and decoder to handle
the extra special cases for new applications of encoded-words.

And if you make the new scheme look too much like encoded-words, it'll
just invite confusion between the two.  It certainly doesn't make it
easier for the implementor.

Keith