ietf-822
[Top] [All Lists]

quoted-phrase in content-disposition header

1995-01-13 14:20:30

I'm a bit concerned about the use of 'quoted-phrase' in the
Content-Disposition draft.  If nothing else, encoded-words aren't
allowed within double quotes in other headers, and I would rather not
make an exception for Content-Disposition.

I have two alternative suggestions:

---------
Alternative 1:

I suggest using other delimiters than quotes, like '[' and ']'.  ("["
and "]" aren't in appendix B of 1521, but they *are* used in RFC 822,
so I think they're safe enough.)

I'd also suggest making the encoding mechanism be legal for other
content-disposition parameters, so there's no need to invent a new way
to encode non-ascii text every time someone adds a new parameter.  

So:

--

parameter = attribute [ "=" value ]

attribute = token

value = ( token / quoted-string / encoded-word-list )

encoded-word-list = "[" *( encoded-word / atom ) "]"

Encoded-words would have to be as if they appeared in an 'phrase',
that is, specials would have to be encoded if the "Q" encoding were
used.

---------
Alternative 2:


Encoded-words were designed to solve a problem with a very narrow
solution space -- encoding of the human readable text portions of rfc
822 message headers.  They are okay for that purpose, but I hate to
see them crop up everywhere, for the following reasons:

+ They were also designed only for "free-form" text, not to
  transparently convey arbitrary data.  For instance, there aren't good
  rules for when an encoded value begins and ends.

+ There are complicated rules that say when certain characters can
  appear unencoded in certain contexts (in a phrase, in *text, or in a
  comment).  

+ The RFC on encoded-words is written in terms of how encoded-words
  are to be *displayed*, and not as a mapping between an unencoded 
  string octets and and encoded one.  So there is inherent ambiguity,
  for example, in how to treat white space between adjacent encoded-words
  when they appear in a parameter. 

These rules have a large potential for being misunderstood or
mis-implemented.  If this only affects how message headers are
displayed, that's not a big deal -- e.g. if an extra space is
displayed after someone's name, it probably won't actually break
anything.  But this mechanism is used for transmitting a filename, and
the filename gets corrupted because of incompatibility, it would be
considered a serious interoperability problem.

So I would prefer that the use of encoded-words be confined to
free-form text fields.

--

For parameters, I'd rather see an encoding scheme that allowed not
just character data, but also binary parameters encoded with base64.
Something like:

value = ( token / quoted-string / base64-chunk-list )

base64-chunk-list = [ charset ] "[" *base64-chunk "]"

base64-chunk = 1*17 ( 4*4 ( b64char ) )

b64char = Any of the ASCII characters: 
          "A"-"Z", "a"-"z", "0"-"9", "+", "=", "/"

token and quoted-string implicitly have ASCII values
for base64-chunk-list, if charset is omitted, values are either ASCII
        or binary, as appropriate for that parameter
otherwise charset is the name of a MIME charset.

any amount of linear-white-space (including "CRLF SPACE") and/or
comments may appear between base64-chunks, but they are ignored when
decoding.

--

This might seem less general than encoded-words, because it doesn't
let you mix character sets in a single parameter.  However, you can
still use MIME charsets that allow charset switching (e.g.
ISO-2022-JP).

The 1522 encoded-word mechanism needed to be able to use multiple
charsets per header field (at least) because a single To: header could
have many different kinds of names in it.  I don't see the same need
for content-disposition parameters.  In particular, the filename
parameter will probably use the same charset used by the sender's
locale.

(Gee, I'd hate to have to write code that tried to convert an
arbitrary list of (charset,string) pairs into something that would
look about the same on the local filesystem.  Though I'm sure Ned
already has something in place that does just that, it seems like a
lot of overhead just to implement content-disposition...)

Keith

<Prev in Thread] Current Thread [Next in Thread>