Re: PROBLEM: Newlines & Quoted-printable

Excerpts from mail: 25-Feb-92 Re: PROBLEM: Newlines & Qu.. Ned
Freed(_at_)HMCVAX(_dot_)CLAREMO (10608)

This seems so easy to resolve I must be missing something. For starters,
the encoder reads the input material. It must know what constitutes a
"line break" in whatever the input material is. For text, this is going to
be whatever the local newline convention or conventions are. Most other 
things 
by and large don't have line break conventions that make sense to recognize, 
so
there are no line breaks to deal with. (Certainly the two examples I gave
above don't have newline mechanisms that make any sense.)


I think you are indeed missing something.  Let me try to make this very
concrete.  I'm sending you mail from a UNIX system and it has two lines:

Me and My
Big Mouth

each of which is terminated, in UNIX convention, by LF.  Now, I want
(who knows why) to encode it in q-p.  Under the exisiting scheme, this
would make no change to this particular data, because the LF is the
local newline convention and can represent itself in q-p.  Under Alain's
proposal, however, I might choose to represent this as 

Me and My=0D=0ABigMouth

OK, fine.  Now I pass this off to sendmail, a pre-MIME piece of software
which knows to convert LF to CRLF for SMTP, but doesn't know anything
about q-p.  It passes it off to another sendmail, and the mail ends up
on another UNIX system, unchanged.  There, a UA tries to show it to the
recipient.  It's a MIME-smart UA, so it knows how to decode
quoted-printable.  So what does it do with the "=0D=0A"?  Well, that
DEPENDS on whether it thinks of the data as line-oriented or binary.  If
it's line oriented, it's going to change it to LF before displaying it. 
If it's binary, it's going to change it to CRLF.  But there's no way to
tell which is right!  The virtue of the old scheme is that, because it
never encodes line breaks as anything but line breaks, no existing
transport software needs to change.

When a newline is encountered it is encoded as such. Both base64 and 
quoted-printable admit the possibility of encoding it as a 0D0A. 
Quoted-printable also admits the use of line break as the encoding for a
line break (note that these are NOT the same thing -- one is the encoding
for the other).


This is the heart of the matter.  The existing q-p definition doesn't
just "admit" the use of a line break for this purpose, it REQUIRES it. 
It makes it perfectly clear that if you have a line break, it represents
a line break, and if you have =0d=0a, it represents the two bytes CRLF. 
  In the Fontaine proposal, the meaning of =0D=0A becomes ambiguous.  
Is this clearer now?

Suppose I never use the line breaks in quoted-printable. I always explicitly
code the 0D0A or whatever into the stream. (This is, in fact, what I believe
is necessary to represent many things in quoted-printable.) Why is this
fragile in any way?


Because I now can't tell the difference between a line break and a CRLF
sequence in binary data.  And unless we're willing to say that
"quoted-printable data is always line-oriented, never binary" this is a
problem.  

I see two possible solutions.  Leaving the text as-is is fine with me,
because I don't think there are any real problems with it.  Changing it
as Alain suggests, but adding the proviso that quoted-printable is NOT
to be used for binary data, is also fine with me, since I don't think
anyone in their right mind wants to use it that way anyway.  But I'm not
happy with making Alain's change without explicitly declaring q-p to be
unsuitable for binary data.  -- Nathaniel