ietf-822
[Top] [All Lists]

Re: Is 8BIT ESTMP really needed

2001-05-14 13:56:50
The mailbot has sddenly stopped accepting posts to the list. This is a
repost.

In <01K3E3D0XPV6002XZW(_at_)mauve(_dot_)mrochek(_dot_)com> 
ned(_dot_)freed(_at_)mrochek(_dot_)com writes:

What about them? AFAIK there has never been an application type defined 
which
specified that some sort of line ending canonicalization is to be 
performed.

Or not performed.

Surely there are plenty of application types which simply say "this data
consists of lines of text in the following format" with the expectation
that machines will be able to act on it, and that they will know what the
end of a line looks like when they see one (there is a general expectation
in the Unix world that lines of text will be terminated by NL, and in the
DOS world that they will be terminated by CRLF).

Of course there are, but this doesn't mean that canonicalization rules have
been defined for such types. Lacking such canonicalization rules (or
alternately lacking knowledge of such rules), application content encoded as
quoted-printable or base64 has to be treated as binary; just because you might
be able to switch CR or LF or whatever doesn't mean you're allowed to.

Indeed, we defined three such application types in the USEFOR draft with
the expectation that the "system" would ensure they went out on the wire
with CRLF on them as part of its normal operation (the usual encoding
being 8bit). Should we have said more?

No. When you send out something with an encoding of 8bit, you're asserting that
the canonical form of the material is lines terminated by CRLFs. That's all
that's needed -- the correct way to encode this into quoted-printable or base64
is obvious. Now, once you're there you're basically stuck with it when you're
an application type. As I said before, upconversion to 7bit or 8bit is more or
less restricted to text types. (And the concerns that drive the desire
to upconvert only apply to text anyway...)

PDF is pure binary material. There are no line endings in it that
can be safely canonicalized. The same is true of Postscript -- if you
try and canonicalize the line endings in Postscript often as not you'll
break it.

As a matter of interest, I checked what Ghostscript did, and it accepted
naked CR, naked LF or naked CRLF as the end of a line (which is consistent
with the spec. in the Red Book).

You need to read the specifications more carefully before drawing such
conclusions. While it true that trivial programs won't be sensitive to the line
terminator you use (in fact they won't care if you no use no terinators at
all), there are operators that are sensitive to line terminators, either
because they involve character counts or else because they process binary
material. The readstring operator is one such but there are others.

                                Ned


<Prev in Thread] Current Thread [Next in Thread>