Re: NULL

Masataka Ohta writes:

John G. Meyers writes:

...

There are also such various and sundry things such as POP servers,
final-delivery agents (deliver, vacation), mail servers, and so on
which are likely to have problems with NUL.  Also, the existing News
transport cannot deal with NUL.


So, they are broken.

Let motivate implementor's to do he necessary change so that
the transition to binary will be smooth, which is actually
happening on various transports.



There is one thing which troubles me with the "BINARY" transport,
and either you gentlemen (I mean the group, not anybody personally)
don't consider it an issue, or have missed it:


What happens when I want to process email with MIME-structures
in it (with "--ContentBoundaryString"s in it), and there is a
body-part with a UNICODE 16-bit chars in it containing explicite
16-bit CRLF:  000D 000A   ?

Now how my scanner is supposed to recognize:
        CRLF --ContentBoundaryString CRLF
so that it can continue processing on the other bodyparts.
(These are in "8-bit" US-ASCII byte sequences, after all..)

Are those boundary-related CRLF's to be always in 8-bit bytes ?
Aren't there any unicode encoded value  0D0A, which could cause
problems ?

Before there was MIME, I lived in "just-send-8" universe, and did (once)
a BINARY transmission of a couple MB of TeX dvi-file.   I was apparently
lucky, or then UNIX->UNIX transport did encoding and decoding of
LF -> CRLF -> LF on line ends exactly symmetrically (likely it did).
That file did traverse thru, and after a cleanup (head & tail), it worked!


Now with MIME "BINARY" I understand that sometimes the UNIX-LF terminated
lines can become rather long, though for a state-machine scanners such
matters aren't a problem..  (Coding line-wise scanners is a lot simpler,
though..)

However what WILL BE a problem is the treatment of the binary UNICODE CRLF.
When UNIX sends such, it conventionally assumes that any LF is a valid place
to convert to CRLF on the SMTP output (+- dot-insert/-removal).

If similarly implemented UNIX SMTP is the receiver, such data will likely
be converted back to the original, but when the receiving SMTP is on any
system with different end-of-line convention, it will get corrupted data!
                000A -> 00 0D 0A -> 00 0D 0A    (VMS, MSDOS)
                000A -> 00 0D 0A -> 00 0D       (Mac)


Consider also, that UNIX-platforms are likely to use 16-bit LFs
instead of CRLFs!  Now should the scanner:
        1) recognize header reporting  CTE: BINARY,  with
           CT: text/xx; charset=UNICODE-x-x
           and treat it's  LF (000A) so that the char to be
           generated is 16 bit CR for its company, or
        2) handle things like presently (and likely break)

or should we never to encorage the use of BINARY, instead encourage BASE64 ?
or should we provide additional document on its use, like:
        - how to do MIME-downgrade with CTE: BINARY
        - how to do SMTP with CTE: BINARY
        - create a new transport which has tagged data format (ASN.1 ?)
          with binary component length information ?  ( Sounds too much
          like X.400...  we could get it right, but hardly as an accepted
          standard... )

On a hypothetical folder format capable to store binary material along
with its length-tag I would be glad to tag the part content as CTE: BINARY,
however while living in the non-perfect world with previous implementations
(*), I guess I have to store such material in BASE64, don't I ?

(*): The UNIX "mail folder"-format is an awfull kludge, but it is one of
     the things with which we have to cope with...


I hope you can provide an easy answer, like RFC NNNN  chapter xx.yy.zz.aa ...
(Somehow I doubt it..)

                                              Masataka Ohta


        /Matti Aarnio   <mea(_at_)utu(_dot_)fi> 
<mea(_at_)nic(_dot_)funet(_dot_)fi>

PS:     I find it quite plausible for the IBM mainframe users to use
        charset=EBCDIC-xxxx, but their MTAs are better to be able to
        translate them to relevant ISO-charsets, when talking to the
        outside of the EBCDIC universe...    There are other pitfalls
        on that area, which relate to a lack of full reversible globally
        used ASCII-EBCDIC translation table, but I am afraid it is
        a loose/loose situation anyway..