ietf-822
[Top] [All Lists]

Re: NULL

1994-10-22 10:59:24
Matti Aarnio <mea(_at_)nic(_dot_)funet(_dot_)fi> writes:
What happens when I want to process email with MIME-structures
in it (with "--ContentBoundaryString"s in it), and there is a
body-part with a UNICODE 16-bit chars in it containing explicite
16-bit CRLF:  000D 000A   ?

Now how my scanner is supposed to recognize:
        CRLF --ContentBoundaryString CRLF
so that it can continue processing on the other bodyparts.
(These are in "8-bit" US-ASCII byte sequences, after all..)

It is supposed to look for the octet sequence:

0D 0A 2D 2D 43 6F 6E 74 65 6E 84 42 6F 75 6E 64 61 72 79 53 74 72
69 6E 67 

Followed by an possible 2D 2D sequence, followed by any number of 09
or 20 octets, followed by a 0D 0A sequence.

Just like the MIME grammar says.

Are those boundary-related CRLF's to be always in 8-bit bytes ?

The MIME grammar is defined in terms of octets.

Aren't there any unicode encoded value  0D0A, which could cause
problems ?

It could cause problems only if the body part contained a delimiter
octet sequence, in violation of the MIME spec.

However what WILL BE a problem is the treatment of the binary UNICODE CRLF.
When UNIX sends such, it conventionally assumes that any LF is a valid place
to convert to CRLF on the SMTP output (+- dot-insert/-removal).

Binary transport won't work in general if binary data (such as
UNICODE) goes through newline convention conversion.  Trying to
determine in a transport whether parts of a message should or should
not go through newline conversion is in general a futile task, I doubt
binary transport will work effectively unless people switch to local
mail formats which leave messages in network canonical format.

-- 
_.John G. Myers         Internet: jgm+(_at_)CMU(_dot_)EDU
                        LoseNet:  ...!seismo!ihnp4!wiscvm.wisc.edu!give!up

<Prev in Thread] Current Thread [Next in Thread>