Matti Aarnio <mea(_at_)nic(_dot_)funet(_dot_)fi> writes:
What happens when I want to process email with MIME-structures
in it (with "--ContentBoundaryString"s in it), and there is a
body-part with a UNICODE 16-bit chars in it containing explicite
16-bit CRLF: 000D 000A ?
Now how my scanner is supposed to recognize:
CRLF --ContentBoundaryString CRLF
so that it can continue processing on the other bodyparts.
(These are in "8-bit" US-ASCII byte sequences, after all..)
It is supposed to look for the octet sequence:
0D 0A 2D 2D 43 6F 6E 74 65 6E 84 42 6F 75 6E 64 61 72 79 53 74 72
69 6E 67
Followed by an possible 2D 2D sequence, followed by any number of 09
or 20 octets, followed by a 0D 0A sequence.
Just like the MIME grammar says.
Are those boundary-related CRLF's to be always in 8-bit bytes ?
The MIME grammar is defined in terms of octets.
Aren't there any unicode encoded value 0D0A, which could cause
problems ?
It could cause problems only if the body part contained a delimiter
octet sequence, in violation of the MIME spec.
However what WILL BE a problem is the treatment of the binary UNICODE CRLF.
When UNIX sends such, it conventionally assumes that any LF is a valid place
to convert to CRLF on the SMTP output (+- dot-insert/-removal).
Binary transport won't work in general if binary data (such as
UNICODE) goes through newline convention conversion. Trying to
determine in a transport whether parts of a message should or should
not go through newline conversion is in general a futile task, I doubt
binary transport will work effectively unless people switch to local
mail formats which leave messages in network canonical format.
--
_.John G. Myers Internet: jgm+(_at_)CMU(_dot_)EDU
LoseNet: ...!seismo!ihnp4!wiscvm.wisc.edu!give!up