Fixing RFC 1641

I've read the latest MIME draft, and it makes it quite clear that the
unicode-1-1 (16 bit character) character set cannot be used with
text/plain, since it does not follow =0D=0A CRLF conventions.

I'd like some suggestions from this group on how to go about revising RFC
1641 and the unicode-1-1 character set accordingly (unicode-1-1-utf-7 and
the forthcoming unicode-1-1-utf-8 are unaffected, as they are ASCII-derived
and can follow the CRLF conventions).

I would have preferred a new subtype of text, something like text/wide or
text/binary or whatever (none of them good names), but the new draft says
*all* subtypes of text must follow the CRLF conventions.

I basically see two other options. One is to define a new subtype of
application, say application/binarytext or application/othertext or
something (if anyone can think of a name xxxtext where xxx is pithy and
means "not RFC 822 compatible" I'm all ears). This is probably the most
straightforward thing to do. I considered application/unicode, but this
type could be used for other character sets as well, such as EBCDIC. On the
other hand, there may not be enough non-ASCII character sets to bother
making it a universal type, so maybe application/unicode isn't so bad.

One disadvantage to this is if variants are ever defined (analagous to
text/enriched, etc.), there is no straightforward mechanism to specify them
other than by concatenating them onto the base name, since there is only
one level of subtype. A more radical approach would be to define a new
content type (call it "newtext" for now) that would be like text, but not
subject to 822 backward compatibility issues, and not subject to the
constraint of interoperability with non-MIME systems (none of which will
know what to do with these kinds of character sets anyway). Then you would
have newtext/plain, and so on.

This is a little cleaner, and opens up possibilities for other enhancements
which might be more difficult in the context of the text content type, but
it is clearly a lot more work and more radical. Also, since there are only
a handful of text subtypes defined after a few years, maybe this is not a
big issue.

I would like to fix this problem so that there will be a means of
transmitting Unicode directly, not encoded with UTF-7 or UTF-8, both of
which impose some overhead. Clearly this would not be for interoperability
with non-Unicode or non-MIME sites, but it would be convenient for
communication between sites using Unicode.

Thanks in advance,

----------------------------
David Goldsmith
david_goldsmith(_at_)taligent(_dot_)com
Senior Scientist
Taligent, Inc.
10201 N. DeAnza Blvd.
Cupertino, CA  95014-2233