Re: UTF-8 and literal packets


On Wed, Mar 24, 2004 at 09:38:50AM -0800, Hal Finney wrote:


Jon Callas writes:

We would like to have a modification to the literal packet, where there 
is a type 'u' packet which is identical to the type 't' packet except 
that in this packet the implementation is saying, "By gum, I *know* 
this really, really contains UTF-8 in it. Trust me. Really."


So how would we define appropriate behavior with respect to the
't' and 'u' packets, on receipt and on creation?  Are we deprecating
't' on creation, and we SHOULD use a 'u' (and UTF-8 of course)?
And then on receipt, what should (or SHOULD) we do?


The way I look at it is that the rule for 't' is to take text and
canonicalize the line endings.  The rule for 'u' is to take text and
canonicalize the line endings and the encoding.  When receiving a
message of either sort, you decanonicalize, doing whatever is
appropriate for your platform.

If possible, use 'u'.  If you can't, whether for lack of information
on the original character set, or even just a minimal implementation
that doesn't have character set recoding ability, then use 't'.  't'
should not be deprecated.

David