* David Shaw:
The last sentence of section 5.9 reads:
Text data is stored with <CR><LF> text endings (i.e. network-normal
line endings). These should be converted to native line endings by
the receiving software.
Suggest to add:
For the 'u' UTF8 literal packet, the minimal UTF8 encoding for the
<CR><LF> line endings SHOULD be used. That is, 0x0D 0x0A and not
0xC0 0x8D 0xC0 0x8A or other multibyte encodings.
This isn't valid UTF-8. A UTF-8 implementation MUST NOT decode these
octets, but MUST flag an error. The most recent UTF-8 RFC is quite
explicit in this regard.
The UTF-8 issue I mentioned previously arises because Unicode has
additional characters with line-ending semantics. There used to be a
Unicode Technical Report on this topic, but it has been superseded by
section 5.8 in Unicode 4.0:
<http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf>