Re: [ISSUE] UTF-8 CRLF


* David Shaw:

The last sentence of section 5.9 reads:

  Text data is stored with <CR><LF> text endings (i.e. network-normal
  line endings).  These should be converted to native line endings by
  the receiving software.

Suggest to add:

  For the 'u' UTF8 literal packet, the minimal UTF8 encoding for the
  <CR><LF> line endings SHOULD be used.  That is, 0x0D 0x0A and not
  0xC0 0x8D 0xC0 0x8A or other multibyte encodings.


This isn't valid UTF-8.  A UTF-8 implementation MUST NOT decode these
octets, but MUST flag an error.  The most recent UTF-8 RFC is quite
explicit in this regard.

The UTF-8 issue I mentioned previously arises because Unicode has
additional characters with line-ending semantics.  There used to be a
Unicode Technical Report on this topic, but it has been superseded by
section 5.8 in Unicode 4.0:

  <http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf>

<Prev in Thread]	Current Thread	[Next in Thread>
[ISSUE] UTF-8 CRLF, David Shaw Re: [ISSUE] UTF-8 CRLF, Florian Weimer <= Re: [ISSUE] UTF-8 CRLF, David Shaw Re: [ISSUE] UTF-8 CRLF, Jon Callas Re: [ISSUE] UTF-8 CRLF, Jon Callas

Previous by Date:	[ISSUE] Some capitalization oddities, David Shaw
Next by Date:	[ISSUE] End-of-line whitespace in 0x01 sigs, David Shaw
Previous by Thread:	[ISSUE] UTF-8 CRLF, David Shaw
Next by Thread:	Re: [ISSUE] UTF-8 CRLF, David Shaw
Indexes:	[Date] [Thread] [Top] [All Lists]