ietf
[Top] [All Lists]

Re: Troubles with UTF-8

2005-12-30 03:17:19
Randy Presuhn wrote:

  [Tom Petch said:]
I was using the 'illegal syntax' to float an alternative
approach, like using %xC1 - which is illegal in UTF-8

Illegal today, it wasn't for some time.  My UTF-8 "decoder"
script would return one SUB for a %xC1 plus the next octet.
%xFF and %xFE were always illegal, %xFD was the worst case
for 5*6+1 bits u+7FFFFFF in UCS-4.
  
that idea does not seem to have caught on within the IETF.

u+FFFF (UTF-8 %xEFBFBF) is guaranteed to be no character, it
is AFAIK reserved for this purpose.  But not "on the wire".
 
I think the use of explicitly encoded length, rather than
special terminator or deliminator sequences, is simpler to
code and debug, as well as being more robust in avoiding
buffer overflow problems, etc.

Yes, abusing %xFF or similar tricks would be like an PDU with
an empty header and a constant trailer.  Your idea "length in
the header" (and maybe a checksum as trailer ?) is better.  

If that hits the limit for encoded lengths add a mechanism for
a "more" flag, or chunks with a "length = 0 is the end", etc.

Reserving NUL as a special terminator is a C library-ism.

A leading length has its own drawbacks if you want a string
with more than 255 octets after one octet for the length. ;-)

history has shown that the use of this kind of mechanism,
rather than explicitly tracking the string's length, was a
mistake.

<CRLF> or whatever isn't too bad with a decent maximal line
length (like 1000).  If you want arbitrary encoded lengths you
would need a delimiter to separate the length from the SDU, or
another trick for this effect.  Attackers could then try their
luck with huge encoded lengths.
                                Bye, Frank



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>