ietf
[Top] [All Lists]

Re: Troubles with UTF-8

2005-12-30 05:36:43
----- Original Message -----
From: "Randy Presuhn" <randy_presuhn(_at_)mindspring(_dot_)com>
To: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, December 28, 2005 9:46 PM
Subject: Re: Troubles with UTF-8

From: "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>
To: "Julian Reschke" <julian(_dot_)reschke(_at_)gmx(_dot_)de>
Cc: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, December 28, 2005 8:06 AM
Subject: Re: Troubles with UTF-8
...
I agree, for XML, but my main concern is with UTF-8 encoded strings, where
FormFeed is a legal character, encoded as it would be in ASCII.  I was using
the
'illegal syntax' to float an alternative approach, like using %xC1 - which
is
illegal in
UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to
have
caught on  within the IETF.
...

I think the use of explicitly encoded length, rather than special terminator
or deliminator sequences, is simpler to code and debug, as well as
being more robust in avoiding buffer overflow problems, etc.  This
is especially true given the variable-length encoding inherent in UTF-8,
as well as the open-ended way that combining marks follow, rather than
precede the characters to which they apply.  (I think this was the "state"
that Masataka Ohta was referring to.)

Reserving NUL as a special terminator is a C library-ism.  I think that
history has shown that the use of this kind of mechanism, rather than
explicitly tracking the string's length, was a mistake.

Randy


I agree with you for 'binary' protocols, intended for machine consumption (OSPF,
SNMP), where the string is usually wrapped up in a binary-encoded TLV; but not
for character ones, intended for humans, in +-printable characters, with
positional or keyword parameters (LDAP[RFC2254], SDP[RFC2327], SASL OTP[RFC2444]
or MIME) where a numeric length, in ASCII characters, would look a little odd to
me.

I always saw NUL as an **IX-ism more than a C-ism, and so of wider use, could be
wrong on that.

Tom Petch


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>