Re: Troubles with UTF-8

----- Original Message -----
From: "Julian Reschke" <julian(_dot_)reschke(_at_)gmx(_dot_)de>
To: "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>
Cc: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, December 28, 2005 4:16 PM
Subject: Re: Troubles with UTF-8

Tom.Petch wrote:

----- Original Message -----
From: "Harald Tveit Alvestrand" <harald(_at_)alvestrand(_dot_)no>
To: "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>; "Ned Freed"

<ned(_dot_)freed(_at_)mrochek(_dot_)com>

Cc: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, December 28, 2005 1:30 PM
Subject: Re: Troubles with UTF-8

--On onsdag, desember 28, 2005 10:09:05 +0100 "Tom.Petch"
<sisyphus(_at_)dial(_dot_)pipex(_dot_)com> wrote:

The Unicode data I am thinking of may have come from an upper layer
protocol and needs to be passed transparently (as with an error or hello
message, identity even); it may or may not already be NUL-terminated
(ever had that security foul-up where some userid/password are
entered/stored NUL-terminated and some are not?) - hence I see the need
to terminate the string in some other way, or to escape or in some other
way transfer encode (parts of) the string.  I looked at existing RFC,
found many different approaches, all viable but none that really said to
me 'this is good engineering, this is best practice'.  Hence, floating
the issue to see if there were any better ones out there. I think not,
which is of itself worth knowing.

There are many strong opinions around "proper" treatment of XML and of
text, and it would be a shame to ask for advice now, reach a seemingly
reasonable conclusion, and then encounter violent objections at IETF Last
Call.

The 'illegal syntax' is not yet an RFC but is in

draft-ietf-netconf-ssh-05.txt

which says
   "As the previous example illustrates, a special character sequence,
    ]]>]]>, MUST be sent by both the client and the server after each XML
    document in the NETCONF exchange.  This character sequence cannot
    legally appear in an XML document, so it can be unambigiously used to
    indentify the end of the current document, allowing resynchronization
    of the NETCONF exchange in the event of an XML syntax or parsing
    error."
For me, that is ok; the 'illegal syntax' is part of the transport syntax not
part of the XML syntax and so is not illegal, if you follow me:-)
...

Why don't you use an illegal *character* instead, such as Formfeed?
That's certainly easier to parse...

Best regards, Julian


I agree, for XML, but my main concern is with UTF-8 encoded strings, where
FormFeed is a legal character, encoded as it would be in ASCII.  I was using the
'illegal syntax' to float an alternative approach, like using %xC1 - which is
illegal in
UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to have
caught on  within the IETF.

Tom Petch


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Accessibility was Re: Troubles with UTF-8, (continued) Re: Accessibility was Re: Troubles with UTF-8, Harald Tveit Alvestrand Re: Troubles with UTF-8, Ned Freed Re: Troubles with UTF-8, JFC (Jefsey) Morfin ABNF Re: Troubles with UTF-8, Tom.Petch Re: Troubles with UTF-8, Tom.Petch Re: Troubles with UTF-8, Ned Freed Re: Troubles with UTF-8, Tom.Petch Re: Troubles with UTF-8, Harald Tveit Alvestrand Re: Troubles with UTF-8, Tom.Petch Re: Troubles with UTF-8, Julian Reschke Re: Troubles with UTF-8, Tom.Petch <= Re: Troubles with UTF-8, Randy Presuhn Re: Troubles with UTF-8, Frank Ellermann Re: Troubles with UTF-8, Tom.Petch Re: Troubles with UTF-8, Masataka Ohta Re: Troubles with UTF-8, Tom.Petch Re: Troubles with UTF-8, James Cloos Re: Troubles with UTF-8, Tim Bray Re: Troubles with UTF-8, Frank Ellermann Re: Troubles with UTF-8, Ned Freed Re: Troubles with UTF-8, Tom.Petch