----- Original Message -----
From: "Julian Reschke" <julian(_dot_)reschke(_at_)gmx(_dot_)de>
To: "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>
Cc: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, December 28, 2005 4:16 PM
Subject: Re: Troubles with UTF-8
Tom.Petch wrote:
----- Original Message -----
From: "Harald Tveit Alvestrand" <harald(_at_)alvestrand(_dot_)no>
To: "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>; "Ned Freed"
<ned(_dot_)freed(_at_)mrochek(_dot_)com>
Cc: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, December 28, 2005 1:30 PM
Subject: Re: Troubles with UTF-8
--On onsdag, desember 28, 2005 10:09:05 +0100 "Tom.Petch"
<sisyphus(_at_)dial(_dot_)pipex(_dot_)com> wrote:
The Unicode data I am thinking of may have come from an upper layer
protocol and needs to be passed transparently (as with an error or hello
message, identity even); it may or may not already be NUL-terminated
(ever had that security foul-up where some userid/password are
entered/stored NUL-terminated and some are not?) - hence I see the need
to terminate the string in some other way, or to escape or in some other
way transfer encode (parts of) the string. I looked at existing RFC,
found many different approaches, all viable but none that really said to
me 'this is good engineering, this is best practice'. Hence, floating
the issue to see if there were any better ones out there. I think not,
which is of itself worth knowing.
There are many strong opinions around "proper" treatment of XML and of
text, and it would be a shame to ask for advice now, reach a seemingly
reasonable conclusion, and then encounter violent objections at IETF Last
Call.
The 'illegal syntax' is not yet an RFC but is in
draft-ietf-netconf-ssh-05.txt
which says
"As the previous example illustrates, a special character sequence,
]]>]]>, MUST be sent by both the client and the server after each XML
document in the NETCONF exchange. This character sequence cannot
legally appear in an XML document, so it can be unambigiously used to
indentify the end of the current document, allowing resynchronization
of the NETCONF exchange in the event of an XML syntax or parsing
error."
For me, that is ok; the 'illegal syntax' is part of the transport syntax not
part of the XML syntax and so is not illegal, if you follow me:-)
...
Why don't you use an illegal *character* instead, such as Formfeed?
That's certainly easier to parse...
Best regards, Julian
I agree, for XML, but my main concern is with UTF-8 encoded strings, where
FormFeed is a legal character, encoded as it would be in ASCII. I was using the
'illegal syntax' to float an alternative approach, like using %xC1 - which is
illegal in
UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to have
caught on within the IETF.
Tom Petch
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf