ietf
[Top] [All Lists]

Re: Troubles with UTF-8

2005-12-28 06:07:50
Tom.Petch wrote:

The Unicode data I am thinking of may have come from an upper layer protocol 
and
needs to be passed transparently (as with an error or hello message, identity
even); it may or may not already be NUL-terminated (ever had that security
foul-up where some userid/password are entered/stored NUL-terminated and some
are not?) - hence I see the need to terminate the string in some other way, or
to escape or in some other way transfer encode (parts of) the string.  I 
looked
at existing RFC, found many different approaches, all viable but none that
really said to me 'this is good engineering, this is best practice'.  Hence,
floating the issue to see if there were any better ones out there. I think 
not,
which is of itself worth knowing.

You can do nothing.

That problem is that Unicode is stateful with complex and
indefinitely long term states, which is a lot worse than
properly profiled ISO 2022 such as that of RFC1468, which
is the character encoding most widely used for Japanese.

Unicode is not even finite state, which means some pattern
matching and normalization problems are hard or insolvable.

OTOH, if you start from scratch, you can have encoding with
a lot shorter term and finite states.

                                                Masataka Ohta



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>