nmh-workers
[Top] [All Lists]

Re: Bug reported regarding Unicode handling in email address

2021-06-14 11:55:27
Sure, convert to Unicode, work in Unicode, convert back, that is
the way to go.

I know that this is application dependent, but what "work" do you
need to perform on the characters?

I realized back when I was originally looking at i18n issues in nmh we
don't need to perform THAT much work on characters internally.  We DO
do some work when it comes to calculating character width in the format
engine, but that's all in the native character set.  So I realized that
at least for nmh, there's no advantage to converting to Unicode/UTF-8
internally, and a number of disadvantages; like you say, the xlocale
functions are non-portable and you can't really get there with the
existing POSIX APIs.  Converting internally to Unicode would force you
to depend on something like ICU.


Really, the older i get the more i think that UTF-16 is not the
worst decision regarding Unicode.  Surrogate pairs have to be
handled, but for UTF-8 you always have to live with multibyte
anyway.

I guess I think out of all of the possible worlds, UTF-8 is probably
the best compromise.

--Ken

<Prev in Thread] Current Thread [Next in Thread>