Should nmh try to get out in front with email address
internationalzation (EAI)? See resources below.
I've thought about what it would take.
From the MUA perspective, IIUC, it relies on native support on the
host to handle unencoded UTF-8 addresses. Would nmh support just be a
matter of 1) not encoding addresses (controlled by a switch) in
outgoing messages and 2) when showing a message, indicating that an
address couldn't be displayed?
I think it's slightly more complicated than that (see below).
Does anyone have experience using it? Gmail supports it, according
to the article below.
I think the lack of people with such an address means it's pretty uncommon
still, right?
Lyndon writes later:
Since we require a Posix environment, that means utf8 locale support must
be in place, thus all the OS bits are there waiting to be used.
But to do this properly we really need to overhaul the code base to
process everything internally as utf8. That's not a trivial task, but we
have to do it, sooner or later.
Here are my unformed thoughts:
- It's not so easy to deal with characters that aren't in your native locale
using the POSIX API; xlocale make this easier, but it's a pain.
- A super-brief scan suggests to me that SMTPUTF8 support is not widespread
at this point. But that will no doubt change.
- Right now our address parser will reject stuff that contains 8-bit
characters; we need to fix that. In fact, we need to throw out that
address parser and get a new one; I made some progress on that using
flex and bison.
- It's unclear to me how much UTF-8 verification a MUA is supposed to deal
with; are we, for example, supposed to check for overlong UTF-8 encodings?
Valid UTF-8 sequences?
- I do not believe we have to process everything internally as UTF-8, but I
could be persuaded I'm wrong. The real kicker is the format engine;
right now we sort-of cheat a lot. %(decode) basically does a one-stop
decoding and conversion to the native character set. This has a lot of
advantages, but also means we need to sit down and decide what the
format engine is really supposed to be working on; for example, is the
format engine supposed to be dealing with strings pre or post RFC-2047
decoding?
- SMTPUTF8 looks relatively straightforward to implement, at least.
- I would rather not make ICU or IDN a build requirement, but it may be
unavoidable.
--Ken
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers