nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] EAI?

2015-08-09 00:58:30
Should nmh try to get out in front with email address
internationalzation (EAI)?  See resources below.

I've thought about what it would take.

From the MUA perspective, IIUC, it relies on native support on the
host to handle unencoded UTF-8 addresses.  Would nmh support just be a
matter of 1) not encoding addresses (controlled by a switch) in
outgoing messages and 2) when showing a message, indicating that an
address couldn't be displayed?

I think it's slightly more complicated than that (see below).

Does anyone have experience using it?  Gmail supports it, according
to the article below.

I think the lack of people with such an address means it's pretty uncommon
still, right?

Lyndon writes later:

Since we require a Posix environment, that means utf8 locale support must 
be in place, thus all the OS bits are there waiting to be used.

But to do this properly we really need to overhaul the code base to 
process everything internally as utf8.  That's not a trivial task, but we 
have to do it, sooner or later.

Here are my unformed thoughts:

- It's not so easy to deal with characters that aren't in your native locale
  using the POSIX API; xlocale make this easier, but it's a pain.

- A super-brief scan suggests to me that SMTPUTF8 support is not widespread
  at this point.  But that will no doubt change.

- Right now our address parser will reject stuff that contains 8-bit
  characters; we need to fix that.  In fact, we need to throw out that
  address parser and get a new one; I made some progress on that using
  flex and bison.

- It's unclear to me how much UTF-8 verification a MUA is supposed to deal
  with; are we, for example, supposed to check for overlong UTF-8 encodings?
  Valid UTF-8 sequences?

- I do not believe we have to process everything internally as UTF-8, but I
  could be persuaded I'm wrong.  The real kicker is the format engine;
  right now we sort-of cheat a lot. %(decode) basically does a one-stop
  decoding and conversion to the native character set.  This has a lot of
  advantages, but also means we need to sit down and decide what the
  format engine is really supposed to be working on; for example, is the
  format engine supposed to be dealing with strings pre or post RFC-2047
  decoding?

- SMTPUTF8 looks relatively straightforward to implement, at least.

- I would rather not make ICU or IDN a build requirement, but it may be
  unavoidable.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>