nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 21:24:14

On Oct 17, 2016, at 6:39 PM, Ken Hornstein <kenh(_at_)pobox(_dot_)com> wrote:

What it refuses to do now is create improperly-formatted email messages
when it cannot identify the character set.  Before it would happily
send these messages out; THAT has been broken for twenty years and was
only recently fixed.

And if we're voting ... I would rather have only one additional way to
specify a nmh-specific locale (well, I'd rather have ZERO additional
ways, but I think more than one way is overkill).

(And it occurs to me that even setting the locale properly probably
will not fix your specific problem, as you have described it; forwarding
messages using MIME will).

The underlying problem is that locales were built before anyone really 
understood the problem.  For one, they assume symmetry on input and output; 
there is no LC_CTYPE_INPUT and LC_CTYPE_OUTPUT.

This is why Plan9 punted on the entire issue and said UTF-8 everywhere.  Do 
what you want outside, but it's your job to convert to UTF-8 before you talk to 
or from the tools.  And they provided a command line tool to do just that.  If 
you look at the Plan9 mail system, it's all UTF-8 internally.  When mail comes 
in over the wire, the appropriate MIME charset= parameters are used to convert 
content to UTF-8 for display (upas/fs takes care of this).  By definition, all 
input is UTF-8.

If we were to use $LANG/$LC_CTYPE to convert incoming data to UTF-8 in the same 
manner, and process (and store!) everything internally as UTF-8, all of this 
nonsense would go away.  Similarly, we could convert from UTF-8 -> 
$LANG/$LC_CTYPE on the way out.  And we could ship everything off-site with one 
of only two character sets: ascii, or utf8.

Good grief, even Microsoft has figured this out :-P  Yes, someone has to write 
the code.  Let's ship 1.7 (if Ralph ever stops committing!), then do 1.8 (the 
SSL/TLS stuff).  And then let's branch for 2.0 and go for a top-to-bottom UTF-8 
runtime.  I've been pharting around with this for a couple of years now in my 
own private branch.  It's not trivial, but it's doable.  And maybe *mh should 
lead the way again, for the first time in a few decades.

--lyndon


_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>