nmh-workers
[Top] [All Lists]

Re: Bug reported regarding Unicode handling in email address

2021-06-11 13:07:40
That actually brings up one point I have wondered about, and which might
help here - my recollection (it has been a long time since I tested this,
so things might have changed) is that nmh doesn't like receiving drafts
with MIME fields in the header (including particularly for right now) a
Content-Type field - is that still true?   If so, does it need to be?

As I understand your question ... no, that is not true (with a few caveats).

We finally decided, I think around nmh 1.5 (or 1.6), to automatically run
"mhbuild" on all drafts because nmh users had the unpleasant habit of
doing things like sending out unencoded UTF-8 because that was very easy
to do unless you explicitly configured it otherwise.

Now PRIOR to that, if you had the AUTOMHNPROC environment variable set
(I think), send would also run mhbuild (back then, mhn).  _If_ you did
that AND your draft contained a MIME header like Content-Type, you'd get
an error.

What we did was add a new flag to mhbuild, -auto, and send would run
mhbuild on the draft with the -auto flag.  The two changes -auto does are
it disables mhbuild directives AND if mhbuild sees a MIME header, it silently
exits without error.  The assumption there is if the outgoing draft already
has MIME headers then either the user knew what he/she was doing and we
shouldn't mess it it, or you already ran mhbuild once on the draft
explicitly.  So if you provide send(1) with a draft with the proper
MIME headers then everything should work just fine.

That in a sense raises another question, what do we do when replying to
a message which is in some (perhaps exotic, like TIS-674) charset, and
quoting parts of that message, when my locale is "C" (or something
else different) ?   Clearly converting everything to UTF8 would
allow it all to work, but whose responsibility is it to do that, and
when does it happen?

Sigh.  We haven't QUITE covered all of the combinations yet.  There is
some kind of add-on tooling that makes this easier but not perfect.
The short answer is the general trend is to call iconv() to convert
the source characters to the native character set (based on the locale),
and then your favorite editor will understand the characters in the
reply message.  If iconv() fails on a character we insert a substitution
character.  This obviously works best if your local character set is
UTF-8.  I am aware that some people, for reasons I cannot comprehend,
want to run in the "C" locale but PRETEND that their character set
is UTF-8 and this approach does not work for them.  To these people I
can only say ¯\_(ツ)_/¯.

--Ken


<Prev in Thread] Current Thread [Next in Thread>