nmh-workers
[Top] [All Lists]

Re: Bug reported regarding Unicode handling in email address

2021-06-11 05:30:17
    Date:        Thu, 10 Jun 2021 18:16:42 -0400
    From:        Ken Hornstein <kenh(_at_)pobox(_dot_)com>
    Message-ID:  
<20210610221648(_dot_)A1CD0C9403(_at_)pb-smtp2(_dot_)pobox(_dot_)com>

  | I feel compelled to point out that when we find 8-bit characters we use
  | the user's locale to find the character set to construct the appropriate
  | MIME headers.

That's all fine, my previous message wasn't so much about nmh, as about
the suggestion (which I have seen before) that nmh is, or could become,
some kind of stand-alone system (I was going to say closed, but I don't
mean it in the not open source sense) where it can control its environment.
It isn't, and we (or I at least) don't want it to be.

  | So if your 24 year old draft (really?)

Yes.

It is (was really, nothing will ever happen to that one, except perhaps rm)
a reply to an FTP Extensions IETF working group mailing list message, on
what must have been a fairly early draft of what became RFC3659.

The message I was replying to (according to a quote that is in my unsent
reply) contained text like "While nobody but a drug-crazed lunatic would
consider such an approach, ..." which might have been why I hesitated to send
the reply (I will leave it for you to imagine what the reply might have
contained) or it might have been that I simply paused to read later messages
on the list before sending the reply, and discovered that someone else had
already said everything I was planning on saying.   Or who knows ... it is
all far too long ago for me to remember anything at all about it!

  | was edited using ISO8859-1 because it pre-dates UTF-8

This was me... it would have been edited using ASCII - the issue isn't
anything related to my ancient message(s) just to the assumption that we
can ever know anything about any files in the ${HOME}/$(mhparam path) tree,
aside from what we can deduce from the content of the files themselves.

[Aside: Occasionally when I have an unsent draft, particularly intended
for a mailing list like this one was, and I end up deciding not to send it,
I will refile it to the mailing list's nmh folder - so one should also not
assume that messages that aren't drafts have ever been seen by any nmh process,
except refile, which is just "ln" (or "mv") on steriods (and sometimes if
I can be bothered to work out what the message number should be, I would
just use mv.)

That actually brings up one point I have wondered about, and which might
help here - my recollection (it has been a long time since I tested this,
so things might have changed) is that nmh doesn't like receiving drafts
with MIME fields in the header (including particularly for right now) a
Content-Type field - is that still true?   If so, does it need to be?

If the draft contained Content-Type, right from the beginning (either auto set
as part of repl or comp processing, or manually inserted), then we wouldn't
need to be guessing what charset it was using, would we?   It can be updated
when appropriate, either in an editor, to switch charset, or by nmh processing
when handling attachments, etc.

For a while in the intervening period (and so possibly for some of my hundred
or so intervening unsent drafts) I might have been using 8859-15 (I think 15,
but I might be confusing that with 10646-15 ... TIS-674 anyway) chars, as
I used to need to reply to messages sent that way (or whatever wintrash
calls its equivalent).

I certainly wouldn't expect nmh to guess that (how could it?) but it would
be nice if there was a convenient way to tell it, aside from what my current
locale happens to be (my current laptop is newer than my need to deal with
those old work related messages, so I have never bothered to set it up to
handle any of that properly).

That in a sense raises another question, what do we do when replying to
a message which is in some (perhaps exotic, like TIS-674) charset, and
quoting parts of that message, when my locale is "C" (or something
else different) ?   Clearly converting everything to UTF8 would
allow it all to work, but whose responsibility is it to do that, and
when does it happen?


In a slightly later reply to my message, ralph(_at_)inputplus(_dot_)co(_dot_)uk 
said:

  | So my thinking is the spool-file's writer will either be something like
  | Postfix which declares support for SMTPUTF8, is handed UTF-8, and AFAICS
  | stores it verbatim,

On my system the spool file is written by /usr/libexec/mail.local (which would
be invoked by postfix if I used that) but can also be run from anything.

While the most common practice is to get to it via the system's MTA, it
also gets invoked by other things, and will write whatever messages they
hand it (doing no processing other than inserting the mail spool "From ..."
separator line and '>' quoting a leading "From " on any other line).

There is no expectation (by it) that messages are necessarily in 822 (or
its  successors) format, though at least some semblance of a relationship
to that is usually maintained.

mail.local's main purpose is locking the spool file so only one message
gets appended at a time...

kre


<Prev in Thread] Current Thread [Next in Thread>