nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects

2014-06-17 13:48:46
On Tue, Jun 17, 2014 at 1:23 PM, Ken Hornstein wrote:
So you are saying that "normal unix commands", such as grep, wc, tr
etc, do or someday the GNU versions will, know about UTF-8, at least
for file contents, if not for file names?
...
There's an implicit assumption in nmh that messages in the message store
are valid RFC 5322 messages and can always be treated as such (see
dist and forw, for starters).

Some anecdotal experience that may be of interest:

I've had to deal with messages that have non-ASCII messages in headers,
so they can occur in the wild, and usually occur in non-English locales,
but can still occur in English locales where special characters (e.g.
English pound, euro) are used.

In a program I developed that has to parse emails, I had to provide a
configuration option that instructed the program what the default
character encoding should be when parsing message headers because of
this.  The MIME RFCs say US-ASCII is the default, but the real world
indicates this is not always the case.  Not sure what nmh does when
encountering such data.

As for message storage, nothing prohibits nmh from auto-converting (aka
normalizing) non-ASCII encoded data to UTF-8 when storing the message.
The underlying message parsing tools of nmh should not be affected (but
others would have to confirm this).  This would allow standard Unix
tools, or other tools like search indexing tools, to process the files
w/o having to do full MIME-aware parsing.  Also, it would avoid the
on-the-fly decoding of non-ASCII headers by nmh each time it reads a
message (for pick, show, scan, etc).

Noramlizing a message headers may be a problem for cases where message
headers may be signed (e.g. DKIM) and if there is a desire to reverify
such signatures later.  Unsure if this is something that is of a real
concern.  If normalization was ever to be supported in nmh, it should be
a configurable option so those concerned of such scenarios are assured
that the message data is left as-is.

--ewh

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>