nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] nmh internals: full MIME integration

2014-07-29 15:23:41
On Sat, Jul 26, 2014 at 6:53 PM, Ken Hornstein wrote:

I would really like to see this, too.  But, since there is no definition
of what a "canonicalized" header might look like, we would need to
proceed very carefully in defining the semantics around this.  Header
parsing is riddled with corner cases and rat holes ...

Well, since this is terra nullius I think we're free to do whatever we
want.  I think in the general sense of matching a generic header, the
semantics of header folding are clear.

I am somewhat following along, but thought I'd throw my $0.02 in, and I
apologize if my comments are not fully on topic:

Caution is required if nmh decides to "normalize" header data when it
stores a message.  Right now, I am in the camp of leaving things as nmh
gets it for the following reasons:

  - Normalization can vary on what it means, and it could mess up some
    things that depend on the data not getting altered.  For example
    DKIM verification, or any other cryptographic-based verification
    system that works against mail headers.

    Yes, the mail RFC *22 (whatever number it is now) is fairly clear on
    folding and unfolding semantics, but normalization goes beyond that.
    It has been a long time since I followed DKIM, but normalization was
    a topic when it was being developed on signing and validating
    headers (in a past life, I even wrote a proprietary specification on
    header normalization that dealt with cryptographic signatures of
    mail data).

  - Unicode is not necessarily embraced by all.  In my experience in
    dealing with locale support, not everyone speaks Unicode, and in
    some cases do not want to (sometimes it is political, other times it
    may be technical).  I've encountered this when dealing with
    Asian-based locales.

    It has been a few years since I dealt with such issues, so I am not
    sure how valid it is today, and if the nmh project should concern
    itself with such matters.

Similar concerns can also be applied to message bodies also.

If normalization is being seriously considered, make it configurable.  I
do believe there is benefits in doing so for many, where the concerns I
list above are not applicable.  However, the default configuration for
nmh should be conservative (i.e. play it safe) where it does alter the
data it stores and translates/normalizes on-demand.

In my project, I have parsing code that does unfold header fields and
stores in a structure where each field name is an array of values (since
fields can be repeated, like Received).  However, after I parse a
message header, I also maintain the its raw, original form.

A well-designed API can hide the details of message storage, but an API
that access header data should support options to retrieve the raw
values and the decoded values (decoding includes things like MIME
decoding of non-ASCII encoded strings).  Of course, any "raw" option
does depend on the original mail data is stored untouched and not
normalized when received.

--ewh

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>