On Sat, Jul 26, 2014 at 6:53 PM, Ken Hornstein wrote:
I would really like to see this, too. But, since there is no definition
of what a "canonicalized" header might look like, we would need to
proceed very carefully in defining the semantics around this. Header
parsing is riddled with corner cases and rat holes ...
Well, since this is terra nullius I think we're free to do whatever we
want. I think in the general sense of matching a generic header, the
semantics of header folding are clear.
I am somewhat following along, but thought I'd throw my $0.02 in, and I
apologize if my comments are not fully on topic:
Caution is required if nmh decides to "normalize" header data when it
stores a message. Right now, I am in the camp of leaving things as nmh
gets it for the following reasons:
- Normalization can vary on what it means, and it could mess up some
things that depend on the data not getting altered. For example
DKIM verification, or any other cryptographic-based verification
system that works against mail headers.
Yes, the mail RFC *22 (whatever number it is now) is fairly clear on
folding and unfolding semantics, but normalization goes beyond that.
It has been a long time since I followed DKIM, but normalization was
a topic when it was being developed on signing and validating
headers (in a past life, I even wrote a proprietary specification on
header normalization that dealt with cryptographic signatures of
mail data).
- Unicode is not necessarily embraced by all. In my experience in
dealing with locale support, not everyone speaks Unicode, and in
some cases do not want to (sometimes it is political, other times it
may be technical). I've encountered this when dealing with
Asian-based locales.
It has been a few years since I dealt with such issues, so I am not
sure how valid it is today, and if the nmh project should concern
itself with such matters.
Similar concerns can also be applied to message bodies also.
If normalization is being seriously considered, make it configurable. I
do believe there is benefits in doing so for many, where the concerns I
list above are not applicable. However, the default configuration for
nmh should be conservative (i.e. play it safe) where it does alter the
data it stores and translates/normalizes on-demand.
In my project, I have parsing code that does unfold header fields and
stores in a structure where each field name is an array of values (since
fields can be repeated, like Received). However, after I parse a
message header, I also maintain the its raw, original form.
A well-designed API can hide the details of message storage, but an API
that access header data should support options to retrieve the raw
values and the decoded values (decoding includes things like MIME
decoding of non-ASCII encoded strings). Of course, any "raw" option
does depend on the original mail data is stored untouched and not
normalized when received.
--ewh
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers