nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] nmh internals: full MIME integration

2014-07-27 05:08:51
Hi Ken,

Okay, I guess I could see that.  The normal case would be to
decode the contents completely

Yep, to UTF-8 single lines?

Well, to whatever the local character set is.

Ah, OK, my natural inclination is UTF-8 everywhere and convert on I/O,
but we've obviously got a backlog of code to consider.  If the new
header handler is "to local character set", e.g. US-ASCII, then how does
replying to an email with a =?utf-8? subject work?  Does it suffer
lossage as it's ASCII'd before an inferior version reaches the `Subject:
Re:' producer?

Well, you might be thinking the 2047-decoding might not make a lot
of difference, whereas I'm thinking a block can be read into a
page-aligned buffer that has an \n beyond it as a sentinel, then
check for /foo[ \t]*:/i, ignore any non-foo headers, hunt for the
next \n and repeat if it's not the sentinel, else read another block
and try again.  Stop if no more blocks or \n\n.  The detail's a bit
more complex but there's no allocation and copying for headers seen
along the way;  they'll be found when they're looked for in turn.
The file's blocks aren't being modified so no copy-on-write's
occurring.

Sigh.  I wasn't actually thinking of special-casing pick.

Neither was I.  :-)  Most programs that want headers don't want all
headers?  Some want relatively few out of the many that are stuffed in
there nowadays.  It's a bit hard to think of ones that do want them all
with the normal components file?

(As an aside, I see that pick does use ^foo[ \t]*: to match on a header,
but my reading of RFC 5322 is that spaces are not allowed between the
header name and the colon ... but I guess the old syntax did?)

I know other code makes allowances for them, e.g.
http://golang.org/src/pkg/net/textproto/reader.go?s=11934:11987#L475
http://cpansearch.perl.org/src/MARKOV/Mail-Box-2.115/lib/Mail/Box/Parser/Perl.pm
OTOH some does not, e.g.
http://hg.python.org/cpython/file/bffa0b8a16e8/Lib/email/feedparser.py#l33
http://cpansearch.perl.org/src/RJBS/Email-Simple-2.203/lib/Email/Simple/Header.pm
Perl straddles the fence.  :-)

I've just had a pooter through folders here looking for them with

    LC_ALL=C awk '/^$/ {nextfile} /^[!-9;-~]+[^!-~]/ {print FILENAME ":" $0}' 
[1-9]*

and all it turned up where 69 "From " lines at the start of some of the
older emails I have.  (This surprised me, but if a bit off-topic.)

So I vote to drop support for these kind of invalid headers unless
anyone here has some that show they're common?

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>