Just when you think that in a third of a century of doing e-mail
you've seen every possible way to screw things up, new ways get invented.
So I have this in my .procmailrc:
TMPFILE=`mktemp -p /home/valdis/tmp fixmsg.XXXXXXXXXX`
# Canonify to 8-bit UTF-8
:0 wf
*!^Content-type:.*multipart/signed
| tee $TMPFILE | mhfixmsg -noverbose -file - -outfile -
(The tee, and the check for content-type because I didn't understand why pgp
signatures were going bad sometimes.)
Found one in my inbox today from the ACLU that ended:
(...)
Content-Transfer-Encoding: 7bit
Content-Type: multipart/alternative;
boundary="----=_NextPart_749_A80A_5A5AF88C.647A7A28"
MIME-Version: 1.0
Message-ID: <16822359(_dot_)478931(_at_)aclu(_dot_)org>
X-ReportingKey:
MJ4CBHM1EIHT38HVKK6B0_JJ3CFC-J7948DM54E1V::valdis(_at_)vt(_dot_)edu::1_478931
Subject: Sessions
Date: Fri, 18 Nov 2016 18:01:22 -0500
To: valdis(_at_)vt(_dot_)edu
Reply-To: aclu(_at_)aclu(_dot_)org
From: "Anthony D. Romero, ACLU Action" <aclu(_at_)aclu(_dot_)org>
X-Gm-Spam: 0
X-Gm-Phishy: 0
X-Gm-Spam: 0
X-Gm-Phishy: 0
------=_NextPart_749_A80A_5A5AF88C.647A7A28--
(one blank line after the separator).
Why? Because the *input* (from that tee, so as it came into procmail) had:
Subject: Sessions
Date: Fri, 18 Nov 2016 18:01:22 -0500
To: valdis(_at_)vt(_dot_)edu
Reply-To: aclu(_at_)aclu(_dot_)org
From: "Anthony D. Romero, ACLU Action" <aclu(_at_)aclu(_dot_)org>
X-Gm-Spam: 0
X-Gm-Phishy: 0
X-Gm-Spam: 0
X-Gm-Phishy: 0
------=_NextPart_749_A80A_5A5AF88C.647A7A28
Content-Type: text/plain;
charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Hi Valdis -
and here's the next line, fed into od -cx:
0000000 P r e s i d e n t - e l e c t
7250 7365 6469 6e65 2d74 6c65 6365 2074
0000020 D o n a l d T r u m p j u s
6f44 616e 646c 5420 7572 706d 6a20 7375
0000040 t a n n o u n c e d h e b \0
2074 6e61 6f6e 6e75 6563 2064 6568 0062
0000060 031 s n o m i n a t i n g S e
7319 6e20 6d6f 6e69 7461 6e69 2067 6553
and mhfixmsg just went nuts when it hit that \0. The exact failure mode depends
on how far into the bodypart the \0 is - sometimes the message body goes
bye-bye,
other times the a chunk of text disappears and the next line is adjoined to
the front half of the previous line.
And sure enough, all the messages that are getting mangled have:
----=Content_Boundary
Content-Type: text/plain; charset="utf-8"
Content-transfer-encoding: 7bit
or a perversion thereof, and then a \0 in the text.
I admit being totally unclear as to where the \0's are coming from,
or what mhfixmsg should do when it sees one, or why any software or person
thinks that 7bit CTE is a sane way to send around utf-8 data.
This is probably going to be a *total* joy to debug.
pgpxq7wQLUuvo.pgp
Description: PGP signature
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers