As Robert and Ken pointed out, one explanation could be that the
content is converted twice, the second time incorrectly.
I saw those replies, but I wasn't sure how to interpret them (as in, the
evidence is compelling, but I have no idea why that would be happening or
what to do about it).
I don't see at this point how mhfixmsg could do that but this needs more
investigation. We can continue this way, or if you want to send me a
sanitized excerpt of the message, I'd be glad to work with it.
I can't think of a reasonable way to sanitize it, but I'm willing to send
it to you privately. Should I use your <levinedl(_at_)acm(_dot_)org> address
for this
purpose?
$ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
-fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream -verbose -file - \
-outfile $destination < $source
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain;
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html;
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8
...which is interesting for more than one reason, including that there's
apparently no conversion of iso-8859-1 to UTF-8,
That's strange, unless $source had already been run through mhfixmsg.
It hadn't. In normal use my procmail-invoked shell script does run the
message through a program I wrote myself, which decodes 2047-encoded
headers -- but that only affects the headers, and passes the body through
unmodified; the relevant excerpt for that is:
[ loop that processes header lines elided]
172 /** an empty input line means the end of the message headers: **/
173
174 if (strlen(input_line) < 1) break;
175 }
176
177
178 /** read and write message body: **/
179
180 while (getline(&input_line, &len, infile) >= 0)
181 {
182 fputs(input_line, outfile);
183 }
184
185
186 /** ...and we're done: **/
187
188 return(0);
189
190 }
The only change this produces in the problematic message is as follows:
47,57c47,57
< X-SG-EID:
=?us-ascii?Q?CePduXinO1TKWf=2FmbcRcIcb5o7KEfW6Q=2FLxIZrPrRA0dtxQ5evb2UIV0M0r6v6?=
< =?us-ascii?Q?DfqG=2FoldGlAr6l6p1riD1OEyVdX0=2F57dKo740dz?=
< =?us-ascii?Q?NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS?=
< =?us-ascii?Q?FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4?=
< =?us-ascii?Q?ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M?=
< =?us-ascii?Q?G6=2FuEHfZ5+X57rF1w=3D?=
< X-SG-ID:
=?us-ascii?Q?N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi=2FKHgAsE=2FCUk5eZaRe5Ltr?=
< =?us-ascii?Q?cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv?=
< =?us-ascii?Q?fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx?=
< =?us-ascii?Q?T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx?=
< =?us-ascii?Q?5EAyl462xuJc+?=
---
> X-SG-EID: CePduXinO1TKWf/mbcRcIcb5o7KEfW6Q/LxIZrPrRA0dtxQ5evb2UIV0M0r6v6
> DfqG/oldGlAr6l6p1riD1OEyVdX0/57dKo740dz
> NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS
> FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4
> ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M
> G6/uEHfZ5+X57rF1w=
> X-SG-ID: N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi/KHgAsE/CUk5eZaRe5Ltr
> cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv
> fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx
> T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx
> 5EAyl462xuJc+
...but in my testing last night and just now, I see the same behavior
when I run mhfixmsg directly on the unmodified original file (my script
always saves an unmodified copy when it makes changes, in case something
goes wrong).
Conversion to the same charset is a no-op, I'll look into removing the
verbose output in that case.
That's probably a helpful thing to do, but the question I was wondering
about wasn't why the UTF-to-UTF conversion was reported, but rather why
the iso-8859-1-to-UTF conversion wasn't reported.
and that in fact it's part 1 rather than part 2 that gets converted
improperly
The part numbers are reversed because that's the order used for display.
Part 2 is the text/plain part, that's the one that got converted.
Thank you. That clears up part of my confusion.
- Steven
--
___________________________________________________________________________
Steven Winikoff | "The thing is, I mean, there's times when
Montreal, QC, Canada | you look at the universe and you think,
smw(_at_)smwonline(_dot_)ca | 'What about me?' and you can just hear
http://smwonline.ca | the universe replying, 'Well, what about
| you?'"
| - Terry Pratchett (Thief of Time)