nmh-workers
[Top] [All Lists]

Re: mhfixmsg character set conversion

2022-02-04 16:13:57
Steven wrote:

So do I, which suggests that there's something in the content of the
specific message I'm working with.

As Robert and Ken pointed out, one explanation could be that the
content is converted twice, the second time incorrectly.  I don't
see at this point how mhfixmsg could do that but this needs more
investigation.  We can continue this way, or if you want to send me
a sanitized excerpt of the message, I'd be glad to work with it.

$ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \
           -fixcte -fixboundary -noreplacetextplain \
           -fixtype application/octet-stream -verbose -file - \
           -outfile $destination < $source
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain; 
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html; 
charset=iso-8859-1
mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8

...which is interesting for more than one reason, including that there's
apparently no conversion of iso-8859-1 to UTF-8,

That's strange, unless $source had already been run through mhfixmsg.

Conversion to the same charset is a no-op, I'll look into removing the
verbose output in that case.

and that in fact it's
part 1 rather than part 2 that gets converted improperly

The part numbers are reversed because that's the order used for display.
Part 2 is the text/plain part, that's the one that got converted.

; part 2 still has

   Content-Type: text/html; charset=iso-8859-1

Right, mhfixmsg didn't touch it.  The text parts are in a
multipart/alternative and that text/html part has a corresponding
text/plain part.  Even if it didn't, mhfixmsg wouldn't convert the
text/html part. It would insert a new text/plain part.

David

<Prev in Thread] Current Thread [Next in Thread>