nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] mojibake in UTF-8 encoded quoted-printable messages

2013-10-24 07:52:56
Joel wrote:

I've noticed recently that I'm getting some mojibake in messages from
a few sources. Both examples I have handy have a quoted-printable UTF-8
encoded text/html part, and one also has a quoted-printable UTF-8
encoded text/plain part.

The one which is HTML only happens also to be in German, and what's
getting munged are the umlauted vowels: e.g., I'm seeing "f端r" as
"f\xC3\x83村r" when I run show on the message. The other message has some curly
apostrophes in it, so I see "I但\xE2\x82?\x84\xA2m" instead of "I'm".

I manually decoded the quoted-printable HTML for the message in German
and the quoted-printable text in the other message, and both appear to
be correct UTF-8. The locale for my terminal is a UTF-8 locale, and it
typically displays Unicode code points correctly. This makes it appear
that the problem is with nmh. (This is with nmh-1.5-3 on Fedora 18.)

Does anyone have an idea where the cause lies? I'd be happy to provide
the problematic messages, if that would help.

Not off hand.  There was a fix to the base64 decoder in June
2012, but it's in nmh 1.5, was for big endian, and shouldn't
affect quoted-printable.

The munged character in your fist example looks like it's
supposed to be c3 bc c3, but instead is 83 c2 bc, if I did
that right.  It takes more than one step to get from here to
there, such as losing bits and wrong endian?

Maybe send an troublesome excerpt from the quoted-printable
example?

My first suggestion would be to try the nmh HEAD.  It builds
easily and quickly on Fedora.

And I'd try with a profile that has just a Path.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
<Prev in Thread] Current Thread [Next in Thread>