mhonarc-dev

[bug #20252] [gnu.org #336933] RFC2047 header encoding bug

2007-06-26 07:59:27

Follow-up Comment #3, bug #20252 (project mhonarc):

I understand what you're trying to say, but I'm not sure you're correct.

First, Apache is returning the pages UTF-8 encoded:

HEAD /archive/html/grub-devel/2007-06/msg00004.html HTTP/1.0
Host: lists.gnu.org

HTTP/1.1 200 OK
Date: Tue, 26 Jun 2007 14:49:29 GMT
Server: Apache/2.0.51 (Fedora)
Last-Modified: Fri, 01 Jun 2007 21:19:30 GMT
ETag: "e3105a-11a8-c5549c80"
Accept-Ranges: bytes
Content-Length: 4520
Connection: close
Content-Type: text/html; charset=UTF-8

Second, the encodings presented as entities between the two pages are
different. In the first URL, msg00004.html, the special characters are
written as &#xC3 whereas in the second URL, they are written as &#xE4.

The correct character has a unicode codepage of 0xe4, an iso-8859-1 encoding
of 0xe4, and a utf-8 encoding of 0xc3a4. Given that, what I'm imagining has
happened is that in the first case, the UTF-8 characters are assumed to be
iso-8859-1, an 8 bit character encoding, and are written as the first byte of
the UTF-8 encoding; however in the second case, I'm supposing that it is
properly transcoding from utf-8 to latin1.

But I'm not very fluent in the internals of MHonArc. Thoughts?

-jag



    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?20252>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV