Follow-up Comment #3, bug #20252 (project mhonarc):
I understand what you're trying to say, but I'm not sure you're correct.
First, Apache is returning the pages UTF-8 encoded:
HEAD /archive/html/grub-devel/2007-06/msg00004.html HTTP/1.0
Host: lists.gnu.org
HTTP/1.1 200 OK
Date: Tue, 26 Jun 2007 14:49:29 GMT
Server: Apache/2.0.51 (Fedora)
Last-Modified: Fri, 01 Jun 2007 21:19:30 GMT
ETag: "e3105a-11a8-c5549c80"
Accept-Ranges: bytes
Content-Length: 4520
Connection: close
Content-Type: text/html; charset=UTF-8
Second, the encodings presented as entities between the two pages are
different. In the first URL, msg00004.html, the special characters are
written as à whereas in the second URL, they are written as ä.
The correct character has a unicode codepage of 0xe4, an iso-8859-1 encoding
of 0xe4, and a utf-8 encoding of 0xc3a4. Given that, what I'm imagining has
happened is that in the first case, the UTF-8 characters are assumed to be
iso-8859-1, an 8 bit character encoding, and are written as the first byte of
the UTF-8 encoding; however in the second case, I'm supposing that it is
properly transcoding from utf-8 to latin1.
But I'm not very fluent in the internals of MHonArc. Thoughts?
-jag
_______________________________________________________
Reply to this item at:
<http://savannah.nongnu.org/bugs/?20252>
_______________________________________________
Message sent via/by Savannah
http://savannah.nongnu.org/
---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV