On March 10, 2004 at 17:02, Ulrich Mayring wrote:
In readmail.pl we had to fix a bug (it may be already fixed in cvs, as
it was reported earlier by my colleague). The line
&$encfunc(\$strtxt, $charset, $TextEncode);
has to be replaced with
&$encfunc(\$strtxt, $real_charset, $TextEncode);
I do not recall getting a bug report about this.
I commited a change into CVS (btw, patch diffs help me find the
exact line much quicker).
Also, in MAILdecode_1522_str we delete the 8th bit of all headers. This
is compliant with the MIME standard. This line was added after getting
the text encoder:
$str =~ tr [\200-\377] [\000-\177];
Actually, this can be done by registering a "plain" character set
converter via CHARSETCONVERTERS. The plain convert is called
on text that is not non-ASCII encoded.
Stripping of 8-bit characters are not done by default since
there is enough broken usage of this in various locales that it
would cause more problems than solving them.
If strict enforcement is required, then CHARSETCONVERTERS can
be used, $mhonarc::CBRawMessageBodyRead can be used to pre-process
the header data, or a pre-processor, like procmail, can be used.
(see appendix of documentation about callback functions).
Also, in mhtxtplain.pl after ## Fixup any EOL mess:
## Fix invalid XML characters
$$data =~ s/\x0c//g;
This does not catch all invalid XML characters, but the one that killed
us in our mails.
You may want to look at the $mhonarc::CBMessageBodyRead callback.
Or, you may want to "chain" the filter call so any changes to mhtxtplain.pl
will automatically be inherited by you. See