Re: Archiving sent mail; Attachmants with non-ascii names; Preserving charset of message

2002-12-16 19:49:26
On December 17, 2002 at 02:11, Tomasz Ostrowski wrote:

I needed to archive sent mail with MHonArc and I needed to put
contents of To: header to mesage index. It was not possible with
MHonArc-2.5.13 so I wrote a small patch that added rc-variable $TO$.

The preferable method is to allow for arbitrary message header
variables instead of just To:.  Otherwise, you end up replicating
code when people want 'cc' or other fields.

Of course. But to make it right you have to had a lot of time and
know a language you work with rather well. Lacking both of it I
choose the ugly one.

Side note: You should check out the mha-preview program in the
examples/ directory of the distribution as an example on how to
add in support for "To:" without making changes to the base MHonArc
code itself.

I just hope somebody will pick up this and make it the right way. I
just think this is better than nothing.

I'm the person to pick up most things.  I'll see about getting something
done for v2.6.0.  Since you are on the dev list, you can monitor
the CVS commit messages to see when it shows up.

Of course, contributions to the author always speed up the development
of certain features :-)

2. Attachmants with non-ascii names

I had problems with accessing attachments extracted with MHonArc from
Windows if they had non-ascii characters in name or characters
forbidden for file names: \/:*?"<>| (when using m2h_external::filter;

Probably more efficient would be just exclude whitespace and non-ascii
characters in one tr// operation:

  $fname =~ tr/\0-\40\t\n\r\177-\377/_/;

I think characters mentioned earlier, not allowed in Windows
environment, should also not be used in filenames. I don't know if
MHonArc allows '/' or '\' in filenames but at least these cound make
directory traversal vunerabilities possible.

/ and \ are handled beforehand by the caller.
readmail::MAILget_content_disposition removes any pathname components
for security reasons.  You're right about the other ones (at least
for Windows).  It is easy to add them in the tr/// operator.

As for the un-grepable utf-8, I think people will eventually have
to dealing with it if they want to have archives that are

I think one chosen by configuration iso character set plus numeric
unicode entities for undisplayable in it characters would be perfect

This already exists in the snapshot builds.  MHonArc::CharEnt has
been enhanced to handle many more character sets (including multibyte
sets like Japanese and Chinese), and all SGML-based entity names
have been removed with the exception of those specified in the HTML
4.0 standard.

Unfortunately, HTML does not allow mixed-character encodings is
the same document, making things problematic when trying to
convert MIME mail into HTML.

Again - numeric unicode entities. Also some character sets are
convertable to another (for example us-ascii to just about any).
UTF-8 would be more proper solution, chosen character set would be
more practical (yet).

Try the snapshot build, <>.
If iso-8859-2 is the default charset for your locale, and you do not
want 8-bit characters to be converted to entity references, but left
in raw form, try the following:

  iso-8859-2; -decode-

(btw, the above works for v2.5.13, but *other* charsets will still
 be converted to SGML-based entities.  the snapshot build fixes

Of course, you will need to change the *PGBEGIN resources to
set the content-type meta tag to denote iso-8859-2 for all your

The limitation is that only encoded message header fields and
text/plain messages are affected.  The other text-based filters
(like for text/html) do not use CHARSETCONVERTERS.
The higher level text encoding processing I am current working on
should address that.


To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the

<Prev in Thread] Current Thread [Next in Thread>