On October 24, 2000 at 11:19, Erik Rossen wrote:
OK, I guess I have a job to do. I'm kind of surprised that noone has ever
asked for this capability before, though. I guess everyone (except me) is
It has been brought up before. But maybe only once or twice.
smart enough to keep the original mbox around just in case. I hope that
the people over at www.mail-archive.com are doing that in case VA Linux
ever decides to pull the plug. They are managing over 5,000 mailing lists
There is a mailing list associated with the maintenance of
www.mail-archive.com: gossip(_at_)jab(_dot_)org(_dot_) I have also meant with
maintainer of the site. MH (the MUA) is used as the core manager of
mail and a heuristic filter is used to determine which archive a
message goes to. I do believe original mail is stored since
archives have been regenerated.
As a start, look at mhmsgfile.pl that is part of the MHonArc distribution
(used by mha-dbrecover).
Thanks, that was a good starting point for extracting header info from the
messages. Is there also a subroutine for extracting bodies? If so, I
hardly need to do any work! :-)
All you need to know is the special comment declarations used to
delimit the message body data. In later versions of MHonArc,
more comment declarations were created to provide better granularity
on delimiting the types of data on a message page. Here is the
structure of the comments associated with the actual message data
on a message page:
... expanded value of SUBJECTHEADER resource ...
... converted message header fields ...
... expanded value of HEADBODYSEP resource ...
... converted message body ...
Messages converted with earlier versions of MHonArc will not have
all the comment declarations above. I'd have to review past releases
to verify what was generated.
In glancing through the file mhtxtenrich.pl, I ran across the following
line of code (line 58 of mhtxtenrich.pl 2.3 99/06/25 14:18:01):
$data =~ s|<<|\<|gi;
Didn't you mean
$data =~ s|<|\<|gi;
No. text/enriched is similiar syntactically as HTML, but there are
differences. To get a literal "<" to show up on text, you use
"<<". Check the RFC for text/enriched for more details.
By the way, what revision control system are you using? I've only used
CVS before and I was wondering about the @(#) prefix in the file IDs.
I use SCCS (with a custom Perl front-end I wrote to handle multiple
directories): an older source code control system that exists on Unix
systems. SCCS is at the same level as RCS, just the management of
individual files. The "@(#)" is a marker for the `what' command for
extracting version information from programs. It is/was common in C
programs to do the following:
static const char sccs_id = "@(#) <some version info>";
for each source file. So a person could do "what
<program/library-filename>" to get the version information for all
source files associated with the program.
If using SCCS, the static declaration may look like:
static const char sccs_id = "%Z% %M% %I% %E% %U%";
Since many now use other types of source code management tools
(like RCS, CVS, etc), one typically does:
static const char sccs_id = "@(#) $Id:$";
or something similiar. This is common for commercial-based programs
since commercial Unix OSs have SCCS as part of the base OS, including
the `what' command. Note, the what command can be emulated as
strings <file> | grep "@(#)"
When I finally got my own PC and Linux installed, I was almost forced
to move to something like CVS/RCS since SCCS does not come with Linux
distributions. However, I was able to find a free implementation of
SCCS that I could build so I avoided migrated my source to a new
In the long term, moving to CVS will be better since it does have
better management capabilities than my simple Perl front-end to
SCCS has. I just have not bothered to do it yet.
BTW, if you are wondering, I used SCCS since I was not familiar
with CVS, and at the time, was the only common version control system
that I had access to. Much of MHonArc's development occured on
commercial Unix OS's before I ever started using Linux.