Re: Does mhonarc2mbox exist?

On Sun, 12 Nov 2000, Earl Hood wrote:

On November 11, 2000 at 13:20, anthonyw wrote:

 For the msg*.html files that are derived from multipart messages:

1) Save the boundary markers of the message part


This is really not needed since the script can recreate new
boundaries.

2) Retain the Mime headers of the message parts - those which are not
   displayed inline - inside of some X- comment.


The only header of potential interest is the Content-Type header.
Other headers can be recreated by the script.  An external file's
content-type can be implied from the filename extension or by the use
of file(1) and /etc/magic, or similiar mechanism.

Note, any back conversion will never be perfect, so it is best to
minimize the amount of work to whatever can give a passable solution.
I'd prefer to not over-pollute a page with a bunch of comment
declarations since it increases the byte size of the page with
questionable benefit.  Also, the comment declaration approach will not
work well with MHTML messages since some parts may be decoded but
referenced within the main HTML part.

Plus, since there appears to be a desire to back convert messages
from existing archives, a solution must exist that does not rely
on comment declarations that do not exist.

BTW, the list of external files is given at the top of each message
page so the bodies URLs can be scanned to see which ones match
against the list.  One could probably just have all message
denoted as multipart in the main content-type comment declaration
converted into MHTML messages.  All basic content-types should be
translatable back to the something close to the original message
(wrt to the message body).


I was just being lazy about parsing :-). The current scripts has a section
which reads: 

if ($isinbody =~ /true/ )
{

# Extract URLs
 s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;

}

and I was too lazy to think of ways to see if the extracted url referred
to a derived file. I wanted to do the following (which is lots more work):

  test if comment says bodypart-begin
  set isinbodypart=true
  skip this line
  if isinbodypart=true
  then
    extract the url knowing that it refers to a local file
    dosomething ...
  endif

Now that I have looked at this some more I will go the following route:

if ($isinbody =~ /true/ )
{
  if (/<a\ href\=\"/)
  {
    # Extract URLs
    s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
    $url =~ s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
    
    # and for each element in some "xderived" array,
    # comapare $url with that element. If there is a match
    # then build the content/type info etc. Otherwise push(@body,$_);

  }
}

--ewh


Regards, 

AnthonyW