procmail
[Top] [All Lists]

ideas on splitting mail saved by Outlook to IMAP folder

2006-05-28 16:58:20

Background:
1. Outlook 2000 mail client
2. IMAP mail delivery (dovecot)
3. Outlook mail rule enabled that saves all outgoing messages
   into a Sent-mail folder on the IMAP server (in mbox format)
4. A script that runs once a day, which trims mailboxes that
   exceed $maxmsgs back to $minmsgs, and archives the excess
   into archive/<mailbox>
5. Formail version: v3.22 2001/09/10.

The archiving step looks like this, in csh syntax:

    set nmsgs = `formail -X 'From ' -s < $f | wc -l`
    if ($nmsgs > $maxmsgs) then
        set arc = archive/${f}
        if (-e $arc) then
          echo "" >> $arc
        endif
        @ cnt = $nmsgs - $minmsgs
        formail -$cnt -s < $f >> $arc
        formail +$cnt -s < $f > ${f}.tmp
        touch -r ${f} ${f}.tmp
        mv -f ${f}.tmp ${f}
        chmod 600 ${f} $arc
    endif

Problem:  the problem that arises is that both the archived mailbox and
the truncated mailbox have extraneous '>' characters placed in front of
valid From_ lines.  This happens because Outlook in its wisdom doesn't
add a empty line after the end of the messages that it writes into
Sent-mail.
Thus, the Sent-mail file might have some lines that look like this
(note that From_ has been changed Zrom so that it won't be escaped, and
thus compound the confusion.  Read Zrom as From_):

Zrom gary(_at_)excample(_dot_)com  Sun May 28 13:44:07 2006
To: "Fred" <fred(_at_)example(_dot_)com>
Subject: bbq
Date: Sun, 28 May 2006 13:44:07 -0700
Message-ID: <002601c68297$76e305a0$6401a8c0(_at_)EXAMPLE>
Importance: Normal
X-OlkEid: 360420A2F0AD4B7D0B6C3B4C8957146374797E9A
X-UID: 23065
Status:
X-Keywords:
Content-Length: 42

You bring the beer, we'll bring the brots?
Zrom gary(_at_)example(_dot_)com  Sun May 28 13:44:07 2006
To: "George" <george(_at_)example(_dot_)com>
Subject: bbq
Date: Sun, 28 May 2006 13:44:07 -0700
Message-ID: <002601c68297$76e305a0$6401a8c0(_at_)EXAMPLE>
MIME-Version: 1.0
X-UID: 23065
X-OlkEid: 362420A2CC21FFA10546584288B5DBBCCBE739C1
Status:
X-Keywords:
Content-Length: 15

Tomorrow at 1pm
<End of File>

As it turns out, "formail -s" will place a '>' in front of the second From_,
apparently because there is no intervening new line to terminate the
message body?  However, the Content-Length of 42 on the first message is
apparently correct in that it counts the 42 characters in the message
(not inclusive of the final new line).

-------------

A few questions:
1. Is the initial message RFC compliant?  Is the Content-Length correct not
to
include the final newline of the message body?  Should the message body
always
be terminated with an empty line (yet is not in this example)?

2. If the answer to 1. above is "yes", then shouldn't formail have honored
the
Content-Length field, and noticed that the second From_ is part of a new
message?

The formail man page says the following:

       If  a  Content-Length:  field  is found in a header, formail will
       copy the number of specified bytes in the  body  verbatim  before
       resuming the regular scanning for message boundaries (except when
       splitting digests or Berkeley mailbox format is assumed).

(note: If you change the Zrom's above into From's, and run the result
through "formail -s", you should be able to duplicate the scenario described
above.)

Given that the behavior of Outlook is immutable, what's the best corrective
course of action?  Note that -ds works no better than -s in this example,
even
though it should ignore the Content-Length field, and be able to find enough
header lines to convince itself that a new message has started.  When the
first message is terminated with an empty line, all is well however.





____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail