procmail
[Top] [All Lists]

Re: formail -ds or -s

1996-08-03 20:52:30
I had written:

| >-ds does not require "From " because it is designed to separate
| >digest articles from one another as well as full-fledged messages from one
| >another; -ds will also split wherever it finds a clump of RFC822 headers
| >embedded in the body of a message.

Stan Ryckman asked,

| How big is a "clump?"

Good question.  I think the main trigger is a From: line plus at least one
other, which can precede From:.  Stephen?

| A sort-of-related question.  My feeling is that the answer is "no".
| But anyway, for a particular digest, my .procmailrc contains:
|       # File the wxobs digests, split up
|       :0 i:
|       * ^From:.*wxobs-sne-digest
|       * ^Subject:.*wxobs-sne-digest
|       | formail +1 -ds >>split-digest

Why did you find need for the `i' flag there?  I'd think it would be risky.

| Recently, I found "two" messages, which *after* said processing,
| ended up in my split-digest mailbox as follows (the lines
|       "Status: RO"
| may have been added by elm when I deleted other messages):

The Status: headers were added by Elm when you exited or resynchronized that
folder.  I'll delete them from your example because they were not there when
formail got the text nor inserted by formail.

| : From 71435(_dot_)211(_at_)CompuServe(_dot_)COM  Thu Aug  1 22:49:41 1996
| : From: Munley <71435(_dot_)211(_at_)CompuServe(_dot_)COM>
| : Date: 01 Aug 96 21:45:20 EDT
| : Subject: Copy of: 5-Day Forecast August 2-6
| : 
| : - ---------- Forwarded Message ----------
| : 
| : From Munley  Thu Aug  1 22:49:41 1996
| : From:       Munley, 71435,211
| : TO: Wxob MA, Internet:wxobs-mda(_at_)greatbasin(_dot_)com
| :     Wxobs-sne-digest, Internet:Wxobs-sne-digest(_at_)shore(_dot_)net
| : DATE:       8/1/96 8:45 PM
| : 
| : RE: Copy of: 5-Day Forecast August 2-6
| [text snipped due to irrelevancy here]
| 
| My question is, is there a way to *not* split the forwarded
| message?  That included "From " is a killer, I think.

The problem is not an included unquoted From_; there probably was none going
in.  formail -ds put it there because it saw From:, To:, and Date:.  The
problem is the included unquoted From:.

You could use formail -fds to prevent adding the From_ line, but then it
won't add one where a real article begins either.  Essentially there is no
weapon in formail's arsenal that I know of that can tell the difference.

| I don't think digests come with built-in "Content-Length:" headers.  (This
| is a majordomo list, if it matters).

Not only do digest articles usually not have their own individual
Content-Length: information, but formail would have to ignore them if -d is
in effect.  After all, it had to ignore any Content-Length: figure in the
headers of the message that delivered the digest; otherwise it would be
unable to burst the articles apart because all articles would begin before
the number of bytes in the Content-Length: header ran out.

| It seems to me that any mail "forwarding" software that sends
| a From_ header in the body unescaped and without a Content-Length:
| header must be broken.

I am pretty sure it didn't.  formail -ds added the From_ because it saw a
From:.  And that's the point: an unquoted From: shouldn't have been there
either.  That's yet another reason that mailers that live in their own worlds
and disobey Internet conventions are pains in the butt; headers of a forward-
ed letter being rewrapped should be indented with citation characters or at
least with spaces, not jammed flush left.

The other solution is to use a digest burster that looks for the traditional
separators: at least three hyphens between articles and at least three
asterisks at the end of an issue, always flush left.  Note how the "Forwarded
Message" banner began "- ----" instead of "------"; Majordomo changed the
second hyphen to a space so that there wouldn't be a line in the middle of
an article that began with three hyphens.  On the other hand, such bursters
can be fooled as well: there are digest-creating programs (including those in
some old versions of Majordomo) that accept lines of hyphens or asterisks
that appear flush left inside articles and pass them without alteration.

Unless certain text is somehow legal inside email bodies (article separators
and inner article headers are, after all, part of the digest's body) but
illegal inside an article in a digested mailing list, we won't have any
guaranteed way for software to tell the difference.

<Prev in Thread] Current Thread [Next in Thread>