Volker Kuhlmann wrote:
[...] I'm not too clear about the method you want to use for finding
the distinctive headers for each mailing list in a foolproof way.
This is working well:
:0
* -2^0
* 9876543210^0 ^Delivered-To: mailing list
\/(_dot_)*(_at_)yahoogroups\(_dot_)com
* 9876543210^0 ^List-Id: +\/.*
* 9876543210^0 ^List-Unsubscribe: \/.*@
* 2^0 ^Precedence: (bulk|list)
* 1^0 ^From: +\/.*
Any 'obvious' bulk list header is used first, since these are flags that
it's list mail, AND provide a unique identifer for the list. Failing any
match on those, a check is made for bulk or list precedence, and the
From: used in that case.
The score on From: seems pointless (every message should have one) but
it didn't work well without it in my quick tests. That logic can
probably use some cleaning up.
I'm sure there are other headers I'll want to add, but afer a few days,
these seem to hit all the lists I'm on. Most seem to be using RFC2919
headers (which are simple to match -- all I've seen so far have at least
one of List-Id: and List-Unsubscribe:). Some googling indicates there
are others that are common:
Mailing-List: list
Sender: owner-
X-BeenThere:
X-Mailing-List:
X-Loop:
X-List:
X-ML-Name:
But none of these are used on any of my currently subscribed lists.
X-Loop: strikes me as vague, but the others should be used only on lists
(I think). I've searched through the archives, and didn't spot any
others that are unique to mailing lists.
So now I've got a string that unqiuely identifies most (any?) list.
Unfortunately, it's often a long and cumbersome string -- certainly one
not suited for a folder name, even after carefully disecting it. And I
do want multiple lists going to some folders. So I still want to map the
string to an easier-to-handle value -- and a checksum/hash seems perfect.
I may change my recipe to use sum instead of md5sum, which will
presumably be faster, and also reduce the hash to 5 decimal characters
instead of 32 hex. That makes me think that maintaining a short list of
matches using Ruud's 'procmail only' approach of lookup would work
equally well (though I'd still need an external call to sum + cut) for
up to about 180 lists without worrying about LINEBUF.
I still favor keeping the data (listhash->foldername matches) separate
for the reasons mentioned before.
- Bob
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail