procmail
[Top] [All Lists]

Matching unique mailing list identifier strings (was Re: Sorting several lists into a common folder -- with a nice pretty name (was Re: Recipe problem))

2004-08-28 21:56:21
Volker Kuhlmann wrote:

 [...] I'm not too clear about the method you want to use for finding
 the distinctive headers for each mailing list in a foolproof way.

This is working well:

:0
* -2^0
* 9876543210^0 ^Delivered-To: mailing list 
\/(_dot_)*(_at_)yahoogroups\(_dot_)com
* 9876543210^0 ^List-Id: +\/.*
* 9876543210^0 ^List-Unsubscribe: \/.*@
* 2^0 ^Precedence: (bulk|list)
* 1^0 ^From: +\/.*

Any 'obvious' bulk list header is used first, since these are flags that it's list mail, AND provide a unique identifer for the list. Failing any match on those, a check is made for bulk or list precedence, and the From: used in that case.

The score on From: seems pointless (every message should have one) but it didn't work well without it in my quick tests. That logic can probably use some cleaning up.

I'm sure there are other headers I'll want to add, but afer a few days, these seem to hit all the lists I'm on. Most seem to be using RFC2919 headers (which are simple to match -- all I've seen so far have at least one of List-Id: and List-Unsubscribe:). Some googling indicates there are others that are common:

Mailing-List: list
Sender: owner-
X-BeenThere:
X-Mailing-List:
X-Loop:
X-List:
X-ML-Name:

But none of these are used on any of my currently subscribed lists. X-Loop: strikes me as vague, but the others should be used only on lists (I think). I've searched through the archives, and didn't spot any others that are unique to mailing lists.

So now I've got a string that unqiuely identifies most (any?) list. Unfortunately, it's often a long and cumbersome string -- certainly one not suited for a folder name, even after carefully disecting it. And I do want multiple lists going to some folders. So I still want to map the string to an easier-to-handle value -- and a checksum/hash seems perfect.

I may change my recipe to use sum instead of md5sum, which will presumably be faster, and also reduce the hash to 5 decimal characters instead of 32 hex. That makes me think that maintaining a short list of matches using Ruud's 'procmail only' approach of lookup would work equally well (though I'd still need an external call to sum + cut) for up to about 180 lists without worrying about LINEBUF.

I still favor keeping the data (listhash->foldername matches) separate for the reasons mentioned before.

- Bob

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>