Matching unique mailing list identifier strings (was Re: Sorting several

Volker Kuhlmann wrote:

 [...] I'm not too clear about the method you want to use for finding
 the distinctive headers for each mailing list in a foolproof way.


This is working well:

:0
* -2^0
* 9876543210^0 ^Delivered-To: mailing list 
\/(_dot_)*(_at_)yahoogroups\(_dot_)com
* 9876543210^0 ^List-Id: +\/.*
* 9876543210^0 ^List-Unsubscribe: \/.*@
* 2^0 ^Precedence: (bulk|list)
* 1^0 ^From: +\/.*

Any 'obvious' bulk list header is used first, since these are flags thatit's list mail, AND provide a unique identifer for the list. Failing anymatch on those, a check is made for bulk or list precedence, and theFrom: used in that case.

The score on From: seems pointless (every message should have one) butit didn't work well without it in my quick tests. That logic canprobably use some cleaning up.

I'm sure there are other headers I'll want to add, but afer a few days,these seem to hit all the lists I'm on. Most seem to be using RFC2919headers (which are simple to match -- all I've seen so far have at leastone of List-Id: and List-Unsubscribe:). Some googling indicates thereare others that are common:


Mailing-List: list
Sender: owner-
X-BeenThere:
X-Mailing-List:
X-Loop:
X-List:
X-ML-Name:

But none of these are used on any of my currently subscribed lists.X-Loop: strikes me as vague, but the others should be used only on lists(I think). I've searched through the archives, and didn't spot anyothers that are unique to mailing lists.

So now I've got a string that unqiuely identifies most (any?) list.Unfortunately, it's often a long and cumbersome string -- certainly onenot suited for a folder name, even after carefully disecting it. And Ido want multiple lists going to some folders. So I still want to map thestring to an easier-to-handle value -- and a checksum/hash seems perfect.

I may change my recipe to use sum instead of md5sum, which willpresumably be faster, and also reduce the hash to 5 decimal charactersinstead of 32 hex. That makes me think that maintaining a short list ofmatches using Ruud's 'procmail only' approach of lookup would workequally well (though I'd still need an external call to sum + cut) forup to about 180 lists without worrying about LINEBUF.

I still favor keeping the data (listhash->foldername matches) separatefor the reasons mentioned before.


- Bob

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: I'm sure I'm brain dead, but I can't see the problem in this recipe, Google Kreme

Next by Date:

Re: procmail mailping ( was Re: test), Udi Mottelo

Previous by Thread:

Re: Sorting several lists into a common folder -- with a nice pretty name (was Re: Recipe problem), Volker Kuhlmann

Next by Thread:

Re: Matching unique mailing list identifier strings (was Re: Sorting several lists into a common folder -- with a nice pretty name (was Re: Recipe problem)), Dallman Ross

Indexes:

[Date] [Thread] [Top] [All Lists]