|Mon 1999-01-11 "Joey Smith" <joey(_at_)samaritan(_dot_)com> list.procmail
|
| grab my Digest mailing lists, pull out the individual messages,
| and sort them into folders according to which list they came from.
There is ready procmail module for detecting your mailing lists. check it out
from pm-code.shar (See file server mentioned in X-info). Docs below.
jari
----------------------------------------------------------------------
Pm-jalist.rc -- Subroutine to detect mailing LIST from message.
File id
.Copyright (C) 1998 Jari aalto
<jari(_dot_)aalto(_at_)poboxes(_dot_)com>
.Contactid: <jari(_dot_)aalto(_at_)poboxes(_dot_)com>
.Created: 1998-06
.Keywords: procmail subroutine list detect
This code is free software in terms of GNU Gen. pub. Lic. v2 or later
You can get newest version by sending email to maintainer with
subject "send <FILENAME>"
Description
This subroutine tries to detect and derive the mailing list name as
it appears in some of the know methods that ezlm, smarlist,
listserv, majordomo etc normally use. After this subroutine has
been applied to message the variable `LIST' contains the mailing
list name. Subroutine adaptively finds new new mailing lists
from the messages.
Quick start
If you just want to jump in and use this module and you
see that some list isn't trapped, please set
o JA_LIST_HEADER_REGEXP to match the From: field site regexp.
If you want to make some list more unique, like if name "Alert"
was detected as list name, please set
o JA_LIST_MAKE_UNIQUE to match the list name, like "Alert"
and the lista name will be converted to HOST-LIST format.
Sendmail plus method for list subscription
If you can use sendmail PLUS addressing capabillities, you may not
be interested in this module, because you have an alternative way
to handle mailing list messages. Let's suppose you want to
subscribe to procmail maling list and want to save all messages
to folser list.procmail, then you'd subribe with address:
login+list(_dot_)procmail(_at_)site(_dot_)com
The extra information after "+" is available to your procmail
scripts via $ARG pseudo variable when procmail is the LDA. If you
fortunate to have new sendmail, you usually subscribe to mailing
lists with regular email address:
login(_at_)site(_dot_)com
How do you detect the arriving mailing list messages?
Traditionally, you would add a piece of recipe to .procmailrc to
catch each list, but that's manual work every time. When you use
this subroutine, you no longer need to write separate mailing list
recipes to your .procmailrc every time you subscribe to a new
mailing list. The detection of a new list happens in this
subroutine for you.
What you need to know before using this module
There is lot of heuristics going on in this modules and one thing
that you must do, if you're a member of tech support or if you
get cron messages from your server. The rule is:
If TO domain is same as FROM/SENDER/REPLY-TO domain
then it is considered a mailing list message.
This causes certain messages landing to category LIST automatically.
This module can't possibly know that the following is not from
mailing list, because it doesn't know "what is mailing list", only
"how it probably looks like it". This is definitedly categorized as
maling list message, because `From' and even `Reply-to' has same
domain `foo.bar.net' as in `To'.
To: support(_at_)foo(_dot_)bar(_dot_)net
From: messagepad(_at_)foo(_dot_)bar(_dot_)net
Reply-to: support(_at_)foo(_dot_)bar(_dot_)net
Subject: Vmail See message to Eric
You must prevent checking messages like this by surrounding the
RC with if statement:
# Do not check these messages
noList = "From.*(foo.bar.net|support.my.com)"
:0
*$ ! $noList
{
INCLLUDERC = $RC_LIST
}
Ask for help
If you find maling lists that this subroutine does not detect, but
which could have been detected by looking the headers in standard
way, please send a email to maintainer. There may be cases where it
is impossible to detect the mailing list and in those cases you
just has to carve a new entry to your procmailrc.
When you keep your procmail log running, you may see message
*** potential list ***
Which is an indication that some new recipe could be added to
to this subroutine to detect that mailing list. If the message
you received WAS from a mailing list, please send all the headers
to the maintainer so that support can be added.
You can search for mailing list that interests you at:
http://www.lsoft.com/lists/listref.html
http://www.netmeg.net/faq/internet/mail/mailing-lists/
Code notes
Bill Houle sent me interesting headers which caused me to add
more heuristical approach that I would have originally wanted.
From these headers there really is impossible to derive the
original list name. So, I tossed my own and derived the name
by combining Reply-To's LOGIN with Errors-To fields first server
name
Reply-To: news(_at_)doodle(_dot_)foo(_dot_)net
Errors-To: bounced(_at_)doodle(_dot_)foo(_dot_)net
The list name formed was "news-doodle". So, If you happen to see
an odd name like this which doesn't remind your original list
name, it may be due to poor headers that have no clue about
the real name. No problem, check below how you would convert
this name to better mailbox name.
Required settings
PMSRC must point to source direcry of procmail code. This subroutine
will include pm-javar.rc from there.
o pm-javar.rc is needed and must reside along $PMSRC
Variable JA_LIST_KILL_POSTFIX
If grabbed `LIST' match this regexp at the end of list name, then
the postfix match will be removed. It is traditional that many
lists name themself as list1-info, list2-beta, list3-l and you
would prefer more names (for mbox) list1, list2 and list3. The
default value will ditch "-(info|beta|l)".
Variable JA_LIST_KILL_PREFIX
Just like the postfix variable. If this string is matched at the
beginning of the LIST, it is removed.
Variable JA_LIST_HEADER_REGEXP
This is *optional* variable, which you can set to match regexp of
the mailing list domain address if it slipped through the tests
in this module. There are some lists that send messages that don't
carry enough information in headers to determine their list status.
If you narrow the group by setting JA_LIST_HEADER_REGEXP, then for
example lists like these, that identify themselves only through
two headers, can be found:
Reply-To: dispatch-faq(_at_)cnet(_dot_)com
From: CNET Digital Dispatch <dispatch(_at_)cnet(_dot_)com>
For that list you would set
JA_LIST_HEADER_REGEXP = "(@cnet\.com)"
Don't worry. all the other list detection recipes has already
been tried, so this is last test that are carried out and variable
JA_LIST_HEADER_REGEXP helps eliminating possible mishist
You don't need set this variable to include all mailing list
domains. Only to those ones that were not trapped. The default
value for this is:
"(amazon\.com|bookpool\.com)"
Variable JA_LIST_MAKE_UNIQUE
If you're subscribed to many mailing lists, that simply tell that
they are *news* or *newsletter*, it will be impossible to
differiantiate A *news* from B *news*. This varaible holds regular
expression that, if matched, prepend the first hostname to the
beginning of listname, thus making the list unique:
news(_at_)some(_dot_)com --> some-news
news(_at_)here(_dot_)com --> here-news
The default value matches lists that are contain word *news*, but you
may need to set this to more matches.
Variable JA_LIST_CONVERSION
Many times the grabbed `LIST' name is not what you would like to
use for your mailbox name. You want to make the name perhaps
more shorter, more descriptive or categorize the messages according
to hierarchy. Let's say that you have subscribed to following mailing
lists:
LIST LIST name Description of mailing list
(as grabbed) you want
jde java.jde Java Development Env
java java.prog Java programming
FLAMENCO flamenco Flamenco music
tango-l tango Argentine Tango dancing
tm-en-help tm-en Emacs TM mime package mailing list
w3-beta w3 Emacs WWW mailing list
First, remember that the variable `JA_LIST_KILL_POSTFIX' is applied,
so the actual `LIST' appear as follows:
jde, java, FLAMENCO, tango, tm-en, w3
Ok, Now we apply the conversion table by defining it as follows,
where the grabbed LIST is first, then comes space(s), new name
_and_ terminating colon. Repeat this for each list you want to
convert.
LIST CONVERSION,LIST CONVERSION,
This gives us table below: Notice that antries tango-l, w3-beta
were not included, because the `JA_LIST_KILL_POSTFIX' already got
rid of the posfixes. Also note how the uppercase match FLAMENCO is
converted to more suitable lowercase mailbox name. After you have
set up this variable you can start saving messages to folders.
JA_LIST_CONVERSION = "\
jde java.jde,\
java java.prog,\
FLAMENCO flamenco,\
"
The list conversion is done with pure procmail means, so it is very
fast. It also means that the conversion is limited to FROM-STRING
TO-STRING syntax. No wildcards or regular expressions are allowed.
If you consider using an external process, like `sed' or `perl'
to convert the grabbed list name to something else (when
`JA_LIST_CONVERSION' method was not enough); think again. For each
incoming mailing list message you launch external process. I get
700 messages from various mailing lists a day so you can imagine how
much load any external process would cause. Just use the grabbed
mailing list name and `JA_LIST_CONVERSION' table if you care
about system load.
If you have many mailing lists that use uppercase names, it may be
tedious to add each mailing list name to `JA_LIST_CONVERSION'.
Possible alternative is to add conversion recipe: `tr' is most
efficient here to convert name to lowercase. Again; think twice,
extra process could be avoided if you use `JA_LIST_CONVERSION'.
:0
* ! LIST ?? ^^^^
{
:0 D # still uppercase list name?
* LIST ?? [A-Z]
{
LIST = `echo $LIST | tr A-Z a-z`
}
:0 :
list.$LIST
}
Example: basic installation
Here is recipe to save all your mailing list to separate folders.
If you subsribe to new lists or unsubsribe to lists, you don't
need to change anything.
RC_LIST = $PMSRC/pm-jalist.rc # name the subroutine
...
# Handle all mailing lists with one subroutine and recipe
# following it
INCLUDERC = $RC_LIST
:0 # if list name was grabbed
* LIST ?? [a-z]
{
dummy = "Saving mailing list: $LIST"
:0 :
list.$LIST
}