At 11:48 2009-10-29 -0500, John Simpson wrote:
On 10/29/09, Professional Software
Engineering<PSE-L(_at_)mail(_dot_)professional(_dot_)org> wrote:> At 10:11 2009-10-29
-0500, Harry Putnam wrote:> > Subject:>
=?utf-8?B?UmV0cm9zcGVjdCBub3RpZmljYXRpb24gZnJvbSBCSlAgKDEwLzI3LzIwMDk> >>>
Are you new to MIME encoding, or just unaware that it can be used to
encode> subjects (and name text even) in the header?
If " * ^Subject:.*Retrospect " is not correct, then what should the
recipe be ?
echo "Retrospect" | mimencode
will give you a string. I can't say I'd want to do this from within
procmail each time, so if you only have a handful of things to match, you
might try:
# If you change the match string, to get base-64 version, do something like:
# echo "Retrospect" | mimencode
# and punch in the result here, prefixed by "=?utf-8?B?"
# this will match an original plaintext or base-64 encoded subject. Since
# this is a notification from a program, you shouldn't expect Re: or Fwd:
# prefixes, so the whitepace preceeding the subject keyword should be it. If
# there WERE a reply prefix, this would be more complicated, because that
# would be part of the encoded subject (which offsets the BASE64 coding)
:0:
* ^Message-Id:(_dot_)*(_at_)reader\(_dot_)local\(_dot_)lan
* ^Subject:[ ]*(Retrospect|=?utf-8?B?UmV0cm9zcGVjdAo=)
retrospect.in
BTW note that we're also using the LOCKING flag, which was omitted on the
originally posted recipe.
The alternative is to (ideally, in a central place in the procmailrc),
identify and extract encoded subjects:
# extract the subject and decode as appropriate.
:0
* ^Subject:[ ]*\/[^ ].*
{
SUBJECT=$MATCH
ORIGSUBJ=$SUBJECT
# is this a mime-encoded subject line?
# match for a number of common character sets
# expand as desired - this is NOT comprehensive
:0
* SUBJECT ?? ^^=\?\/(utf-8|Windows-1251|koi8-r)\?B\?
* MATCH ?? \/[^\?]+
{
SUBJENCODING=$MATCH
# now, decode the subject
:0
* $ SUBJECT ?? ^^=\?${SUBJENCODING}\?B\?\/.*
{
SUBJECT=`echo "$MATCH" | mimencode -u`
}
}
}
Then, anywhere you might normally refer to the subject:
* ^Subject: expression
You would instead:
* SUBJECT ?? expression
Specifically, the original recipe becomes:
:0:
* ^Message-Id:(_dot_)*(_at_)reader\(_dot_)local\(_dot_)lan
* SUBJECT ?? ^Retrospect
retrospect.in
very readable.
Where necessary, you can check SUBJENCODING to see what the character set
encoding is. Because of multibyte character encoding for several
non-western languages, expect a few errors during decode, due to nulls in
the output string. I'd take the above recipes and stuff them into a
sandbox, then throw a large corpus of saved emails at them.
Note that in my extraction above, the subject has been stripped of leading
whitespace (because for my own purposes, this is desireable). Modify the
extraction or your individual references to it accordingly.
FTR, in my own experience, the encoded subject more often than not is
employed in SPAM - not that your particular automated message is, but for
an abundance of messages, it is.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail