Re: Some strange is done to subject line

At 11:48 2009-10-29 -0500, John Simpson wrote:

On 10/29/09, Professional SoftwareEngineering<PSE-L(_at_)mail(_dot_)professional(_dot_)org> wrote:> At 10:11 2009-10-29-0500, Harry Putnam wrote:> > Subject:>=?utf-8?B?UmV0cm9zcGVjdCBub3RpZmljYXRpb24gZnJvbSBCSlAgKDEwLzI3LzIwMDk> >>>Are you new to MIME encoding, or just unaware that it can be used toencode> subjects (and name text even) in the header?If " * ^Subject:.*Retrospect " is not correct, then what should therecipe be ?


echo "Retrospect" | mimencode

will give you a string. I can't say I'd want to do this from withinprocmail each time, so if you only have a handful of things to match, youmight try:


# If you change the match string, to get base-64 version, do something like:
# echo "Retrospect" | mimencode
# and punch in the result here, prefixed by "=?utf-8?B?"
# this will match an original plaintext or base-64 encoded subject.  Since
# this is a notification from a program, you shouldn't expect Re: or Fwd:
# prefixes, so the whitepace preceeding the subject keyword should be it.  If
# there WERE a reply prefix, this would be more complicated, because that
# would be part of the encoded subject (which offsets the BASE64 coding)
  :0:
  * ^Message-Id:(_dot_)*(_at_)reader\(_dot_)local\(_dot_)lan
  * ^Subject:[  ]*(Retrospect|=?utf-8?B?UmV0cm9zcGVjdAo=)
  retrospect.in

BTW note that we're also using the LOCKING flag, which was omitted on theoriginally posted recipe.

The alternative is to (ideally, in a central place in the procmailrc),identify and extract encoded subjects:


        # extract the subject and decode as appropriate.
        :0
        * ^Subject:[    ]*\/[^  ].*
        {
                SUBJECT=$MATCH
                ORIGSUBJ=$SUBJECT

                # is this a mime-encoded subject line?
                # match for a number of common character sets
                # expand as desired - this is NOT comprehensive
                :0
                * SUBJECT ?? ^^=\?\/(utf-8|Windows-1251|koi8-r)\?B\?
                * MATCH ?? \/[^\?]+
                {
                        SUBJENCODING=$MATCH

                        # now, decode the subject
                        :0
                        * $ SUBJECT ?? ^^=\?${SUBJENCODING}\?B\?\/.*
                        {
                                SUBJECT=`echo "$MATCH" | mimencode -u`
                        }
                }
        }

Then, anywhere you might normally refer to the subject:

* ^Subject: expression

You would instead:

* SUBJECT ?? expression

Specifically, the original recipe becomes:


  :0:
  * ^Message-Id:(_dot_)*(_at_)reader\(_dot_)local\(_dot_)lan
  * SUBJECT ?? ^Retrospect
  retrospect.in

very readable.

Where necessary, you can check SUBJENCODING to see what the character setencoding is. Because of multibyte character encoding for severalnon-western languages, expect a few errors during decode, due to nulls inthe output string. I'd take the above recipes and stuff them into asandbox, then throw a large corpus of saved emails at them.

Note that in my extraction above, the subject has been stripped of leadingwhitespace (because for my own purposes, this is desireable). Modify theextraction or your individual references to it accordingly.

FTR, in my own experience, the encoded subject more often than not isemployed in SPAM - not that your particular automated message is, but foran abundance of messages, it is.



---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail