Quoting John Levine <johnl-procmail(_at_)iecc(_dot_)com>:
As I expect everyone is aware, you can use MIME hackery in mail
headers to include non-ASCII text in just about everything other than
the actual e-mail address. Writing patterns to match MIME stuff is
painful, since there's about a dozen character sets with somewhat
different encoding, and if they do base64 encoding, the coded values
change depending on where in the string the characters are.
So it would be handy if there were an option to interpret all the MIME
and turn all the text into UTF-8 which (assuming they are 8 bit clean)
existing regex code should handle.
Anyone done that? Thought about it?
Well, after having my email address used by a spammer to sign up on
thousands of forums in multiple countries, I found this worked pretty
well to standardize the Subject line:
SUBJECT=`formail -xSubject: |/usr/local/bin/perl -MEncode -ne 'print
encode ("utf8",decode ("MIME-Header",$_ )) '`
so I could delete anything that said 'Welcome To' in the subject. I
assume this is what you're shooting for.
The problem is, a user friendly interface generally isn't set up to do
variable matching. It's just going to do a header match. So there's
a little re-write there.
I'm also not sure, as an English Speaker, what would happen with other
languages when that conversion is done. Also, if that user friendly
interface were modified to do variable matches instead of header
matches.... Would it be more or less reliable? What happens to
umlauts and such?
That's about as far as I've thought about it - but I've had user
complaints about filter accuracy due to the UTF8 issue, I'm just not
sure what to do about it... :)
Rick
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail