procmail
[Top] [All Lists]

Re: non-Western chars

2000-09-24 01:16:34
From: "David W. Tamkin" <dattier(_at_)ripco(_dot_)com>

Dallman asked,

| But I specifically want to identify the non-Western chars in
| the From: line.

 # brackets enclose caret, tab, space, hyphen, tilde; put tab before space!
 * ^From:.*[^  -~]

will catch any From: header that has a character that isn't
printable ASCII.  Continental European characters may fit your
definition of "Western," but the problem there is that there are so
many varying character assignments for values above 127 that what is
European in one set can be Far Eastern in another, and vice versa.

This works like a dream; thanks.  It took me a few minutes of thinking
to figure out *why* it works, though.  And why the caveat about order.
(This also caused me to reorient my existing "$WHITESPACE" global
variable).  Very slick, indeed.

(I'll cross the Western-European bridge when I come to it - and I
will, as I get some mail in German.)

I had been trying the below:

 nonWEST="\\*=\\?gb2312\\?[bq]\\?"  # useful for character-set testing

But I couldn't get that to work.

Here's what I've got now.  The variables should be self-evident
from their names:


     :0  # find non-Western character sets; finesse OR conditions via scoring
     * $ $INFINITY^0 ^From:.*[^$WHITESPACE-~]
     * $ $INFINITY^0 SUBJECT ?? ()[^$WHITESPACE-~]
     { RECIPE = "${RECIPE:+$RECIPE }UBE_D-01-ES" }

-- 
Note: my 8.5-year-old netcom.com address will soon disappear
when Netcom closes its doors.  Please update your records to
reflect my new address as: Dallman Ross <dman(_at_)NotNetcom(_dot_)com>

Ex-Netcom users: get free email forwarding at NotNetcom.com!
NOT affiliated with EarthLink, Inc.'s Netcom brand-identity.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>