procmail
[Top] [All Lists]

Re: Embedded comments

2003-05-26 18:32:48
R A Chatwin wrote:
On 26 May 2003 at 11:08, Daryle A. Tilroe wrote:

R A Chatwin wrote:

The matching snippet is: ${a}<(!-- ?)?${w}${w}+( *--)?>$a
where the PERL-style variables are (inherited from JARI AALTO :-)
http://info.ccone.at/INFO/Tips+Co/pm-tips.html
        a       = "[a-zA-Z]"            # word, only letters
        w       = "[0-9a-z_A-Z]"        # word

What would the drawback be of simply using?:

        [a-zA-Z]<.*>[a-zA-Z]

I.E. why all the fancy stuff in the middle if one is just trying
to catch/weight comments not surrounded by white space?



Why? To keep up with the payments on my daughter's new car ;-)

Our business is technical translations & copy editing...
Incoming unknown mail (copy and pasted technical stuff):
"... p<0.05 for a>b, and ... [lots more of the same]"
Oops! A match. Gotcha!

Actually it seems that the only thing that keeps that from being
an arbitrary potential obfuscating comment is that it does not
start with '!' or a letter.  I suppose one could use:

        [a-zA-Z]<(!|[a-zA-Z]).*>[a-zA-Z]

to narrow it down a bit.  Also won't your original regexp miss all
the obfuscating comments with spaces?

One quick question on your original regexp:  I am probably being
dense, but what is the purpose of the two "$(w)" unless you are
trying to restrict it to two or more 'word characters'?

More important is the general idea of keeping the matches for bad
guys very specific. A false positive (good guy goes to jail) is a *bug*. A false negative (bad guy gets through) is just an annoyance. The specific matches can always be gently generalized if false negatives start to occur. The other way round is not really an option.

Depending on how you handle the positives it is a, possibly preferable,
option.  For me they go into quarantine and are quickly deleted once a
day.  Note that the volume is manageable since the blacklists and a few
other sure kills get most of the crud (as an aside, once I feel the
filters are 99% false positive free I will implement an autoresponder).
The rare false positives is appropriately forwarded.  OTOH the false
negatives are never seen by me; unless of course they are ones
addressed to me or I get a complaint.

--
Daryle A. Tilroe


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>