procmail
[Top] [All Lists]

Re: | vs. ? And another basic question

2003-01-07 22:58:02
On  7 Jan, Jefferis Peterson wrote:
| Question 1:
| 
| In this rc,  how does ? Differ in function from the | :
| * $ $OR 
| ^Received:.*[[(]209\.236\.([0-9]|[1-5][0-9]|6[0-3])\.[0-9][0-9]?[0-9]?[])]
| 
| I presume that the " | "is the standard  'or'  so that the third set of
| numbers could be a single character, or 2, but how does that differ from the
| 4th set of numbers which are separated by the ? Instead of the |

(this|that), an alternation,  matches "this" or "that".

[a-m0-5], a character class, matches any single chracter between "a" and
"m", or between 0 (zero) and 5.  What falls between is determined by the
ASCII table (man ascii).

? modifies the preceding character, character class, or parentheses
enclosed group to be optional (i.e. match 0 or 1 time).

* modifies the preceding character, character class, or parentheses
encloded group to match 0 or more (unlimited) times.

+ modifies the preceding character, character class, or parentheses
enclosed group to match 1 or more times.

Examples:

(abc|def)*(1a|2b) matches abcabcdef2b, 1a, but NOT abd1a
(abc|def)?[xyz] matches abcx, y, defz, but NOT abcdefx
[abc]+[123]  matches a2, aa1, abcb2, abc3, but NOT 1, 2, or 3
([abc]|[123]) matches a, b, c, 1, 2, or 3

In your IP regexp, [0-9][0-9]?[0-9]? matches any sequence of 1-3 digits.

| Question 2:
| Can you shorten the IP address to cover 0-255
| I was wondering if you need to include the last digits:
| Received:.*[[(]212\.154\.3[2-6]?[])]
| I've identified a spammer who owns 212.154.32 to .36  /0 -255 in those
| ranges. 
| 
| I was wondering if the following recipe covers all those ip's or do you need
| to add in factors for the last 3 digits?
| * ^Received:.*[[(]212\.154\.3[2-6]?[])]

You can do what you want, but not that way.  First off, all these
regexps are hosed because the [ and ] characters are special - denoting
character classes.  If you want to match a special chracter literally,
it needs to be backwacked (backslash escaped) (e.g \[). Also, you don't
care what that last octet is, but it will be there so you have to allow
for it.

Try:

  * ^Received:.*\[212\.154\.3[2-6]\...?.?]

The . (dot) matches any single character, so \...?.? will match any 1-3
digit octet after 212.154.3[2-6].  Note, it will also match ANY 1-3
character string, say "a", "=@", or "2b%", but that's probably ok in
this case.  You do have to be careful matching Received: headers because
you get all kinds of different things there from different MTAs.  But
this particular "looseness" doesn't seem likely to generate false
positives.  If that's a concern, use:

"\[212\.154\.3[2-6]\.[12]?[0-9]?[0-9]]".

That's still not perfect, but better. If you wan't to narrow it down to
virtually no possibility of mismatch use:

"\[212\.154\.3[2-6]\.(0|[1-9][0-9]?|1[0-9][0-9]|2([0-4][0-9]|5[0-5]))]".

Note, I make no representation that that's the most efficient regular
expression, but it will match any legal octet.

-- 
Email address in From: header is valid  * but only for a couple of days *
This is my reluctant response to spammers' unrelenting address harvesting



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail