procmail
[Top] [All Lists]

Re: Defining comments

1999-11-26 02:07:53
On Fri, 26 Nov 1999 08:22:46 +0100, Rejo Zenger
<subs(_at_)sisterray(_dot_)xs4all(_dot_)nl> wrote:
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
  char         = "[-~]+"

(This is how this came through. Doesn't look right, does it? Perhaps
you have a real NUL and a real DEL there, though. Or NUL to ~ (126))

     ctext       =  <any CHAR excluding "(",     ; => may be folded
                     ")", "\" & CR, & including linear-white-space> 
  ctext        = "([-'*-[]-~])+"

This doesn't look right, either. In regular expressions in general,
any ] is the closing bracket unless it's the first character in the
class (after any ^ modifier and possibly -) but frankly, I'm not sure
Procmail follows tradition here 100%. Anyway, I think your regex makes
sense intuitively, but I wouldn't be too sure grep and friends would
agree.

I've been to lazy to ever test this part of Procmail fully, but here
are some things to look at:

 $ cat <<'HERE'>scratch/regex.rc
SHELL=/bin/sh # always remember this
DEFAULT=/dev/null
VERBOSE=yes

foo='abalaba[]()[]'

:0
* foo ?? ()\/[^[]+
* foo ?? ()\/[[].*
* foo ?? ()\/[]].*
* foo ?? ()\/[^]]+
* foo ?? ()\/[[-]]
* foo ?? ()\/[[-]].*
* foo ?? ()\/[^-][]+
* foo ?? ()\/[^-[]]+
{ LOG="oh well
" }

:0
* foo ?? ()\/[-'*-[]-~]
{ LOG="no problem
" }
HERE

 $ procmail -m scratch/regex.rc </dev/null
 procmail: [9977] Fri Nov 26 10:52:35 1999
 procmail: Assigning "foo=abalaba[]()[]"
 procmail: Assigning "MATCH="
 procmail: Matched "abalaba"
 procmail: Match on "()\/[^[]+"
 procmail: Matched "[]()[]"
 procmail: Match on "()\/[[].*"
 procmail: Matched "]()[]"
 procmail: Match on "()\/[]].*"
 procmail: Matched "abalaba["
 procmail: Match on "()\/[^]]+"
 procmail: Matched "[]"
 procmail: Match on "()\/[[-]]"
 procmail: Matched "[]()[]"
 procmail: Match on "()\/[[-]].*"
 procmail: Matched "[]"
 procmail: Match on "()\/[^-][]+"
 procmail: No match on "()\/[^-[]]+"
 procmail: No match on "()\/[-'*-[]-~]"
 procmail: Assigning "LASTFOLDER=/dev/null"
 procmail: Opening "/dev/null"
   Folder: /dev/null                                                          0

The "Matched" log entry comes before the corresponding "Match" but
here's a deciphered version

  [^[]+   matched abalaba
  [[].*   matched []()[]
  []].*   matched ]()[]
  [^]]+   matched abalaba[
  [[-]].* matched []()[]

These are exactly the way you would expect, given general regex
principles of longest-leftmost matching and the rules for how these
special cases of classes should be interpreted.

  [^-][]+ matched []

Huh? I would have expected this to match abalaba (or maybe ()) but
certainly not this.

  [^-[]]+ didn't match anything
  [-'*-[]-~] didn't match anything

Perhaps there should be a syntax error or at least a warning from
Procmail. These are illegal or at least weird regex syntax. (But
perhaps it should cope with the latter, in fact.)

1. I think i cannot have $comment to be complete correct because of this
   call to itself. This will cause a problem with nested "(" and ")" i
   guess. Can i escape these problems by adding those parentheses to
   both $ctext and $quoted_pair?

Basic language theory. Regular expressions are not a powerful enough
formalism to deal with nested parentheses. Briefly, a regular
expression is computationally equivalent to a simple automaton where a
match causes a transition from one state to another. The automaton
doesn't have a way to remember whether it's been in the same state
before, so it can't know how many opening parens there have been.

2. How to exclude CR from ctext? I just don't see how i chould specify
   this here.

Hey, I had never thought of that. How +would+ one do that? Philip?

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition

<Prev in Thread] Current Thread [Next in Thread>