On Fri, 26 Nov 1999 08:22:46 +0100, Rejo Zenger
<subs(_at_)sisterray(_dot_)xs4all(_dot_)nl> wrote:
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
char = "[-~]+"
(This is how this came through. Doesn't look right, does it? Perhaps
you have a real NUL and a real DEL there, though. Or NUL to ~ (126))
ctext = <any CHAR excluding "(", ; => may be folded
")", "\" & CR, & including linear-white-space>
ctext = "([-'*-[]-~])+"
This doesn't look right, either. In regular expressions in general,
any ] is the closing bracket unless it's the first character in the
class (after any ^ modifier and possibly -) but frankly, I'm not sure
Procmail follows tradition here 100%. Anyway, I think your regex makes
sense intuitively, but I wouldn't be too sure grep and friends would
agree.
I've been to lazy to ever test this part of Procmail fully, but here
are some things to look at:
$ cat <<'HERE'>scratch/regex.rc
SHELL=/bin/sh # always remember this
DEFAULT=/dev/null
VERBOSE=yes
foo='abalaba[]()[]'
:0
* foo ?? ()\/[^[]+
* foo ?? ()\/[[].*
* foo ?? ()\/[]].*
* foo ?? ()\/[^]]+
* foo ?? ()\/[[-]]
* foo ?? ()\/[[-]].*
* foo ?? ()\/[^-][]+
* foo ?? ()\/[^-[]]+
{ LOG="oh well
" }
:0
* foo ?? ()\/[-'*-[]-~]
{ LOG="no problem
" }
HERE
$ procmail -m scratch/regex.rc </dev/null
procmail: [9977] Fri Nov 26 10:52:35 1999
procmail: Assigning "foo=abalaba[]()[]"
procmail: Assigning "MATCH="
procmail: Matched "abalaba"
procmail: Match on "()\/[^[]+"
procmail: Matched "[]()[]"
procmail: Match on "()\/[[].*"
procmail: Matched "]()[]"
procmail: Match on "()\/[]].*"
procmail: Matched "abalaba["
procmail: Match on "()\/[^]]+"
procmail: Matched "[]"
procmail: Match on "()\/[[-]]"
procmail: Matched "[]()[]"
procmail: Match on "()\/[[-]].*"
procmail: Matched "[]"
procmail: Match on "()\/[^-][]+"
procmail: No match on "()\/[^-[]]+"
procmail: No match on "()\/[-'*-[]-~]"
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
Folder: /dev/null 0
The "Matched" log entry comes before the corresponding "Match" but
here's a deciphered version
[^[]+ matched abalaba
[[].* matched []()[]
[]].* matched ]()[]
[^]]+ matched abalaba[
[[-]].* matched []()[]
These are exactly the way you would expect, given general regex
principles of longest-leftmost matching and the rules for how these
special cases of classes should be interpreted.
[^-][]+ matched []
Huh? I would have expected this to match abalaba (or maybe ()) but
certainly not this.
[^-[]]+ didn't match anything
[-'*-[]-~] didn't match anything
Perhaps there should be a syntax error or at least a warning from
Procmail. These are illegal or at least weird regex syntax. (But
perhaps it should cope with the latter, in fact.)
1. I think i cannot have $comment to be complete correct because of this
call to itself. This will cause a problem with nested "(" and ")" i
guess. Can i escape these problems by adding those parentheses to
both $ctext and $quoted_pair?
Basic language theory. Regular expressions are not a powerful enough
formalism to deal with nested parentheses. Briefly, a regular
expression is computationally equivalent to a simple automaton where a
match causes a transition from one state to another. The automaton
doesn't have a way to remember whether it's been in the same state
before, so it can't know how many opening parens there have been.
2. How to exclude CR from ctext? I just don't see how i chould specify
this here.
Hey, I had never thought of that. How +would+ one do that? Philip?
/* era */
--
Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition