procmail
[Top] [All Lists]

Re: Defining comments

1999-11-28 09:24:45
[Attempting to move this to procmail-dev. Note Reply-To:]

On Sat, 27 Nov 1999 20:46:07 -0600, Philip Guenther 
<guenther(_at_)gac(_dot_)edu>
wrote on the Procmail mailing list:
(This is how this came through. Doesn't look right, does it? Perhaps
you have a real NUL and a real DEL there, though. Or NUL to ~ (126))
To match the range NUL to ~ you have to reverse it and match everything
not in the range ?-ÿ (200-377):
     char =  '[^?-ÿ]'
This is because it is impossible to include a literal NUL in a
procmail regexp: it is treated as the end of the string instead.
E-mail messages may contain NULs, but procmail variables and
rcfiles may not meaningfully do so.

Argl. Yes you can "meaningfully" want to include a NUL in a regular
expression. I was always under the impression that this would not be a
problem (because Procmail is "8-bit clean", although that technically
doesn't have anything with this to do, but you tend to see 7-bitness
and over-sensitivity to NULs being fixed at the same time). This
shouldn't even be very hard to fix, should it?

Bare SMTP doesn't cope well with raw NULs, of course, but you could
have something like a base64 decoder putting NULs in the body and you
could meaningfully want to process that just like any other message. I
can see how NULs can be slightly tricky to handle with stock C library
functions but that's no real excuse for not Doing It Right.

  ctext        = "([-'*-[]-~])+"
This doesn't look right, either. In regular expressions in general,
any ] is the closing bracket unless it's the first character in the
class (after any ^ modifier and possibly -) but frankly, I'm not sure
Procmail follows tradition here 100%. Anyway, I think your regex makes
sense intuitively, but I wouldn't be too sure grep and friends would
agree.
The only place inside a character class where a close bracket is not
treated as the end of the class is as the first characters in the class,

I think it would make sense to allow for [*-]] -- like I wrote above,
I think it makes sense intuitively, although, like you note, it's not
commonly done that way.

[^-][]+ matched []
Huh? I would have expected this to match abalaba (or maybe ()) but
certainly not this.
As with close bracket, a minus sign is not special as the first
character of a characters class skipping an optional negation.

Many other regex implementations interpret [^-]] as "any character
except minus or close bracket". If you don't allow for that, there is
no way really to define that particular character class (with
variations). I think Procmail should be changed to behave.

(The previous sentence was longer originally but it's good enough like
this :-)

Furthermore, if a regexp ends during the parsing of a character
class, procmail will close the class internally. So, the regexp
     [^-][]+
is parsed as:
     character class matching everything but - and NL
     character class matching ] and +
Now does the result make sense?

Yes and no. I think some of the rules here are wrong. And I think
there should at least be a warning if you hit end of regex while the
parser is in the middle of a character class.

2. How to exclude CR from ctext? I just don't see how i chould specify
   this here.
Hey, I had never thought of that. How +would+ one do that? Philip?
CR = control-M is not special to procmail.  Just put on in your rcfile.

I meant line feed (ctrl-J) here of course. I guess probably that's
what Rejo meant, too.

Actually it would be very nice if there was a \xFF or \0777 type octal
escape facility, too, but I guess it's one of those things that should
have been designed in from the beginning, or not at all. (Generalized
repeat counts is another but I got the feeling at one point it wasn't
completely out of the question that this would be implemented one day.)

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition

<Prev in Thread] Current Thread [Next in Thread>