procmail
[Top] [All Lists]

Re: Who is the procmail maintainer? (revisited 2005)

2005-11-05 20:50:10
Bart Schaefer:
Ruud H.G. van Tol:

I don't follow this. How is using a :0p (to switch over to PCRE for
that recipe) reducing efficiency?

It neither the switching, nor the mechanism for switching, that's the
issue; it's the implementation of the regex matcher itself.

I don't agree. The PCRE can be dynamically loaded on first request, like
by a commandline parameter, or by not having a ~/.procmailrc but a
~/.procmailrc4, or by having a special procmail_PCRE binary, etc. The
default procmail should not basically have to change.


My original point was that simply linking procmail against a
freely-available PCRE library is  not an ideal solution, because it
is likely to result in a less efficient scanner.

But that was AFAIK not really the issue. Nobody expressed wanting to
lose anything.


The library regex
engine is almost certainly optimized for different purposes than the
one for which procmail recipes would most frequently use it.

With that I do agree, and we should not have to lose that at all.


BTW, the PCRE is suggested as an enhancement, not as a replacement.

But unless it either replaces or is in some way combined with the
existing procmail regex engine, you end up with a binary containing
two complete and distinct regex engines.  That's just silly; procmail
is intended to be a relatively small and lightweight program.

PCRE itself contains two distinct regex engines. :)
Procmail could be a daemon too.


Could you illustrate your point with some examples?

Not entirely, because to do so would mean comparing regex
implementations at the source code level.  However, your own example
is somewhat illustrative:

procmail: * ^Subject:(.*\<)?\/.*
PCRE:     * ^Subject:(?.*?\W)?(.*)

Just look how far out of your way you had to go to coerce PCRE into
emulating procmail's semantics.

That is what I use a lot with perl, so nothing "far out of my way" is
involved.
For '\<' one can use (the zero-width) '\W'; a non-greedy '*' is followed
by a '?'; a clustering non-capturing group starts with '(?:', a MATCH is
between (). Oops, I forgot to type the ':' of '(?:'.

Retry:

   procmail: * ^Subject:.*\/[^  ].*
   PCRE:     * ^Subject:.*?(\S.*)


 Do you think most users would bother?
 Do you think PCRE has been optimized for that usage?  Do you even
think that the majority of procmailrc conditions in the world require
the \/ operator?

That's also why I asked you for examples. My examples will of course be
biased towards 'more complex' usage. You were saying 'In most cases
procmail rules are merely testing for *any* match of the regular
expression, not for the "canonical" match.'.

Recipes like:

  :0HB
  * fiagra
  /dev/null

should just keep working as they do now.



Incidentally, the problem with a PCRE=on type switching mechanism
rather than :0p is illustrated by:

PCRE=on
INCLUDERC=some_old_procmail.rc

Yes, I certainly prefer :0p (and maybe :0m for added sub-MATCH-es).

BTW, procmail is about 70K, libpcre.a is about 200K.

-- 
Grtz, Ruud


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>