procmail
[Top] [All Lists]

Re: non-INCLUDERC-ing string-splitter

2005-01-12 06:23:31
Toen wij Bart Schaefer kietelden, kwam er dit uit:
Ruud H.G. van Tol:

Bart: please try it. (it = v1.01)

I had a look at split.102.inc ...

That is in development, better stick to v1.01 for now.


interesting approach, though once it
gets up to needing $reA or so I'd begin to worry about
PROCMAIL_OVERFLOW, especially with the newline-matching variant of
$re0.

That isn't a real problem, since one can treat a buffer line-by-line.

But still, with a LINEBUF of 21000 it can do a split (with newlines)
at 4000 (because of the 5-character-regex it needs a LINEBUF that is
at least 5 * the split-position, plus some), and that is already $reB.

So with a LINEBUF of 65K it can do a split (of a buffer with embedded
newlines) at position 13K. More than enough for parsing e-mailaddresses.
:)

Also try if your procmail supports "[.$NL]" (mine didn't).
With a 4-character regex, a split at 16K must be possible.


I'd also say that, if there were any danger of someone beginning to
maintain procmail again, the use of a leading $ in a variable's value
to re-expand the rest of the value is exactly the sort of thing that
I'd expect to be treated as a bug and therefore to stop working at
some point.  Consider:

LOOP='$ $LOOP'
:0
* $ $LOOP
{ }

Is that 'bug' exploitable by cleverly constructing an e-mailmessage? No.
It can be used by a user, so let's tell bugtraq about it. ;)

A certain limit on reparsing, of let's say 256 levels, would be a good
thing. Maybe a new variable PARSEMAX (a bit like LINEBUF) that you can
set to any value between 0 and 1023, and that defaults to 256.

And if you check split.inc: I am not using anything like that.
I do use a variable that holds exactly the number of reparsings
that are needed, and no more.


I can't believe that behavior is intentional, although your
exploitation of it is quite clever.


The way I use it is AFAIK intentional behavior. I have seen
procmail-code
of long ago that uses he same kind of reparsing (though less extreme).

The variant with the LOOP = '$ $LOOP' can't be intentional behavior, but
that is not what I use.


A suggested enhancement:  split the input message if split_Str is not
set, perhaps with an option to split the header, body, or the whole
thing.

No, that is not modular.


It is too easy to put headers or body in split_Str yourself:

Alternative 1:

  :0
  * ^^\/.*($.*)*
  { H = "$MATCH" }

  split_Str = "$H"


Alternative 2 (with v1.02):

  :0
  * ^^\/.*($.*)*
  { H = "$MATCH" }

  split_Reparse = 'T'
  split_Str     = '$H'     # 'smart' pointer


Alternative 3 (with v1.02):

  split_Reparse = 'T'
  split_Str     = 'H'      # uses procmail's special H
                           # (which saves memory)


But don't start using v1.02 yet ...

-- 
Grtz, Ruud


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>