[Top] [All Lists]

Re: Regex matching complete headers.

2001-01-03 08:41:06
2 points and some suggestions about the Sieve Home page


In the regex draft we have in the example:

            # or the subject is all uppercase (no lowercase)
            header :regex :comparator "i;octet" "subject"
              "^[^:lower:]*$" ) {

What if the Subject is mutliline and one of the lines contains uppercase
letters, while the other contains only lowercase letters?  ie.

Subject: this is the first line that contains only lowercase
  this is a contination of the Subject header but it contains UPPERCASE 

This is not an example of multiple subject lines, it is an example of a single
folded subject line. Folding is removed prior to comparison; it is arbitrary
and can change during message transit. Unfortunately, the sieve specification
doesn't make this clear, and it should. (The reason it doesn't is probably that
the handling of folding is considered to be such a fundamental characteristic
of email headers that nobody thought to point it out.)

The regular expression "^[^:lower:]*$" is going to match the first line, and
therefore give us a match, but this isn't what we intended.

No, it will apply to the entire unfolded line, so it won't match.

I have a customer who quite sensibly would like to filter all messages that
have either no from header, or an empty from header.  Our exists test will 
if the header exists but is empty, so we need a regex test too that tests to
say if the header is completely empty.  Initial thoughts would be something
along the lines of "^[[:space:]]*$ but then this will match the following from


as the very first line contains nothing but spaces.

Again, this is incorrect. There is no empty From: field here.

Can we currently do this with Sieve at present?

Yes you can.

If not perhaps we need to be
able to use :regex with :is or :contains?  I see it says in the Sieve draft
that we can't give more than one match type, but then perhaps this is a
suitable occasion?


Could we also allow:
  a.. \w in place of [:word:]
  b.. \s in place of [:space:]
  c.. \d in place of [:digit:]
  d.. \l in place of [:lower:]
  e.. \u in place of [:upper:]

I find them really quite useful.  Or are we trying to stick to POSIX rigidly.
(Still haven't got a copy of this.  Bloomin have to bloomin pay for it. Grrr)

Personally, I think there are real advantages to sticking to the POSIX
specification and not introducing shortcuts. For example, it enables use of an
OOTB POSIX regex facility.

Finally, whoever is responsible for could they
please update the broken link that points to
instead of

Additionally should there not be a link to the regex work at it
seems relevant in the Internet Drafts section

I suggest you send mail to the site maintainer, and if that's not obvious
to the postmaster. This isn't appropriate for this mailing list.


<Prev in Thread] Current Thread [Next in Thread>