[Top] [All Lists]

Re: 'header' test and whitespace

2005-11-26 01:26:47

On Fri, 2005-11-25 at 15:53 -0800, Ned Freed wrote:
the exact method used of RFC 2047 encoding doesn't have any semantics,

In the abstract, maybe. But in practice it does. For example, it happens to
make perfect sense for me to filter out all messages that use the koi8-r 
charset. I receive maybe 300 such messages every day, and they are without
exception spam.

that will work until the spammers are savvy enough to switch to UTF-8 to
avoid such filtering.  the correct test would check for code points, not

There are also cases where language can be inferred with reasonable 
from charset, and this can be very useful in some applications. (There often
isn't enough text in a header to analyze to determine the langauge being 

yes, although I have received messages in Norwegian using GB2312
encoding and messages in English using KOI8-R encoding, it's a safe bet
that the sender of these messages understand Chinese and Russian
respectively since they've set up their e-mail program to use these

a discussion thread might have a Subject which switches back
and forth between Q and B encoding, or in the length of encoded-words
and hence the number of lines.  this is very apparent for us
non-USans :-)

The encoding probably doesn't matter, especially since it sometimes changes in
transit. Charset changes are much rarer.

not in this neck of the woods.  different users will prefer UTF-8, ISO
8859-1 and ISO 8859-15, and the Subject will switch back and forth.

so I don't think it's appropriate to make :raw be :halfraw.  the CR LF
is actually a part of the raw header.

I disagree. Folding points change all the time, they are specifically defined
not to have semantics, and I don't believe I've ever seen a case where I 
a test that is sensitive to CRLFs. The same cannot be said for encoded words
and trailing spaces.

okay, fine.  I was thinking abut cases where you want to repeat it back
at the sender, e.g.

  vacation :subject text:
Auto: I'm away from office
 (was: ${subject} )

where the fact that the CRLFs are included in ${subject} makes it
unnecessary to worry about folding of overlong subjects.  the vacation
draft should perhaps be explicit in that Subject must be folded

while we're at it, what do we do about headers which can have multiple
values, e.g. "Cc"?  (multiple headers is deprecated in 2822, but must be
supported.)  I don't have a good suggestion.  the naive approach is to
concatenate them as if they were simply separated by CR LF, but for "Cc"
you would really want to include a comma in the delimiter as well.  the
other option is to say that the first matching header is used.  this
sits well with short-circuiting logic, but means it's impossible to
capture the complete value of the header.

Simple: You test all the values as separate fields. Sieve tests already allow 
a list of fields but there's an implicit list even when there's only a single
field name specified.

yes, "but [this] means it's impossible to capture the complete value of
the header", "*" will only fetch the contents of the first one.  perhaps
a loop construct for the address test would be a more appropriate
solution to that problem?

  for.every.address ["To", "Cc"] { block }

this can of course wait until we see the need.  for the base
specification, I would like a tiny change to make this a little more

   For instance, the test `header :contains ["To", "Cc"]
-  ["me(_at_)example(_dot_)com", "me00(_at_)landru(_dot_)example(_dot_)edu"]' 
is true if either the
+  ["me(_at_)example(_dot_)com", "me00(_at_)landru(_dot_)example(_dot_)edu"]' 
is true if either a
   To header or Cc header of the input message contains either of the
   email addresses "me(_at_)example(_dot_)com" or 

Yes, I'm afraid so. Perhaps something like rawheader? (I believe the 
test is the only one for which :raw makes sense.)

I'd like it for address, too.  this is the difference between
"" and "sø".  okay, so this isn't
in the spec today, but we might as well be ready for it.

what happens if you add a :raw argument and upload to today's
implementations?  will they reject during upload?  will they ignore it
during runtime?  will they bomb during runtime?

Our implementation will return a runtime error. I believe this is the correct
behavior. We don't check during upload - our sieves are typically provisioned
via LDAP and we have no control over what tools are used to insert them into
the directory.

And yes, this makes error reporting a real challenge.

IMHO it's okay as long as it doesn't cause a runtime error.  (Cyrus 2.2
will reject upload, I haven't checked others.)

This is just another extension, so I don't see why causing a runtime
error is a problem.

right, that was my point.  if you didn't raise a runtime error, we might
get away with not declaring it an extension.  but you do, so we don't.

Kjetil T.

<Prev in Thread] Current Thread [Next in Thread>