ietf-mta-filters
[Top] [All Lists]

Re: Filtering on raw (non-decoded) headers?

2008-02-14 21:39:58

On Thu, 14 Feb 2008, Ned Freed wrote:
        If somebody does not want to accept headers with Cyrillic characters,
s/he has to list all character sets allowing Cyrillic characters,
including UTF-8.

OK, I seem to be missing something basic here, but given that a properly
functioning Sieve implementation will decode all encoded words using
well-known charsets to Utf-8, why would you need to list anything other than
the Cyrillic characters as they appear in UTF-8?

Using :regex with a bracket expression covering the entire Cyrillic range in
Unicode (0400-04FF) strikes me as the obvious way to do this.

It would arguably be better to use a regexp of "\p{Cyrillic}", as there's apparently a couple Cyrillic characters outside the Cyrillic block now. To quote http://www.unicode.org/Public/UNIDATA/Scripts.txt:
1D2B          ; Cyrillic # L&       CYRILLIC LETTER SMALL CAPITAL EL
1D78          ; Cyrillic # Lm       MODIFIER LETTER CYRILLIC EN

...but those aren't in the koi8-r tables I've seen, so they may be irrelevant to the goal. As always, figuring out what the actual goal is is key to finding the optimal implementation.


Philip Guenther