Re: Filtering on raw (non-decoded) headers?


On Thu, 14 Feb 2008, Ned Freed wrote:

        If somebody does not want to accept headers with Cyrillic characters,
s/he has to list all character sets allowing Cyrillic characters,
including UTF-8.


OK, I seem to be missing something basic here, but given that a properly
functioning Sieve implementation will decode all encoded words using
well-known charsets to Utf-8, why would you need to list anything other than
the Cyrillic characters as they appear in UTF-8?

Using :regex with a bracket expression covering the entire Cyrillic range in
Unicode (0400-04FF) strikes me as the obvious way to do this.

It would arguably be better to use a regexp of "\p{Cyrillic}", as there'sapparently a couple Cyrillic characters outside the Cyrillic block now. Toquote http://www.unicode.org/Public/UNIDATA/Scripts.txt:

1D2B          ; Cyrillic # L&       CYRILLIC LETTER SMALL CAPITAL EL
1D78          ; Cyrillic # Lm       MODIFIER LETTER CYRILLIC EN

...but those aren't in the koi8-r tables I've seen, so they may beirrelevant to the goal. As always, figuring out what the actual goal isis key to finding the optimal implementation.



Philip Guenther

Previous by Date:	Re: Filtering on raw (non-decoded) headers?, Дилян Палаузов
Next by Date:	Re: Filtering on raw (non-decoded) headers?, Arnt Gulbrandsen
Previous by Thread:	Re: Filtering on raw (non-decoded) headers?, Дилян Палаузов
Next by Thread:	Re: Filtering on raw (non-decoded) headers?, Arnt Gulbrandsen
Indexes:	[Date] [Thread] [Top] [All Lists]