On Thu, 14 Feb 2008, Ned Freed wrote:
If somebody does not want to accept headers with Cyrillic characters,
s/he has to list all character sets allowing Cyrillic characters,
including UTF-8.
OK, I seem to be missing something basic here, but given that a properly
functioning Sieve implementation will decode all encoded words using
well-known charsets to Utf-8, why would you need to list anything other than
the Cyrillic characters as they appear in UTF-8?
Using :regex with a bracket expression covering the entire Cyrillic range in
Unicode (0400-04FF) strikes me as the obvious way to do this.
It would arguably be better to use a regexp of "\p{Cyrillic}", as there's
apparently a couple Cyrillic characters outside the Cyrillic block now. To
quote http://www.unicode.org/Public/UNIDATA/Scripts.txt:
1D2B ; Cyrillic # L& CYRILLIC LETTER SMALL CAPITAL EL
1D78 ; Cyrillic # Lm MODIFIER LETTER CYRILLIC EN
...but those aren't in the koi8-r tables I've seen, so they may be
irrelevant to the goal. As always, figuring out what the actual goal is
is key to finding the optimal implementation.
Philip Guenther