How about the following for the second paragraph of 2.7.2:
Comparisons are performed in Unicode. Implementations convert
text from header fields in all charsets [HEADER-CHARSET] to
Unicode as input to the comparator (see 2.7.3). Implementations
MUST be capable of converting US-ASCII, ISO-8859-1, the US-ASCII
subset of ISO-8859-* character sets, and UTF-8. Text that the
implementation cannot convert to Unicode for any reason, MAY be
omitted, treated as plain US-ASCII (including any [HEADER-CHARSET]
syntax), or processed according to local conventions,
Thought I sent a note about this, but cannot find it... Anyway, I think this
is fine, although I'd be tempted to out "treat as plain US-ASCII" first on the
list.
Definitely first on the list. I actually think this *is* best practice,
despite Ned's having said he doesn't think there is a best practice on
this, and, in particular, I'd like to eliminate "omitted". Consider:
Subject: =?bogus-charset?Q?Buy Viagra now!?=
Do you really want my rule that says
if (header :contains ["subject"] ["viagra"]) {
discard;
stop;
}
to be ignored because the spammer put in a bogus character set name
(perhaps purposefully, to screw up Sieve scripts)?
Barry
--
Barry Leiba, Pervasive Computing Technology
(leiba(_at_)watson(_dot_)ibm(_dot_)com)
http://www.research.ibm.com/people/l/leiba
http://www.research.ibm.com/spam