On Thu, Oct 20, 2005 at 07:44:05PM +0100, Dave Cridland wrote:
We need to restrict this discussion to just the one mailing list,
really, but I've posted a message saying that actually, the reverse
is true - comparators match on octet strings, and happen to have a
decode built in - hence i;octet doesn't decode, and i;ascii-* both
decode using ASCII.
What exactly do you mean by "decode"? Removing the MIME encoding or
converting the character set?
The notion that comparators work on character strings is a notion
that comes pre-flawed - ACAP does not operate on character strings,
but octet strings, which might on a good day happen to be UTF-8
encoded text, but might be anything.
That explains why we have that mess. Over here, users certainly expect
"en;ascii-case" to match characters, and will be confused if the first
test is true and the second is not, and yet more, if both are false:
Subject: =?utf8?q?A=c3=a4?=
:comparator "en;ascii-casemap" :matches "a?"
:comparator "i;octet" :matches "A?"
If "i;octet" operates on octets, we can't talk of unicode, but need
to talk about UTF8 for comparisons, and users will ask instantly:
How can I match characters case sensitive? The base spec makes me think
"i;octet" is just that, and operating on characters, despite the name.
Section 2.7.1, Match Type, does not mention octets anywhere.
I've also suggested that where all the protocol has is a character
string, then the semantics of a comparator must behave as though the
string were encoded using UTF-8 (possibly by actually doing so).
Are you saying that even using "en;ascii-casemap", the wildcard "?"
does not match a single character outside US-ASCII?
Alexey: No matter how this turns out, could we add the above example
including the result to the base spec?
Michael