[Top] [All Lists]

Re: status of 3028bis

2005-10-20 15:25:22

On Thu, Oct 20, 2005 at 07:44:05PM +0100, Dave Cridland wrote:
We need to restrict this discussion to just the one mailing list, 
really, but I've posted a message saying that actually, the reverse 
is true - comparators match on octet strings, and happen to have a 
decode built in - hence i;octet doesn't decode, and i;ascii-* both 
decode using ASCII.

What exactly do you mean by "decode"? Removing the MIME encoding or
converting the character set?

The notion that comparators work on character strings is a notion 
that comes pre-flawed - ACAP does not operate on character strings, 
but octet strings, which might on a good day happen to be UTF-8 
encoded text, but might be anything.

That explains why we have that mess.  Over here, users certainly expect
"en;ascii-case" to match characters, and will be confused if the first
test is true and the second is not, and yet more, if both are false:

Subject: =?utf8?q?A=c3=a4?=

:comparator "en;ascii-casemap" :matches "a?"
:comparator "i;octet" :matches "A?"

If "i;octet" operates on octets, we can't talk of unicode, but need
to talk about UTF8 for comparisons, and users will ask instantly:
How can I match characters case sensitive? The base spec makes me think
"i;octet" is just that, and operating on characters, despite the name.
Section 2.7.1, Match Type, does not mention octets anywhere.

I've also suggested that where all the protocol has is a character 
string, then the semantics of a comparator must behave as though the 
string were encoded using UTF-8 (possibly by actually doing so).

Are you saying that even using "en;ascii-casemap", the wildcard "?"
does not match a single character outside US-ASCII?

Alexey: No matter how this turns out, could we add the above example
including the result to the base spec?


<Prev in Thread] Current Thread [Next in Thread>