ietf-mta-filters
[Top] [All Lists]

Re: Matching NUL characters

2003-04-04 03:42:39

    Subject: =?iso-8859-1?q?abc=00def?=
  
  and the tests:
  
    header :contains ["Subject"] ["abc"]
    header :contains ["Subject"] ["def"]
    header :matches ["Subject"] ["abc?def"]
  
  An implementation that evaluates the second or third test as false
  is broken, isn't it?

granted.  good example.

Well, that means the implementation must already be able to deal with
strings containing NUL characters somewhere already.

I don't believe this is true.  every implementation must understand
that the sequence of N octets making up one UTF-8 character is _one_
character, not N.

Where would it make a difference, if the implementation could not decode
headers to UTF-8, which is allowed? Quoting section 2.7.2:

----------
   Implementations decode header charsets to UTF-8.  Two strings are
   considered equal if their UTF-8 representations are identical.
   Implementations should decode charsets represented in the forms
   specified by [MIME] for both message headers and bodies.
   Implementations must be capable of decoding US-ASCII, ISO-8859-1,
   the ASCII subset of ISO-8859-* character sets, and UTF-8.

If implementations fail to support the above behavior, they MUST
conform to the following:

   No two strings can be considered equal if one contains octets
   greater than 127.
----------

To me, that means an implementation could entirely forget about UTF-8.
If someone used a script that contains UTF-8 characters, it does not
make a difference but for comparisons, and those are always false if
the string contains UTF-8 encoding for unicode characters >127.

Personally, I am surprised that UTF-8 aware string comparisons do
not require an extension, since RFC conforming Sieve implementations
do not absolutely have to support it.  Depending on the (conforming)
implementation, the following test may be true or false

   Subject: =?iso-8859-1?q?abc=80def?=

   header :contains ["Subject"] ["abc"]

"Fail to support the above behaviour" means "fail to decode MIME words"
(no MIME support) or "fail to decode MIME words to UTF-8" (MIME support,
but no character set translation).  If the intention was different,
the specification should be, too.

Michael

<Prev in Thread] Current Thread [Next in Thread>