3028bis open issue #3: require 2047 decoding?



Currently, 3028bis, section 2.7.2 is unchanged from RFC 3028:

----
2.7.2.   Comparisons Across Character Sets

   All Sieve scripts are represented in UTF-8, but messages may involve
   a number of character sets.  In order for comparisons to work across
   character sets, implementations SHOULD implement the following
   behavior:

      Implementations decode header charsets to UTF-8.  Two strings are
      considered equal if their UTF-8 representations are identical.
      Implementations should decode charsets represented in the forms
      specified by [MIME] for both message headers and bodies.
      Implementations must be capable of decoding US-ASCII, ISO-8859-1,
      the ASCII subset of ISO-8859-* character sets, and UTF-8.

   If implementations fail to support the above behavior, they MUST
   conform to the following:

      No two strings can be considered equal if one contains octets
      greater than 127.
----

That is, support for RFC 2047 is only a SHOULD and not a MUST.  Do
we want to leave that as is or should it be made stricter, with a
MUST support RFC 2047, MUST support conversion of US-ASCII and UTF-8
and SHOULD support conversion of ISO-8859-1 and the US-ASCII subset
of ISO-8859-*?



This question arose from a query from Bob Johannessen during the
WGLC of the 'body' extension on that extension's requirements for
charset conversion.  Those requirements currently read:

   MIME parts identified as using charsets other than UTF-8 as
   defined in [UTF-8] SHOULD be converted to UTF-8 prior to the match.
   A conversion from US-ASCII to UTF-8 MUST be supported.
   If an implementation does not support conversion of a given
   charset to UTF-8, it MAY compare against the US-ASCII subset
   of the transfer-decoded character data instead.  Characters from
   documents tagged with charsets that the local implementation
   cannot convert to UTF-8 and text from mistagged documents MAY
   be omitted or processed according to local conventions.

Now, I had always read that paragraph as requiring support for
matching UTF-8 against text parts labelled as being in UTF-8 or
US-ASCII; treating a UTF-8 part as if it was US-ASCII with some
unmatching octets was banned.  On reflection, I now think that
paragraph should be read as a giving practically the same choice
as section 2.7.2 in the base-spec and that it should therefore be
simply replaced with a reference to that section in the base-spec,
ala:
        Implementations MUST use the same rules for comparisons
        against body parts in charsets other than UTF-8 as they use
        for comparisons against header fields in such charsets (c.f.
        [SIEVE] section 2.7.2).


What are peoples' opinions on the base-spec and body-extension
requirements in this area?


Philip Guenther