That is, support for RFC 2047 is only a SHOULD and not a MUST. Do
we want to leave that as is or should it be made stricter, with a
MUST support RFC 2047, MUST support conversion of US-ASCII and UTF-8
and SHOULD support conversion of ISO-8859-1 and the US-ASCII subset
of ISO-8859-*?
I would prefer that, but:
If implementations fail to support the above behavior, they MUST
conform to the following:
No two strings can be considered equal if one contains octets
greater than 127.
To me, that states that if an implementation fails to convert a character
set to UTF-8, two strings can not be equal if one contains octets greater
than 127. Assuming that all unknown character sets are one-byte character
sets with the lower 128 octects being US-ASCII is not sound. But perhaps
that's not what was meant.
MIME parts identified as using charsets other than UTF-8 as
defined in [UTF-8] SHOULD be converted to UTF-8 prior to the match.
Shouldn't the implementation be free what to convert them to? It may
chose a different unicode representation. Do we need to enforce UTF-8?
If an implementation does not support conversion of a given
charset to UTF-8, it MAY compare against the US-ASCII subset
of the transfer-decoded character data instead. Characters from
documents tagged with charsets that the local implementation
cannot convert to UTF-8 and text from mistagged documents MAY
be omitted or processed according to local conventions.
That sounds more useful than RFC 3028 to me, but I slightly prefer to
match the raw transfer-encoded data. Why bother decoding it, if you
can entirely unsure what it may be anyway?
Whatever the result is, I agree that comparisons for header and body
should be the same.
Michael