Re: 3028bis open issue #3: require 2047 decoding?


Michael Haardt <michael(_at_)freenet-ag(_dot_)de> writes:

On Thu, Jun 30, 2005 at 05:43:53PM -0700, Philip Guenther wrote:

     Comparisons are performed in Unicode.  Implementations convert
     text from header fields in all charsets [HEADER-CHARSET] to
     Unicode as input to the comparator (see 2.7.3).  Implementations
     must be capable of decoding US-ASCII, ISO-8859-1, the US-ASCII
     subset of ISO-8859-* character sets, and UTF-8.


That sounds good.

Hmm, I think the paragraph needs to also specify that text in unknown
charsets never matches, no?


Either that or not decoding it, if it can not be converted.


So, if the implementation didn't understand the charset, it would
instead convert the raw ASCII (e.g., "=?charset?Q?blah?=") to Unicode
(an identity function, yes) and feed that to the comparator?  I guess I
can see _a_ use to that (scripts could match "=?charset?" to check for
use of particular charsets that the implementation doesn't support).

Hmm, I don't see any guidance in RFC 2047 or 2048 that could apply to
this.  Oh well...

But now that you cited the scope of 3028bis, can we do that?


Good question.  RFC 3028 did not specify how an implementation should
handle charsets that it doesn't understand, effectively leaving it
implementation defined.  Is that causing interoperability problems?  If
so, then my understanding it that fixing it is in scope.  If not, then
we shouldn't add new constraints on implementations.  Is there consensus
that it's an interoperability problem with _and_ that we can obtain
consensus on how to resolve it?

Meanwhile, perhaps one of our chairs could speak on the procedural
point?


Philip Guenther