[Top] [All Lists]

Re: 3028bis open issue #3: require 2047 decoding?

2005-07-04 03:53:04

Philip Guenther wrote:

Tim Showalter <tjs(_at_)psaux(_dot_)com> writes:
Michael Haardt wrote:
Shouldn't the implementation be free what to convert them to? It may
chose a different unicode representation.  Do we need to enforce UTF-8?
I realize the language in 3028 clearly implies implementations are required to convert to UTF-8 but if an implementation wants to use UCS-4 or UCS-2 or UTF-7 internally, that must be allowed. The specification has no power to specify behavior that can't be externally observed, and the text you cited is just wrong.

For 3028bis, 2.7.2, paragraph 2, how about:

        Comparisons are performed in Unicode.  Implementations convert
        text from header fields in all charsets [HEADER-CHARSET] to
        Unicode as input to the comparator (see 2.7.3).  Implementations
        must be capable of decoding US-ASCII, ISO-8859-1, the US-ASCII
Shouldn't this be MUST?

        subset of ISO-8859-* character sets, and UTF-8.

with the new normative reference:

[HEADER-CHARSET]        Moore, K., "MIME (Multipurpose Internet Mail
                        Extensions) Part Three: Message Header
                        Extensions for Non-ASCII Text", RFC 2047,
                        November 1996
Sounds good to me.

Hmm, I think the paragraph needs to also specify that text in unknown
charsets never matches, no?

(You can't just map them to U+FFFD (replacement character) because you
don't know how many characters are encoded!)