Philip Guenther wrote:
Tim Showalter <tjs(_at_)psaux(_dot_)com> writes:
Michael Haardt wrote:
Shouldn't the implementation be free what to convert them to? It may
chose a different unicode representation. Do we need to enforce UTF-8?
I realize the language in 3028 clearly implies implementations are
required to convert to UTF-8 but if an implementation wants to use UCS-4
or UCS-2 or UTF-7 internally, that must be allowed. The specification
has no power to specify behavior that can't be externally observed, and
the text you cited is just wrong.
...
For 3028bis, 2.7.2, paragraph 2, how about:
Comparisons are performed in Unicode. Implementations convert
text from header fields in all charsets [HEADER-CHARSET] to
Unicode as input to the comparator (see 2.7.3). Implementations
must be capable of decoding US-ASCII, ISO-8859-1, the US-ASCII
Shouldn't this be MUST?
subset of ISO-8859-* character sets, and UTF-8.
with the new normative reference:
[HEADER-CHARSET] Moore, K., "MIME (Multipurpose Internet Mail
Extensions) Part Three: Message Header
Extensions for Non-ASCII Text", RFC 2047,
November 1996
Sounds good to me.
Hmm, I think the paragraph needs to also specify that text in unknown
charsets never matches, no?
(You can't just map them to U+FFFD (replacement character) because you
don't know how many characters are encoded!)
Alexey