ietf-822
[Top] [All Lists]

MIME charset (not charset in general) and its properties

1993-11-11 21:16:55
As the CJK disambiguation is necessary word-by-word (don't forget that
Harald proposes to handle multi-lingual document) and in header part,
and the disambiguation is necessary only for the specific character
set: ISO10646/UNICODE, language tag is not a good mechanism for the
disambiguation. It's better to use ISO10646/UNICODE with the
charset names "iso-10646-<language tag>" for single language only.
                                                 ^^^^^^^^^^^^^^^^^^^^

I fail to see why

   Content-Type: Text/Plain; charset=iso-10646-chinese

would solve the problem of word-by-word distinction between
Chinese and Japanese in a multi-lingual text any better than

No, of course. I wrote "single language only".

But the inaility is specifically to ISO 10646. It is not the inability
of other encoding systems such as full ISO 2022.

Also, ISO 8859 for Arabic/Hebrew has ambiguity on directionality, so that
additional information is necessary.

Thus, to use some encoding system need profiling.

The problem should, I think, be solved by registering charset name with
such profiling information.

   Content-Type: Text/Plain; charset=iso-10646
   Content-Language: zh (Chinese)

Neither of them does, I think, and the latter approach seems
cleaner to me, as it doesn't confuse language with coded
character set.

Don't confuse the language of the content and language of the script.

I can write Japanese with ASCII characters.

"Watashiha ASCII mojide nihongo wo kakemasu" is the Japanese translation
of the sentense above.

Thus, your suggestiton should have been:

   Content-Type: Text/Plain; charset=iso-10646
   Content-Script-Language: zh (Chinese)

or

   Content-Type: Text/Plain; charset=iso-10646 charset-language=zh

But, then, how can you encode content-script-language header in header?

                                                Masataka Ohta