Re: TEXT version of Draft RFC

Greg writes:

Third, none of the content-types are for 8 bit text.  Only ASCII is
specified as a defined content type.  I realize that this is not yet
settled on the mailing list, but it would be nice to have at least a
strawman available for other content-types as well as examples.
Possible examples include the 8859-n family, the 2022 family, 646
national variant family, and the 10646 set.

It is not necessary to pick a #standard# way to encoding character
sets, but it would be very useful to at least demonstrate the various
encodings.


I have earlier provided an extensive list of character sets
and a way of encoding them, which are actually also put forward
in ISO as a proposal for encoding them. Almost all of the ECMA
registry is covered, along with some 40 vendor defined character
sets, and C routines to handle conversions between them.
The code and data are essentially free, also for commercial use.

Now, I have one very strong feeling in terms of selecting a character
set.  One of the hallmarks of Internet protocols is that they are
implementable as written.  Profiling documents should not be required
for the one implementation to work with another.  In the case of this
document, this is no longer true.  I can implement this RFCXXXX, and
not be able to send mail to another person implementing this document,
even if we are using the same language, unless we have a prior
agreement about what character set we are using.  I can write French
in Latin 1, a 2022 variant, or in Unicode, and unless all
implementations have support for all possible character sets, I cannot
count on interoperating.


With the abovementioned code and data, true interoperability
on all these character sets could be achieved.

On the other hand, I would recommend that only a selected list of character
sets should be generally accepted. NETF and EUnet has decided for 2
such universal accepted character sets namely ASCII and 10646 in compaction
method 5 level 2. If this list should be extended, I would recommend
the 8859 series and nothing more. Well, Japanese, Chinese ...

That said, I'd like to see a "common" character set defined in this
document.  At this point, that seems to point to either 10646, or
Unicode.  Both have their dis-advantages, but they are implementable.
Use of other codes are also acceptable.


True, both can do the job as a "common" character set.
Only one has a status as a (nearly completed) de jure standard.
NETF found that 10646 had some distinct advantages such as 
interoperability with ASCII and an economic compaction method
giving in essence no extra cost in transmission volume for most
European work. NETF/NORDUnet actually ruled out UNICODE unanimously.

Another option is to specify character sets for specific domains of
use, ie. Romance languages use Latin-1.  I think this gets very messy
very fast.


I think one should not bind languages and character sets together.
Papers I see written here often contain Greek or mathematical
special symbols. Latin1 cannot accomodate that although the languages
preferred here are Danish and English.

Keld