ietf-822
[Top] [All Lists]

Re: character sets

1991-04-30 13:50:05
1) I'd like to see a description of some of these character sets that
are being proposed like ISO-2022. Are they ftpable?

ISO 2022 is not a character set, but a way to combine a set of 
character sets. You may call 2022 an encoding scheme.

I do not think (agree with Erik) that the wordings of ISO 2022
is available electronically (like RFCs are). ISO has copyright to its
standards, and the national member bodies of ISO actually fund some
of their activity on copying the standards. (so their steep document
prices go to a honourable activity).

Actually in the case of character sets it is also not easy to
have all these weird characters represented and ftp-able
(Oh, we need some RFCs to be able to mail such data, folks:-)

But quite some of the character sets are tabled in some files 
available by ftp from dkuug.dk:pub/ch.shar* - I did most of the work
producing this. Also something to be used for ISO 2022 is tabled here,
namely the designator byte of the ECMA registration. The latter work
is in an intermediate state, though.

The method I have used to represent a character (any weird one, that is...)
is to use the unique long descriptive name that ISO/IEC JTC1/SC2
has produced and published in the DIS 10646, instead of the actual
graphical character itself. These long names are in the style of
"GREEK CAPITAL LETTER ALPHA WITH ACCENT". To be able to table the
character sets conveniently I invented a short name for each
of the characters, mostly 2 characters long. With these short names
it was possible to table an average 8-bit character set in just
under 16 lines. The tables contain almost all of the ECMA registry
(about 60 character sets) and some 40 vendor-defined character sets.

2) The internet tradition is of freely and conveniently available
documents. [And indeed many think the ISO documentation system is
suicidal]. I hope we all agree that any description of character
sets in the core RFC must be freely available. In fact if they are
not, and we still want to use them then they should be republished
as RFCs. And if that is not possible for copyright reasons we should
define our own standard. 

RFCs describing character sets might have to be postscript only.

Or use the method above, which can be done in plain ASCII, like all
other RFCs.

keld

<Prev in Thread] Current Thread [Next in Thread>