Peter Vanderbilt writes, in part...
First, IA5, what X.400 uses, is based on ISO-IR-2.
"Based on"? No. No more than ASCII is "based on" 6. They are, by
no coincidence, identical in the graphics characters but certainly
ASCII, and, if I recall, IA5, are "character set first, registration of
graphical repertoire later".
Second: As I understand it, each registration numbers refers to only a
part of a typical character set.
In general, for what we mean by "character set", one needs at least
two--one for graphics and one for controls. For many purposes, of
course, one does not care about the controls. But, in a document that
references things like CR and LF, one must be careful.
For example, ISO 8859/1 (Latin-1) has characters from registrations 6
and 100 -- the 6 refers to ASCII and the 100 to the right hand part of
8859/1. There are additional registration numbers for the control
characters. Calling 8859/1 "ISO-IR-100" would be inexact, at best.
Yes. And for those who believe that "ASCII" is the only source of
terrible confusion around here, note ISO8859-1 (both graphic sets and
both control sets) is called "Latin-1" and that registration 100 is
called--you guessed it--"Latin-1".
The differences between
versions and vintages are fairly minor and are probably not adhered to
by real systems anyway.
Before our European colleagues wake up and have to generate flames
early in the morning...
There is an ISO Standard, 646 (note low number), which started with
ASCII as a departure point. Traditionally, 646 has specified two
"versions". One of those, the "international reference version" is
identical to ASCII with the substitution of "universal currency symbol"
for "dollar sign". The other, however, is something called the "basic
version". It reserves about a half-dozen character positions that ASCII
uses for special characters for "national use" characters, leading to
roughly one national variation per country. And "real systems" pay
attention: if nothing else, these national characters show up on
keyboard keytops, printers, and usually screens.
Does everybody reading this mail see
"$(_at_)[]\^`{}|~" as dollar sign, at sign, square brackets, back slash,
caret, back quote, curly brackets, vertical bar and tilde (hope I've
got the names right!)?
In a word, no. Letters with umlauts, and cedillas, and grave and
acute accents, and slashes, and circles over letters, and question marks
with the little curvy part at the bottom and the dot at the top, and...
For the future we should nail down the
character set as exactly as possible, including whether regionally
varying renditions are allowed.
Yeah. And we need to be clear about whether we are specifying
(nailing down) graphic coding only (e.g., ISO-RN-6) or both graphics and
controls (e.g., ASCII).
Sorry, Stef, it really isn't going to be easy :-)
--john
-------