ietf-822
[Top] [All Lists]

Re: SIGH! Re: text --> IA5 ?

1991-04-18 23:07:49
Peter Vanderbilt writes, in part...

First, IA5, what X.400 uses, is based on ISO-IR-2.
   "Based on"?  No.  No more than ASCII is "based on" 6.  They are, by 
no coincidence, identical in the graphics characters but certainly 
ASCII, and, if I recall, IA5, are "character set first, registration of 
graphical repertoire later".

Second: As I understand it, each registration numbers refers to only a
part of a typical character set.
   In general, for what we mean by "character set", one needs at least 
two--one for graphics and one for controls.  For many purposes, of 
course, one does not care about the controls.  But, in a document that 
references things like CR and LF, one must be careful.

For example, ISO 8859/1 (Latin-1) has characters from registrations 6
and 100 -- the 6 refers to ASCII and the 100 to the right hand part of
8859/1.  There are additional registration numbers for the control
characters.  Calling 8859/1 "ISO-IR-100" would be inexact, at best.
  Yes.  And for those who believe that "ASCII" is the only source of 
terrible confusion around here, note ISO8859-1 (both graphic sets and 
both control sets) is called "Latin-1" and that registration 100 is 
called--you guessed it--"Latin-1".

The differences between
versions and vintages are fairly minor and are probably not adhered to
by real systems anyway.  
   Before our European colleagues wake up and have to generate flames 
early in the morning...
   There is an ISO Standard, 646 (note low number), which started with 
ASCII as a departure point.  Traditionally, 646 has specified two 
"versions".  One of those, the "international reference version" is 
identical to ASCII with the substitution of "universal currency symbol" 
for "dollar sign".  The other, however, is something called the "basic 
version".  It reserves about a half-dozen character positions that ASCII 
uses for special characters  for "national use" characters, leading to 
roughly one national variation per country.  And "real systems" pay 
attention: if nothing else, these national characters show up on 
keyboard keytops, printers, and usually screens.

Does everybody reading this mail see
"$(_at_)[]\^`{}|~" as dollar sign, at sign, square brackets, back slash,
caret, back quote, curly brackets, vertical bar and tilde (hope I've
got the names right!)?
  In a word, no.  Letters with umlauts, and cedillas, and grave and 
acute accents, and slashes, and circles over letters, and question marks
with the little curvy part at the bottom and the dot at the top, and... 

 For the future we should nail down the
character set as exactly as possible, including whether regionally
varying renditions are allowed.
    Yeah.  And we need to be clear about whether we are specifying 
(nailing down) graphic coding only (e.g., ISO-RN-6) or both graphics and
controls (e.g., ASCII).

Sorry, Stef, it really isn't going to be easy :-)
    --john
-------

<Prev in Thread] Current Thread [Next in Thread>