ietf-822
[Top] [All Lists]

Re: restrictions when defining charsets

1993-02-07 04:58:13
henry(_at_)zoo(_dot_)toronto(_dot_)edu writes:

Ahh.. but that's the *current* interpretation.  I've used a lot
of keyboards that were convinced that a 41 octal was a vertical
bar...

I have used ANSIs official registration of ASCII, in the ECMA
registry according to ISO 2375, registration number 6, dated 1975/12/01.
It says on page 3.10:

2/1 ! Exclamation mark
5/14 ^ Upward arrow head  circumflex accent

No "vertical bar", no "not sign" there. 

Keld, you've missed the most crucial point.  From the very start, those
characters have had those names... but ANSI X3.4 has also said from
the very beginning that those names are not intended to constrain how
the characters *look*.

I think then ASCII is in contradiction to ISO standards, which all have
a certain binding between the character name and the range of glyphs
which can be dispayed for that character. You cannot display a glyph
for <broken bar> when the character is <circumflex>. Some exceptions
exist, like the Latin, Greek and Cyrillic <A>. I have no
references for this requirement, so it may be an implicit rule, coming
from the definition of "character": if you are allowed to display
any other characters' glyph for a given character, the concept of
character has no meaning.

Any definition of "character set" that is to include ASCII (ANSI X3.4)
can't be too nitpicking and pedantic.  In particular, I very much doubt
that one can devise a wording that includes ASCII but excludes Unicode,
as some seem to want.  Also in particular, I suspect attempts to write
a fairly narrow definition are going to end up being futile, and the
time is best spent on more substantive issues.

Well, that puts a new light on what I have been practising in 10 years
- it will mean that I can use my national letters, and still call it
ASCII! I am just using curly braces which look in a nice way to me.
It would also mean that the Greek can use the term ASCII and have
the <a> <b>  etc displayed as their Greek variant, the same goes
for Cyrillic, Arabic etc. ASCII with this loose definition of relation
from the character name to the glyph, seems to me to be a non-definition.
You cannot then use ASCII as a base for anything!

If IETF then should use ASCII in our specifications, we should
be more precise on the concept of "character" - and also
have a precise definition of ASCII for IETF use.
Maybe "NETASCII" was not such a bad name after all.

For a precise definition of "ASCII" or "USASCII" we could use
the definition in RFC1345.

The "character" concept could have some descriptions in it like:

"A character has a set of glyphs which is said to denote
the character. A glyph which is not in the set cannot denote
the character."

Some troubles in the above:

I do not know an official mapping between characters and glyphs.
The glyphs for a character may not be a finite set.
The "denote" is meant to indicate that you can say: "that glyph
is a <species> character". Characters can be presented by
other means, eg. by a fallback representation. RFC1345 has some
and you may also consider the <broken bar> mentioned as
a replacement for <exclamation mark> as a fallback representation.
So maybe you need a definition of fallback too:

"A fallback representation of a character is a representation,
which may be ambigeous or unambigeous, using glyphs of characters
in the avaliable set of glyphs, usually different from the
glyphs of the character itself."

Keld