ietf-822
[Top] [All Lists]

Re: restrictions when defining charsets

1993-02-07 09:34:52
Keld, Henry, and others...

I think that some of us are splitting hairs here and trying to assume
that the hairs have tree-like thickness.  They don't.  Let me try to 
review the real situation, and then let's try to go on to something else.

(1) There are few ISO standards in the IT area that are really completely
precise about everything they appear to cover; they tend to be somewhat
informal descriptions that rely on a certain amount of general
understanding, good sense, and good will to operate.  The same statement
could be made--at least as strongly--about IETF standards: if anything,
I'd suggest that ISO applications standards are a little more precise on
average than their IETF counterparts.

(2) The (unwritten, before anyone goes looking for them) ANSI (and ASC
X3) rules for defining a character set standard under which X3.4 was
produced are inevitably different from the registration rules of ISO
2375 and are yet again different from the rules that define what can be
considered as an ISO 8859-n character set standard.  These things may
very well map the same "characters" to the same code positions and
would, indeed, break down if they didn't.  But that is as far as it
goes.  ISO 2375, for example, creates a registration vocabulary which
may be inconsistent with the "what things mean" rules of a national
character set standard.

(3) The basis for the "stylization" rules is to avoid getting the
coded character set standard mired in arguments about fonts and
typographic styles (see B2.4, B2.5, and B7 of X3.4-1986).  It is very
hard to write a formal definition that avoids these things--I spent a
lot of time at an institution whose name is spelled out in granite as
MASSACHVSETTS INSTITVTE...

(4) On the other hand, X3.4-1986 (R1992) -- the first version of ASCII
to be explicitly aligned with ISO 646 -- contains some explicit
conformance language that I don't believe is present in ISO 646.  This
language is also different from what appeared in the 1977 and 1968
versions.  That material defines a "Conforming Receiving Imaging Device"
in section 2.1.2 and goes on to say, in 2.1.2(3):
  (3) Shall image all 94 graphic characters such that each character is
   recognizable as being associated with one of its names and such that
   each character is distinguishable from the other graphic characters.  No
   other graphic characters may be substituted for any of the graphic
   characters in the set.
Now that is pretty clear, at least in intent.  It doesn't prevent me
from printing in Old English, but it is intended to prevent my creating
nonsense and claiming authorization from the Standard.
   I could probably stylize "Exclamation point" (2/1) into "broken
vertical bar", but I'd be on thin ice.  I cannot stylize it into
"vertical line" because that name is explicitly assigned to 7/12.
   There is also some explicit non-standard text (appendix B, sections
B1 and B2) about the occasional necessity of making "character
substitutions".  "secular sets" (term used in the Standard) are
explicitly non-conforming but may be "consonant with it" if guidelines
in B2 are followed.  B2 also expands on the meaning of 2.1.2, but of
course really doesn't change anything.

(5) The other major thing that happened in the trip to "alignment with
ISO 646" is that a number of the names of characters were recast into
the same character names used in 646.  There used to be a few more
differences, some of which caused confusion when the change was made,
even in the US.

(6) The original rules in ANSI X3.4-1968 created three "dualities" which
have been part of the recent confusion.  These permitted OR in 2/1, NOT
in 5/14, and "pound sterling symbol" in 2/3.   All three were eliminated
in ASCII with ANSI X3.4-1977 (which was, folks, a long time ago).  At
the same time, 7/12 (our friend the "real" vertical bar--an ISO646
national use position) had its printed representation changed to
eliminate the break in the middle that appeared in X3.4-1968.  RFC821
and 822 reference X3.4-1968 (but shouldn't have); MIME doesn't, and I
hope no one is going to argue for a change toward confusion now (if
nothing else, ANSI doesn't have copies of the 1968 document to sell, so
they are not available except from confirmed packrats).

(7) Some dualities (or worse) are left.  For example, 2/7 in X3.4-1986
is denoted "Apostrophe, right single quotation mark, acute accent".  In
8859-1, there are a couple of different code points mapped to these
names.  On the other hand, the ISO 646-1983 IRV permits "overline" as
well as "tilde" for 7/14; ASCII (X3.4-1986) permits only "tilde".

-----------------------
Bottom line:  ASCII permits some fairly extreme stylizations when going
from character names to glyphs.  It doesn't permit deliberate creation
of confusion.  It maps code points onto descriptive names; it doesn't
map those names onto specific glyph images.  The number of code points
associated with multiple names (and, hence, potentially radically
different glyph-representations) is on the decline, but is still not
zero.   Unless IETF is willing to specify a code point -> glyph mapping,
not a code point -> name one, this is about as far as we are going to
get with ASCII (although we could specify exact names where dualities
still exist--note that registration 6, as quoted by Keld, still includes
a duality at 5/14, since, in any clever font, I'd print "upward arrow
head" and "circumflex accent" differently.
  And the problems with 10646 in this context is that it sometimes
specifies a reversible code point to name mapping (most alphabetic
characters), sometimes specifies a reversible code point to near-glyph
mapping (most ideographic characters, on a "looks alike, same origins,
must be the same character), and sometimes specifies a non-reversible
(or non-uniquely-reversible) multiple code point to glyph or character
name mapping).
   --john

<Prev in Thread] Current Thread [Next in Thread>