ietf
[Top] [All Lists]

Re: RFC 1345 mnemonics table not consistent with Unicode 3.2.0

2007-08-25 19:12:54
[People, please don't send me copies of list messages by mail. I'm
subscribed to the list and read it via a non-mail interface.]

"Doug Ewell" <dewell(_at_)roadrunner(_dot_)com> writes:

Ben Finney <ben plus ietf at benfinney dot id dot au> wrote:

The issue remains that the informational RFC presents useful
mnemonics for many characters, and there doesn't appear to be such
a thing from Unicode or ISO. That's the point of an update to RFC
1345: it serves a purpose that I can't see served comparably well
elsewhere.

You might not find much enthusiasm in the character-encoding community
for the mnemonics published in RFC 1345, and later as the so-called
"repertoiremap" in ISO/IEC TR 14652.  These have been widely
criticized for their incompleteness, (real or perceived)
arbitrariness, and lack of extensibility to scripts not already
covered.

Thanks for this. I agree that, for *encoding* and *naming*, the
mnemonics aren't much use anymore; we have superior encodings and
Unicode names, so the properties you (correctly) ascribe to the
mnemonics in RFC 1345 are not much use for those purposes.

The "repertoiremap" of ISO/IEC TR 14652 is apparently meant to be for
character transmission and translation only. It seems more extensible
for that purpose than the mnemonic approach in RFC 1345.

There is one specific application of the RFC 1345 mnemonics for which
I've not seen a superior reference: direct character *input* at a
keyboard using an input method program. There are numerous programs
(e.g. Emacs, SCIM) that support the RFC 1345 character mnemonic table
as an input method for typing key sequences to input the corresponding
characters.

Most people will agree that "a plus apostrophe" makes a handy
mnemonic for "a with acute," and "c plus comma" works well for "c
with cedilla," but the system tends to break down rather quickly
after that, with Greek letters identified by an asterisk, Cyrillic
by an equal sign, Hebrew by a capital letter and plus sign, Arabic
by a small letter and plus sign, etc.

So long as the table follows some kind of system (and the definition
of the RFC 1345 character mnemonic table does at least explain the
scheme it uses for those character sets), it is still useful as a
means of remembering short, discrete mnemonics for a large set of
characters.

There are numerous exceptions to these guidelines, especially when
the letters in question don't map cleanly to Basic Latin, and a
large number of non-ideographic characters have no mnemonic at all,
even some that were defined in ISO 10646 at the time RFC 1345 was
published.

Yes, the system does have its limits; a mnemonic table cannot
reasonably expect to map mnemonic pure-ASCII keyboard characters to
*every* set of characters in ISO 10646. But with those limits
acknowledged, the mnemonic system can be useful for those character
sets where there *is* a reasonable expectation of such a mapping.

That is why you are unlikely to find an update to RFC 1345 that
brings the mnemonics up to date with 10646/Unicode: the task is
almost impossible, given the limitations of the system.

Indeed. My initial comment was merely that even the characters that
*are* covered by the mnemonic table are not in accord with the current
Unicode data. To the extent that the character mnemonic table is
useful, it is surely undermined if the data are wrong.

The motivation for inventing these mnemonics seems to have been to
specify characters "in a coded character set independent way," which
was perhaps a sensible goal in 1992 when the Universal Character Set
was quite a bit less universal.

I'm beginning to understand the gap of understanding here; I've been
approaching this discussion caring *only* about the character mnemonic
table in RFC 1345, whereas others have (reasonably) approached the
discussion in the context of the entire RFC document and its apparent
purpose.

Today, however, virtually all non-10646 character sets are mapped to
10646 code points, not to alphabetic mnemonics.

This is true for the purpose of *encoding*, but for the purpose of
*input* at a non-remapped largely-ASCII keyboard, input method
programs certainly do map ASCII mnemonic sequences to non-ASCII
characters.

Almost any charatcer that can be found in a national or industry
charset can be found in 10646.  The need for a notation independent
of 10646 has passed.

I think it's clear that the domain of keyboard character input clearly
needs brief mnemonic ASCII sequences, not numeric ordinals or
descriptive character names, to map to the desired characters.


Thanks very much for the discussion, it's becoming clearer now. Two
further questions:

I'd like to discuss this with the people who made the original RFC
1345 character mnemonic table. How would I get in touch with the
authors of RFC 1345?

It wasn't my intention to write a new discussion draft, but it seems
that since my purpose is significantly different to the broad purpose
of RFC 1345 that a new draft aimed at the purpose I have in mind may
be warranted. What should I read (URLs please) before doing so?

-- 
 \         "If we don't believe in freedom of expression for people we |
  `\         despise, we don't believe in it at all."  -- Noam Chomsky |
_o__)                                                                  |
Ben Finney


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>