ietf
[Top] [All Lists]

Re: RFC 1345 mnemonics table not consistent with Unicode 3.2.0

2007-08-25 16:42:43
Ben Finney <ben plus ietf at benfinney dot id dot au> wrote:

The issue remains that the informational RFC presents useful mnemonics for many characters, and there doesn't appear to be such a thing from Unicode or ISO. That's the point of an update to RFC 1345: it serves a purpose that I can't see served comparably well elsewhere.

You might not find much enthusiasm in the character-encoding community for the mnemonics published in RFC 1345, and later as the so-called "repertoiremap" in ISO/IEC TR 14652. These have been widely criticized for their incompleteness, (real or perceived) arbitrariness, and lack of extensibility to scripts not already covered.

Most people will agree that "a plus apostrophe" makes a handy mnemonic for "a with acute," and "c plus comma" works well for "c with cedilla," but the system tends to break down rather quickly after that, with Greek letters identified by an asterisk, Cyrillic by an equal sign, Hebrew by a capital letter and plus sign, Arabic by a small letter and plus sign, etc. There are numerous exceptions to these guidelines, especially when the letters in question don't map cleanly to Basic Latin, and a large number of non-ideographic characters have no mnemonic at all, even some that were defined in ISO 10646 at the time RFC 1345 was published.

That is why you are unlikely to find an update to RFC 1345 that brings the mnemonics up to date with 10646/Unicode: the task is almost impossible, given the limitations of the system.

The motivation for inventing these mnemonics seems to have been to specify characters "in a coded character set independent way," which was perhaps a sensible goal in 1992 when the Universal Character Set was quite a bit less universal. Today, however, virtually all non-10646 character sets are mapped to 10646 code points, not to alphabetic mnemonics. Almost any charatcer that can be found in a national or industry charset can be found in 10646. The need for a notation independent of 10646 has passed.

Most modern operating systems allow the user to change the keyboard layout (or define one's own) to gain access to frequently used characters, and many applications and OS's define a special keystroke (such as Ctrl+Q) that allows entry of any arbitrary character by Unicode/10646 code point. You might consider one or both of these approaches as an alternative to using RFC 1345 mnemonics for data entry. Or, you can go ahead and use the mnemonics as they are, but resign yourself to the fact that they will probably never be updated.

Speaking only for myself, as always.

--
Doug Ewell · Fullerton, California, USA · RFC 4645 · UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf