ietf
[Top] [All Lists]

Re: RFC 1345 mnemonics table not consistent with Unicode 3.2.0

2007-08-30 23:13:23
John C Klensin wrote:
--On Friday, 31 August, 2007 01:00 +0200 Harald Alvestrand
<harald(_at_)alvestrand(_dot_)no> wrote:
Harald, Ben has pointed out one important use for something like
1345, which involves references to characters in programming
languages and command interfaces.  The Unicode names are bad
news for that, I certainly don't want
        characterNamed(SLOBBOVIAN LOWER CASE COMBINATION
        LEFT-HANDED SPANNER)

in those contexts, and that is what Unicode would give me.  Our
current solution to that problem seems to be U+[N[N]]NNNN, which
is pretty unattractive (except when compared to all of the other
alternatives).  On the other hand, one could argue that 1345
inadvertently proves that no shorter set of mnemonics is going
to work across all of Unicode without becoming pretty arbitrary
and discriminatory against scripts not familiar to the creator
as well as difficult to extend.
Two different threads here: one about the idea of mnemonics, the other about this specific document's implementation of it...

Actually I used 1345 mnemonics in a fairly hefty piece of work back in 1995 (draft-alvestrand-lang-char, I think the latest published version was -03). Ten years later, I'm unable to figure out what characters I was trying to point to in some cases; somehow, characters snuck in where "it's obvious that the mnenmonic for X has to be *X", but 1345 doesn't provide a definition for "*X". For cases where the correct mnemonic was "+X" and the draft specifies "*X", it's impossible to tell by anything short of character-by-character lookup that I goofed.

Based on that experience in working with 1345, I claim that the idea of a larger set of "mnemonics" than what one can memorize in an hour or two for handling data in a wider character set than the one you're writing in is a Bad Idea. Tried it, didn't work.

In programming language constructs intended to be read and maintained by people who aren't familiar with the script they're maintaining and aren't willing to bother looking up the code every time they use it, "characterNamed(SLOBBOVIAN LOWER CASE COMBINATION LEFT-HANDED SPANNER)" is exactly the right construct, in my opinion; if people can read the script, an UTF-8 environment is a far cleaner solution than any possible mnemonic set.

The second part of my criticism involves the tables in 1345 that claim to show existing character sets and what characters they contain. These tables are defined inconsistently with their base specifications (ISO 646 IRV-NO is a 94-character ISO 2022-based charcter set, but presented in 1345 as if it was an 128-character one, without explaining what control character set it is matched with to create that set, for instance), and, as Ned says, contain errors.

Both are good reasons to ignore 1345 as it currently stands, in my opinion.

If anyone wants to resolve the second by creating a revision, feel free. But I don't see how it can help with the first one.

                          Harald


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf