Re: new RFC-CHAR draft

Erik M. van der Poel writes:

Once I have my code updated we'll see -- computers
are relentlessly literal and can find problems I'd never see myself.

Ah, so then the source code of your program becomes The Standard
around which everything else does (or does not) pivot. Come on, Ned,
you know better 'n that.


Huh? I have no intention of making my source code a standard of anything. In
any case, it is not "my" code: (1) I am not the only one working on it, (2) It
is the property of Innosoft and I get paid to write it, and (3) It will
probably end up in a commercial product eventually and will probably never be
made generally available. Finally, this code is in Pascal, not C, so I very
much doubt if very many other people would be interested in it.

The last thing the Internet needs is another pile of code like the BIND
nameserver, thank you very much.

My intent here was something entirely different. In case you haven't bothered
to look, RFC-CHAR contains a substantial number of tables of information. These
tables have to interwork in a fairly complex way in order for RFC-CHAR to be an
effective document. Examples:

(0) All commands used in the charset tables must be within the set of
    predefined commands.

(1) All descriptions in the table of available mnemonics must be unique.

(2) All mnemonics used in the charset tables must be defined in the mnemonic
    tables.

(3) All charset names should be unique.

(4) It should be noted if a mnemonic appears in two places in the same
    character set. This is not a problem, exactly, but if it does happen
    it must be noted so that appropriate actions can be taken to deal with
    it.

(5) The tables of combining mnemonics that Keld has added to the latest
    draft must apply to mnemonics that are actually in the associated
    character set.

Checking these things by hand is simply impossible. It has to be done by a
program. I have elected to use RFC-CHAR as a source of the tables to drive my
translation routines. In the process of doing this I have also implemented many
checks for mistakes in the tables.

There is also a set of checks that has to be done manually. No other method
applies to the problem of making sure that Keld's tables actually reflect the
character sets they actually reference. Errors can creep in. (Keld has to type
this stuff in from printed pages. I shudder to think of the time he has spent
on this. In any case, the fact that he has to do that should not be held
against him. Instead it should be held against the people that produce
standards only in printed formats.)

Currently my code checks the tables for (1), 3), and (4). And my code has
flushed a number of mistakes in RFC-CHAR that I never would have caught.
I happened to notice a violation of (2) by eye -- pure luck and nothing
more. I fully intend to check all of these things, and many more besides. I
simply implemented the minimal set of checks that I need to get the conversions
I actually need to have up and running.

Now, there is a problem here, in that I have attempted to test mechanically
only the potential problem areas that concern me as an implementor. An
implementor using RFC-CHAR for something else might have an entirely different
set of concerns. And this is a problem -- a big one, in fact. This is one of
the reasons I'm concerned about the lack of public comment on RFC-CHAR.

                                        Ned