restrictions when defining charsets

As far as I remember the discussions at Santa Fe, the group wanted to
have its own definition of character set.
It was never spelled out in the minutes or anything like that, but
went something like:

A set of rules for the interpretation of an octet stream, such that:
- The interpretation of each byte cannot be questioned
- The number of representable characters is limited
- No further parameters need to be parsed to get the complete
  identity of the character set



"Never spelled out in the minutes or anything like that"?  Doesn't
seem like a good state of affairs to me.  If the above is indeed the
intention, it should be spelled out in MIME itself, perhaps in the
part that shows you how to register a new charset.

I should note that ALL THREE of the above rules came as a surprise to
me today (I didn't attend Santa Fe).  The last two seem reasonable,
and I'm willing to agree to them, but the first one is problematic
(and an example of a charset where it is problematic, is iso-2022-jp,
since you can't tell whether a particular byte is the 1st or 2nd byte
of a Japanese character, unless you backtrack to the beginning of the
byte stream, or you keep track from the beginning in the first place).
(Unless I've misunderstood the 1st rule.)

What, exactly, does the first rule mean?  Yes, let's open this can of
worms too.  Why are there so many cans of worms on ietf-822 these
days?  (Coz it's Draft Standard time?  I guess so.)


Thanks in advance for any reply,
Erik

Previous by Date:	Re: printable wide character (was "multibyte") encodings, Erik M. van der Poel
Next by Date:	Re: MIME to Draft Standard, Erik M. van der Poel
Previous by Thread:	Content-IDs, Jay C. Weber
Next by Thread:	Re: restrictions when defining charsets, Keith Moore
Indexes:	[Date] [Thread] [Top] [All Lists]