ietf-822
[Top] [All Lists]

Re: comments on latest MIME drafts

1995-05-24 15:22:30
Ned Freed writes:

   The ISO definition of the term "coded character set" is as follows:
   "A set of unambiguous rules that establishes a character set and the
   one-to-one relationship between the characters of the set and their
   coded representation." and this definition may be subject to
   different interpretations.

Not only does this RFC1345 definition disagree with the MIME definition of
character set, it also disagrees with the definition of "coded character set"
in "Character Sets Considered Harmful" 
(draft-ietf-html-charset-harmful-00.txt,
or simply CCH for the rest of this message).

What was stated above was a definition taken from ISO standards,
it was not the definition that RFC1345 made of the term "charset",
which was defined further on in that paragraph. 

CCH's definition of "coded character set" is as follows:

coded character set
     A function whose domain is a subset of the integers, and whose
     range is a set of characters.
 
It should be obvious that this is a completely different beast from Keld's, 
but
in case it isn't, the key difference is Keld's definition calls for a 1:1
mapping from characters to a coded representation.

First of all: the definition paragraph in RFC1345 is not mine, I think it was 
made by Olle Jaernefors. Secondly, it further modifies the original ISO
def to not require a 1:1 mapping. This was done in one of the sentences
that Keith removed in citing me:

   "A coded character set is a set of rules that unambiguously and
   completely determines which sequence of characters, if any, is
   represented by each possible sequence of n-bit bytes for a certain
   value of n." This implies that e.g. a coded character set extended
   with one or more other coded character sets by means of the extension
   techniques of ISO 2022 constitutes a coded character set in its own
   right.  In this memo the term "charset" is used to refer to the above
   interpretation of the ISO term "coded character set".

This doesn't allow for
characters with more than one coded representation. The CCH definition 
operates
in the other direction, saying that each integer in the set must map into a
character.

The refined def of "charset" allows more than one coded representation.
I think CCH def is essentially saying the same as the above def from
RFC1345.

CCH is an attempt to arrive at consistent terminology for future IETF use. 
This
document's terminology is derived from a variety of sources, including ISO
specifications. (I don't know specifically where Keld's definition comes from.

I think it came from ISO 8859-1:1987, but I have not got it here to check.
I looked at some other ISO standards, like 10367 and 8859-10 and they
had similar, but a little different wording.

It is quite possible that it originates in the ISO as well, since ISO
terminology is known to be inconsistent.)

What I have seen on this term in ISO is not inconsistent, but 
refinements on the same term. I think that IETF terminology shares
the same refinement process on term wording.

According to CCH, MIME's "character sets" should have been called "character
encoding schemes" or, more simply, "character encodings". We didn't do this 
and
it is now too late to change in MIME. There is already a note about this
specific discrepancy in the MIME definition of a character set.

I agree that ISO terminology is not precise, and that "character encoding"
is a quite good word for what we define as a "charset".

My point was that we could keep compatible with ISO terminology, and
not pick a term which in ISO has a distinctly different meaning.

Keld

<Prev in Thread] Current Thread [Next in Thread>