This is in response to the notes from Vaudreuil, Borenstein, Klensin and
Simonsen who were, in turn, responding to my original note. I've tried
to summarize their set of reactions, but generally without attributing
specifics.
1. Challenge to the timing of this note & claim of no earlier comments
from me:
This was not the first time I've raised this concern. I have been
raising this concern for some months, both publicly and privately. I
decided to raise it rather more forcefully, now, because we are at a
point-of-no-return and this is the one item that I see as having the
potential of seriously injuring the success of RFC XXXX. I believe that
difficulties with lack of understnding character set issues will render
uses of XXXX to be non-interoperable in cases that should interoperate.
While it probably won't make anyone feel any better, I really am not
happy about raising the issue and suspect that I know just the kind of
frustration this sort of note can engender. And I am hoping that we can
understand the basis for the concern and find workable resolution to it.
2. Only US ASCII is needed, or we should wait to standardize any other
character sets:
These were remarkable mis-interpretations of my note, since I said
exactly the opposite. I think that the character set efforts should
proceed aggressively and I think that the Internet should be rather
embarrassed that we have not attended to this topic sooner. However,
the complexity of the topic dictates that it be treated separately from
an email format standard, though I believe that it is essential that the
email standard contain the appropriate hook for accessing this other
work. (And I believe charset= is adequate to that end.) This does not
have to introduce any delay at all.
3. The topic IS messy and should be left to the experts:
Well, I agree completely. In fact, that is the major reason I think it
should be handled separately from an email format standard. I am trying
to get an email spec, namely XXXX, out of the business of discussing
character set detail, unless it wishes to provide discussion about
translating between character sets, which it currently does not do.
4. The topic is quite stable and well-established standards already
exist:
This item conflicts somewhat with the previous item, which is exactly
the conflict that I read in the set of notes from Vaudreuil, et al. They
seem to have some disparity among themselves, about the breadth and
depth of existing experience. However, there seems to be some common
thread, through their notes, which suggests that a subset of the
documents cited in RFC XXXX really are well-established, have
significant field field experience, and are well understood. Assuming
this is true, that is great. It may even make them appropriate to cite
within RFC XXXX, though I claim it isn't necessary. Registering these
character sets with IANA is all that is truly required.
From the collection of responses, it does appear, however, that XXXX has
some citations specified incompletely and/or is citing some
specifications which are quite unstable. At the very minimum, I believe
that all such citations should be removed, since they only serve to pass
their instability on to XXXX.
From the collection of notes, it sounds as if:
8859 has exactly one very-well established part (part 1); does this
overlap with ASCII? If so, how are users of each to interoperate?
Klensin indicates that translation behavior is well-established, but I
see no indication of any such documentation in XXXX, to give guidance to
implementors. How are implementors to know what to do with mail that is
in a different character set than they display (but which they could
translate from, if only they knew how?) It also sounds as if the
citation for 8859 may need tightening. Some notes thought that I was
claiming that 8859 had an uncertain status or that I was otherwise
misrepresenting 8859; I was merely noting the lack of that information
in XXXX.
2022jp is claimed to have a solid user base. That is fine, but the
documentation of 2022jp details, within XXXX, I believe is entirely
inappropriate. Worse, it sounds as if getting those details correct is
difficult. However, I suspect that having a Japanese-only version
of the spec is workable, assuming that appropriately knowledgeable
persons can speak to the IETF/IESG/IAB and convince us of the
specification's stability and experience. However, I'm unclear why we
would want to specify a regional convention, within XXXX, rather than
merely citing it, via IANA. We don't have that kind of detail about
ulaw encoding of audio in the spec.
10646 apparently is every bit as unstable as I had thought.
MNEMONIC is acknowledged to be new and, therefore, untested, as is RFC-
CHAR. In the responses to me, there was some tone that I was critical
of them. I am not. Actually, I think they are quite good efforts,
within the limits of my ability to judge this topic. Rather, my point
is that they are working within an area that clearly is taking a long
time to settle down and, therefore, I think that XXXX should detach
itself from the details of that entire realm, except for the charset=
hook, and a pointer to IANA registration of specs.
Some of the references to multiple, interoperable implementations
surprised me, since I don't recall having seen email about email-based
use of these character sets. I would appreciate hearing more (or at
least receiving copies of the previous group discussion about it; sorry
I missed it.)
5. International standards already exist, so the Internet should just
adopt them:
Sorry, but no. The Internet is quite selective in its use of
specifications, including those from outside the Internet. The fact
that a spec is a standard from another international body IS quite
important, but does not guarantee use within the Internet. Worse, it is
quite clear that the character set topic is in considerable flux, so it
is not simply the case that there are multiple specs because there are
multiple real-world character sets, but it appears that there are
multiple specs which cover the same territory. That is, specs which
overlap. This is an invitation to interoperability problems and we
should not ignore the potential.
6. RFC XXXX must cite some or all of the character set specifications,
or else there will be no support for multiple character sets:
I believe that this is a technically incorrect assessment of the results
of following my suggestion. RFC XXXX has many places in which it allows
extension to various lists, via IANA. Audio, for example, cites only
one spec, but leaves the door open for more. I also should note that
the debate over the citation for ulaw ended up making things absolutely
as simple as possible, even removing the ability to specify options. I
am merely suggesting that we keep the same philosophy for charset.
In any event, having XXXX point to IANA, for the list of authorized
character sets, is entirely sufficient. I do not understand the
assertion that the failure to include the details in XXXX somehow
cripples XXXX. It doesn't.
7. Use of "X-" labelled charsets is inappropriate:
While I don't agree with the severity of this response, I think I erred
in referencing only X- labels. Klensin thinks that X- means
experimental; I believe it merely means "private" and that is the error:
I believe that IANA can register names for specs which are published but
not yet standardized. Hence, the X- needn't be used; the details for
the charset can be available; and it only is the standards status of
each charset spec that would remain at issue.
The RFC Editor probably can clarify this procedural point.
8. ... Wrapping up...
The first Internet network management MIB specified only a very small,
very simple set of variables. Many, many more have been specified since
then. But the concern, initially, was to require only a minimum set, so
that the focus could be on building the network management
infrastructure, rather than on the details of all the network management
information. For example, it took about two additional years to get a
reasonably complete set of MIB extensions for the common media.
I am suggesting that we take a similarly conservative approach for XXXX,
since I see it as serving a similar role of establishing an
infrastructure. This can result in a free market for various
extensions, including character sets, if it does not lock down an issue
too quickly and if we can get the infrastructure stable. The
standardized use of varied character sets, in the Internet, appears to
be a complex issues and needs an opportunity to gain experience. XXXX
provides the platform for gaining that experience, but it gets bogged
down when it tries to state the details of specific character sets. It
shouldn't try. It doesn't need to.
Dave