Nathaniel writes...
Yeah, well, maybe. But I think that "X-ISO-8859-1" is kind of offensive
-- there's hardly anything experimental about it, which is what "X-"
usually implies. How about the following compromise proposal:
...
While I am sympathetic to Dave's concerns -- character set issues are
*very* complex and won't be completely and elegantly "solved" any time
soon, since there are not only multiple Standards but multiple
*models*-- it seems to me that we are in danger of "throwing out the
baby with the bathwater" and taking a serious step backwards.
ISO-8859-n, especially for values of "n" equal to 1, isn't experimental.
It is/they are widely used and, indeed, are implemented in the hardware
of several vendors. For 8859-1 in particular, we even know what to do
with it at gateways into networks whose native character set is not
ASCII-based. Call it ISO-8859-1 (or some agreed-upon lexical variant on
that form) and I can implement gateway code to do rational things with
it. Call it X-ISO-8859-X, or X-foobar, and, if I am cautious, I can't
do a thing with it unless I keep tables of known senders whose
definition of that X-token is the same as mine.
For a gateway between major networks that has to perform character set
translations, X-tokens are pretty close to useless: one can either
reject the mail as untranslatable or can send it through with a "this
may be trash" warning of some sort.
Now the Japanese use of ISO 2022 is a little different from this, since
it is not established in an ISO Standard and we have had some difficulty
obtaining an official definition. But it is in very common use in part
of the world, the people who use it know what it means, it is possibly
to build gateways to convert to locally-preferred forms on private
networks if one knows that is coming in, and, again, there is absolutely
nothing experimental about it.
10646 is different, significantly different, for two reasons. First, it
is not a Standard but a proposal that has spent its long life (and
through several versions) mired in controversy. It is impossible to
know what will ultimately appear, and when. And, second, it raises
issues of how to handle and use 16 and/or 32 bit characters and there is
little or no production-level experience with doing that in
heterogeneous data communication environments. So there are strong
arguments for saying "let 10646 stabilize and get itself approved in
some form, then write rules for using it that are consistent with the
final, Standard, definition". Experimenting with X-10646 would
certainly be consistent with that approach.
While I personally like it a lot, and don't see the internal
contradictions in it that I see in the current 10646 draft, RFC-CHAR is
in somewhat similar status. Still evolving a bit, not nearly the kind
of production-use experience that exists with, e.g., ISO 8859-1 or the
Japanese use of 2022. So I can see arguments for saying "let's defer
locking ourselves into that for the moment", even though I hope we can
avoid that decision.
For whatever it is worth, please, folks, remember where this WG started
a year ago. The major mandate--reinterpreted in the spring to separate
out dealing with the transport issues--was to make international
character sets, especially western European character sets--"work" in a
well-defined, canonical way. Let's not let go of that: it is
at least as important to some major communities as sound and pictures.
And, if I'm left in a situation in which I can send international
characters in a canonical way by converting pages to images and then
sending them as image types, but I can't send them canonically as
characters, we may find ourselves wiping out one of the major advantages
of email over fax machines, the manipulability of the transmitted text.
RFC-XXXX defines a place to put the "charset" value, as it does now. It
defines "US-ASCII' as the string to use for expressing, well, US-ASCII.
It says other values may be used among consenting mail systems, and
SUGGESTS that the names given to the character sets should be taken from
RFC-CHAR. End of story. That is functionally equivalent to the current
draft, I believe. Would it be satisfactory to all parties?
No, I don't think so.
--john