ietf-822
[Top] [All Lists]

Re: RFC 2047 and gatewaying

2003-01-14 09:53:56

Charles Lindsey wrote:
In <3E22ADC2(_dot_)5050702(_at_)Sonietta(_dot_)blilly(_dot_)com> Bruce Lilly 
<blilly(_at_)erols(_dot_)com> writes:


Charles Lindsey wrote:

Not so. Untagged data in "any other" charset would be non-compliant with
the standard,


Clearly, that is being handled as a religious issue; 'thou shalt have no
gods other than utf-8', and would delare those other charsets 'heretical'.
If you think carefully and objectively about why untagged 8-bit content
in charsets other than utf-8 is bad, you will realize why untagged utf-8
is also bad.  But fundamentalist religious zealots never think objectively
about their beliefs...


Currently, the email standards require that 'thou shalt have no gods other
than US-ASCII' (and that position does seem to be being defended in the
manner of a religious issue).

That is an inaccurate characterization of (a) the standards, and (b) the
positions being taken.  The standards specify *a subset* of us-ascii due
to a number of historical reasons, including the basis of initial electronic
mail (prior to SMTP) on FTP and TELNET protocols, network limitations, and
some charset conversion issues.  MIME extended that to provide for
representation of the remainder of us-ascii as well as other charsets by
the means which is currently defined in RFC 2047 as amended by RFC 2231
and errata, and which has evolved to include language-tagging capability
as well.  With the exception of providing for us-ascii as a *default*, all
charsets are treated equally. If a default were not provided, one would
need to tag everything. US-ASCII as a default is both consistent with
historical practice (arising out of the origins of the Internet as a US
Department of Defense Advanced Research Projects Agency effort) and as a
common subset of many other charsets (including most of the ISO-8859
series).  In some cases, even with the default, it is necessary to use
tagging, e.g. for =?us-ascii*en?q?boot?=. There is a great deal of
software, widely implemented and interoperating, based on the standards.
The position regarding the standards can be summed up as:
1. compatibility with those earlier standards is critical to maintaining
   interoperability with good-faith existing implementations compliant
   with those standards -- injecting illegal data (w.r.t. existing
   standards) is not conforming.
2. compatibility and interoperability can only be assured by proper
   formulation of a standard. They can not be achieved by a "it sorta
   kinda works (maybe) some of the time" approach.
3. compatibility with existing standards means *true* compatibility,
   not "I'll ignore the part about header content in the MIME-part
   headers of message and multipart composite media types", for example.
4. compatibility and interoperability considerations require looking at
   the "big picture", not just an isolated niche. In this case, that
   means considering gateways to email, which may subsequently hit list
   expanders, travel over diverse networks, etc., and IMAP, which has
   specific features for news as well as email.
5. outright violation of existing draft standards is simply unacceptable
   (whether bald-faced or via a "back-door" approach).
6. a single scheme (e.g. use 2047/2231) is preferable to multiple choice
   in standards
7. one cannot avoid encoding; utf-8 is yet another encoding
8. language tagging is important in many contexts (RFC 2277, sect. 4)
9. scalable interoperability is more important than compatibility with
   existing practices which are non-conformant with existing standards
10. RFC 2047 encoding is intended for items used for human (not machine)
   interpretation
11. existence of a single implementation that handles some input X is
   uninteresting -- what is important (w.r.t. compatibility and
   interoperability) is what happens to implementations which conform
   to existing standards when presented with X
12. "elegance" and "aesthetics" are subordinate to interoperabilty

Exclusive worship of the untagged utf-8 god requires sacrifing
compatibility, interoperability, existing conforming implementations,
scalability, internationalization (language-tagging), and simplicity
of software implementation.  Conversely, observing the principles
expounded above ensures those attributes, but prevents neither free
worship of any of the charset gods nor charset agnosticism.

> Expanding that, within Netnews, to 'thou
shalt have no gods other than utf-8' is clearly a viable option,

Bad premise -> bad conclusion. GIGO.

> whereas
saying 'thou mayst have any untagged gods thou likest' is not viable
> because there is no way to tell which god you are invoking, which is going
> to cause some considerable confusion on the top of Mount Olympus.

On that we can agree.  But I see no problem with permitting any *tagged*
charset, while there are problems with *any* untagged charset (unless
that happens to be a common denominator which is used in  the current
standards (and therefore has no backwards-compatibility issues), such
as the subset of us-ascii called netascii).

Indeed, some GB18030 heretics may try to get away with it, but at least it
is easy to detect those heresies when they occur.

I have no objection to properly-tagged use of GB18030 -- it's in the
official IANA charset list.  Ditto for properly-tagged utf-8.


<Prev in Thread] Current Thread [Next in Thread>