ietf-822
[Top] [All Lists]

Re: Transformation of Non-ASCII headers

2003-02-03 14:29:23

Charles Lindsey wrote:
In <3E32A979(_dot_)5080904(_at_)erols(_dot_)com> Bruce Lilly 
<blilly(_at_)erols(_dot_)com> writes:


Charles Lindsey wrote:


In <3E2EB579(_dot_)5080200(_at_)erols(_dot_)com> Bruce Lilly 
<blilly(_at_)erols(_dot_)com> writes:



1. The draft is incompatible with the MIME standards ...

The existing MIME standards relate only to Email.


Untrue; MIME has many uses, including HTTP and voice messaging.


The only reason that MIME applies to HTTP is that RFC 2616 explicitly
declares it to be so (modulo some small changes). So the MIME standards do
not currently apply to Netnews (but our draft will fix that).

Incorrect; RFC 1036 specifically mentions that news articles use the
text message format defined in RFC 822, and RFC 2047 explicitly revises
the RFC 822 ABNF for phrase and for comment to permit encoded-words
(see RFC 2047 section 5 which clearly and unambiguously spells this out).
So in the specific case of encoded-words (which is the topic under
discussion), MIME has applied to news for at least a decade (since
RFC 1342).

Neither utf-8 nor any other charset provides language tagging.
Unicode 3.1 and 3.2 documents provide for language tagging specifically
for ACAP; news articls != ACAP.  No known news or mail user agents
generate or recognize the Unicode 3.1/3.2 language tagging.


What is ACAP? I see no mention of it in the Unicode documents. Anyway,
there seems to be some confusion in this area, so I am writing to Martin
Duerst for clarification.

This was all thrashed out many months ago. Look in the archive.
Or go to the Unicode consortium web site and search for "ACAP"
(which yields precisely one match).  Look up ACAP in the rfc-index
(FYI, it's related to MIBs and SNMP).

No sane, rational person who has given a moment's thought would propose
such nonsense. The fields in a message header may contain multiple languages.
Indeed, a single field may contain multipple  languages, e.g.:
Keywords: =?us-ascii*en?q?boat?=, =?iso-8859-1*de?boot?=


Ah! You mean I might write a message with
    Subject: I left my boots im boot
which would surely confuse any agent with a text-to-audio facility.

So how would you write that header (in current email)? And do you suppose
any user will ever go to the trouble of writing language tags in headers
even if we give him the facility to do so?

With the appropriate language tags, of course, that's how (it
depends on where the language changes, which is not entirely clear
from your example).  Users may do so where they recognize the
potential for confusion and where their user agent of choice
provides a convenient facility for doing so, or where the user
goes to the trouble to provide tags for the benefits of his
readers in the absence of UA support.  At least it is relatively
simple for a user to do the latter; he has only to use us-ascii
characters -- without consulting a table of Unicode tags and/or
utf-8 encoding tables, how would you present the raw
utf-8/Unicode 3.1 equivalent of the Keywords example?

3. There is no backwards compatibility ...

What you describe is not backwards compatibility, it is forwards compatibility (existing content vs. some postulated future software). That is sophistry. You
know what backwards compatibility really means; it has been explained to you
multiple times by many people (including those in the IESG who will have to
review and vote on the draft if it ever is submitted). Presenting software which
has ben designed in good faith in compliance with RFC 1036, MIME, etc. with
data which is illegal (8-bit content in message or MIME-part header fields) is
not backwards compatible.


You will not win any arguments by changing the meaning of the English
language to prove your point.

If there is a feature in some existing protocol which becomes illegal in
the new version, then that is a breech of "backwards compatibility".

Sorry, no -- you have that wrong.

If there is a feature in some new protocol that will not work with
implementations written for the old protocol, then that is a breech of
"forwards compatibility".

Nope, wrong again.

It seems you want to define these terms the other way around. Does anyone
else here agree with your view? The weight of evidence is surely against
you.

Don't take my word for it. Dave Crocker has clearly and unambiguously
explained to you what backwards compatibility does and does not
mean to the IETF, which is what counts.  And inflicting illegal
(per existing standard) content on existing implementations (e.g.
of gateways), as the Usefor draft would do, is most definitely
contrary to maintaining backwards compatibility.

A moment's thought should make it clear that "particular attention
to backwards compatibility" in the Usefor WG charter refers to the
same thing as the IETF usage -- one would hardly require paying
particular attention to the reverse, i.e. old articles being fed
into new implementations; after about 1 day such old articles
would be rejected as stale, and would hardly warrant "particular
attention".

Just to check, I grepped through my small collection of online RFCs, and
found usage of "backwards compatibility" in my sense in RFCs 1945, 2060 and
2156. I found only one (RFC 2646) that seemed to use it your way, so the
majority is against you. You are. however, welcome to produce further
examples.

[Apologies to everybody other than Charles for repeating the content
below rather than simply providing references to the online archives,
but in the past when presented with references Charles has whined about
having to fire up a browser, etc.]

Reread what Dave wrote in 
<38167744153(_dot_)20030107093025(_at_)brandenburg(_dot_)com>:
"
I have included segments from your posting that pertain to the question of
impact on installed base.  This issue is fundamental.

To add to Ned's correction, news-> email gatewaying has been around for at
least 20 years, as far as I can recall. Organizations have often wanted to
plug mailing lists into newsgroups. To permit the newsgroup readers to
participate in the mailing list, a news->email gateway is required. In other
environments, the 2-way gatewaying is simply part of a model that lets users
decide how they want to receive and process their group discussion messages.

With respect to the appearance of new capabilities (such as new data formats
and encodings) in the installed base, your use of the term "gobbledegook"
needs to be made more precise. If it means that that a legacy system
receives the data and the data are legal but meaningless -- MIME base64 is
example of this approach -- then it is fine. It permits incremental adoption
without breaking existing systems. The downside for existing systems is that
they do not get the benefit of the new feature, but everything else
continues to work fine.

If it means that the legacy system receives data that are illegal, with
respect to the existing service, then this is not at all fine.
Interoperability requires conforming to a standard, rather than conforming
to it whenever it is convenient. Injecting illegal data is not conforming.

The difference between an upgrade that protects the installed base, versus
an upgrade that requires creating a new installed base is absolutely
fundamental.  Each can be appropriate but explicitly choosing between them
is essential.

The importance of preserving the installed base -- by way of doing an
incremental upgrade rather than a parallel replacement -- is frequently
missed. However the market power of an installed base is massive. Engineers
and vendors ignore it at their peril.

The very strong IETF philosophy has been to try to preserve the installed
base, through optional, incremental upgrade.  It is part of the reason that the
architectural history has been to keep the core infrastructure as simple as
possible.

Infrastructure changes rarely permit incremental upgrade. Hence, a change to
the infrastructure often must be done to the *entire* infrastructure before
there can be benefit to any users. Think about the effort -- and delay --
this requires for a large infrastructure, like the IP Internet, or Usenet. A
delay that is effectively infinite becomes probable when the infrastructure
is very large.

It is certainly true that this approach creates solutions that are ugly. The
encoding aspect of MIME is the epitome of ugliness, in my opinion. And, yes,
incremental ugliness can create an aggregate messiness that eventually
requires complete replacement.  And it is important to look for that
critical mass of cumbersomeness.

But the reason for choosing the MIME ugliness was to permit incremental
upgrade while preserving the installed base -- and to make the upgrade be
strictly in the leaves, with no changes at all to the email infrastructure.

There were previous attempts to introduce multi-media email to the Internet,
through operation of an independent, parallel service that had a multi-media
core.  The installed base ignored those efforts.
"

and in <15225657263(_dot_)20030109074805(_at_)brandenburg(_dot_)com>
"
The concern is not whether there is an existence proof for the capability,
or even whether there is already a strong support base. The concern is what
happens to the platforms that do not yet support it.
"

and in  <868770917(_dot_)20030109194639(_at_)brandenburg(_dot_)com>
"
you seem to miss the point that the IETF does not make new standards that
break old standards.  parallel effort is fine, but you do not lay down a new
thing that violates an old one.

your use of "unlikely" basically ends this discussion.

standards work uses statistics to make guesses about needs, but it does not
use it to decide whether a technology is valid.  either it breaks things or
it doesn't.  "unlikely" is not in the vocabulary.  (it breaks the
discussion.)
"

and in <736366624(_dot_)20030110115649(_at_)brandenburg(_dot_)com>
"
Except in any of the places in the installed base that are not expecting
UTF-8.

Oh.  That's right.  That means every place using current standards.

oops.
"

and in <89108836668(_dot_)20030111162443(_at_)brandenburg(_dot_)com>
"
1.  netascii has a 30-year installed base.

2.  compatibility with netascii is an explicit goal when working with email.

3.  the logic behind that goal is applicable to any other application having
a similar installed base history.

4.  paying attention to the installed base requires worrying about
compatibility with what has been established practise, not what "might" work
or what is "frequently" available.

5.  encodings are different from representations.  representation is the
real, "native" data.  encoding is a way of transporting that data over a
constrained environment.

6.  the fact that one 7-bit encoding scheme was not successful does not
condemn all 7-bit encoding schemes.

7.  breaking standards is not measured by whether someone's code core-dumps.
it is measured by whether it violates the specification.  sending 8-bit data
in a 7-bit environment breaks the specification.  sending valid strings of 7-bit
data that might need further interpretation (to obtain the semantics) does
not.
"

and in <138136228145(_dot_)20030114085738(_at_)brandenburg(_dot_)com>
"
The issue is not whether there is an existence proof for the practise you
want to standardize.  The issue is, instead, the impact of a new standard on
the installed base that uses the old standard.
"

Others have also given similar advice, and I could have quoted Ned Freed
or others as well.  After all of that, do you really fail to comprehend
what backward compatibility means in terms of protecting the installed
base, or are we just witnessing more of your sophistry?

4. Compatibility (as exists under RFC 1036) would be broken with interoperating
 protocols including  IMAP and SMTP.


There is no requirement to transport Netnews articles by SMTP. In the case
of moderation, they are required first to be converted to Email messages
(and our draft tells you how). The Email messages may then be transported
by SMTP.

Problem is, the draft method isn't part of the existing installed
base of software.  Under the current standard, news articles
*are* email messages (with some minor differences), so a news
article *can* be (and is) mailed to a moderator, almost always
via SMTP (and possibly via IMAP as well).

No, there is no probnlm with IMAP, which works quite well for maill and news,
and the IMAP RFCs are fully compliant with RFC 1036 and other relevant
RFCs. ...


IMAP works with the existing news protocols. If, as a result of our draft,
the news protocols change, then a matching change in the IMAP protocol
would be in order.

That would again break backwards compatibility, and so would almost
certainly never be approved as such. Now, if you want to propose such
an incompatible format as part of a separate, parallel implementation,
that's another matter.  That may well suffer the same fate as the
incompatible multimedia mail implementations mentioned by Dave Crocker.
Perhaps not. But existing protocols, software installations, etc. could
either continue to support RFC 1036 as is or adopt the hypothetical
new format or both, where practical.

5. Provision of two alternative mechanisms ... is a dubious practice.

But necessary in order to pave the way for future progress. The
alternative is for all the internet protocols to remain stagnant for the
next 1000 years.


The "1000 years" rmark is the height of absurdity.


If you insist that forwards incompatibility can NEVER be allowed, then you
should be aware that "1000 years" is considerably less than "NEVER". I do
not see the absurdity of that.

Your failure to see the absurdity does not make it vanish. We
are talking about backwards compatibility, and there are mechanisms,
discussed at length in the past, which permit gradual migration
without breaking backwards compatibility.  If you cannot see the
absurdity of making a 1000 year prediction based on something
like a 30-year history (thats's for TCP/IP and all of the protocols
built on top of it) in which changes in technology have been
accelerating (compare the number of RFCs issued per decade), then
you have a problem that I cannot help you to solve.

6. The draft requires modification of non-trace header fields, which is incompatible
 with message signing. ...


I am not aware of any circumstance where non-trace field modification can
break S/MIME.


Modification of body content (e.g. MIME-part header fields) is one such case.


No, this situation cannot arise. See RFC 3156 which states:
    "For this reason all data signed according to this protocol MUST be
     constrained to 7 bits (8- bit data MUST be encoded using either
     Quoted-Printable or Base64)."
So you could not have a UTF-8 header such as
    Content-Disposition: attachment; filename = "æøå"
within the signed portion (so RFC 2231 encoding would be necessary in that
case).

That is precisely why signed content must originate in a
compatible format at the UA and be maintained so.  It is
pointless to try to sign something that is required to be
modified (e.g. by a gateway to a moderator) in transit.

7. The use of raw utf-8 is incompatible with administrative tools such as grep ...

Modern versions of grep can handle UTF-8 without problem. ...


Examples have ben provided by others.


And refuted by others.

And the refutations have been refuted. OK, so that discussion
is going nowhere. No matter, it's only one point of many, and
a relatively minor one.

8. The draft provides no means of moderation ...

The draft describes in great detail how to communicate with moderators

...

No, it does not.


Then please point out where it fails to do so. We discussed this very
extensively some months ago.

Indeed we did. And what I actually said is "The draft provides no
means of moderation of extended newsgroups which is both acceptable
to moderators and compatible with the existing installed base of
RFC 1036-compliant UAs and gateways.", and that remains the case.
Encapsulation is unacceptable to moderators, and either passing
raw utf-8 or attempting to second-guess the UA in a gateway is
incompatible with the installed base (for that matter, encapsulation
is also incompatible with the existing installed base)..

Moderators who wish to forward articles
making use of the newly introduced features will, of course,


There are, of course, no new features.


If there are "no new features" in the draft, then what the Hell are you
complaining about?

A "feature" is something like internationalized display names,
comments, and text in unstructured fields.  And that feature has
been around for over a decade in a backwards-compatible manner via
MIME.  Raw utf-8 is not a feature; it is a means of implementation.
In this case it also happens to be incompatible with the installed
based, inconsistent with Best Current Practice, etc.

One problem is the originators UA.  The draft currently permits the UA to
generate illegal content, ...


May I suggest you take the trouble to READ the latest draft. The UA
problem was fixed months ago. It would have been helpful if you had taken
the trouble to comment upon my fix when I presented it.

I have read it; the problem persists. Existing gateways cannot handle
the raw utf-8 content (content which is illegal under the current
standard) that the draft permits UAs to generate.