ietf-822
[Top] [All Lists]

Re: Why current RFC-XXXX is unsuitable for non-English languages (Re: Let us finish RFC-XXXX NOW!)

1991-09-28 03:46:46
Various authors write:

I think I have to clear out a couple of things, perhaps to smooth the
very hard discussion just now. When Peter talks about "using our
special characters", he means that before RFC-XXXX, all mailers and
computers in Sweden used for example left-brace for adiaeresis.
The left brace was displayed as a adiaeresis, ans also the keyboard
was made, so the user actually thought he was using the adiaeresis
and not (as he was doing) the left brace.

Now almost all computers turns into 8-bit (at least). Sun among others
support ISO-8859-1 since a year or so. Our users continues to use
what they think is adiaeresis, which on a Sun now suddenly, has become
a prohibited character in the Subject line.

Not true. It was prohibited by standards before. It is prohibited now.

We do not violate the RFC822! We have on the NORDUnet in several years
been running a conversion of 8-bit into 7-bit on ALL of our mailers.
We also have used a "ISOC-8859-1" extension to the RFC822 (here we
actually have a violation...) to check if the reciever can accept
ISO-8859-1 text. The problem is how to use the different characters.

I'm glad you finally say it is a violation. Let's call it what it is, OK?

I note in passing that your voice was not raised in the debate ab>out how to
accomplish all of this using mnemonic or whatever. (At least I don't recall 
any
input from you -- correct me if I'm wrong.) We could have used your input
then...

I must correct you now. One of the first proposals was made by
a group inside the NORDUNET-ETF, and that started all this debate.
We made the TEXT-HEX encoding, we have discussed with Keld his
mnemonic encoding and hopefully helped him with some good input,
we also have made from the NORDUnet a proposal to the RFC-822 that 
has been to discussion in the IETF meetings by our representative, 
Jan Michael Rynning, so I'm sure think we have worked over here in 
the Nordic countries too. But still, we are only 8 millions of us
is sweden, much less than the number of people in NewYork, so
you might not have heard our voice.

Was it presented on this list? I remember TEX-HEX being mentioned a couple of
times. I don't think an explanation of what TEX-HEX is was ever posted, despite
requests for it.

I don't much care how many of you there are in Sweden, or how many of us there
are in America, for that matter. What I do care about is that I don't see you
supporting issues that apparently matter very much to you until a decision to
defer is reached, and then you object to the decision that was reached rather
than contributing to the discussion that led up to it.

The ONLY argument against your great work to make the RFC-XXXX is the
use of non-ASCII in headers.

To repeat the present position: The current proposal is not to address this
issue in RFC-XXXX. In other words, we are changing nothing. What is illegal
before remains illegal now. By doing this we can reach closure on RFC-XXXX
without having to wait to close this issue. Now that we seem to have your
attention we may be able to drive this to closure relatively rapidly ;-)

I think that we should then produce a separate document (yes, I'm willing to
help write it) that describes whatever approach we want to take to the header
character set problem. If this can occur within the right time frame it may
well be possible to fold it into RFC-XXXX, but we can cross that bridge when we
come to it. Having it in a separate RFC is not the end of the world.

Our "positive" proposal (now repeated for the third time) is to
have seperate header to describe the character sets used in the Subject
as you saw in the first message from Peter that started this unnessesary
flaming.

Actually, this is not the third repetition. It is about the tenth. This is
pretty much the same as what Bob Smart and I, among others, have been proposing
for about a month.

We do understand you and your arguments I think, but still: Why stop now!
We do not want to use our local usage of ASCII.

We want to stop now because there is great pressure to achieve closure on what
we have now. My count of the participants prior to deciding to shelve this
issue showed that people were split right down the middle on how to solve
this problem (actually, it was Bob Smart's tally and not mine). You have now
jumped into the fray. Fine, let's use that to achieve some closure here, but
let's not hold up other orthogonal material in order to deal with this stuff.

Perhaps it will be like Keld is writing, we still have to use
our old 7-bit swedish or danish or norwegan version of ASCII. I do not
hope so.

Please, let's not reduce this to a country-specific issues. I've already argued
this point and I do not propose to repeat what I have said before.

Your persistence in thinking that this is a problem that we provincial
Americans don't want to grapple with turns into a self-fulfilling prophecy. By
yelling and screaming that we're overlooking your needs you become your own
worst enemy and virtually guaranteee that we will give your views less
attention than they deserve.

Besides, I happen to agree (basically) with your approach. I'm not the
person you have to convince.

...

As I said in my first contribution, I am now only concerned with the
"*text" fields (actually only the Subject field), isn't it reasonable
to treat them separately from the fields which are parsed?

I think it is. Other people did not agree and we could not come to closure on
this.

Yes we have been violating RFC822 when we have been using non-ASCII,
but what would you expect? That we all should have been communicating
in english? Besides, in the "*text" headers, the violation consists of
that the "}"s and "{" etc was not to be interpreted by the receiver as
braces etc. Do you have examples of problems this have caused?

Yes I do. I have presented them on the list previously. Look them up.

No, nothing is changed, but that's the problem -- the body character
set changes, but not the subject ditto! Observe that I'm not talking
theory but reality -- the users *has* been using and *will* continue
to use special characters.

Everyone realizes this, but the problem remains that we have not come to
closure on these issues yet.

You are living in a fantasy world if you think
your present violations of RFC822 and RFC821 are not causing operability
problems. Sorry, it does cause them. We're trying to come up with a
scheme that does not.

Could you give examples?  Does this really include the Subject line?

I have already presented examples on this list more times than I can count.
Look them up. No, they did not include problems specifically related to
the Subject: line. I can construct artificial problems that are specific
to Subject: lines, but I'm not going to bother -- the present body of
text on problems is plenty large enough.

... And consequently I have a little trouble with dealing with your input
now. If you want to solve these problems, offer up a proposal that addresses
the issues at hand, or endorse one or more of the proposals we have in
front of us that addresses the issues.

I thought I did that, but as several seem to have missed it, I repeat:

    So far I have not seen anybody argue that there would be any parsing
    problems with the headers which contains only text: Subject, Comments
    and (from RFC-XXXX) Content-Description. How about adding two new
    header-fields, to be applied to *only* those headers:

          Text-Header-Field-Type
          Text-Header-Field-Transfer-Encoding

    which are directly parallel to Content-Type and Content-Transfer-
    Encoding respectively, with the restriction that the only permitted
    values for the former is Text/* and Text-Plus/* (and X-*).  (This idea
    is not mine, I'm just supporting and forwarding it.)

You have specified different headers, but apart from that this is pretty much
the proposal Bob Smart and I put have been advocating. We advocated the
specification of the character set and the encoding used on it in a single
header, and we advocated the use of mnemonic (although something like
8859-1/quoted-printable would be allowed) I don't have strong feelings about
the exact form of the information the information takes; all I care about is
that it be there.

I went a little further and proposed covering phrases before route
addresses and comments with the encoding. I'll post a message in a bit
that summarizes where this has gone.

...

In the first instance the
headers will be an area where private agreements continue but as Ned
has indicated this is not intended to be a permanent, or hopefully
even a long-lived, situation.

Do you really mean that the overwhelming majority of all mail
exchanged in non-English speaking countries should be referred
to the area of "private agreements"?

That's the way it is now. I propose to change it, as does Bob, but not
necessarily with RFC-XXXX.

  The new 822+XXXX standard
should be so restricted as far as the Subject line is concerned
that it is adequate only for communication in English (and maybe
Italian and Swahili also)?  A new standard should be published
which allows me to combine typographically pleasing Text-plus,
verbatim fax images, and a voice message in the same multipart
electronic letter, but doesn't allow me to correctly spell a
French name in the Swedish Subject line of the letter?

Once again, RFC-XXXX is not the last RFC that is going to be written
on extensions to e-mail! In fact, it is hoped that RFC-XXXX will only be
the first in a series of RFCs that do good things to mail.

There is nothing special about RFC-XXXX. It attempts to solve a bunch of
problems with existing systems. There's room for additional RFCs to
solve more problems.

...

At the risk of starting another firestorm, one solution to this for 
users in Sweden (or elsewhere with similar problems, e.g., JIS 2022 
isn't ASCII either) might be to implement and disseminate an 8bit
transport plan with negotiation (or, for that matter, a 7bit transport
plan with negotiation).   Such a negotiated model might very well take a 
more relaxed attitude toward non-ASCII characters in headers than is 
feasible with the 821/XXXX combination, since the negotiation itself 
would force the use of gateways that would presumably be able to do 
something acceptable.

In a way, that suggestion is for a return to the enclave model, although 
with a different tone.  Within Sweden (?  Nordunet?  Western Europe?) 
you use a negotiation extension to SMTP and transport, e.g., ISO8859-1 
either "native" and over an 8bit connection or encoded in some
appropriate way (presumably mnemonic) over a 7bit connection.

Isn't it MUCH simpler to just extend the Type and Encoding
mechanism that is already there in RFC-XXXX for message bodies
to the text-only header fields Subject, Comments, and
Content-Description?  (This can be done by two new headers,
Text-Header-Field-Type and Text-Header-Field-Transfer-Encoding,
as Peter Svanberg proposed already in his first message to this
list.)  A further advantage is that this allows e.g. Swedes in
Sweden to communicate with other Swedes working e.g. in the US
in Swedish with no artificial restrictions.

You're preaching among the converted insofar as the advantages of this
approach go. The only problem I'm attempting to address here is _where_
this stuff will go, that's all.

The need for national characters in address phrases is not as
urgent as in the case of the Subject header and can wait for a
future general solution to the header extension problem.

I can turn your argument back on you and say: "would it not be simpler
to deal with this NOW rather than later". Even if we decide NOT to 
deal with, I want to make that decision rather than omit the discussion
that leads up to it.

                                Ned

<Prev in Thread] Current Thread [Next in Thread>