Re: Don't change RFC822 for the worse!


I have been reading this "debate" with frankly mounting
disbelief.  As a relative newcomer I hesitated to add my
tuppence-worth, but perhaps this perspective will help.

Having now written this, its turned out a bit longer than I'd
intended.  There's a summary at the end, so you can skip the
rest if it looks boring.


General Remarks from a Trying-to-be Implementor
===============================================

We have recently written an MUA and mail edit/display program.
When we started this project we collected all the RFCs and sat
down to read them.  We were pleased to find that we were not
faced with a heavy-weight specification -- no stack of barely
comprehensible paper several feet deep.  This was going to be
easy...

Of course, we were wrong (yup, we was wrong).

The RFCs contain significant numbers of what my father was
pleased to call "clearly defined areas of doubt and
uncertainty".  Having seen how matters of detail are debated
and resolved, I am no longer surprised.  [My father was a
diplomat.  Sometimes in a negotiation the parties cannot
resolve a difference.  One strategy then is: first you clearly
define the area of difference, and agree on that; second you
agree on some important matters of principle by which this
difference can be settled.  By now you've agreed on a lot of
things, where a while ago you were getting ready for war !  A
few rounds of this, and the parties come away with an
agreement.  What you must not do in this process is to attempt
to examine how the parties interpret the agreement -- this is
where the doubt and uncertainty comes in.  You can obviously
agree, for example, that something must be divided fairly (who
wouldn't agree to that?).  What you absolutely must not then
do is try to agree concrete criteria for deciding fairness.
That way lies war...]

More important, however, is that the RFCs are only relevant to
the extent that apparently the average implementor once met
somebody in a pub (or other watering hole) who had seen the
relevant RFCs in early draft form, and who had some recall of
the interesting bits....  (Sorry, I exaggerate a little.)  The
real point is there is no point worrying about whether this or
that mailer is in some strict sense (i.e. as per the RFCs)
"broken" (always assuming you can establish that definitely).
As a purely practical matter "interoperability" requires you
to accept and understand all sorts of extra requirements and
caveats.

In brief: the RFCs are only an introduction.  There are no
absolute specifications.  There is a lot of additional "you
know, and I know that it is generally accepted that..."

RFC writers please note: this is not a flame.  The strength of
the Internet "culture" is that it works.  If it didn't, the
Internet wouldn't be growing the way it is.  But the reason
the thing works is not because the RFCs are minor miracles of
completeness -- indeed, quite the opposite !

A Quick Look at RFC822
======================

RFC822 doesn't say much about the message body, does it ?  I
can find :

   in 1.2: "Messages consist of lines of text.  No special
   provisions are made for encoding drawings, facsimile,
   speech or structured text."

   in 3.1: "The body is simply a sequence of lines containing
   ASCII characters."

   in 4.1: "message = fields *( CRLF *text )".

   in Appendix D: "CHAR  = <any ASCII character>  ; (0-177,
   0.-127.)" and "text = <any CHAR, including bare CR and bare
   LF, but NOT including CRLF>".

And in the Bibliography there is a reference to "USA Standard
Code for Information Interchange", X3.4.

In addition to RFC822 "you know, and I know" (inter alia) the
following :

 o while lines shouldn't be more than about a thousand characters
   long, you'd be well advised to keep lines a lot shorter
   than that, if you want to be nice to the rest of the world.

 o you can try sending NULs if you like, but don't be
   surprised if they don't get to the other end.  (Maybe we'd
   better stay away from DEL, as well.)

 o if you do send control codes other than CR and LF, they'll
   probably get to the other end, but what will happen then is
   more than a bit variable -- TAB will probably tab to the
   next multiple of eight column; but don't bet the farm.

 o your mail could go through an EBCDIC gateway, which may (or
   may not) mangle some of the characters (oy vey).

 o your mail could go through systems that strip trailing
   spaces off lines -- better avoid that too.

 o mail messages of greater than 64K bytes may (or may not)
   get through intact -- though some gateways impose a smaller
   limit.  But not to worry, the odds are that much bigger
   messages will work, for most people.

 o actually, lots of systems will quite happily handle 8-bit
   data.  So if you're used to working with 8-bit characters,
   why worry ?  (Let's be pragmatic.)

 o the RFC may say ASCII, but all sorts of people send any old
   7 or 8 bit data -- for example various national variants of
   ISO 646, or ISO 8859, or ....

In this context, what grounds are there for an argument on a
strict, or otherwise, interpretation on the meaning or even
the intended meaning of RFC822 ?

[Of course MIME as an "upward compatible" extension lives
within not just RFC822 but all the "you know, and I know"
restrictions.  The dreaded Content-Transfer-Encoding comes of
this -- the "Quoted-Printable" seems especially popular ;-).]

So What is Actually Happening ?
===============================

OK, well maybe RFC822 is a bit vague about message bodies, and
in any case custom and practice has overriding practical
precedence.

When I send a piece of mail all I care about is that the
recipient can read it.

Because its a lowest common denominator, if I send stuff which
is ISO 646 (or USASCII), in reasonable length lines, avoiding
control codes and some contentious character codes, then
whoever is at the other end will probably be able to read it.

This is about as much as can be guaranteed.  [This may even be
what RFC822 intended -- though it could be deliberately widely
drafted, so as to allow for future improvements.]  Little
though this is, it is enough for me to communicate from my
desk to yours -- wherever you are on the planet.

Of course, if I know the recipient is another Windows user I
can go ahead and use all sorts of characters -- unless we're
unlucky and have some awkward 7-bit transport between us.

And, if I needed to write Cyrillic then I'd have to find out,
or guess a suitable encoding.

Or, if I needed to communicate in Japanese then I'd have to
use whatever the recipient could decode and display -- which I
gather would probably be ISO-2022 based.

If all else fails I can take a document, UUENCODE it, and send
that.  (And, unless I am really unlucky, the data will reach
the other end.)

All these ad hoc schemes work wonderfully.  The Internet
allows much and forgives all -- so all sorts of local
conventions allow people to talk to each other with the
minimum of fuss.

So what is the Problem, then ?
==============================

Now I immediately ask forgiveness if I have misunderstood the
matters of substance -- and ask for further enlightenment.

 o Is the use of ISO-2022-.. contrary to RFC822 ?

   If it is, does it matter ?  If communities are happily
   using it, and it gets through their transports, and it
   doesn't break other peoples machines, ...

   [For standards experts only: what is the relationship
   between USASCII and ISO-646 ?  I note in my copy of ISO-
   646-1983, paragraph "4.1 c) Code extension control
   characters", the following: "Procedures for the use of the
   code extension control characters are specified in ISO
   2022."]

 o Does the fact that many cannot read ISO-2022-.. messages
   mean that they are a bad thing ?

   Surely not.  Provided the messages don't cause my mailer to
   fall over, or mess up my screen.  (And my mailer should be
   tolerant of at least codes 1.-127. !)

   If I get a message in (say) Polish I'm afraid I cannot read
   it, even if the characters are legible.  Even if my mailer
   understood ISO-2022-JP (which I'm afraid it doesn't) and
   displayed the necessary, I still wouldn't be able to read
   messages in Japanese or Korean or Chinese -- ignorance on
   my part again, I fear.

   If I wanted to read these things I would go and learn the
   language, and get myself the right software -- I guess I
   would have plenty of time while attempting the first to
   achieve the second.

   In the meantime, if these messages look like a jumble of
   odd characters on the screen, why should I care ?

 o Would it be a good idea to be able to declare the character
   encoding used in a message ?

   Let me see now.  What is it that bears do in the woods ?

   When I do get a mailer that does understand ISO-2022-..,
   and maybe some other commonly used codes, I can see it
   would be handy for it to be able to tell which code (or
   even codes) a message uses -- so I don't have to guess and
   tell it !

   Since MIME has a mechanism for declaring a character set,
   would adopting that do the trick ?

 o What should a mailer assume in the absence of a declared
   character set ?

   What do mailers do now ?  They assume something convenient
   for their users.  Why should that change ?

   Surely, it is not seriously suggested that after all this
   time some strict interpretation of RFC822 should enforce
   USASCII ?  Not the pragmatic, practical Internet way, is
   it?

 o Should the Internet adopt ISO-2022 ?

   Has anyone asked this ?  If not, why not !?

   Since ISO-2022 sits over (under?, beside?, with?) ISO-646,
   would this not be a logical step up ?

   From my experience the ISO-8859 sets are sufficient for
   European languages (please correct me if I'm wrong).  I
   presume there's a standard way of switching between these
   in ISO-2022 manner.

   If ISO-2022 is already in use for Asian languages isn't
   that a good enough reason ?  Or is there a better solution?


Summary
=======

As a relatively recent implementor of software for the
Internet I have been impressed by the practical "lets get on
and get something working" culture.

The down-side of this is that finding a full specification to
work to is not simply a matter of reading the RFCs !
(Wonderful bed-time reading though they are.)

I have found the tone and content of the recent "debate" quite
incomprehensible.  I don't know what axes are being ground
here -- but they must be very, very sharp by now.

"You know, and I know" that the RFCs are actually less
important than the custom and practice that has built up
since.  So it doesn't seem to make sense to argue over a
strict interpretation of an RFC.

Is this really the way that new features are "designed" ?
Surely not !  I do hope that the Internet has not grown so big
that only the International Standards Body type organisation
will now work.


I apologise for the length of this.  The Internet is a noble
enterprise (if I can say that without sounding unutterably
pompous).  It has pained me to see it being used so ignobly,
and to such little practical effect.

Chris

------------------------------------------------------------------
Chris Hall          : Dorking Business Park : Tel: +44 1306 747700
Managing Director   : DORKING               :
Turnpike Ltd.       : Surrey,  UK.  RH4 1HN :  ChrisH(_at_)turnpike(_dot_)com