Re: Why the normative form of IETF Standards is ASCII

Julian Reschke wrote:


Actually, the page breaks _are_ useful.  Like when referencing specific
parts/paragraph in a document with an URL in a long section, e.g.
    http://tools.ietf.org/html/rfc5246#page-36
which contains the message flow of a full TLS handshake.
And that message flow is just perfect in ASCII arts.


That URL points to an HTML document, not a TXT document. There is 
(unfortunately) no fragment identifier syntax for text/plain (at least 
not one that UAs actually support)


Wrong.  It points to a TXT document that is rendered as HTML.

If you abide to certain conventions in your plain ASCII text,
then everyone can recognize and use them (RFC/ID -> HTML or -> PDF
converters, accessibility tools like text->speech).  And it still
renders just fine on pure text environments and over very low
bandwidth links.

I-Ds and RFCs are not "publish and forget" documents, but instead
they're vivid snapshots of working group discussions in constant
motion and under discussion, and one of the most important aspects
is that others can easily build a derivative work from an existing
document (especially for expired I-Ds).

So it is extremely important that the published format is easy to
quote in ASCII-Emails, can be easily quoted in ASCII code comments,
and easily incorporated into new documents.

Just try NRoffEdits conversion I-D -> authoring nroff source and
see how easy that is.  It's a single all-in-one tool written in Java,
basically wysiwyg with spell checker included and makes I-D editing
extremely easy.


And guess what: if we go directly to HTML, we'd have anchors as well, 
but not only for section numbers, but also figures, tables, or even 
individual paragraphs.


"Anchors" in plain-ASCII text that are human-comprehensible can
be automatically converted into real URLs and anchors with simple
tools.  These tools exist and work just fine with the existing
plain-ascii text documents.

all unicode codepoints from their glyphs (and a number of them
can not be distinguished by their glyphs), and even worse, most
machines/environments do not even have fonts to display glyphs
for most of the unicode codepoints.


That is an argument for not allowing *all* Unicode characters.


Which languages do you want to discriminate against, and on what
grounds?  Anything beyond US-ASCII is unfair to those that
are not in the set.  I'm typing on a german keyboard, but there
are a significant number of characters in ISO-8859-1 (or ISO-8859-15)
that I can not type.  Over the years I've come accustomed to not
use any Umlauts in Emails, even when writing in German.

The employee names in our Outlook address book also do not use any
non-ascii characters in consideration to our world-wide subsidiaries.
Adding names into an Email Addressbook that most people can not
type on their keyboards just doesn't make any sense.


Have you ever heard of X.400 Mail?
The story goes like this:

Two people meet on a conference, they both have Internet-Email, exchange
their email addresses and happily communicate thereafter.

Two other people meet on a conference, one of them has Internet-Email,
the other X.400.  The X.400-person takes the Internet-Email address home,
sends an EMail and both hope, that the "reply" function will work.

Yet two other people meet on a conference, both of them have X.400.
They exchange their phone numbers.



If you want people to communicate, they need to share a common
language.  If they don't happen to already know one common language
(in which case the difficulty/complexity of that language doesn't matter),
then they should probably standardize on a language that is easy
to learn for both of them and has a high likelyhood to be useful
for communication with other people as well.

And it makes perfect sense to not only standardize on that single
language in spoken communication, but also in written communication.

Anyone who enters IETF discussions, which are Email-based for a large
part, should provide a description of his own name with letters
from the US-ASCII alphabet, rather than forcing others to make
guesses how to do it given some kind of gibberish codepoints from
awkward codepages.


Some people think internationalized domain names are a good idea.
I think they are a pretty stupid idea, because they're a significant
roadblock for international communication.  Lots of people around
the globe will have severe difficulties accessing some Web-Site
that uses a fancy internationalized domain name, or someone using
a fancy internationalized email address.  If you don't happen
to know the language, recognize the gylphs and have a platform
where you can actually create/type that on your keyboard, then
you will not be able to read&use such web server or email addresses
from paper or television ads or from a paper business card.


We are really lucky that the world standardized on a single set
of gylphs to represent digits, use of the decimal system and
the same orientation for the notation of numbers.

Standardizing on a number of icons/pictograms for certain signs
(including traffic) also works remarkably well.

But when it comes to standardizing on an alphabet for communication
among an international community, there is much less rationale
involved for some.

Having stuff that can only be copy&pasted for a large part of
the internet population, but neither typed nor displayed is
nothing that we need in our specs.

Using HTML or PDF for RFCs is about the same as moving from
English language RFCs to mandarin language RFCs.  There is
a huge number of people who can read it, but there is a
also a large number of current RFC and I-D consumers and
producers which can not and does not want to use mandarin.


Sorry? Are you implying anybody is unable to display HTML?


Yes, of course.  The majority of devices and a huge number of
environments is completely unable to display HTML.
Only the fancy gadgets with graphic displays and lots of
CPU horsepower, memory and network bandwidth to waste
support HTML -- but still need tools to do so, and at the
same time limit what other things can be done with that
information that is encoded as HTML.

I do not doubt that there are tools available for heavy graphical
user interfaces and specific platforms that can deal with mandarin
just fine.  But I do not understand mandarin, my tools can not cope
with it and a lot of my platforms and my environments can not cope
with it.  HTML and pdf are only marginally better than mandarin.


Sorry? With all due respect, but this statement is really ridiculous; at 
least in the context of HTML.

Btw. printing out I-Ds and RFCs on paper (even 2-up and double sided)
has always been working just fine for me with tools like a2ps.


Sounds great if you have a Postscript printer.


I'm pretty sure there are similar ascii to pdf converters by now.
I would not be surprised if there even was a web service that will
convert an I-D into PDF in case that you can neither print formatted
ASCII text on your printer nor Postscript.  And if not, creating
such a tool is probably trivial, much much simpler than a tool
to make a fancy document format like HTML or PDF viewable on
a pure textual display.

Printing PDF works, but reading PDF on screen is a royal PITA.


Yes.

Even with a 1600x1200 screen, displaying a single page is


The concept of a "page" is part of the problem - for the current format 
as well. "Pages" are good when the primary output format is paper, and 
the paper size is known in advance. Today, that's the edge case.


The paper sizes Legal, Letter, A4 etc. have been around for many years,
and they will be around in 30 years.  I'm not so sure about any current
incarnation of PDF or HTML.

difficult to achieve -- and hardly legible with many PDFs that
I've come accross--and if you choose a legible size, then the page
doesn't fit and you have to page down to see the bottom of the page.

Getting 62 lines of pure ascii text with a legible font displayed
on a 1024x768 screen is much easier and much more legible.
And doing a 2-up of an RFC or I-D has a predictable legibility.
...


How is that different for (a properly selected subset of) HTML?


Printing HTML is still a research topic.  It just doesn't work.
The common "Tool" to display HTML is called web browser, and
the only influence, if any, that you have on what comes out off
the printer is the "paper size" and potentially "margins".

But more often than not, the screen-oriented formatting in HTML
resuls in the printouts being truncated at the right border
or filled with white spaces.  And removing parts of the page
(like a navigation box that becomes useless when printed
before printing it is a feature not currently supported.


HTML is definitely a nightmare when it comes to printing.

And support for mobile devices just isn't there.   May Web-Sites
even have difficulties supporting certain browser brands -- so much
for a standard.  Ever tried to read Microsoft SDK documentation online
with Firefox?  It's unusable.  Fortunately, the "printer friendly version"
displays in a readable fashion.


-Martin
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf