Re: More liberal draft formatting standards required

On 29 jun 2009, at 23:32, Andrew Sullivan wrote:

On Mon, Jun 29, 2009 at 01:37:31PM -0700, David Morris wrote:
1000 years from now, it will certainly be easier to recover contentfrom
an ascii 'file' than an html, xml, or pdf 'file' created now. It is
probably an unjustified assumption that 'software' available 1000years
from now will be able to render today's html, xml, or pdf.

I am not sure I agree with this assertion.  In 1000 years, I have
every hope that some versions of PDF will be widely usable; but the
currently prescribed format of electronic versions of RFCs I think is
already obsolete, and will be unreadable in 1000 years.

My original message was about problems _creating_ a certain format.But this is of course related to _reading_ a format.

Don't underestimate how quickly formats go away. Anyone here try toopen a Wordperfect document recently?

Assuming that in 1000 years people still understand English and canstill read latin script, it's trivial to decode a plain ASCII file.HTML is only slightly more difficult: just remove everything between <and > and you have something that's mostly readable. With XML youshould be able to recover most of the text, but I'm pretty sure in1000 years nobody is going to understand what <rfc ipr="trust200902"category="exp"> means. Not exactly sure what the insides of a PDF filelook like, but I'll go on a limb and say that it won't be possible toget anything useful out of a PDF file without software thatunderstands PDF. I don't think that will be around in 1000 years.However, because PDF unambiguously maps to an image it should bepossible to convert from PDF to other image formats without losing anycontent. (And then a decade or two later run OCR on that to retreivethe original text...)

So I'd say that if we want to change our archival format a carefullydocumented subset of HTML would probably be a good choice. This iseasy to display on a variety of screen sizes and prints reasonablywithout effort, can be made to print very well with additional tools.It has a lot more structure than flat text so scraping tools couldpotentially be more effective than today's, especially consideringthat old RFCs weren't formatted as rigorously as recent ones.

PDF would be a disaster because it's not compatible with text-onlydisplays, not compatible with any scraping tools, can't be viewedwithout non-trivial software and doesn't scale to display size.

ASCII, on the other hand, doesn't meet any of the librarians'
criteria, and never did.  It is too restrictive even to deal with
non-American titles in the library catalogue (e.g. books priced in
pounds sterling), never mind to deal with non-English titles.


Last time I checked RFCs were free and in English...

Consider this: even if we could use non-latin scripts for author namesin RFCs, would that be a good idea?

Back to my original problem: although there are tons of modern toolsthat create HTML, they usually create completely unstructured and verymessy HTML that would be unusable for archiving or pretty muchanything else. With a modern word processor you can basically createan unstructured and unformatted ASCII file without even line and pagebreaks, or create something highly structured that requires conversiontools to create something that looks like draft format.


We've really painted ourselves in a corner here.
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf