On Mon, Jun 29, 2009 at 01:37:31PM -0700, David Morris wrote:
1000 years from now, it will certainly be easier to recover content from
an ascii 'file' than an html, xml, or pdf 'file' created now. It is
probably an unjustified assumption that 'software' available 1000 years
from now will be able to render today's html, xml, or pdf.
I am not sure I agree with this assertion. In 1000 years, I have
every hope that some versions of PDF will be widely usable; but the
currently prescribed format of electronic versions of RFCs I think is
already obsolete, and will be unreadable in 1000 years.
PDF/A-1 is an ISO standard preferred by the U.S. Library of Congress
for page-oriented textual (or primarily textual) documents "when
layout and visual characteristics are more significant than logical
visited 2009-06-29) One could construct a reasonable argument that in
the case of RFCs, the layout and visual characteristics are _not_ more
siginficant than logical structure. But under the current publication
regime, they are in fact more significant: we have significant rules
for publication about the exact "page" layout, the number of lines,
the margins, the headers and footers, and even what "character"
(i.e. line-printer character) ends a page. We have practically no
guidance about the logical structure of documents, except that if the
document is a given number of pages, it needs a table of contents.
Whether the logical structure of the document ought to be of higher
concern in relation to the publication form is a topic argued
elsewhere in this thread. I want to pay attention to whether PDF will
be usable in 1000 years.
The Library of Congress, and librarians generally, take archival
formats terribly seriously. There is just about no hope of dislodging
the MARC standard, for instance, even though every librarian I ever
spoke to in my admittedly brief library career granted that MARC is
miserably adapted to relational databases (which hadn't been invented
when MARC was settled upon). The reason MARC can't be replaced is
because that's the format they picked, and so everything has to work
around it. Period. The technology it was invented around was
obsolete before the standard even got widely adopted? Too bad. This
is an _archival_ format, and therefore it Will Not Change. All future
technology will simply be specified to use it. And it is so
specified: one library automation system I knew of when I last looked
at this (nearly 10 years ago, mind) stored every MARC record in BLOBs,
and just did everything up in the application. Everyone except the
sales people thought this a miserable hack, but the MARC format was
preserved. Thus do relational theorists go slowly insane.
If librarians have picked PDF/A-1 as an electronic format that they're
going to use -- particularly, if LC has picked it -- then I have every
confidence that the format will be supported somewhere for roughly as
long as there remain readers on Earth. I am more concerned, in fact,
about widespread inability to read than I am about librarians stopping
support of some archival format they selected. They are way more
serious about keeping old archival formats working than the IETF has
even been about making FTP continue to work everywhere.
ASCII, on the other hand, doesn't meet any of the librarians'
criteria, and never did. It is too restrictive even to deal with
non-American titles in the library catalogue (e.g. books priced in
pounds sterling), never mind to deal with non-English titles. ASCII
was a bugbear in library automation systems from the very beginning.
Certainly, files of supposedly plain text containing the occasional
control character used to format pages for a specific line printer
that was once attached to some ancient computer system on the Internet
is not an archival format that any responsible librarian would sign up
Ietf mailing list