mail-ng
[Top] [All Lists]

Human Readable vs Machine Readable

2004-04-29 04:38:44

On Thu, 29 Apr 2004 06:59, Keith Moore wrote:
...it's a direct result of the broader
design choice to make email mesages be both machine-readable and
human-readable.   Changing the date format without revisiting the larger
choice is probably wasted  effort.

Then let me make it clear that the larger choice is what I wish to bring into
question -- it's just that RFC822 dates provide the best examples of why we
ought to do so.

Without necessarily advocating harshly opaque binary formats, I wish to bring
into serious question the idea that any representation can be considered
generally human-readable when it includes elements of a particular natural
language (eg "Jan", "Feb", "Mar", etc). Such a practice makes it more
readable for a particular subset of humans while not aiding the others, and
arguably making the machine side of things harder. Unless your particular
function library already contains date-handling functions which recognise
this notation, you'll probably need a table of month names with which to
back-convert into a month number, and that may only be an intermediate step.

The ISO numeric date format is arguably both more machine readable and more
human-readable, since Arabic numerals in a position-based decimal system are
far closer to a universal language than any other form of notation, so far as
I'm aware. (Anyone with specific knowledge care to comment on this?) They are
limited in that they represent numbers only, and it would be confusing to use
them in any other context, but dates are quite amenable to numeric
representation. Any machine manipulation of a date will generally pass
through a numeric intermediate form of some sort.

With regards to the non-numeric elements of a protocol, I have no objection to
English-derived keywords (like the "HELO" and "RCPT" of SMTP). An easily
parsed grammar can include a "keyword" element (or similar) which can be
almost any string. For these, historical practice has been to make them
representable in the alphabetic subset of ASCII, and mnemonic relative to the
English language. It's mildly regrettable that we have to play favourites
with some natural language or another in this context, but unavoidable so far
as I can tell (unless you find "mnemonic to nobody" better than "mnemonic to
those literate in English"). I see no need to change this practice.

A general principle I would derive from the above discussion is as follows.
Some data are represented both numerically and with words, such as dates
(with particular reference to months). For improved machine and human
readability, the numeric format should be used, absent any compelling reasons
otherwise.

The amount of work in decoding an integer (say a UNIX-style
time_t) to a date is approximately equal to the amount of work in
parsing a RFC822 style date, and it's at least as easy to botch

Most operating systems or languages have a function for producing a text-based
date from a purely numeric epoch-offset format (like time_t). The reverse
operation is rarely provided, because it's much harder (both to define and to
implement). So, in actual practice, a Unix programmer uses 'ctime()' or its
ilk to decode a time_t to a string, which is a trivial exercise, whereas an
RFC822 date parser is more often implemented on the spot from lower level
tools.

General principle I'm advocating here: if we know that a particular data
format or function is widely available to implementers, we should prefer
protocol elements amenable to those formats and functions, rather than
disregarding them. "Most environments provide functions to interpret decimal
integers expressed in ASCII format" is, I believe, an uncontroversial
example. The RFC822 date format does not seem to have been designed with this
principle in mind.

Do you show dates
in the recipient's time zone, or the time zones of the various
senders?

The sender's time zone.

My MUA is showing me the date at which your message was sent in *my* time zone
(the recipient's time zone), and I happen to like that behaviour. In order to
do so, it has to perform time zone calculus on an RFC822 date, which almost
certainly involves going via some internal date format based on an
epoch-offset (like time_t) or a structured numeric format (like struct tm).
If we are to stipulate that MUAs should present dates in either the sender's
zone or the recipient's zone according to user preference, then we facilitate
this behaviour by representing the date in-protocol in the most general form
-- the one most readily converted into all the desired target forms. For
dates, this will mean universal time plus sender time zone.

General principle: where a datum can be expressed in more than one form and a
choice of forms is a reasonable thing to offer, the protocol ought to adopt
the form which maximises ease of conversion to the desired destinations (if
such a maximisation is possible). In the case of dates, we have "universal
time" which is the most convenient time from which to compute particular
local times.


<Prev in Thread] Current Thread [Next in Thread>