Re: Human Readable vs Machine Readable


On Thu, 29 Apr 2004 06:59, Keith Moore wrote:

...it's a direct result of the broader
design choice to make email mesages be both machine-readable and
human-readable.   Changing the date format without revisiting the
larger choice is probably wasted  effort.


Then let me make it clear that the larger choice is what I wish to
bring into question -- it's just that RFC822 dates provide the best
examples of why we ought to do so.


If dates are really the best example (which I doubt) then we should
definitely stay with a text-based format for ease in debugging.

Without necessarily advocating harshly opaque binary formats, I wish
to bring into serious question the idea that any representation can be
considered generally human-readable when it includes elements of a
particular natural language (eg "Jan", "Feb", "Mar", etc).


You may as well complain about header field names being English words.  
it may seem unfair and biased towards English, but it really is useful.
lots of people who don't speak English understand enough English to
grasp what the header fields mean.  And from a programmer's perspective,
comparing for "Jan" vs. "Feb" etc. is not much more difficult than
comparing for "1" vs. "2" etc.  Also "Jan" is unambiguously a month
name, whereas "1" could be almost anything.

The ISO numeric date format is arguably both more machine readable and
more human-readable, since Arabic numerals in a position-based decimal
system are far closer to a universal language than any other form of
notation, so far as I'm aware.


The emperical evidence suggests that programmers can't put the fields
in proper order (even when they spell them correctly, which they often
do) for rfc822  format.  What makes you think that they can put the
fields in proper  order for ISO format?

A general principle I would derive from the above discussion is as
follows. Some data are represented both numerically and with words,
such as dates(with particular reference to months). For improved
machine and human readability, the numeric format should be used,
absent any compelling reasons otherwise.


I don't think you've given us any compelling reasons for your
"principle".

Now this may seem like a trivial discussion, but (except for killing
innocent people for political purposes) nothing bothers me so much as
trying to do protocol design based on superstition or false premises.

The amount of work in decoding an integer (say a UNIX-style
time_t) to a date is approximately equal to the amount of work in
parsing a RFC822 style date, and it's at least as easy to botch


Most operating systems or languages have a function for producing a
text-based date from a purely numeric epoch-offset format (like
time_t). The reverse operation is rarely provided, because it's much
harder (both to define and to implement).


You also get into issues like whether to count leap seconds.  What this
means is that different versions of an integer-to-separate-fields
decoder can give different results.

So, in actual practice, a
Unix programmer uses 'ctime()' or its ilk to decode a time_t to a
string, which is a trivial exercise, whereas an RFC822 date parser is
more often implemented on the spot from lower level tools.


Why is it trivial to convert time_t to a string for display (it's not
as if you want to use ctime format for display) and non-trivial to 
convert it to an rfc822 date?  I mean, how hard is this?

char *
arpadate (t)
time_t *t;
{
    struct tm gmt;
    struct tm *lt;
    static char datebuf[100];
    int gmtoff;
    char sign;
    static char *months[] = {
        "Jan", "Feb", "Mar", "Apr", "May", "Jun",
        "Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
    };
    static char *wdays[] = {
        "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat",
    };
    
    if (t == NULL)
        return NULL;

    /*
     * "I'm ashamed of this."  - SEK
     *
     * there's no portable function to get the offset between local time
     * and gmt.  so we call both localtime() and gmtime() with the same
     * time clock and calculate the difference.   also, gmtime() and
     * localtime() share the same static return structure, so we have to
     * copy the result of one before we call the other.  yeech.
     */
    gmt = *gmtime (t);
    lt = localtime (t);
    gmtoff = (lt->tm_hour - gmt.tm_hour) * 60 + lt->tm_min - gmt.tm_min;
    if (lt->tm_year != gmt.tm_year)
        gmtoff += (lt->tm_year - gmt.tm_year) * 24 * 60;
    else
        gmtoff += (lt->tm_yday - gmt.tm_yday) * 24 * 60;

    sign = '+';
    if (gmtoff < 0) {
        sign = '-';
        gmtoff = -gmtoff;
    }
    sprintf (datebuf, "%s, %d %s %04d %02d:%02d:%02d %c%02d%02d",
             wdays[lt->tm_wday], lt->tm_mday, months[lt->tm_mon],
             lt->tm_year + 1900, lt->tm_hour, lt->tm_min, lt->tm_sec,
             sign,
             gmtoff / 60,
             gmtoff % 60);

    return datebuf;
}

General principle I'm advocating here: if we know that a particular
data format or function is widely available to implementers, we should
prefer protocol elements amenable to those formats and functions,
rather than disregarding them.


ARRRGH. Don't define protocols in terms of APIs.  The APIs will change
and we'll be screwed.  ctime() used to produce exactly the same string
on all UNIX systems - then localization mucked things up.

"Most environments provide functions to
interpret decimal integers expressed in ASCII format" is, I believe,
an uncontroversial example. The RFC822 date format does not seem to
have been designed with this principle in mind.


The only real problem with the RFC822 date is that programmers don't
bother to read the spec.  Why you think they'll read an even less
accesible spec instead is baffling.

Do you show dates
in the recipient's time zone, or the time zones of the various
senders?


The sender's time zone.


My MUA is showing me the date at which your message was sent in *my*
time zone(the recipient's time zone), and I happen to like that
behaviour.


there's no accounting for taste.

--
Regime change 2004 - better late than never.