On Sat, 1 May 2004 06:58, Iljitsch van Beijnum wrote:
An interesting question is whether it's better to implement a binary
date format as a timestamp or a concatnation of
year/month/day/hour/second fields.
I can see that you've thought about this, but it seems you still haven't quite
noticed how full of worms the "dates" can is. The last time I studied up on
dates, I was prompted to write a short essay about it, which you may find
amusing (and even informative).
http://www.nutters.org/log/dating-game
(BTW I once implemented a date format as a floating point value
counting the days since Y2K, which gives good precision right now but
still allows dates far in the past and future. And ignoring the leap
second problem is easier.)
That's actually not a bad approach, and ignoring the leap seconds problem
isn't as evil as some might think. We actually have (at least) two
"universal" times: UTC and UT1. UT1 is a purely solar time, and always has
86400 "seconds" per day. The fundamental unit of UT1 is the "day", where
"day" is a rotation of the Earth relative to the sun, and all the smaller
components are constant fractions of that day.
In UTC, the fundamental unit is the "atomic second" (which is what the SI
nomenclature considers a "second" to be). Given the current rules of UTC, a
"day" is 86400 seconds, with an optional one-second adjustment (plus or
minus) on leap second days, which are presently at the end of March, June,
September, and December, with the June and December days given higher
priority. I say "presently", because the whole thing is defined by a
committee which may well decide to change its collective mind in the future
(as it has done in the past). Leap seconds are inserted (or removed) from UTC
in order to keep it within 0.9s of UT1. It is not reasonable to project UTC
dates into the future (beyond the next leap second day), because leap seconds
are inserted or removed based on the vagaries of the Earth's rotation at the
time.
There's also TAI, which is UTC without the leap seconds. That is, it is based
around the atomic second with exactly 86400 atomic seconds per day, but it
drifts with respect to the solar day because it lacks leap seconds.
The above descriptions are a rather condensed version of the information given
at the following page.
http://tycho.usno.navy.mil/leapsec.html
Given the purpose and nature of date-stamps in email, I would advocate using
UT1 as the time base. UTC is a PITA because of arbitrary leap seconds, and we
don't have any use for the precision of atomic seconds in this application.
TAI isn't useful for civilian timekeeping. In UT1 there are always 86400
seconds in a day (simple! predictable!), and it provides a date-stamp which
is with 0.9s of UTC, which strikes me as more than good enough for the job.
(Please note that the "second" component of an RFC2822 time-of-day production
is already optional.)
Rather than use the number of days since the year 2000, however, I suggest
using the Modified Julian Day, simply because it's a standard epoch that
someone else already invented. A date can be expressed in the form "MJD
+53125.1666", and this would have meaning to historians who already use MJD.
This format is capable of covering the entire history of the human race, it
isn't tied to a particular calendar, and it can be parsed with scanf(). The
format generalises to other epoch-offsets, such as Unix time_t, which could
be expressed "U+1083384342". (The two dates expressed aren't exactly the
same, and I don't advocate actually using the Unix time_t epoch. I merely
demonstrate that this is a general epoch-offset format, not specific to MJD.)
I have sample Perl code for conversion between MJD and Gregorian dates if
anyone wants to see it. It's nothing spectacular.