mail-ng
[Top] [All Lists]

Re: OT: Re: Less is more

2004-04-30 13:59:02

On 30-apr-04, at 21:13, Keith Moore wrote:

not really, because it's not representative of the kinds of errors
that programmers make when writing 822 date parsers.

Frode's field was syntactically valid (as far as I could see).

If someone of your experience can't say for sure then something isn't right...

If that field occurred in an actual message generated by an actual MUA
I'd claim it was a programmer error even if it was syntactically valid
:)  Anyone who put that in a shipping product ought to be sacked.

And what about the protocol designer? "Fool me once, shame on you. Fool me twice, shame on me. Fool me 4294967295 times, shame on the IETF."

If a program can't parse Frode's field, it can't parse the RFC822 date
field syntax as specified.

True, but it could quite possibly parse 99.999% of the dates that occur
in actual use, including dates that aren't valid - at which point the
inability to parse dates is insignificant in comparison to failures that
are due to other problems.  If you're concerned about reliability you
care about how well it works in actual use, not whether it handles
really obscure corner cases.  (security concerns are an exception -
since crackers specifically look for corner cases.)

This sounds like a decent pragmatic approach to protocols which are so complex or ambiguously defined that writing code that handles all possible permutations is infeasible. However, when building a new protocol it makes sense to create it such that the number of possible permutations is small enough that it's possible to fully implement and test them all.

An interesting question is whether it's better to implement a binary date format as a timestamp or a concatnation of year/month/day/hour/second fields. The advantage of timestamps is it makes date comparisons easy and there is no ambiguity as every possible value is a valid date/time. Testing is also easy because there are only three exceptional cases: underflow, overflow and wraparound. But the problem with a timestamp is that it doesn't allow for leap seconds so the easy math advantage is pretty much fictional and it's hard to debug. (BTW I once implemented a date format as a floating point value counting the days since Y2K, which gives good precision right now but still allows dates far in the past and future. And ignoring the leap second problem is easier.)

Another issue that is somewhat related: overhead. The message I'm replying to is 3906 bytes long on my system. 2768 of that is header. 299 of that is date/time information (8 of them) and 398 host name/address info for 7 "received" lines. Now if we encode this in binary a "received" line can be timestamp, IP address, IP address = 12 bytes, add another 8 for overhead and that's 140 bytes rather than 665, which saves 25% header bytes. We can save even more by removing redundant information such as localhost received lines and software advertising.

Add to that that a binary format with explicit length values is much faster to parse (especially on disk where you can seek over large uninteresting parts) and less inclined to have buffer overflow problems. On the other hand, the binary format must be simple enough that it can be implemented and debugged easily to avoid SMNP-like troubles.


<Prev in Thread] Current Thread [Next in Thread>