namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: Problems with date sorts

2007-01-13 11:14:59
On Sat, 2007-01-13 at 13:58 +0900, Tadamasa Teranishi wrote:
Lindsay Haisley wrote:

Does date information accurately follow the form of RFC2822 by
all documents of MailMan?

Is there mail with an illegal Date: field ?

Please show the Date: field of the mail.

OK, here is an example.  I used the following query:

http://www.kca-tx.org/mailman/kca/namazu.cgi?query=Laptop&submit=Search%21&idxname=kca&max=100&result=short&sort=date%3Aearly

Here's the result:

1. win 98SE (score: 2)
    /pipermail/kca/2002-September/000192.html (4,152 bytes)
...
You can see from path names that these are out of order.  Here are the Date
fields in each of these, copy-n-pasted from the files themselves:

Sun Sep 22 14:07:51 CDT 2002 
Fri Jul 19 11:54:00 CDT 2002
Fri Aug 23 08:44:57 CDT 2002
Mon Mar  3 10:14:12 CST 2003
Tue Jan 14 17:37:11 CST 2003

To begin with, Date of pipermail was not RFC2822 form. 
However, Date of pipermail is correctly reflected in the field 'Date'.

The pipermail format contains no RFC822 header, but morphs the "Date"
header from the original post into an HTML element, and the date format
is different from what was in the original message/rfc822 format.  This
is apparently correctly parsed by mknmz.  For instance, the original
message contained:

Date: Sun, 22 Sep 2002 14:07:51 -0500

Whereas Mailman's pipermail conversion converts this to:

<I>Sun Sep 22 14:07:51 CDT 2002</I>

Namazu must handle this difference.  Pipermail files, at least those
generated by Mailman, aren't in RFC822 format, nor can the program
reasonably expect them to be in this format.

Then, let's examine the time stamp of pipermail file next. 
For instance, please confirm the following file and confirm the date
by 'ls -alF'.

The Unix time stamp is irrelevant!  The entire pipermail archive file 
hierarchy can be rebuilt using the Mailman "arch" utility (arch --wipe 
listname) in which case the time stamp on these files will be the time of the 
rebuild, not the posting time.  No indexing utility (or any other utility for 
that matter) can expect to get valid posting date/time data from the Unix file 
mod time!

One is a method of changing the time stamp of the file according 
to information on Date of contents of the file. 

No accessory utility such as namazu should ever make any changes to the
data source it's analyzing, even if it's only a matter of changing the
Unix time stamp.

Even if this weren't bad software design, it's a rather excessive and 
inefficient way to do the job.  If namazu can correctly parse out the date from 
a pipermail html file, as it seems to be able to do in pipermail.pl, then it 
can certainly store this information in an index so it can be used to sort the 
files in correct date order.  Moreover, although I haven't looked at all the 
code, nowhere in the code I looked at did it look as if namazu was trying to 
sequence files based on the Unix time stamp!

I'm not sure what you're saying here.  Does namazu expect to be able to do date 
sorts based on the Unix timestamp on pipermail files?  I find this hard to 
believe since the namazu code looks pretty intelligent otherwise.  I'll run 
some tests to see if this is true, but if it is, we'll have to abandon our 
project to integrate namazu into mailman since this kind of behavior would 
constitute a serious design flaw.

Another one is a method of using the field sorting UTC function. 
Do you know the method of setting the field sorting UTC function?

I'm not familiar with the field sorting UTC function.  What does namazu
use, and where in the code is the sorting done?

Are you one of the authors of namazu?

-- 
Lindsay Haisley       |  "The voice of dissent  |     PGP public key
FMP Computer Services | was arrested before the |      available at
512-259-1190          |  president cleared his  | http://pubkeys.fmp.com
http://www.fmp.com    |     throat to speak     |
                      |        of freedom"      |
                      |     (Chris Chandler)    |

_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en

<Prev in Thread] Current Thread [Next in Thread>