Thus spake Tadamasa Teranishi on Fri, Jan 12, 2007 at 09:18:26AM CST
Lindsay Haisley wrote:
I'm running into a rather nasty problem with date sorting on Mailman
pipermail
archives. When I sort on date:early or date:late there appears to be some
other sort being applied, although if I do a date sort in the reverse order
the
order of the messages is indeed reversed, indicating that the sort is
working,
albeit with an incorrect algorithm.
Does date information accurately follow the form of RFC2822 by
all documents of MailMan?
Is there mail with an illegal Date: field ?
Please show the Date: field of the mail.
OK, here is an example. I used the following query:
http://www.kca-tx.org/mailman/kca/namazu.cgi?query=Laptop&submit=Search%21&idxname=kca&max=100&result=short&sort=date%3Aearly
Here's the result:
1. win 98SE (score: 2)
/pipermail/kca/2002-September/000192.html (4,152 bytes)
2. Linux install plus a note on jedit (score: 2)
/pipermail/kca/2002-July/000052.html (4,432 bytes)
3. Canon BJC-2100, Restart in DOS mode (score: 2)
/pipermail/kca/2002-August/000103.html (3,073 bytes)
4. March Newscard (score: 2)
/pipermail/kca/2003-March/000353.html (4,296 bytes)
5. New TurboTax "feature" (score: 2)
/pipermail/kca/2003-January/000331.html (6,865 bytes)
You can see from path names that these are out of order. Here are the Date
fields in each of these, copy-n-pasted from the files themselves:
Sun Sep 22 14:07:51 CDT 2002
Fri Jul 19 11:54:00 CDT 2002
Fri Aug 23 08:44:57 CDT 2002
Mon Mar 3 10:14:12 CST 2003
Tue Jan 14 17:37:11 CST 2003
The date information isn't in a standard RFC2822 header format once the files
are in a pipermail archive, but embedded in HTML markup, e.g.:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE> New TurboTax "feature"
</TITLE>
<LINK REL="Index" HREF="index.html" >
<LINK REL="made"
HREF="mailto:kca%40lists.kca-tx.org?Subject=New%20TurboTax%20%22feature%22&In-Reply-To=20030114.1005
28.-140453.1.bstrohm%40juno.com">
<META NAME="robots" CONTENT="index,nofollow">
<META http-equiv="Content-Type" content="text/html; charset=us-ascii">
<LINK REL="Previous" HREF="000330.html">
<LINK REL="Next" HREF="000332.html">
</HEAD>
<BODY BGCOLOR="#ffffff">
<H1>New TurboTax "feature"</H1>
<B>Dale Cockle</B> <A
HREF="mailto:kca%40lists.kca-tx.org?Subject=New%20TurboTax%20%22feature%22&In-Reply-To=20030114.100528.-140453.1.
bstrohm%40juno.com"
TITLE="New TurboTax "feature"">k5jic at kca-tx.org
</A><BR>
<I>Tue Jan 14 17:37:11 CST 2003</I>
etc....
Could that be a problem? Should I perhaps be indexing the mbox file? Would
namazu understand that better?
--
Lindsay Haisley | "Fighting against human | PGP public key
FMP Computer Services | creativity is like | available at
512-259-1190 | trying to eradicate | <http://pubkeys.fmp.com>
http://www.fmp.com | dandelions" |
| (Pamela Jones) |
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en