On September 25, 1998 at 18:21, "Christian de la Salle" wrote:
Let me put it in other word: sometimes MHonArc misses the subject-based
threading and I can't find the reason why. A couple of examples follow but I
must confess my analysis does not bring the solution: For some failing thread
configurations I have other situation where they work.
Three failing examples :
I- Herebelow, MHonArc missed that 3 is a reply to 1
1- Chaîne d'Info (Active Channel), Christian de la Salle - 24/07/98
2- Mise en place d'archives, Christian de la Salle - 07/08/98
3- Re : Chaîne d'Info (Active Channel), Christian de la Salle - 23/08/
-----------^
SUBJECTREPLYRXP fails to match.
II- Herebelow, MHonArc missed that 4 is a reply to 2
1- Mailing lists, Christian de la Salle - 23/06/98
2- Securité / Stats, Christian de la Salle - 02/07/98
3- Majordomo : archives, Christian de la Salle - 23/07/98
4- Re: Securité / Stats, Christian de la Salle - 23/07/98
5- Redirect Problem and Majordomo Archiving Question, Christian de la
Salle - 28/07/98
Notes / Same author too. They feature special characters (accents) and when I
look at the mbx file they are both encoded the same way
Subject: =?iso-8859-1?Q?Tr:_Securit=E9_/_Stats?=
Subject: =?iso-8859-1?Q?Securit=E9_/_Stats?=
MHonArc does not decode subject text when checking for subject-based
threads (less overhead). So, since the "base" subject text does not
match after SUBJECTREPLYRXP is applied, no subject-thread is detected.
If the "Tr:" was not part of the encoded text (ie. It came before
it), then a match would have been made.
III- Herebelow, MHonArc missed that 3 is a reply to 1
1- Bonjour à tous!, Pierre / JP Derrier - 02/09/98
2- un humble avis sur les questions-réponses, Anne GUILLIEN - 03/09/98
3- Re: Bonjour à tous!, Christian de la Salle - 03/09/98
Notes: Different authors. They feature special characters (accents) and when
I
look at the mbx file they are encoded a different way
Subject: =?iso-8859-1?Q?Bonjour_=E0_tous!?=
Subject: Re: Bonjour à tous!
Same issue as previous example.
Onthe other hand the following examples work OK:
I- Herebelow, MHonArc found that 3 is a reply to 2
1- afa, Support Technique - 02/03/98
2- Accès Magic On Line, Christian de la Salle - 13/03/98
3- Re: Accès Magic On Line, Arnaud Pignard - 13/03/98
Notes: Different authors. They feature special characters
(accents) and when I look at the mbx file they are not encoded :
Subject: Accès Magic On Line
Subject: Re: Accès Magic On Line
Here, the "base" subjects match after SUBJECTREPLYRXP is applied. I.e.
There is no encoding variations to mess things up.
II- Herebelow, MHonArc found that 4 and 5 are replies to 1
1- Actions prévues, Christian de la Salle - 07/09/98
2- HELP ! (www.afa.asso.fr), Christian de la Salle - 09/09/98
3- AFA - Thx, Christian de la Salle - 09/09/98
4- Tr: Actions prévues, Christian de la Salle - 17/09/98
5- Re: Tr: Actions prévues, Christian de la Salle - 17/09/98
Notes: Same author. They feature special characters
(accents) and when I look at the mbx file they are not encoded :
Subject: Actions prévues
Subject: Tr: Actions prévues
Subject: Re: Tr: Actions prévues
Same as previous example.
In sum, subject-based detection will fail if the "base" subject text
does not match after SUBJECTREPLYRXP is applied. Alternate non-ascii
encoding of the same subject can cause a fail to match.
Some implementation issues arise if decoding is done first, plus the
extra overhead will slow things down. Subject-based detection already
has its built-in deficiencies with respect to threading. So for now, I
see no compelling reason to change anything.
It's still an interesting problem.
--ewh
----
Earl Hood | University of California: Irvine
ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu | Electronic
Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME