mhonarc-users

Re: Is subject-based threading actually done in conjunction with explicit threading?

1998-09-25 12:40:33
On September 25, 1998 at 18:21, "Christian de la Salle" wrote:

Let me put it in other word: sometimes MHonArc misses the subject-based
threading and I can't find the reason why. A couple of examples follow but I
must confess my analysis does not bring the solution: For some failing thread
configurations I have other situation where they work.

Three failing examples :
I- Herebelow, MHonArc missed that 3 is a reply to 1
    1- Chaîne d'Info (Active Channel), Christian de la Salle - 24/07/98
    2- Mise en place d'archives, Christian de la Salle - 07/08/98
    3- Re : Chaîne d'Info (Active Channel), Christian de la Salle - 23/08/
-----------^
SUBJECTREPLYRXP fails to match.


II- Herebelow, MHonArc missed that 4 is a reply to 2
    1- Mailing lists, Christian de la Salle - 23/06/98
    2- Securité / Stats, Christian de la Salle - 02/07/98
    3- Majordomo : archives, Christian de la Salle - 23/07/98
    4- Re: Securité / Stats, Christian de la Salle - 23/07/98
    5- Redirect Problem and Majordomo Archiving Question, Christian de la
Salle - 28/07/98
Notes / Same author too. They feature special characters (accents) and when I
look at the mbx file they are both encoded the same way
    Subject: =?iso-8859-1?Q?Tr:_Securit=E9_/_Stats?=
    Subject: =?iso-8859-1?Q?Securit=E9_/_Stats?=

MHonArc does not decode subject text when checking for subject-based
threads (less overhead).  So, since the "base" subject text does not
match after SUBJECTREPLYRXP is applied, no subject-thread is detected.
If the "Tr:" was not part of the encoded text (ie. It came before
it), then a match would have been made.

III- Herebelow, MHonArc missed that 3 is a reply to 1
    1- Bonjour à tous!, Pierre / JP Derrier - 02/09/98
    2- un humble avis sur les questions-réponses, Anne GUILLIEN - 03/09/98
    3- Re: Bonjour à tous!, Christian de la Salle - 03/09/98
Notes: Different authors. They feature special characters (accents) and when 
I
look at the mbx file they are encoded a different way
    Subject: =?iso-8859-1?Q?Bonjour_=E0_tous!?=
    Subject: Re: Bonjour à tous!

Same issue as previous example.


Onthe other hand the following examples work OK:

I- Herebelow, MHonArc found that 3 is a reply to 2
    1- afa, Support Technique - 02/03/98
    2- Accès Magic On Line, Christian de la Salle - 13/03/98
    3- Re: Accès Magic On Line, Arnaud Pignard - 13/03/98
Notes: Different authors. They feature special characters
(accents) and when I look at the mbx file they are not encoded :
    Subject: Accès Magic On Line
    Subject: Re: Accès Magic On Line

Here, the "base" subjects match after SUBJECTREPLYRXP is applied.  I.e.
There is no encoding variations to mess things up.


II- Herebelow, MHonArc found that 4 and 5 are replies to 1
    1- Actions prévues, Christian de la Salle - 07/09/98
    2- HELP ! (www.afa.asso.fr), Christian de la Salle - 09/09/98
    3- AFA - Thx, Christian de la Salle - 09/09/98
    4- Tr: Actions prévues, Christian de la Salle - 17/09/98
    5- Re: Tr: Actions prévues, Christian de la Salle - 17/09/98
Notes: Same author. They feature special characters
(accents) and when I look at the mbx file they are not encoded :
    Subject: Actions prévues
    Subject: Tr: Actions prévues
    Subject: Re: Tr: Actions prévues

Same as previous example.

In sum, subject-based detection will fail if the "base" subject text
does not match after SUBJECTREPLYRXP is applied.  Alternate non-ascii
encoding of the same subject can cause a fail to match.

Some implementation issues arise if decoding is done first, plus the
extra overhead will slow things down.  Subject-based detection already
has its built-in deficiencies with respect to threading.  So for now, I
see no compelling reason to change anything.

It's still an interesting problem.

        --ewh

----
             Earl Hood              | University of California: Irvine
      ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu      |      Electronic 
Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME