namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: html data not indexed in text/html mails

2005-08-04 05:41:02
swati wrote:

Hello all,
This is a sample mail that i was trying to index and search on.


snip


In this mail I am able to search on the words like rober and streams, which 
exists in the header part. But the words like fear or member or primers, 
which exists inside the html part of the mail are not indexed or searched. I 
tried the new verison of namazu (namazu-2.0.15pre1 ) with that also i am not 
able to index/search this type of mail.

Can anyone give some suggestions as to how I can make these mails also indexd 
and searched.


I made a patch for this type mail (from namazu-2.0.15pre1.)

bash$ diff -ub filter/mailnews.pl.org filter/mailnews.pl
--- filter/mailnews.pl.org      Mon Jun  6 14:41:42 2005
+++ filter/mailnews.pl  Thu Aug  4 21:13:53 2005
@@ -65,7 +65,7 @@
     util::vprint("Processing mail/news file ...\n");
 
     uuencode_filter($cont);
-    mailnews_filter($cont, $weighted_str, $fields);
+    mailnews_filter($cont, $weighted_str, $headings, $fields);
     mailnews_citation_filter($cont, $weighted_str);
 
     gfilter::line_adjust_filter($cont);
@@ -79,11 +79,12 @@
 
 # Original of this code was contributed by 
<furukawa(_at_)tcp-ip(_dot_)or(_dot_)jp>. 
 sub mailnews_filter ($$$) {
-    my ($contref, $weighted_str, $fields) = @_;
+    my ($contref, $weighted_str, $headings, $fields) = @_;
 
     my $boundary = "";
     my $line     = "";
     my $partial  = 0;
+    my $htmlmail = "";
 
     $$contref =~ s/^\s+//;
     # Don't handle if first like does'nt seem like a mail/news header.
@@ -125,6 +126,10 @@
                 # contributed by Hiroshi Kato 
<tumibito(_at_)mm(_dot_)rd(_dot_)nttdata(_dot_)co(_dot_)jp>
                 $partial = $1;
                 util::dprint("((partial: $partial))\n");
+            } elsif ($line =~ m!text/html!i) {
+               # The simplest form of an HTML email message.
+               util::dprint("text/html mail\n");
+               $htmlmail = "yes";
             } elsif ($line !~ m!text/plain!i) {
                 $$contref = '';
                 return;
@@ -161,6 +166,9 @@
        multipart_process($contref, $boundary, $weighted_str, $fields);
 
     }
+    if ($htmlmail) {
+       html::html_filter($contref, $weighted_str, $fields, $headings);
+    }
 }
 
 # Prototype declaration for avoiding


Yukio USUDA

_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en

<Prev in Thread] Current Thread [Next in Thread>