swati wrote:
Hello all,
This is a sample mail that i was trying to index and search on.
snip
In this mail I am able to search on the words like rober and streams, which
exists in the header part. But the words like fear or member or primers,
which exists inside the html part of the mail are not indexed or searched. I
tried the new verison of namazu (namazu-2.0.15pre1 ) with that also i am not
able to index/search this type of mail.
Can anyone give some suggestions as to how I can make these mails also indexd
and searched.
I made a patch for this type mail (from namazu-2.0.15pre1.)
bash$ diff -ub filter/mailnews.pl.org filter/mailnews.pl
--- filter/mailnews.pl.org Mon Jun 6 14:41:42 2005
+++ filter/mailnews.pl Thu Aug 4 21:13:53 2005
@@ -65,7 +65,7 @@
util::vprint("Processing mail/news file ...\n");
uuencode_filter($cont);
- mailnews_filter($cont, $weighted_str, $fields);
+ mailnews_filter($cont, $weighted_str, $headings, $fields);
mailnews_citation_filter($cont, $weighted_str);
gfilter::line_adjust_filter($cont);
@@ -79,11 +79,12 @@
# Original of this code was contributed by
<furukawa(_at_)tcp-ip(_dot_)or(_dot_)jp>.
sub mailnews_filter ($$$) {
- my ($contref, $weighted_str, $fields) = @_;
+ my ($contref, $weighted_str, $headings, $fields) = @_;
my $boundary = "";
my $line = "";
my $partial = 0;
+ my $htmlmail = "";
$$contref =~ s/^\s+//;
# Don't handle if first like does'nt seem like a mail/news header.
@@ -125,6 +126,10 @@
# contributed by Hiroshi Kato
<tumibito(_at_)mm(_dot_)rd(_dot_)nttdata(_dot_)co(_dot_)jp>
$partial = $1;
util::dprint("((partial: $partial))\n");
+ } elsif ($line =~ m!text/html!i) {
+ # The simplest form of an HTML email message.
+ util::dprint("text/html mail\n");
+ $htmlmail = "yes";
} elsif ($line !~ m!text/plain!i) {
$$contref = '';
return;
@@ -161,6 +166,9 @@
multipart_process($contref, $boundary, $weighted_str, $fields);
}
+ if ($htmlmail) {
+ html::html_filter($contref, $weighted_str, $fields, $headings);
+ }
}
# Prototype declaration for avoiding
Yukio USUDA
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en