namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: html data not indexed in text/html mails

2005-08-05 00:10:44
hello all,
Sorry for yet another mail.
But i just had a small doubt. When will the new version of namazu ( namazu-2.0.15 ) be formally released.

Thanks n regards
Swati





swati wrote:

Hello,

Thank you very much for that patch. I am able to index and seach on those text/html mails now. And everything seems to work fine.
Thanks again

Regards,
Swati

Yukio USUDA wrote:

swati wrote:

Hello all,
This is a sample mail that i was trying to index and search on.


snip

In this mail I am able to search on the words like rober and streams, which 
exists in the header part. But the words like fear or member or primers, which 
exists inside the html part of the mail are not indexed or searched. I tried 
the new verison of namazu (namazu-2.0.15pre1 ) with that also i am not able to 
index/search this type of mail.

Can anyone give some suggestions as to how I can make these mails also indexd 
and searched.


I made a patch for this type mail (from namazu-2.0.15pre1.)

bash$ diff -ub filter/mailnews.pl.org filter/mailnews.pl
--- filter/mailnews.pl.org      Mon Jun  6 14:41:42 2005
+++ filter/mailnews.pl  Thu Aug  4 21:13:53 2005
@@ -65,7 +65,7 @@util::vprint("Processing mail/news file 
...\n");uuencode_filter($cont);
-    mailnews_filter($cont, $weighted_str, $fields);
+    mailnews_filter($cont, $weighted_str, $headings, 
$fields);mailnews_citation_filter($cont, 
$weighted_str);gfilter::line_adjust_filter($cont);
@@ -79,11 +79,12 @@# Original of this code was contributed by 
<furukawa(_at_)tcp-ip(_dot_)or(_dot_)jp>. sub mailnews_filter ($$$) {
-    my ($contref, $weighted_str, $fields) = @_;
+    my ($contref, $weighted_str, $headings, $fields) = @_;my $boundary = "";my $line     
= "";my $partial  = 0;
+    my $htmlmail = "";$$contref =~ s/^\s+//;# Don't handle if first like 
does'nt seem like a mail/news header.
@@ -125,6 +126,10 @@# contributed by Hiroshi Kato 
<tumibito(_at_)mm(_dot_)rd(_dot_)nttdata(_dot_)co(_dot_)jp>$partial = 
$1;util::dprint("((partial: $partial))\n");
+            } elsif ($line =~ m!text/html!i) {
+               # The simplest form of an HTML email message.
+               util::dprint("text/html mail\n");
+               $htmlmail = "yes";} elsif ($line !~ m!text/plain!i) {$$contref 
= '';return;
@@ -161,6 +166,9 @@multipart_process($contref, $boundary, $weighted_str, 
$fields);}
+    if ($htmlmail) {
+       html::html_filter($contref, $weighted_str, $fields, $headings);
+    }}# Prototype declaration for avoiding


Yukio USUDA




------------------------------------------------------------------------

_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en



********** DISCLAIMER **********
Information contained and transmitted by this E-MAIL is proprietary to Sify Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail & notify us immediately at admin(_at_)sifycorp(_dot_)com
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en
<Prev in Thread] Current Thread [Next in Thread>