namazu-users-en
[Top] [All Lists]

Re: PDF and description/keywords

2002-04-03 01:08:27
In article <20020326162406(_dot_)4FB6(_dot_)DARREN(_at_)flyingcolor(_dot_)com>
darren(_at_)flyingcolor(_dot_)com writes:

Is this possible with PDF files? I had a look at html.pl and pdf.pl in
/usr/share/namazu/filter and it looks like I could hack something if the
information was in the pdf file and I could get at it. Has anyone tried
something like this?

Yes, you can. The point is weight_element function in
filter/html.pl. $heading variable is used to make summary information.
It is proccessed in make_summary function of mknmz command.

To modify filter/pdf.pl as such, you can do it. However, it's little
hard to determine what sentence is appropriate as summary because any
output of pdftotext commmand is simple text format (HTML is an
architectural format, so it's easier).
-- 
NOKUBI Takatsugu
E-mail: knok(_at_)daionet(_dot_)gr(_dot_)jp
        knok(_at_)namazu(_dot_)org / knok(_at_)debian(_dot_)org


<Prev in Thread] Current Thread [Next in Thread>
  • Re: PDF and description/keywords, NOKUBI Takatsugu <=