Hello,
Le Sun 3 Dec, Tadamasa Teranishi m'a écrit:
Mr. koi_san is making the image filter though it is not understood
whether to examine even EXIF-tag.
http://www.interq.or.jp/japan/koi_san/trash/2004/namazu_filter2.htm
Many thanks (and thanks to Google's translating page :-)
I've installed this filter with the others in usr/share/namazu/filter/,
installed the required modules Image::Info, IO::String (making the "make
test" and all seems ok).
The Image::Info comes with a directory of sample images, and a test-script
that dumps info from these images. There also, all seemed ok.
I've adapted the mknmsrc sample (see below) and tried as root:
mknmz -d -V -f /etc/namazu/mknmzrc.img -O /var/namazu/index/img/
/root/Image-Info-1.16/img/
and no file is indexed:
@@ Reading rcfile:
@@ Reading rcfile:
@@ /etc/namazu/mknmzrc.img
// Invoked: /usr/bin/wvWare --version
// Invoked: /usr/bin/pdftotext
// Invoked: /usr/bin/pdfinfo
// tmpnam: /var/namazu/index/img//NMZ.tmp_i.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_p.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_pi.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_w.tmp
// tmpnam: /var/namazu/index/img//NMZ.checkpoint.tmp
// tmpnam: /var/namazu/index/img//NMZ.flist.tmp
// tmpnam: /var/namazu/index/img//NMZ.i.tmp
// tmpnam: /var/namazu/index/img//NMZ.ii.tmp
// tmpnam: /var/namazu/index/img//NMZ.p.tmp
// tmpnam: /var/namazu/index/img//NMZ.pi.tmp
// tmpnam: /var/namazu/index/img//NMZ.r.tmp
// tmpnam: /var/namazu/index/img//NMZ.t.tmp
// tmpnam: /var/namazu/index/img//NMZ.w.tmp
// tmpnam: /var/namazu/index/img//NMZ.wi.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_i.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_p.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_pi.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_w.tmp
// NMZ: /var/namazu/index/img//NMZ.checkpoint.tmp
// NMZ: /var/namazu/index/img//NMZ.flist.tmp
// NMZ: /var/namazu/index/img//NMZ.i.tmp
// NMZ: /var/namazu/index/img//NMZ.ii.tmp
// NMZ: /var/namazu/index/img//NMZ.p.tmp
// NMZ: /var/namazu/index/img//NMZ.pi.tmp
// NMZ: /var/namazu/index/img//NMZ.r.tmp
// NMZ: /var/namazu/index/img//NMZ.t.tmp
// NMZ: /var/namazu/index/img//NMZ.w.tmp
// NMZ: /var/namazu/index/img//NMZ.wi.tmp
// NMZ: /var/namazu/index/img//NMZ.body
// NMZ: /var/namazu/index/img//NMZ.err
// NMZ: /var/namazu/index/img//NMZ.field
// NMZ: /var/namazu/index/img//NMZ.foot
// NMZ: /var/namazu/index/img//NMZ.head
// NMZ: /var/namazu/index/img//NMZ.i
// NMZ: /var/namazu/index/img//NMZ.ii
// NMZ: /var/namazu/index/img//NMZ.lock
// NMZ: /var/namazu/index/img//NMZ.lock2
// NMZ: /var/namazu/index/img//NMZ.log
// NMZ: /var/namazu/index/img//NMZ.msg
// NMZ: /var/namazu/index/img//NMZ.p
// NMZ: /var/namazu/index/img//NMZ.pi
// NMZ: /var/namazu/index/img//NMZ.r
// NMZ: /var/namazu/index/img//NMZ.result
// NMZ: /var/namazu/index/img//NMZ.slog
// NMZ: /var/namazu/index/img//NMZ.status
// NMZ: /var/namazu/index/img//NMZ.t
// NMZ: /var/namazu/index/img//NMZ.tips
// NMZ: /var/namazu/index/img//NMZ.version
// NMZ: /var/namazu/index/img//NMZ.w
// NMZ: /var/namazu/index/img//NMZ.wi
Looking for indexing files...
@@ find_target starting: Sun Dec 3 10:35:15 2006
@@ Denied: /root/Image-Info-1.16/img/test.jpg
@@ Not allowed: /root/Image-Info-1.16/img/test.svg
@@ Denied: /root/Image-Info-1.16/img/gps.jpg
@@ Denied: /root/Image-Info-1.16/img/test.png
@@ Not allowed: /root/Image-Info-1.16/img/test.rle
@@ Not allowed: /root/Image-Info-1.16/img/test.xbm
@@ Not allowed: /root/Image-Info-1.16/img/test.ppm
@@ Not allowed: /root/Image-Info-1.16/img/tiny.pgm
@@ Not allowed: /root/Image-Info-1.16/img/test.pgm
@@ Denied: /root/Image-Info-1.16/img/test.gif
@@ Not allowed: /root/Image-Info-1.16/img/test.xpm
@@ Not allowed: /root/Image-Info-1.16/img/test.pbm
@@ find_target finished: Sun Dec 3 10:35:15 2006
@@ Target Files: 0 (Scan Performance: Elapsed Sec.: 1, Files/sec: 0.0)
@@ Possible: 12, Not allowed: 8, Denied: 4, Excluded: 0
@@ MTIME too old: 0, MTIME too new: 0
No files to index.
Here is the result of mknmz -C; as you can see, the images are preceeded
with a minus (???). I've recompiled namazu after having installed the
new libraries. And to be sure that my previous /etc/namazu/mknmzrc
doesn't interfere, I've renamed it /etc/namazu/mknmzrc.all (for all other
mime-types)
Do you see some explanations ?
System: linux
Namazu: 2.0.16
Perl: 5.008004
File-MMagic: 1.25
NKF: no
KAKASI: no
ChaSen: no
MeCab: no
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /etc/namazu
LIBDIR: /usr/share/namazu/pl
FILTERDIR: /usr/share/namazu/filter
TEMPLATEDIR: /usr/share/namazu/template
Supported media types: (37)
Unsupported media types: (11) marked with minus (-) probably missing
application in your $path.
application/excel: excel.pl
application/gnumeric: gnumeric.pl
application/ichitaro5: taro56.pl
application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
application/macbinary: macbinary.pl
application/msword: msword.pl
application/pdf: pdf.pl
application/postscript: postscript.pl
application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
application/vnd.kde.kivio: koffice.pl
application/vnd.kde.kpresenter: koffice.pl
application/vnd.kde.kspread: koffice.pl
application/vnd.kde.kword: koffice.pl
application/vnd.oasis.opendocument.graphics: ooo.pl
application/vnd.oasis.opendocument.presentation: ooo.pl
application/vnd.oasis.opendocument.spreadsheet: ooo.pl
application/vnd.oasis.opendocument.text: ooo.pl
application/vnd.sun.xml.calc: ooo.pl
application/vnd.sun.xml.draw: ooo.pl
application/vnd.sun.xml.impress: ooo.pl
application/vnd.sun.xml.writer: ooo.pl
application/x-apache-cache: apachecache.pl
application/x-bzip2: bzip2.pl
application/x-compress: compress.pl
- application/x-deb: deb.pl
- application/x-dvi: dvi.pl
application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
application/x-rpm: rpm.pl
- application/x-tex: tex.pl
application/x-zip: zip.pl
- audio/mpeg: mp3.pl
- image/bmp: image.pl
- image/gif: image.pl
- image/jpeg: image.pl
- image/png: image.pl
message/news: mailnews.pl
message/rfc822: mailnews.pl
text/hnf: hnf.pl
text/html: html.pl
text/html; x-type=mhonarc: mhonarc.pl
text/html; x-type=pipermail: pipermail.pl
text/plain
text/plain; x-type=rfc: rfc.pl
text/x-hdml: hdml.pl
text/x-roff: man.pl
Here is my /etc/namazu/mknmzrc.img
#
# This is a Namazu configuration file for mknmz.
#
package conf; # Don't remove this line!
#===================================================================
#
# Administrator's email address
#
$ADDRESS = 'gauthier(_at_)courrier(_dot_)adt';
#===================================================================
#
# Regular Expression Patterns
#
#
# This pattern specifies HTML suffixes.
#
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";
#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
# Do NOT use `$' or `^' anchors.
# Case-insensitive.
#
$ALLOW_FILE = ".*\\.jpg|.*\\.jpeg" . # Jpeg files
"|.*\\.png" . #
"|.*\\.gif" #
;
# This pattern specifies fields which used for field-specified
# searching. NOTE: case-insensitive
#
# $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";
#
# This pattern specifies meta tags which used for field-specified
# searching. NOTE: case-insensitive
#
$META_TAGS = "keywords|description";
#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');
#
# This pattern specifies HTML elements which should be replaced with
# null string when removing them. Normally, the elements are replaced
# with a single space character.
#
$NON_SEPARATION_ELEMENTS =
'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';
#
# This pattern specifies attribute of a HTML tag which should be
# searchable.
#
$HTML_ATTRIBUTES = 'ALT|SUMMARY|TITLE';
#===================================================================
#
# Critical Numbers
#
#
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
$ON_MEMORY_MAX = 5000000;
#
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
# binary-formated files such as PDF, Word are larger.
#
$FILE_SIZE_MAX = 2000000;
#
# The max text size for indexing. Files larger than this
# will be ignored.
#
$TEXT_SIZE_MAX = 600000;
#
# The max length of a word. the word longer than this will be ignored.
#
$WORD_LENG_MAX = 128;
#
# Weights for HTML elements which are used for term weightning.
#
%Weight =
(
'html' => {
'title' => 16,
'h1' => 8,
'h2' => 7,
'h3' => 6,
'h4' => 5,
'h5' => 4,
'h6' => 3,
'a' => 4,
'strong' => 2,
'em' => 2,
'kbd' => 2,
'samp' => 2,
'var' => 2,
'code' => 2,
'cite' => 2,
'abbr' => 2,
'acronym'=> 2,
'dfn' => 2,
},
'metakey' => 32, # for <meta name="keywords" content="foo bar">
'headers' => 8, # for Mail/News' headers
);
#
# The max length of a HTML-tagged string which can be processed for
# term weighting.
# NOTE: There are not a few people has a bad manner using
# <h[1-6]> for changing a font size.
#
# $INVALID_LENG = 128;
#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
$MAX_FIELD_LENGTH = 200;
#===================================================================
#
# Softwares for handling a Japanese text
#
#
# Network Kanji Filter nkf v1.71 or later
#
$NKF = "no";
#
# KAKASI 2.x or later
# Text::Kakasi 1.05 or later
#
$KAKASI = "no";
#
# ChaSen 2.02 or later (simple wakatigaki)
# Text::ChaSen 1.03
#
$CHASEN = "no";
#
# ChaSen 2.02 or later (with noun words extraction)
#
$CHASEN_NOUN = "no";
#
# MeCab
#
$MECAB = "no";
#
# Default Japanese processer: KAKASI or ChaSen.
#
$WAKATI = $none;
#===================================================================
#
# Directories
#
# $LIBDIR = "@PERLLIBDIR@";
# $FILTERDIR = "@FILTERDIR@";
# $TEMPLATEDIR = "@TEMPLATEDIR@";
# 1;
--
Gauthier Vandemoortele <gauthier(_dot_)vandemoortele(_at_)skynet(_dot_)be>
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en