[Namazu-users-en] Re: meta-information on large binary files

On Tue, Jul 11, 2006 at 01:58:37PM +0900, NOKUBI Takatsugu wrote:

At Sat, 8 Jul 2006 20:13:42 +0200,
Alexander Oelzant wrote:

Is there any possibility (or planned feature) to have namazu read just a
few Kb of a file in order to extract metadata? In analogy to the mp3
filter I've written an ogg plugin, but for the large radio recordings
it's prohibitively slow.


If the target files are only one media-type, you can do it like following:

$ mknmz -O indexdir -t audio/mpeg target-dir

-t (--media-type) option ommits to read target file for finding binary
signature.


Thanks, but unfortunately for indexing the filter still has to read in
the entire file, for a 200M-file that produces processes like the
following:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
22312 user      15  10 1598m 592m 1104 R  0.3 39.2   0:08.19 mknmz             

With enough swap, it takes about twenty minutes to extract the 5 lines
of data and insert those in the db ;-)

I was hoping the $ON_MEMORY_MAX   = 5000000; would take care of that, e.
g. only reading in the first part of the file, but according to
tips.html that only influences the size of the db files kept in memory,
which is only logical, since redesigning namazu to read in files chunk
by chunk would probably involve rewriting all the filters to use a
read_chunk() function instead of accessing $$contref directly, though.

hand
   Alexander


-- 
Alexander Oelzant (Durchlaufstr. 7/4/5, A-1200 Wien)
alexander(_at_)oelzant(_dot_)priv(_dot_)at 
aoe(_at_)fsinf(_dot_)htu(_dot_)tuwien(_dot_)ac(_dot_)at
       ex-internic, ripe, bofh, priv.at: !ao418
            +43 1 3500929 +43 676 84441065                                  McQ
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en