namazu-users-en
[Top] [All Lists]

Re: Stop Words

2001-11-04 23:43:59
On Sat, 3 Nov 2001, Subramanian Radhakrishnan wrote:

How to implement stop words in Namazu search. what are all the files
are need to be modified for this purpose...

Before you try my way, I'd really suggest that you pre-process the query 
string with something like perl before sending it to namazu.  Iff that 
doesn't work for you, then try this:

Instructions follow:

These are for namazu-2.0.5, I have not tested with later versions.  It 
is also not the best way to do it, I can think of better ways, but 
haven't tried it yet.  I will try and get this to work similarly to the 
rest of namazu, but for now, it works for me.

I also have an implementation of synonyms along the same lines.

I have attached two files - stop-list.c and stop-list.h

Additionally, you will need to create a text file called stopwords.txt 
with one word per line.  This file will be in the same directory as your 
index.

you have to put these in nmz/ directory, and add the following to 
nmz/query.c:

#include "stop-list.h"         (at the top)


nmz_make_query():

after:
    /* If too much items in query, return with error */
    if (tokennum > QUERY_TOKEN_MAX) {
        return ERR_TOO_MANY_TOKENS;
    }

add:
    /* Read stop list from file */
    read_stop_list();


after:
        if (query.str[i] != '\0')
            query.str[i++] = '\0';

add:
        /* If the word is in the stop list, then purge it */
        if(is_stop_word(query.tab[tokennum])) {
                query.tab[tokennum] = (char *) NULL;
        }

after end of for loop, add:
    /* Clear stop list */
    clear_word_list();



-- 
The program isn't debugged until the last user is dead.


Visit my webpage at http://www.ncst.ernet.in/~philip/
Read my writings at http://www.ncst.ernet.in/~philip/writings/

  MSN  philiptellis                         Yahoo!  philiptellis
  AIM  philiptellis                         ICQ     129711328

Attachment: stop-list.tar.gz
Description: GNU Zip compressed data

<Prev in Thread] Current Thread [Next in Thread>