procmail
[Top] [All Lists]

Re: formail -D & using hashcodes instead?

1996-05-20 01:32:29
"Guy" == Guy Geens 
<Guy(_dot_)Geens(_at_)elis(_dot_)rug(_dot_)ac(_dot_)be> writes:

    Guy> I would suggest to use MD5 for the hash
    Guy> calculations. Otherwise, you could get false positives.

*All* hash methods give false positives.  Unless the size of the hash
table is larger than the number of data streams....  MD5 may reduce
them.  However, is the extra computation involved in using MD5 on
every file compared to the computation required by CRC or pjw-hash
worth it, when by the nature of any decent hash if there is a
collision the odds are good that a byte-by-byte comparison of the
actual files will reject identity on the first byte?  That is the
question.

-- 
                           Stephen John Turnbull
University of Tsukuba                                        Yaseppochi-Gumi
Institute of Policy and Planning Sciences  http://turnbull.sk.tsukuba.ac.jp/
Tennodai 1-1-1, Tsukuba, 305 JAPAN                 
turnbull(_at_)sk(_dot_)tsukuba(_dot_)ac(_dot_)jp