procmail
[Top] [All Lists]

Re: formail -D & using hashcodes instead?

1996-05-17 08:06:24
Robert <dummy(_at_)c2(_dot_)org> writes:
R> Has anyone considered the thought of using a hash function on the body of
R> a message instead of merely using the Message-ID field for "formail - -D"?

On Fri, 17 May 96 08:49 JST, 
turnbull(_at_)turnbull(_dot_)sk(_dot_)tsukuba(_dot_)ac(_dot_)jp (Stephen J. 
Turnbull) said:

S> I don't know if it's ever been done for private use with email, but the
S> spam-hunters were doing this kind of thing on Usenet.

   A few years ago, the same folks who wrote "agrep" and "glimpse" presented
   a Usenix paper on a program called "sif" (similar files).  Their approach
   was to generate and store a series of hash values from a moving 70- or
   80-byte window in a file and then look for recurring combinations of
   hashcodes.  It took up some disk space but worked well for finding
   occurrences of one file which had been included in another one, etc.

-- 
Karl Vogel                                        
vogelke(_at_)c17mis(_dot_)wpafb(_dot_)af(_dot_)mil
Control Data Systems, Inc.           ASC/YCOA, Wright-Patterson AFB, OH 45433