Re: duplicate detection with MD5

#    The disadvantage with this method is its complexity: an additional
#    "pending" file is needed, and there is a small performance hit for
#    each mail, in order to maintain the pending file.
#
#    The advantage of this method is its completeness: all duplicates
#    will still be detected and dropped.
#
# Both methods require the use of the TRAP command; they set additional
# commands into the TRAP variable.  If the TRAP command has already been
# set, the new commands are added to the list of commands.


As a non-expert, I'm asking rather than recommending--what would be wrong with:

1. Store in $DUP_ID whether  formail -D #### ID.cache  is duplicate.

2. Move  Message-Id: <xxxxx(_at_)yyyyy>  to  Orig-Message-Id: <xxxxx(_at_)yyyyy>

3. Do the white space/signature/etc. filtering and compute the checksum
   "asdfghjk" (this is where I'm skeptical of claims of "completeness"--I
   don't think it's possible to automate the discarding of every kind of
   perverse variation people might come up with, e.g., quoting methods,
   signature formats, conversion of Macintosh fonts into ISOLatin1, etc.)

4. Add  Message-Id: <asdfghjk(_at_)MD5>

5. Store in $DUP_MD5 whether  formail -D #### MD5.cache  is duplicate

6. Move  Message-Id: <asdfghjk(_at_)MD5>  to  X-Checksum: <asdfghjk(_at_)MD5>

7. Put  Message-Id: <xxxxx(_at_)yyyyy>  back the way it was.

8. Do whatever you want with the message based on the values of $DUP_ID & 
$DUP_MD5

Previous by Date:	Re: On discouraging direct replies, Alan K. Stebbens
Next by Date:	Re: It's the dish, not the culture., David W. Tamkin
Previous by Thread:	Re: You're a winner, W. Wesley Groleau x4923
Next by Thread:	How come procmail doesn't know \t code?, jari.aalto
Indexes:	[Date] [Thread] [Top] [All Lists]