[Asrg] DCC and IP checksums

Although it looks like the DCC servers have to collect the IP
information, they don't have to give it out.  So the potential for
abuse is limited to the people who run the DCC servers, not to anyone
who can query them.

That's not an effective response to the privace issues.


Well, most of the other proposals on this forum are far more invasive
(witness the argument involving Hadmut and others.)

The current, real life global
network of ~120 DCC servers already involves far more people and
organizations than I think can be trusted.  It's not that any single
DCC server operator is untrustworthy, but that every group of 120 or
more people is untrustworthy.


The privacy implications of leaking the fact that IP address x.y.z.w
sent the same message 1,000 times, or had 25 failed RCPT commands, are
not huge.

No; way less than this.  For a mail message, we collect the body
checksum, the sending-IP checksum and maybe a few flags indicating
failed RCPT commands.  At most 60-100 bytes/message.

If I thought you knew how to make a single checksum that is fuzzy
enough to ignore "hashbusters" but not so fuzzy that it has false
positives, I'd ask you in private about it.


I use the following algorithm in CanIt.  It is by no means perfect,
but it's pretty good:

- Ignore the headers
- Read message lines as follows:
  - Delete leading and trailing whitespace
  - Ignore blank lines
  - Ignore MIME part delimiters
  - Skip a line starting "Dear "
  - Collapse multiple spaces into single spaces
  - Stop reading once you have 200 non-ignored lines or reach end of message

- If you have between 10 and 20 lines, ignore the first and last, else
- If you have between 20 and 40 lines, ignore the first and last 3 lines, else
- If you have more than 40 lines, ignore the first and last 4 lines

- Do a SHA1 hash on what's left.

My imperfect current solution involves 3 checksums.


How are 3 checksums better than one, unless you're checksumming different
parts of the message?

checksum would discover no two IP addresses ever send the same
message, except for trivial cases such as some virus warnings,
because its answers would all differ.


I have hard evidence to the contrary.  See, for example,
http://www.roaringpenguin.com/canit/showincident.php?id=3037

(You may have to authenticate as "demo/demo" and then enter
3037 as the incident ID.)

I have lots more messages like that, and that's just on my server.

--
David.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg