Re: use of HTML in AddressModifyCode

2002-05-27 03:30:04
On May 27, 2002 at 14:20, John Belmonte wrote:

The AddressModifyCode works on the raw data.  As for using
"." is address obfuscation, it is a very weak form since any
decent address harvester would expand entity references before
doing detection.  Why not use something like:

  s/\./ dot /g;
  s/\@/ AT /g;

I think both entities and dot/at are equally weak against harvesters. 
Entities have the advantage of maintaining address appearance.

Well, all obfuscations are really weak.  Either the people who write
the harvesters are not too bright, or they are, and are harvesting
the addresses by de-obfuscating the data.  Or, they do not care since
they get alot of hits regular hits anyway.

Since entity reference resolution is a standard thing to do when
parsing HTML/XML, it seems to be the weakest of all obfuscations.
Munging the address is better since it requires some analysis by the
harvester developer to determine what heuristics should be added to
de-obfuscate data.  Resolving entity references is a no-brainer.

If you really need want to use entity references, modify mhonarc's
htmlize() routine to convert '.'s, '@'s, et. al. to entity references.