Re: Stripping signature / tagline / adline

2005-06-05 12:19:20

Wouldn't mind seeing your filtering code.

Two things I've found:
<hr> - I think Yahoo uses this html tag to end the mail and being their ad
<x-sigsep> - Eudora is nice enough to markup the sig this way (not
usually an ad, but useful nonetheless).


On 6/4/05, Jym Dyer <jym(_at_)econet(_dot_)org> wrote:
Is there any standard way to tell Mhonarc to strip the
signature / tagline / adline (for free email providers)?
In plaintext, they are often deliniated by multiple dashes
and a newline.

=v= The RFC standard is "-- \n", which Earl's code addresses,
but not many people seem to use it these days.

I think they are delineated in html by a tag - but I'm
not sure.

=v= Generally not, since people appending ads to messages
aren't interested in making it easy to detect them.  I've
found that they tweak the format now and then, seemingly
at random.  If you search for ASCII lines, make sure you've
found the *last* such line in the message, since the message
author might be doing something with lines as well.

=v= Topica is the worst of the email list services in this
regard; they append *and* prepend ads, and they jiggle the
format around from time to time.  I've got a Perl filter to
get rid of this junk, but I find I have to change it from
time to time.

=v= Stuff from Hotmail usually has a one-liner appended to
it, and it almost always has an apostrophe.  Actually, what
it almost always has is a "Windows 1252" charset "smart
single quote", often turning an ASCII message into one with
exactly one 8bit character.  (This isn't always apparent from
the headers, which say "text/plain".)

<Prev in Thread] Current Thread [Next in Thread>