Hi,
I'm trying to count big letter words in the message body, but
I'm unable to contruct the score recipe right. Say, that
I tolerate 3 big letter words, and if there is more, then
I consider it UBE. The regexp should ignore some words like:
SMTP, AM, IP, base64-decoded-lines.
I started with simple word count, but it doesn't work.
The regexp is supposed to
- start at word border
- must have at least 3 big letters
- have trailing space
max = 3
# Count capitalized words
:0 D
*$ -$max^0
*$ B ?? 1^0 ()\<[A-Z][A-Z][A-Z]+[ ]
{
count = $=
dummy = "$count capitalized words"
}
body example:
------ =_NextPart_000_01BD0A3F.2D0B88F0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
-----Alkuper=E4inen viesti-----
L=E4hett=E4j=E4: Jari Aalto
[SMTP:jari(_dot_)aalto(_at_)ntc(_dot_)nokia(_dot_)com]
L=E4hetetty: Tuesday, December 16, 1997 11:04 AM
Vastaanottaja: xx xx
Aihe: xxx
TAMAN PAIVAN OSALTA ALKAA PROJEKTITEHTAILU OLLA VAIHTEEKSI KASASSA. =
txt txt txt ...
------ =_NextPart_000_01BD0A3F.2D0B88F0
Content-Type: application/ms-word
Content-Transfer-Encoding: base64
eJ8+IgkOAQaQCAAEAAAAAAABAAEAAQeQBgAIAAAA5AQAAAAAAADoAAEIgAcAGAAAAElQTS5NaWNy
b3NvZnQgTWFpbC5Ob3RlADEIAQ2ABAACAAAAAgACAAEEkAYAyAEAAAEAAAAQAAAAAwAAMAIAAAAL