procmail
[Top] [All Lists]

RE: Get [[:upper:]] percentage from subject

2008-06-27 12:37:23
Michelle Konzack schrieb Thursday, June 26, 2008 1:39 PM:

I have tried to get procmail native the percentage of 
[[:upper:]] in the
subject line but failed.  

In the meantime while searching for a faster solution I am using:

----[ '/usr/share/tdtools-procmail/FLT_subject_upper' 
]-----------------
<snip>
    MSG_SUBJECT=`formail -czx Subject:`
<snip>
    TMPVAR1=`echo "${MSG_SUBJECT}" |wc --chars `
    TMPVAR2=`echo "${MSG_SUBJECT}" |sed 's|[[:lower:]]||g' 
|wc --chars `
        HIT=`echo "${TMPVAR2}00/${TMPVAR1}" |bc`
<snip>

Geez, that's really pretty ugly, Michelle. :-)

I'm pretty sure we did this problem years ago in procmail
(so it would be in the archives).  Off the top of my head,
though:


   SP = ' '
   TAB = '      '
   WS = $SP$TAB

   :0
   * $ ^Subject:.*\/[^$WS].*
   *     MATCH ?? [a-z]
   * 1^1 MATCH  ?? .
   {
       # We're only here if there was a subject with a 
       # low-bit letter in it.  If you want to look for
       # German or French letters that have caps also,
       # and you expect them to come in the Subject without
       # being encoded (which they normally wouldn't, but
       # oh, well), um, add them to the char class we're
       # looking for.

       MSG_SUBJECT = $MATCH
       LEN_SUBJECT  = $=

       :0 D   # case-sensitive
       *    1 ^1  MSG_SUBJECT ?? [A-Z]
       * 1000 ^1  MSG_SUBJECT ?? [a-z]
       {
           # I did it this way so we only need one recipe
           # instead of 2.  I'm assuming the Subject
           # won't have more than 999 capital-letter
           # chars in it.

           RATIO_SUBJECT = $=
       }
   }


Here's a sample run:

 9:29pm [~/Mail/spam] 415[0]> procmail -m ~/Mail/rc < $lastf
procmail: [10176] Fri Jun 27 21:29:51 2008
procmail: Assigning "DEFAULT=/dev/null"
procmail: Assigning "SP= "
procmail: Assigning "TAB=       "
procmail: Assigning "WS=        "
procmail: Assigning "MATCH="
procmail: Assigning "MATCHLEFT="
procmail: Matched "Top US stock picks analysis"
procmail: Match on "^Subject:.*\/[^     ].*"
procmail: Match on "[a-z]"
procmail: Score:      27      27 "."
procmail: Assigning "MSG_SUBJECT=Top US stock picks analysis"
procmail: Assigning "LEN_SUBJECT=27"
procmail: Score:       3       3 "[A-Z]"
procmail: Score:   20000   20003 "[a-z]"
procmail: Assigning "RATIO_SUBJECT=20003"
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
From Adrian-llassulp(_at_)3gmobil(_dot_)no  Fri Jun 27 21:22:26 2008
 Subject: Top US stock picks analysis
  Folder: /dev/null                                                        1776

So that Subject has 27 chars, of which 20 are lower-case, 3 are
upper-case, and 4 are "other" (spaces, in this case).

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>