procmail
[Top] [All Lists]

Changing case in procmail

2005-06-26 09:11:07
We've discussed changing case a number of times on this
list.  There are procmail-only and piped solutions that
have been discussed.  For example, from 2002 is this,
from David Tamkin:

http://www.xray.mpe.mpg.de/mailing-lists/procmail/2002-01/msg00453.html


In 2003, David made an oblique mention of that posting
with a reply to LuKreme, who was using a pipe to tr (a
decent solution, actually):

http://www.xray.mpe.mpg.de/mailing-lists/procmail/2003-07/msg00267.html


In 2004, Ruud H.G. van Tol offered an alternative all-procmail
solution:

http://www.xray.mpe.mpg.de/mailing-lists/procmail/2004-06/msg00294.html

That's very nice, actually.


I discussed the subject a few times myself, the last time,
apparently, being in January 2005, here:

http://www.xray.mpe.mpg.de/mailing-lists/procmail/2005-01/msg00245.html


After that, David showed us a very nice trick.  I will,
however, give a reference to Ruud's follow-up to it:

http://www.xray.mpe.mpg.de/mailing-lists/procmail/2005-04/msg00084.html

David's idea is very slick.  Here is the basic idea from
the above archive reference:

  MATCHSTRING='(announce|general)'

  :0:
  * $ ^TO_\/$MATCHSTRING(_at_)lists\(_dot_)example\(_dot_)com
  * MATCH ?? ()\/[^(_at_)]+
  * $ MATCHSTRING ?? ()\/$\MATCH
  lists-$MATCH


The only thing is, you need to know what you're going to get
ahead of time for that to work.  (MATCHSTRING has to
be pre-set.)

There have been other mentions of the subject as well over
the years.

Okay, anyway, I have an updated version that is similar to
Ruud's of a year ago, which itself built on David's from a year
before that.  (One doesn't need to know ahead of time what's
in the input for this group of recipe suggestions.)
Each of these grabs a character at a time, testing what
it grabbed for case, and converting as necessary.  Each
recurses as many times as there are letters to test.

Mine has one improvement, which is why I'm posting: it grabs
a bunch at a time -- all the lower-case ones it can find at
once, or if the next char is upper-case, then just it alone.

So converting "dmanspam(_at_)Nomotek(_dot_)com" to 
"dmanspam(_at_)nomotek(_dot_)com"
takes three iterations.

I wrote this today as part of a package I'm developing in
procmail, about which I'm not ready to say much yet.  But
I'll share this function rc-file with the list.  After the
below procmail code, I'll discuss a couple of points:

#
##########################################################################
##
# More like this coming soon to www.spamless.us
#
#
##########################################################################
##
#
# Module name:         func.lowercase
# Program version:     1.0
# Last edited:         25-Jun-2005 (dman)
# Last change:         whole-cloth (from non-production draft)
# Abstract:            Make _func_in lower-case
#
# Discussion:          All-procmail solution, contained recursion

#
##########################################################################
##
# Take _func_in and make it all-lower-case; return _func_out
#
#
##########################################################################
##

 :0                                            # this recipe runs just
once
 * $! SWITCHRC ?? ^^$_^^
 {
   _LOWER = AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZzÄäÖöÜü
   _CAPs  = ${_CAPs:-A-ZÄÖÜ}                   # raw chars only; no
brackets
 }


 :0 D
 * $ _func_in ?? ^^$\_LEFTPART\/([^${_CAPs}]+|.)
 {
   _LEFTPART = ${_LEFTPART}${MATCH}

   :0 D
   * $  MATCH ?? [${_CAPs}]
   * $ _LOWER ?? $MATCH\/.
   { }

   _func_out = ${_func_out}${MATCH}  SWITCHRC = $_
 } _func_in _LEFTPART _LOWER                   # unset to clear env space

# ############################  END OF MODULE
###############################


Okay, here's a use of it I have in actual production:

 :0 D
 * $ H_RP ?? [${_CAPs}]
 {
   _func_in  = $H_RP
   INCLUDERC = ${_SUBS}/func.lowercase
   H_RP      = ${_func_out} _func_out
 }


H_RP is the Return-Path address,[1] which I had previously grabbed
and set the variable for.  Note that I'm using _CAPs here, so
obviously I had it defined elsewhere earlier; since the INCLUDERC
hasn't been called yet at the point I use it just above, the
definition inside the function-rcfile won't do me any good.
Actually, I put _CAPs in my general vars rcfile.  I only
didn't put _LOWER in there too because my use for this
lower-casing function is rare, and I don't want to fill up
my env space needlessly.  (Note that I unset _LOWER at the
end of the function module, also.)

[1] I want the Return-Path address case-controlled, because I
use the lower-case address in whitelist file hashes, among other
things.

One thing I want to mention is that I've moved, in my current
private .procmailrc and in my current projects in procmail, to
a non-colliding namespace set.  That's why I've pre-pended a '_'
in front of all my vars.  (Obviously, if you do that too, then
we'll collide.  But my namespace set won't collide with procmail
vars or other private varsets that use something other than
a leading underscore char.)  Thus, if I write some procmail
code and Joe Blow includes it as an INCLUDERC in his stuff,
he doesn't have to worry that I'm going to overwrite his own
private vars, except that he will know (because I'll say so
in the docs for what I provide) that my vars have this scheme;
so if he's using the same scheme, then he'll want to be
careful.  It's actually a bit inconvenient to use a leading
underscore in a var, because (it turns out) you have to
put the curly braces around it (or use standard procmail
quoting $\_LIKE_THIS instead, if you want hard-quoted
content).  I'm presuming that most people would rather have
an easier namespace set such as drVAR or dr_VAR or DR_VAR
or something, so they don't have to use the curly braces
*all* the damn time.  (I've just written 3700-plus lines
of procmail code in the last 6 weeks using this naming
convention -- and I'm not through -- and I've gotten pretty
good at typing { and } without making a typo, now.)  :-)

(Oh: the "H_RP" doesn't follow the new convention, because
it's imported from an older module that I am not ready to
change, since "H" stands for "header-vars" anyway.  Ruud
first suggested that one, as far as I know, and it's a
decent way of doing things also.)

Okay, the next thing I want to mention is how this line --

 * $ _func_in ?? ^^$\_LEFTPART\/([^${_CAPs}]+|.)

 -- interstingly (to me, anyway) captures the longer charset
(i.e., [^${_CAPs}]+ ) by preference over the shorter
charset ( . , which will here be a capital letter).
This is also true even if the lone dot is place on the
left in the or-statement, "(.|[^${_CAPs}+)" .

I'll also mention that at first I had:

           (stuff)\/([^${_CAPs}]+|[${_CAPs}])

but I found the dot works fine instead of the last
expression.


Next I'll mention that I deal sometimes with German text, so
I went ahead and put the three German umlauted capital letters
in there too.  I don't use French or other langauges with
upper-case diacritics, so I didn't put them in.  If you use
those, you might want to ad them to _CAPs .

Later,
Dallman



____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • Changing case in procmail, Dallman Ross <=