procmail
[Top] [All Lists]

Re: To: field handling Question.

1999-04-13 03:06:31
On Wed, 7 Apr 1999 10:10:42 -0400 , "Banerjee, Tapas"
<Tapas(_dot_)Banerjee(_at_)gs(_dot_)com> wrote:
     I am trying to strip first line of body which gives the name of the
mailing list to which the processed mail should be send. So I strip out that
and use in my Bcc: list. But the problem happens when user sends an
attachment - since my Unix mail reader sometimes cat not read the document
type, mail body shows
"---------------------------------------------------------------------------
----------------------------------------------------------------
"This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
 
------ =_NextPart_000_01BE80FC.ADC85E7A
Content-Type: text/plain
 
sizar  <---- This is the mailing list I would like to extract"
----------------------------------------------------------------------------
-----------------------------------------------------------  

You are opening up quite a big can of worms here, but I'm going to
assume that you can be reasonably confident that if the message you
are processing got as far as this recipe, you are going to be fairly
certain that if it's a MIME multipart message, the first MIME body
part will be a text/plain with a token you want to extact on the first
nonblank line of the body of the part in question. (Bodies and body
parts ... and body part body part headers ... this MIME terminology
sure is ugly.)

You can probably rather trivially merge this with what you already
have. (You'd probably try this first, then put your original
extraction recipe after this one with an :E flag. This says to try
that recipe if the conditions on the preceding one weren't all true.)

    :0
    * ^Content-type:\<*multipart/mixed;.*boundary="?\/[^";      ]+
    {
        BOUNDARY=$MATCH
        :0B
        * $ ^^(.*($))*$\BOUNDARY--(($).+)+($)($)+\/.*
        { SUBJ=$MATCH }

        # No closing brace here yet; see below ...

I'm cheating a bit on e.g. the boundary string matching so you will
want to test this against some real-world examples before putting it
into anything like production use. (Not sure about multipart/mixed --
could these be multipart/alternative or some other variation instead?
Tweak to suit your needs.)

The first grab will get the MIME boundary string (I hope) and the
final grab will get the entire first non-blank line of the first body
part. In other words, you will need to make sure that non-qualifying
messages you don't want to do this extraction on are bypassed
completely, probably with an outer set of braces with the conditions
you are already using (perhaps they should be more stringent than what
you have now, if I'm allowed to make a recommendation. This is my
excuse for being lazy :-).

          # Remove first line containing subject, from body
          :0 fbw
          | awk 'NR>1'

Should the first body part be discarded completely, or just the first
line of the first body part? Either way, this complicates things.
Perhaps you should just break down and use a MIME-aware tool to
process the MIME messages. But here's a hack-splutter-cough in the
spirit of the rest of the partial solution above: Count the number of
lines in the string we already matched on once (from start of body up
to the first line of the first body part), then discard line number
so-and-so from the body of the message.
  (You can't get the match into MATCH above because you need MATCH for
other things. So you have to match on the same condition again, only
this time grabbing everything instead of just the token you wanted.
[There are probably more elegant ways to do this. Hmm.])

        :0B
        * $ ^^\/(.*($))*$\BOUNDARY--(($).+)+($)($)+
        {
            :0fbw
            * 1^1 MATCH ?? ^.*$
            | sed $=d
        }
    } # Closing brace for the whole MIME multipart kludge

(For what it's worth, I believe sed will be significantly faster than
awk on many systems. What you mean by "significantly" can be a matter
of debate, of course. If you think sed is too cryptic for anything at
all, I can sympathize.)
         
          #  Change BCC and save in a file,
          #  attach :; tag to To: field to prevent mail bounce
          :0:
          | (formail -A"Bcc: $SUBJ" \
          -I"To: =?iso-2022-jp?B?GyRCJDQ5WEZJJE4zJyQ1JF4kWBsoQg==?= :;" \
          -I"MIME-Version: 1.0" \
          -I"From:
=?iso-2022-jp?B?GyRCJTQhPCVrJUklXiVzISYlNSVDJS8lOUVqPy4bKEI=?= :;" ) >>
"$SUBJ"_mails

The problems with this have been pointed out before, but allow me to
regurgitate and perhaps expand a bit on some points.

If you are inserting Mime-Version: 1.0 you should also be doing
something to ensure that the rest of the message is MIME-compliant.
What I'd recommend here is that -- assuming your messages will
generally already be MIME messages, if they contain any text in
Japanese, such as the first line with the token in Japanese -- you
simply pass on any existing MIME headers, using formail (probably a
second invocation of formail, actually) to extract Content- and Mime-
headers. I pointed out this before.

Another possible approach -- which would be necessary at least if you
sometimes receive non-MIME messages -- would be to add an additional
layer of MIME encoding, perhaps putting the entire original message in
a message/rfc822 "attachment" or something if it's already a MIME
message, and otherwise make an informed guess as to the nature of the
contents (probably guessing text/plain, us-ascii, 8bit is not too
far-fetched ... I guess as a matter of fact this is implied if you
don't put in anything to the contrary). This is kind of elegant
MIME-wise, but I don't think typical MIME clients will be very helpful
in displaying the results the way you'd like them to if you end up
embedding MIME messages within MIME messages.

The parentheses around the formail call are unnecessary and wasteful,
too. Unless there is some subtle point to invoking an extra shell
which I am missing.

I believe you said you would be sending these messages onward with
formail -s sendmail -oi -t at some point. I am prepared to bet that
relying on this combination of To: and Bcc: and the absence of any
other relevant headers (Cc:, Resent-anything) is going to bite you
sooner than you think. You should at the very least (try to) make sure
you zap any Cc: and Resent- headers in the incoming message if you are
going to feed it to sendmail -t later (and don't want those Cc:ed and
Resent-To: etc receipients to receive a copy then). 
  Anyway, since you already know -- from the $SUBJ string if nothing
else -- who these messages should be sent to, you're better off simply
handing that to sendmail explicitly at the time you send the messages,
rather than encode it in the Bcc: header at this point.
  If you want some sort of compromise, how about adding your own
custom header, extracting that at send time, then starting sendmail
with the result on the command line?

    #!/bin/sh
    who=`formail -1 -zx X-Tapas-Recipient: -s <inputfile`
    formail -s sendmail -oi "$who" <inputfile

(where inputfile, obviously, is the $SUBJ_mails file in question).

Well, hope this helps. It's not exactly pretty. Sometimes one wishes
for decent MIME support in Procmail itself (or perhaps some sort of
derivative). 

/* era */

Would it be too heretic to suggest you use a real mailing list manager
for this whole business? These messages you are sending out are in
essence a collection of messages sent to a list in a certain period of
time, after which the accumulated messages are sent out and a new
period started? How about sending all messages as a digest? Any basic
mailing list software should allow you to do that, clean and simple.

-- 
.obBotBait: It shouldn't even matter whether     <http://www.iki.fi/era/>
I am a resident of the state of Washington. <http://members.xoom.com/procmail/>
 * Sign the European spam petition! <http://www.politik-digital.de/spam/en/> *

<Prev in Thread] Current Thread [Next in Thread>