procmail
[Top] [All Lists]

Re: mime-decode.rc

1997-05-29 03:33:00
On Wed, 28 May 1997 18:41:09 +0300 (EET DST), era eriksson 
<era(_at_)iki(_dot_)fi> said:
On Wed, 28 May 1997 11:27:45 -0400,
Roderick Schertler <roderick(_at_)argon(_dot_)org> wrote:

My line of reasoning was that if you have a multipart where the
different parts are encoded differently, the script would attempt to
apply the same decoding on them all.
  This might still happen if the embedded data is type message, or can
it?

Not if I understand you.  A multipart message (or a multipart part of a
multipart message, and so on all the way down) is just not allowed to be
encoded:

    If an entity is of type "multipart" the Content-Transfer-Encoding
    is not permitted to have any value other than "7bit", "8bit" or
    "binary".

This script only looks at the top level encoding so multipart messages
won't be touched.

There are two problems with yours:  It doesn't allow spaces before
the colon (easily fixed), [...]

RFC822 doesn't permit whitespace before the colon, does it?

That last time I read it I thought it did.  Looking now I think I was
wrong, the bit saying that space is allowed anywhere between lexical
tokens is in the section on structured field bodies, and we're not
talking about field bodies.

Here's something, though:  From this same interpretation it falls out
that the subject in "Subject: foo" is " foo", note space.  If you wanted
the subject to be just "foo" you'd have to write "Subject:foo".  If
that's not correct that I think space is allowed before the colon.

It's a pedantic point either way, of course.  Perhaps one can take
"liberal in what you accept" too far.  I've never seen anybody put
space before the colon.

[...] I can't spot a way to strip the trailing space but otherwise
capture the entire value without using an external process).

Hokay, how about

    * ^Content-transfer-encoding:[     ]*\/[^         ](.*[^   ])*

That is excellent.  Thanks.  Here's my revised version.

# $Id: mime-decode.rc,v 1.3 1997-05-28 12:26:01-04 roderick Exp $
#
# Roderick Schertler <roderick(_at_)argon(_dot_)org>

# Undo the Content-Transfer-Encoding applied to the entire body.
#
# Note that a multipart message is not allowed to have an encoding (other
# than 7bit, 8bit or binary) so this won't touch such.
#
# Most "attachments" are sent as multipart messages so this will not
# decode them.  Check the Content-Type header for "multipart/" to see
# if this is the case.  Smart mailers, however, will send an attachment
# without any encompanying text without the multipart encoding, so this
# script will decode them.  (Netscape gets this right, eg.)  Save the
# body with flags "br" and you'll recover the original exactly.  I
# process zip files this way.

mime_decode_space = '    '      # space and tab

:0
* $ ^Content-Transfer-Encoding[$mime_decode_space]*:[$mime_decode_space]*\
        \/[^$mime_decode_space](.*[^$mime_decode_space])*
{
    mime_decode_encoding = $MATCH

    :0 fbrw
    * mime_decode_encoding ?? ^^quoted-printable^^
    # grr, beware, you can't bundle the switches
    | mimencode -q -u

    :0 afhw
    | formail -i 'Content-Transfer-Encoding: 8bit'

    :0 Efbrw
    * mime_decode_encoding ?? ^^base64^^
    | mimencode -u

    :0 afhw
    | formail -i 'Content-Transfer-Encoding: binary'

    mime_decode_encoding
}

mime_decode_space

-- 
Roderick Schertler
roderick(_at_)argon(_dot_)org

<Prev in Thread] Current Thread [Next in Thread>