On Wed, 15 Dec 2004, 06:10 GMT+01 Ruud H.G. van Tol wrote:
Toen wij Robert Allerstorfer kietelden, kwam er dit uit:
whitespace between adjacent 'encoded-word's. That whitespace should
then be removed. But this does not seem to be easily makeable with
procmail. Still have to think on how to convert
=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?= c =?ISO-8859-1?Q?d?=
to
=?ISO-8859-1?Q?a?==?ISO-8859-1?Q?b?= c =?ISO-8859-1?Q?d?=
in order to deobfuscate it to
ab c d
http://www.xs4all.nl/~rvtol/procmail/
Check out my 'bq_wrap.rc' that demoes 'inc/snr_wild.inc'.
It does a rather opportunistic search&replace
of '\?=[$WS]+=\?'
by '?==?'.
I have now written a sophisticated pure-procmail routine that
deobfuscates the subject, with full respect to RFC 2047's rule saying
if adjacent 'encoded-word's are separated by whitespace only, that
whitespace must not be displayed (and will thus be removed by my
code). Rather than just searching for "\?=[$WS]+=\?", it only removes
whitespace if it is indeed between two 'encoded-word's. Thus, it also
works on
Subject: <?= =?> First=?utf-8?q?_line=7E?=
=?us-ascii?q?Second l?=ine
which will result in
Subject: <?= =?> First line~Second line
No need to call any external program.
In that case, where more than one 'encoded-word's are present, but
all 'encoded-word's don't use the same charset, it also sets
$av_MULTICHARSETS to true, indicating that the message must be trash
(similar to your '*$ ^[^:]+:.*${bq_regex}.*${bq_regex}' but explicitly
checks each charset and stores it as $av_SUBJECT_CHARSET, usable
for additional spam tests).
The basic steps are
Assigning "av_SUBJECT=<?= =?> First=?utf-8?q?_line=7E?= =?us-ascii?q?Second
l?=ine"
Assigning "INCLUDERC=/etc/procmailrcs/SoftlabsAV-dev/inc/av_bq-subj.inc"
Assigning "av_SUBJECT_BQMATCH==?utf-8?q?_line=7E?="
Assigning "av_SUBJECT_START=<?= =?> First"
Assigning "av_SUBJECT_END==?us-ascii?q?Second l?=ine"
Assigning "av_BQ_ENCODING=q"
Assigning "av_SUBJECT_CHARSET=utf-8"
Assigning "av_ENCODED_TEXT=_line=7E"
Assigning "snr_Return=_line~"
Assigning "bq_Return= line~"
Assigning "av_SUBJECT=<?= =?> First line~=?us-ascii?q?Second l?=ine"
Assigning "INCLUDERC=/etc/procmailrcs/SoftlabsAV-dev/inc/av_bq-subj.inc"
Assigning "av_SUBJECT_BQMATCH==?us-ascii?q?Second l?="
Assigning "av_SUBJECT_START=<?= =?> First line~"
Assigning "av_SUBJECT_END=ine"
Assigning "av_BQ_ENCODING=q"
Assigning "av_MULTICHARSETS=yes"
Assigning "av_SUBJECT_CHARSET=us-ascii"
Assigning "av_ENCODED_TEXT=Second l"
Assigning "bq_Return=Second l"
Assigning "av_SUBJECT=<?= =?> First line~Second line"
Assigning "INCLUDERC=/etc/procmailrcs/SoftlabsAV-dev/inc/av_bq-subj.inc"
No match on "()\/=\?([a-z][a-z0-9_-]+[a-z0-9])\?[bq]\?[^?]+\?="
Assigning "SWITCHRC"
It of course also uses 'inc/bq.inc' for decoding.
Of course, I also use your bq-decoding recipes, in a slight modified
form.
To do: decode underscores to spaces (inside q-encoded strings).
In my inc/av_bq.inc, which is based on your bq, I am using your snr
after the qpr for that, like this:
# insert by roal, to replace all "_" by " ":
snr_Return = $qpr_Return
qpr_Return
snr_Search = '_'
:0
* $ snr_Return ?? [$snr_Search]
{
snr_Replace = ' '
snr_Search_Len = '1'
snr_Replace_Len = '1'
snr_Head
VERBOSE = 'off'
INCLUDERC = $av_INSTALLDIR/inc/av_snr.inc
VERBOSE = $av_VERBOSE_SAVED
}
# end of insert
bq_Return = $snr_Return
This stuff will all be released with SoftlabsAV 0.8.3.
:-)
rob.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail