procmail
[Top] [All Lists]

removing whitespace between adjacent 'encoded-word's (was: Re: How to avoid s/\n/ /g when unfolding a header)

2004-12-14 19:28:59
On Wed, 15 Dec 2004, 01:37 GMT+01 Ruud H.G. van Tol wrote:

Toen wij Robert Allerstorfer kietelden, kwam er dit uit:

the
solution of removing all newlines from a string with sed is

sed -e :a -e '$!N; s/\n//; ta'

tested and works :-)

Why not leave at least a single space where each \n was?

This was only for "completeness" how s/\n//g (perl regex) works with
sed. Further below in the mail in question, I also stated that I
completely dropped the idea to use sed at all for the original purpose
of deobfuscating a subject. The reason is the fact regarding
whitespace between 'encoded-word's, as defined in RFC 2047, as you
lightened. From the example section of RFC 2047 you mentioned:

   encoded form                                displayed as
   ---------------------------------------------------------------------

   (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)     (ab)

           White space between adjacent 'encoded-word's is not
           displayed.

Having this rule in mind, the space inserted by procmail at the places
where \n have been, when catching MATCH of
* $ ^subject:[$WS]+\/.+
are no more a problem when MATCH is then tested carefully for
whitespace between adjacent 'encoded-word's. That whitespace should
then be removed. But this does not seem to be easily makeable with
procmail. Still have to think on how to convert
=?ISO-8859-1?Q?a?=  =?ISO-8859-1?Q?b?=  c =?ISO-8859-1?Q?d?=
to
=?ISO-8859-1?Q?a?==?ISO-8859-1?Q?b?=  c =?ISO-8859-1?Q?d?=
in order to deobfuscate it to
ab  c d


rob.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>