procmail
[Top] [All Lists]

Re: removing whitespace between adjacent 'encoded-word's

2004-12-18 19:57:18
Toen wij Robert Allerstorfer kietelden, kwam er dit uit:
Ruud H.G. van Tol:

See http://www.professional.org/procmail/furrin.rc for several
charsets that will not match even your regex, like those with two
underscores in them.

But my regex matches all the W3C approved character Encoding names
(charsets), as listed on  http://validator.w3.org/detailed.html

OK, but it is you (and w3c) against the bad guys.


Suggestion:   av_CHARSET = '([a-z][a-z0-9_-]+[a-z0-9])'
(single quotes, because no variables have to be expanded)

Could you please give an example which charset this regex catches, but
would not be catched by the regex I am currently using?

I already gave the example of ones with more than a single underscore.
See the furrin.rc for other examples.


If you prefer single quotes, why don't you use
SPACE = ' '  instead of  SPACE = " " ?

That is because I haven't gone through that piece of code yet to correct
it.
Using single quotes where possible will shave off some cpu-cycles, but I
wouldn't overuse them either.


There are points where you only use ^ to anchor to the start of
a variable-value. Remember: ^ is any linefeed, ^^ is the start.

Yes, this was inconsequent. I have been using both ^^ and ^ to anchor
the start of a variable's value, I also cleaned this up for SoftlabsAV
0.8.3. However, this was not a real must because the variable values
in question are all one-liners, so using ^^ or ^ at the beginning of
the regex for such variables has exactly the same effect.

But ^^ is more expressive for somebody who reads your code, and using
only ^ where ^^ is meant, could bite you when code is reused.


[$ascii, tilde]
Did not observe any problems with the tilde. Would vote for shipping
your asc.inc with support of decoding the tilde, or, at least
commenting why you don't support it. Too bad that you can't remember
which side-effects this could cause.

I have to rerun my tests to see what it was. Taking the tilde out is
ugly,
so I wouldn't have done that if it didn't cause problems somewhere. I
hope
it is not just the editor that I used at the time. ;) (must have been
pico)


BTW, your bq recipe rocks! In fact I haven't seen a procmail recipe so
far that is more sophisticated than your 'b64.inc.inc' and
'qpr.inc.inc'. Do you still have in mind to expand b64's capabilities
from decoding 4 characters at a time to do it line by line, so it
could be a real replacement to mimencode? I remember you mentioned
something when you released your very first version.

It is capable of doing that since the early days. Of course it all
starts
with a group of 4 characters, because that is how base64 works.
See b64_demo.rc and bq_demo.rc and (especially) bq_head.rc and
bq_wrap.rc

Just in: rep_fast.inc. See rep_fast_demo.rc for just another demo.

-- 
Grtz, Ruud


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>