procmail
[Top] [All Lists]

Re: Simple (for you) regexp question

1998-11-16 13:50:53
On Mon, 16 Nov 1998 19:20:45 +0200 (EET), Kimmo Jaskari
<kimmo(_at_)alcom(_dot_)aland(_dot_)fi> wrote:
ended the same way. I just wondered how are you going to match a line
that can start with anything but end with something specific without
resorting to .*
What do you mean by "for some reason that is not the way to do it"?
Yes, generally speaking, .* is too prone to accidentally match
something you didn't intend it to, so if you want to match on
something more specific, you should spell it out. Is this what you are
referring to?
Yeah. I've seen numerous times here that using .* at the beginning of
an expression is a no-no, so that's why I started wondering how else

(So the proper way for this sentence to begin is "No." ;^)

I was supposed to do it. It works as is, I just started thinking that
perhaps it might be useful to know how to make it work more elegantly.

The lesson you want to learn is that .* at end of line, or beginning
of line, is implied anyway and basically just a slightly wasteful
no-op. The issue is not that it's doing something wrong, just that you
demonstrate that you don't understand how regular expressions work (to
put it bluntly).

The prototypical case where gurus cringe over hopeless newbie behavior
is something like

    grep '.*..*'

(or, worse yet, 

    grep '*.*'                  # yikes, horrors

but let's not even think about that :-) when you could really just as
well say 

    grep '.'

So the answer to, "how do I match something anywhere an a line" is

    grep something

(and not grep '.*something.*' or '^.*something.*$' which is
syntactically correct but pretty much a royal waste of resources) and
now you're also prepared to learn that "something at beginning of
line" is '^something' and "something at end of line" is 'something$'.

Then we get to the intermediate level where J. Hapless Luser basically
knows how to wield regular expressions for even moderately complex
tasks, but doesn't contemplate what happens when you dump in a .*
where you really mean whitespace followed by three to eleven tokens
followed by whitespace. That's what I was referring to above.

Of course, a lot of times, if it's a non-critical regular expressions,
a lot of us are simply too lazy to construct a better regular
expression, and .* is good enough (until two months later you figure
out why that recipe never did work properly :-)

People who have read Friedl's "Hip Owls" book (highly recommended!)
tend to even have some sort of idea about the efficiency impact of a
regular expression with two redundant .*:s where none are necessary,
but if that's something you're concerned with, by all means read the
book. (Procmail's minimal matching behavior makes it slightly more
immune to this than typical greedy regex engines.)

/* era */

-- 
.obBotBait: It shouldn't even matter whether    <http://www.iki.fi/~era/>
I am a resident of the state of Washington. <http://members.xoom.com/procmail/>

<Prev in Thread] Current Thread [Next in Thread>