procmail
[Top] [All Lists]

Re: Matching repeating lines?

1997-02-10 14:37:32
At 01:48 PM 2/10/97 -0500, Lars Kellogg-Stedman wrote:
:What's the most efficient way of filtering lines that are repeated more
:than <n> times?

If you're willing to settle for n=1, try the "uniq" command (but don't

That would work, except that the lines aren't identical -- each one has a 
timestamp.  Uniq won't prune these lines out of the message because 
they're not really identical.

For anyone following this thread:  a syslog entry looks something like:

Feb  9 10:09:53 hostname <some sort of message text>

I want to prune out lines with duplicate messages -- even if they have 
different timestamps.  I suppose something like this would be possible in 
perl, but I'm not at all familiar with perl.  Any hints?

I'm still not clear on what you have and what you want.  I'm going to use
awk, not perl, because of familiarity with it.

If I make the following assumptions:
        - syslog entry is one line, and the stuff wanted starts in col. 17
        - timestamps can be ignored
        - lines are shorter than 10000 characters
        - you want to see quantity of message, and the message

Then you can feed it to awk with the following program:
        length($0) > 16 {               # ignore shorter lines
                quantity [substr($0,17,9999)]++
                }
        END {
                for (x in quantity) {
                        print quantity[x], x
                        }
                }

This exploits awk's ability to use strings as array subscripts.
This is probably getting off topic for procmail, though.

Cheers,
Stan.

<Prev in Thread] Current Thread [Next in Thread>