I receive log information from a large number of computers.
Syslog on some machines is broken, and instead of adding a note to the
effect of "previous message repeated 271 times", it will include all 271
seperate log messages.
I don't need to see all of these.
What's the most efficient way of filtering lines that are repeated more
than <n> times? If I was trying to match specific text, this would be
trivial, but as I'm trying to match *any* repeated text I'm not sure how
to proceed.
-- Lars
This is obviously not a procmail question.
However I'll give you a hint in awk:
#! /usr/local/bin/gawk -f
## rmsyslogdupes.awk
## by James T. Dennis (jim(_at_)starshine(_dot_)org)
##
## remove dupe messages in syslog files
# get the part of the line past the date:
# initialize l (lastline) for first line:
NR==1 { $1 =""; $2=""; $3=""; l=$0 }
# use the same technique again to strip the date/time stamp
# compare l with what's left of $0 and skip to next line
# if they're the same -- (else)|(in any event) set new l
NR > 1 { $1 =""; $2=""; $3="";
if (l==$0) { next }
l=$0
}
# print anything we didn't skip (including that first line)
{ print }
This script is not tested -- just something off the cuff
-- however the principle should work.
--
Jim Dennis,
info(_at_)mail(_dot_)starshine(_dot_)org
Proprietor,
consulting(_at_)mail(_dot_)starshine(_dot_)org
Starshine Technical Services http://www.starshine.org