:What's the most efficient way of filtering lines that are repeated more
:than <n> times?
If you're willing to settle for n=1, try the "uniq" command (but don't
That would work, except that the lines aren't identical -- each one has a
timestamp. Uniq won't prune these lines out of the message because
they're not really identical.
Lars,
GNU uniq has some extensions. Not enough of them
(since I don't see one to specficy the field seperator)
but probably enough for this task.
(1) Grab the message portion of the entry
(2) check to see if it's in TMPFILE
(3) If not, output the line and add the message to TMPFILE
Are you trying to remove adjacent dupes or non-adjacents
dupes?
We both define "dupes" the same way -- the portion of the
line past the date and host name.
My script provides handles adjacent dupes. I'd feed
my script through 'sort +4 ' (using the skip fields
option of that utility -- which is zero based!).
Personally I think this approach is flawed -- since
you may miss patterns of errors that occur at longer
intervals.
For a more comprehensive approach to log file filtering
you might want to look at swatch (available at Linux
FTP sites everywhere -- it should be portable to other
systems since it's written in perl (if I recall
correctly). I have to admit that I haven't used
swatch personally since I've written my own simple
awk script and developement my own filters for it.
I really think we should take any further discussion
of this topic off line, however. I'd like to thank
the rest of the list for their indulgence and patience
in tolerating this off-topic thread up until this point.
--
Jim Dennis,
info(_at_)mail(_dot_)starshine(_dot_)org
Proprietor,
consulting(_at_)mail(_dot_)starshine(_dot_)org
Starshine Technical Services http://www.starshine.org