Re: Matching repeating lines?

Is there an easy way to identify syslog mailings?  Can you then sort -u
on a field (such as the message text?) that will be alike on all the
lines that are duplicates, ignoring the timestamp?  Something like


The syslog mailings are easy to identify -- they're already sorted into
their own mail folders.  On the other hand, 'sort -u' won't work because 
I don't want to rearrange the *order* of lines in the file (otherwise, I 
would have just used sort -u and not bothered anybody :-).

I think we need more detail as to exactly what these lines look like, what


The text of the syslog message is variable.  The only part of the line 
with a fixed format is the initial timestamp and hostname, from my 
original example:

Feb  9 10:09:53 hostname <some sort of message text>

The remained of the text is free format (but it is guaranteed to be on 
the same line).

else can occur in the same message, and what part has to be identical (every-
thing except the timestamp, including the hostname?) for you to want them
grouped together.


Yes.  (That is, consider them identical if everything but the timestamp 
and hostname matches)

Also, what if this happens:

  198 lines that should be grouped together
  1 different line
  73 more lines like the first 198

Would you want the seventy-three combined with the 198 to make 271 or kept
separate?


It would probably make sense to keep them seperate.  In an ideal world, 
there would be a trivial way to check the timestamp and see how far apart 
they were chronologically, and decide whether or not to combine them.  
However, there are certain things I think are best not handled by a mail 
filter, and that would probably be one of them.

It sounds as if the sed script would work just fine; I'll try it out on 
some samples and see what happens.  Incidentally, I don't need a count :-).

-- Lars

---
Lars Kellogg-Stedman * lars(_at_)bu(_dot_)edu * (617)353-8277
Office of Information Technology, Boston University