Is there an easy way to identify syslog mailings? Can you then sort -u
on a field (such as the message text?) that will be alike on all the
lines that are duplicates, ignoring the timestamp? Something like
The syslog mailings are easy to identify -- they're already sorted into
their own mail folders. On the other hand, 'sort -u' won't work because
I don't want to rearrange the *order* of lines in the file (otherwise, I
would have just used sort -u and not bothered anybody :-).
I think we need more detail as to exactly what these lines look like, what
The text of the syslog message is variable. The only part of the line
with a fixed format is the initial timestamp and hostname, from my
original example:
Feb 9 10:09:53 hostname <some sort of message text>
The remained of the text is free format (but it is guaranteed to be on
the same line).
else can occur in the same message, and what part has to be identical (every-
thing except the timestamp, including the hostname?) for you to want them
grouped together.
Yes. (That is, consider them identical if everything but the timestamp
and hostname matches)
Also, what if this happens:
198 lines that should be grouped together
1 different line
73 more lines like the first 198
Would you want the seventy-three combined with the 198 to make 271 or kept
separate?
It would probably make sense to keep them seperate. In an ideal world,
there would be a trivial way to check the timestamp and see how far apart
they were chronologically, and decide whether or not to combine them.
However, there are certain things I think are best not handled by a mail
filter, and that would probably be one of them.
It sounds as if the sed script would work just fine; I'll try it out on
some samples and see what happens. Incidentally, I don't need a count :-).
-- Lars
---
Lars Kellogg-Stedman * lars(_at_)bu(_dot_)edu * (617)353-8277
Office of Information Technology, Boston University