ietf
[Top] [All Lists]

Re: Subscriber List Damage

2008-06-30 19:30:48
Unfortunately I really don't have anything but assumptions about what happened at this point.

An increase in TMDA activity caused the system to run out of resources. We saw an extremely high load average and a sudden decrease in kernel buffer memory. Processes started to fail to fork. Our engineers connected in, and rebooted the server.

Upon reboot, we quickly discovered that the IETF list, which we moderate, was no longer letting us in with the normal list password. A quick check showed that the database file config.pck was now only about 50% of its original size, and that many subscribers had been removed - and others added. Passwords and other settings had been "reverted" to pre-cutover values. A comparison with our recent backup of the file showed massive differences - not just removals, but additions. We hypothesize that mailman fell back to some type of cached copy of the database from 2005, which was also in the directory, and recreated the data from that.

Unfortunately, we have no way to verify what happened, and certainly don't want to try to cause it again.

I believe our solution is going to be to remove TMDA completely, which, I believe will eliminate the side-effects we've been seeing such as this.

I'm happy to discuss this further; however, I do not want to pollute the various lists with this. I'm cc'ing the lists in this case so everyone understands we are responding to emails, but I'm going to take any further threads off-list to keep the lists focused on their primary purposes.

Glen

Eric Rescorla wrote:
At Mon, 30 Jun 2008 15:48:10 -0700,
Michael Thomas wrote:
1) Have you brought this up with the mailman folks? I've interacted with
them and they seem like a responsive set of folks. I'm sure that this sort
    of thing would horrify them.

I agree that this is horrifying.

More importantly, doesn't this mean that this is a problem we actually
need a solution for pronto? As I understand Glen's message, he's
saying that this is a bug in mailman triggered by some problem in
TMDA. I realize that TMDA is being replaced, but presumably Henrik's
code isn't perfect, so don't we have to worry about it triggering the
same behavior?

Glen, I'm sure there are some people on this list who understand
mailman well. I realize you may not have complete info, but if you can
provide us some more information--e.g., what file(s) got stomped and
which code you think stomped it--about what you think happened, maybe
they can help track it down?

-Ekr


2) 3 years since the last backup? Oi.

       Mike

Glen wrote:
All -

I was asked by the IAOC to post a message to the IETF and SIP lists, to ensure that people were aware that the subscriber lists for the IETF and SIP lists were damaged as a result of an anomaly in TMDA and Mailman that occurred Thursday night.

Basically, TMDA misbehaved, and, in the process, caused Mailman to encounter a transient failure in the reading of its databases for these two lists. As a result, rather than simply holding the mail and retrying it, Mailman decided to discard the current list databases and re-create them from 3-year-old data, for both the IETF and the SIP lists.

*sigh*

No email was lost to the system or the archives; however, some people may have missed some messages, or may still not be resubscribed to the list.

Of course we restored the files from backups; however, we want to make sure that everyone gets the mail they missed, and that everyone is subscribed to these lists who wishes to be subscribed.

So...

If you're reading this message in your email box, you're subscribed to the list identified in the subject line, and all should be okay.

If you're reading this message in the archives, wondering why you're not getting list mail, please take a moment to resubscribe yourself to the list, which should resolve your problem.

And regardless, if you feel you missed any mail, we do have the archives available for your reference.

IETF List Subscription Link:  https://www.ietf.org/mailman/listinfo/ietf
IETF List Archive Link:  http://www.ietf.org/mail-archive/web/ietf/

SIP List Subscription Link:  https://www.ietf.org/mailman/listinfo/sip
SIP List Archive Link:  http://www.ietf.org/mail-archive/web/sip/

We are in the home stretch of getting TMDA removed and replaced on the servers, and I apologize for any inconvenience caused by this issue. Because server problems apparently happen only in the dead of night, you can be sure that we feel any and all pain anyone may be experiencing.

If you need any assistance, please contact the IETF Secretariat, using the links at: http://www.ietf.org/secretariat/

Thank you,
Glen Barney
IT Director
AMS (IETF Secretariat)
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>