Robin Dunn
2009-05-05 00:16:20 UTC
Hi all,
We've once again had a CPU load spike that caused the hosting provider
to shut down the VPS, and once again it appears to be related to the
mail server. I've put what I think are reasonable limits on the number
of processes that postfix can spawn, and then dropped the count by a few
more. (The defaults were 100, and I've had them set at 25, 15, then 10,
and now 8.) The spikes are less frequent than they were before I
started investigating this, but they can still happen. The problem
seems to be that since managing the mail queues can be fairly disk
intensive then the system can get into the state where more than one
process is waiting on the disk, and that significantly raises the load
average even though there is no real amount of user CPU being used by
the processes. The htop tool will report several processes in the 'D'
state (uninteruptible sleep) with less than 1% CPU, but the total system
or kernel CPU utilization is real high. Also, when other non-mail
processes like apache need to access the disk at the time of the spike
they have to wait for the mail processes to finish their disk access so
they go into the 'D' state too.
Anyway, I'm contemplating moving the mail lists to Google Groups and am
seeking opinions about doing this. One of the main reasons I am
considering Google Groups over other possible solutions is because they
are at least partially to blame for our spikes. Since about half or our
subscribed addresses are @gmail.com then when Gmail gets bogged down and
they start deferring delivery of the messages then the number of
messages waiting in our active queue skyrockets. Here is a current
snapshot of the top of the active queue:
T 5 10 20 40 80 160 320 640 1280 1280+
TOTAL 5271 170 702 1068 1071 576 1467 40 16 9 152
gmail.com 2682 94 339 533 469 331 916 0 0 0 0
softwaremagic.net 86 0 0 0 1 0 0 2 3 5 75
yahoo.com 82 3 13 18 9 5 34 0 0 0 0
2p.pl 63 0 0 2 0 0 0 4 2 2 53
googlemail.com 62 4 7 15 9 5 22 0 0 0 0
hotmail.com 57 1 12 13 4 3 24 0 0 0 0
gmx.de 40 0 8 8 17 0 7 0 0 0 0
gmx.net 32 1 8 9 5 9 0 0 0 0 0
163.com 23 0 1 1 10 9 2 0 0 0 0
free.fr 19 3 4 7 5 0 0 0 0 0 0
xs4all.nl 13 1 2 3 5 0 2 0 0 0 0
mac.com 12 1 1 2 3 2 3 0 0 0 0
sauco.org 12 0 1 1 0 0 1 8 1 0 0
worldnet.att.net 12 0 1 1 0 0 1 8 1 0 0
cox.net 10 1 2 3 1 3 0 0 0 0 0
yahoo.it 10 0 4 4 1 0 1 0 0 0 0
ymail.com 10 0 2 2 1 1 4 0 0 0 0
t-online.de 10 0 1 1 5 0 3 0 0 0 0
web.de 9 0 3 3 2 0 1 0 0 0 0
The columns represent the "message bucket" the messages are stored in
where the numbers on the top row are the number of minutes that the
messages in that bucket have been undeliverable. (The first column of
numbers is the totals.) So it should be obvious that at this moment a
very disproportionate amount of problem deliveries are headed to gmail,
and it has been that way for about 3 hours. At least a few of the other
times that we have had load spikes the queue shape has looked similar.
Normally the active queue will only have between zero and a few dozen
pending message deliveries and the qshape status shown above would be
mostly empty.
So anyway, I'm thinking that moving the lists to Google will likely at
least result in better delivery to the gmail addresses, and they should
have the resources to manage providing good delivery to the non gmail
addresses too. It also adds the ability to read and respond to messages
via just the web browser without actual mail messages being delivered to
the inbox, (making it more like forums) for people that like that approach.
Now the downside: There doesn't seem to be a supported way to import
list archives into a google group. Some searching turns up some hacks
to do it, involving subscribing an address to both the current list and
the google group and then requesting a resend of messages to the
address. It has to be done carefully though since Google will block
message floods. Since we have a few hundred thousand messages in our
archives this is not something I want to spend a lot of time on, or have
to babysit. Anybody have any experience with this, or know of an
automated way to do it?
On the other hand, we do have good 3rd party archives already available
(gmane, nabble, etc.) with at least the recent history in them, so it
would be feasible to just decide to not bother importing them into the
Google groups.
Thoughts? Comments? Opinions? Alternatives? I would also be interested
in hearing from anybody who has managed multiple medium to high traffic
lists at google groups about what you like or dislike, and if you are
happy that you chose to use GG.
We've once again had a CPU load spike that caused the hosting provider
to shut down the VPS, and once again it appears to be related to the
mail server. I've put what I think are reasonable limits on the number
of processes that postfix can spawn, and then dropped the count by a few
more. (The defaults were 100, and I've had them set at 25, 15, then 10,
and now 8.) The spikes are less frequent than they were before I
started investigating this, but they can still happen. The problem
seems to be that since managing the mail queues can be fairly disk
intensive then the system can get into the state where more than one
process is waiting on the disk, and that significantly raises the load
average even though there is no real amount of user CPU being used by
the processes. The htop tool will report several processes in the 'D'
state (uninteruptible sleep) with less than 1% CPU, but the total system
or kernel CPU utilization is real high. Also, when other non-mail
processes like apache need to access the disk at the time of the spike
they have to wait for the mail processes to finish their disk access so
they go into the 'D' state too.
Anyway, I'm contemplating moving the mail lists to Google Groups and am
seeking opinions about doing this. One of the main reasons I am
considering Google Groups over other possible solutions is because they
are at least partially to blame for our spikes. Since about half or our
subscribed addresses are @gmail.com then when Gmail gets bogged down and
they start deferring delivery of the messages then the number of
messages waiting in our active queue skyrockets. Here is a current
snapshot of the top of the active queue:
T 5 10 20 40 80 160 320 640 1280 1280+
TOTAL 5271 170 702 1068 1071 576 1467 40 16 9 152
gmail.com 2682 94 339 533 469 331 916 0 0 0 0
softwaremagic.net 86 0 0 0 1 0 0 2 3 5 75
yahoo.com 82 3 13 18 9 5 34 0 0 0 0
2p.pl 63 0 0 2 0 0 0 4 2 2 53
googlemail.com 62 4 7 15 9 5 22 0 0 0 0
hotmail.com 57 1 12 13 4 3 24 0 0 0 0
gmx.de 40 0 8 8 17 0 7 0 0 0 0
gmx.net 32 1 8 9 5 9 0 0 0 0 0
163.com 23 0 1 1 10 9 2 0 0 0 0
free.fr 19 3 4 7 5 0 0 0 0 0 0
xs4all.nl 13 1 2 3 5 0 2 0 0 0 0
mac.com 12 1 1 2 3 2 3 0 0 0 0
sauco.org 12 0 1 1 0 0 1 8 1 0 0
worldnet.att.net 12 0 1 1 0 0 1 8 1 0 0
cox.net 10 1 2 3 1 3 0 0 0 0 0
yahoo.it 10 0 4 4 1 0 1 0 0 0 0
ymail.com 10 0 2 2 1 1 4 0 0 0 0
t-online.de 10 0 1 1 5 0 3 0 0 0 0
web.de 9 0 3 3 2 0 1 0 0 0 0
The columns represent the "message bucket" the messages are stored in
where the numbers on the top row are the number of minutes that the
messages in that bucket have been undeliverable. (The first column of
numbers is the totals.) So it should be obvious that at this moment a
very disproportionate amount of problem deliveries are headed to gmail,
and it has been that way for about 3 hours. At least a few of the other
times that we have had load spikes the queue shape has looked similar.
Normally the active queue will only have between zero and a few dozen
pending message deliveries and the qshape status shown above would be
mostly empty.
So anyway, I'm thinking that moving the lists to Google will likely at
least result in better delivery to the gmail addresses, and they should
have the resources to manage providing good delivery to the non gmail
addresses too. It also adds the ability to read and respond to messages
via just the web browser without actual mail messages being delivered to
the inbox, (making it more like forums) for people that like that approach.
Now the downside: There doesn't seem to be a supported way to import
list archives into a google group. Some searching turns up some hacks
to do it, involving subscribing an address to both the current list and
the google group and then requesting a resend of messages to the
address. It has to be done carefully though since Google will block
message floods. Since we have a few hundred thousand messages in our
archives this is not something I want to spend a lot of time on, or have
to babysit. Anybody have any experience with this, or know of an
automated way to do it?
On the other hand, we do have good 3rd party archives already available
(gmane, nabble, etc.) with at least the recent history in them, so it
would be feasible to just decide to not bother importing them into the
Google groups.
Thoughts? Comments? Opinions? Alternatives? I would also be interested
in hearing from anybody who has managed multiple medium to high traffic
lists at google groups about what you like or dislike, and if you are
happy that you chose to use GG.
--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!