Discussion:
RFC: Migrating to Google Groups?
(too old to reply)
Robin Dunn
2009-05-05 00:16:20 UTC
Permalink
Hi all,

We've once again had a CPU load spike that caused the hosting provider
to shut down the VPS, and once again it appears to be related to the
mail server. I've put what I think are reasonable limits on the number
of processes that postfix can spawn, and then dropped the count by a few
more. (The defaults were 100, and I've had them set at 25, 15, then 10,
and now 8.) The spikes are less frequent than they were before I
started investigating this, but they can still happen. The problem
seems to be that since managing the mail queues can be fairly disk
intensive then the system can get into the state where more than one
process is waiting on the disk, and that significantly raises the load
average even though there is no real amount of user CPU being used by
the processes. The htop tool will report several processes in the 'D'
state (uninteruptible sleep) with less than 1% CPU, but the total system
or kernel CPU utilization is real high. Also, when other non-mail
processes like apache need to access the disk at the time of the spike
they have to wait for the mail processes to finish their disk access so
they go into the 'D' state too.

Anyway, I'm contemplating moving the mail lists to Google Groups and am
seeking opinions about doing this. One of the main reasons I am
considering Google Groups over other possible solutions is because they
are at least partially to blame for our spikes. Since about half or our
subscribed addresses are @gmail.com then when Gmail gets bogged down and
they start deferring delivery of the messages then the number of
messages waiting in our active queue skyrockets. Here is a current
snapshot of the top of the active queue:

T 5 10 20 40 80 160 320 640 1280 1280+
TOTAL 5271 170 702 1068 1071 576 1467 40 16 9 152
gmail.com 2682 94 339 533 469 331 916 0 0 0 0
softwaremagic.net 86 0 0 0 1 0 0 2 3 5 75
yahoo.com 82 3 13 18 9 5 34 0 0 0 0
2p.pl 63 0 0 2 0 0 0 4 2 2 53
googlemail.com 62 4 7 15 9 5 22 0 0 0 0
hotmail.com 57 1 12 13 4 3 24 0 0 0 0
gmx.de 40 0 8 8 17 0 7 0 0 0 0
gmx.net 32 1 8 9 5 9 0 0 0 0 0
163.com 23 0 1 1 10 9 2 0 0 0 0
free.fr 19 3 4 7 5 0 0 0 0 0 0
xs4all.nl 13 1 2 3 5 0 2 0 0 0 0
mac.com 12 1 1 2 3 2 3 0 0 0 0
sauco.org 12 0 1 1 0 0 1 8 1 0 0
worldnet.att.net 12 0 1 1 0 0 1 8 1 0 0
cox.net 10 1 2 3 1 3 0 0 0 0 0
yahoo.it 10 0 4 4 1 0 1 0 0 0 0
ymail.com 10 0 2 2 1 1 4 0 0 0 0
t-online.de 10 0 1 1 5 0 3 0 0 0 0
web.de 9 0 3 3 2 0 1 0 0 0 0

The columns represent the "message bucket" the messages are stored in
where the numbers on the top row are the number of minutes that the
messages in that bucket have been undeliverable. (The first column of
numbers is the totals.) So it should be obvious that at this moment a
very disproportionate amount of problem deliveries are headed to gmail,
and it has been that way for about 3 hours. At least a few of the other
times that we have had load spikes the queue shape has looked similar.
Normally the active queue will only have between zero and a few dozen
pending message deliveries and the qshape status shown above would be
mostly empty.

So anyway, I'm thinking that moving the lists to Google will likely at
least result in better delivery to the gmail addresses, and they should
have the resources to manage providing good delivery to the non gmail
addresses too. It also adds the ability to read and respond to messages
via just the web browser without actual mail messages being delivered to
the inbox, (making it more like forums) for people that like that approach.

Now the downside: There doesn't seem to be a supported way to import
list archives into a google group. Some searching turns up some hacks
to do it, involving subscribing an address to both the current list and
the google group and then requesting a resend of messages to the
address. It has to be done carefully though since Google will block
message floods. Since we have a few hundred thousand messages in our
archives this is not something I want to spend a lot of time on, or have
to babysit. Anybody have any experience with this, or know of an
automated way to do it?

On the other hand, we do have good 3rd party archives already available
(gmane, nabble, etc.) with at least the recent history in them, so it
would be feasible to just decide to not bother importing them into the
Google groups.

Thoughts? Comments? Opinions? Alternatives? I would also be interested
in hearing from anybody who has managed multiple medium to high traffic
lists at google groups about what you like or dislike, and if you are
happy that you chose to use GG.
--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!
Vadim Zeitlin
2009-05-05 09:00:12 UTC
Permalink
On Mon, 04 May 2009 17:16:20 -0700 Robin Dunn <***@alldunn.com> wrote:

RD> We've once again had a CPU load spike that caused the hosting provider
RD> to shut down the VPS, and once again it appears to be related to the
RD> mail server. I've put what I think are reasonable limits on the number
RD> of processes that postfix can spawn, and then dropped the count by a few
RD> more. (The defaults were 100, and I've had them set at 25, 15, then 10,
RD> and now 8.)

I wonder if switching to Exim could help. I've never had any shadow of
such problems with it, although to be fair I never used it with that many
messages neither. OTOH, in spite of my dislike for Postfix, I have to admit
that it really looks more like a Linux kernel problem than a Postfix one,
it really should be able to handle disk accesses better (I don't know what
architecture does the server use but under amd64 it's catastrophically
bad).

RD> Anyway, I'm contemplating moving the mail lists to Google Groups and am
RD> seeking opinions about doing this.

What are the alternatives? It's clear that we can't continue to live with
this kind of problems all the time so we need to do something. And I don't
think anybody here can host the mailing lists (I have some spare hardware
but surely not enough bandwidth for them). So we need to host them
somewhere and Google is surely the best one among the free alternatives
(or does anybody seriously consider putting the lists on MSN?). As for paid
hosting, I don't know how good is it but your experience with the current
server seems to be rather dissuasive to me. OTOH maybe it's simply not the
best one... I've heard a lot of good things about DreamHost and they do
claim to provide "unlimited" mailing lists using mailman for $6/month. Has
anybody here had any experience with them?

RD> One of the main reasons I am considering Google Groups over other
RD> possible solutions is because they are at least partially to blame for
RD> our spikes.

This is weird, I've never had such huge delays with gmail addresses.
Again, I don't have nearly the same amount of traffic, of course, so maybe
it starts refusing messages after some threshold.

RD> Here is a current snapshot of the top of the active queue:
RD>
RD> T 5 10 20 40 80 160 320 640 1280 1280+
RD> TOTAL 5271 170 702 1068 1071 576 1467 40 16 9 152
RD> gmail.com 2682 94 339 533 469 331 916 0 0 0 0
RD> softwaremagic.net 86 0 0 0 1 0 0 2 3 5 75
...

Whoa, where do these thousands of messages come from? I haven't received
more than a couple of dozens wx messages yesterday, are there any high
volume wx mailing lists I'm not subscribed to?

RD> So anyway, I'm thinking that moving the lists to Google will likely at
RD> least result in better delivery to the gmail addresses, and they should
RD> have the resources to manage providing good delivery to the non gmail
RD> addresses too. It also adds the ability to read and respond to messages
RD> via just the web browser without actual mail messages being delivered to
RD> the inbox, (making it more like forums) for people that like that approach.

Notice that you need a Google account to use these features however
(although you can subscribe to the list normally without it). Not a real
problem in practice probably but I just was rather surprised that I
couldn't disable email delivery for a mailing list I was subscribed to
under a non-gmail address (I finally could do it by adding this other
address as alternative address to my Google account and unsubscribing from
the list and resubscribing again but this wasn't exactly straightforward).

RD> Now the downside: There doesn't seem to be a supported way to import
RD> list archives into a google group.

It's a pity, I liked the idea of migrating to Google Groups because I
already prefer to use Google search for wx-users (which is already indexed
there as Usenet group) rather using gmane which is nicer for browsing but
rather bad for searching.

BTW, what about wx-users Usenet gateway, how is this going to work if the
mailing list itself is hosted on Google Groups?

Thanks,
VZ
--
TT-Solutions: wxWidgets consultancy and technical support
http://www.tt-solutions.com/
Bryan Petty
2009-05-05 16:21:46 UTC
Permalink
Post by Vadim Zeitlin
I've heard a lot of good things about DreamHost and they do
claim to provide "unlimited" mailing lists using mailman for $6/month. Has
anybody here had any experience with them?
They are one of most spotty hosting companies as far as their history
goes. I've had a couple friends that have hosted through them and are
no longer using them because of issues just like the one we had here.
They also shut down customers that hit limits on their service without
notice, and they are well known for screwing up tons of things like
losing databases, accidentally charging customers for a full year's
worth of hosting, making customers change all their passwords when
they lost everyone's FTP passwords, and the list goes on. Despite all
that, they are shared hosting, not VPS or dedicated. It would be
downgrading service from what Robin has the lists and the rest of the
wx stuff on.

Just search "Dreamhost" on Google, and a good half of the results past
the first page are complaints.
Post by Vadim Zeitlin
RD>
RD>                       T   5  10   20   40  80  160 320 640 1280 1280+
RD>              TOTAL 5271 170 702 1068 1071 576 1467  40  16    9   152
RD>          gmail.com 2682  94 339  533  469 331  916   0   0    0     0
RD> softwaremagic.net   86   0   0    0    1   0    0   2   3    5    75
...
 Whoa, where do these thousands of messages come from? I haven't received
more than a couple of dozens wx messages yesterday, are there any high
volume wx mailing lists I'm not subscribed to?
I'm sure we have thousands of subscribers to at least wx-dev, where we
send wxTrac notifications. Each time a ticket is changed, the queue
jumps up a few thousand. And of course we still have wx-users, and the
SVN list. These numbers make sense to me.
Post by Vadim Zeitlin
 Notice that you need a Google account to use these features however
(although you can subscribe to the list normally without it). Not a real
problem in practice probably but I just was rather surprised that I
couldn't disable email delivery for a mailing list I was subscribed to
under a non-gmail address (I finally could do it by adding this other
address as alternative address to my Google account and unsubscribing from
the list and resubscribing again but this wasn't exactly straightforward).
I've actually been wondering how to do this myself. I've been so
confused for the longest time about how I could manage my Groups
subscriptions to other Google accounts that weren't @gmail.com
accounts.

Regards,
Bryan Petty
legalize+ (Richard)
2009-05-05 17:22:17 UTC
Permalink
[Please do not mail me a copy of your followup]
Post by Bryan Petty
Post by Vadim Zeitlin
I've heard a lot of good things about DreamHost and they do
claim to provide "unlimited" mailing lists using mailman for $6/month. Has
anybody here had any experience with them?
They are one of most spotty hosting companies as far as their history
goes. I've had a couple friends that have hosted through them and are
no longer using them because of issues just like the one we had here.
[...]
As a very satisfied customer of XMission, I would recommend them.
They offer a hosting service and provide mailing lists (via mailman)
for customers. They would never do anything to a customer's service
without contacting them first. They run linux on their mail servers,
with heavy amounts of resources, mostly due to dealing with large
amounts of spam flowing to their customers. They use exim as the
mailer and while I've not run any heavy mailing lists on their
machine, I'm sure they could handle it.
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>

Legalize Adulthood! <http://blogs.xmission.com/legalize/>
Kevin Ollivier
2009-05-05 17:06:32 UTC
Permalink
Hi Robin,
Post by Robin Dunn
Hi all,
We've once again had a CPU load spike that caused the hosting
provider to shut down the VPS, and once again it appears to be
related to the mail server. I've put what I think are reasonable
limits on the number of processes that postfix can spawn, and then
dropped the count by a few more. (The defaults were 100, and I've
had them set at 25, 15, then 10, and now 8.) The spikes are less
frequent than they were before I started investigating this, but
they can still happen. The problem seems to be that since managing
the mail queues can be fairly disk intensive then the system can get
into the state where more than one process is waiting on the disk,
and that significantly raises the load average even though there is
no real amount of user CPU being used by the processes. The htop
tool will report several processes in the 'D' state (uninteruptible
sleep) with less than 1% CPU, but the total system or kernel CPU
utilization is real high. Also, when other non-mail processes like
apache need to access the disk at the time of the spike they have to
wait for the mail processes to finish their disk access so they go
into the 'D' state too.
Anyway, I'm contemplating moving the mail lists to Google Groups and
am seeking opinions about doing this. One of the main reasons I am
considering Google Groups over other possible solutions is because
they are at least partially to blame for our spikes. Since about
bogged down and they start deferring delivery of the messages then
the number of messages waiting in our active queue skyrockets. Here
T 5 10 20 40 80 160 320 640 1280 1280+
TOTAL 5271 170 702 1068 1071 576 1467 40 16 9 152
gmail.com 2682 94 339 533 469 331 916 0 0 0 0
softwaremagic.net 86 0 0 0 1 0 0 2 3 5 75
yahoo.com 82 3 13 18 9 5 34 0 0 0 0
[snip]

Wow, gmail is about 2600% higher than any other host. :( And, of
course, the trend of people using gmail will likely only continue to
grow, meaning those delivery queues could get much worse.

Even given the downside you mention, I think it's best to try a move
to Google Groups before we seriously look at alternatives. The free
ones may penalize us for having these sort of mail queues or may have
poor performance (I can't imagine a server running multiple lists like
this on it...), and the paid ones require payment. ;-) I don't think
you should be purchasing separate servers / services for each thing wx
needs - the bill will really start to add up, and although I'd hope
donations would help cover it, you never know how things will pan out.
Post by Robin Dunn
The columns represent the "message bucket" the messages are stored
in where the numbers on the top row are the number of minutes that
the messages in that bucket have been undeliverable. (The first
column of numbers is the totals.) So it should be obvious that at
this moment a very disproportionate amount of problem deliveries are
headed to gmail, and it has been that way for about 3 hours. At
least a few of the other times that we have had load spikes the
queue shape has looked similar. Normally the active queue will only
have between zero and a few dozen pending message deliveries and the
qshape status shown above would be mostly empty.
So anyway, I'm thinking that moving the lists to Google will likely
at least result in better delivery to the gmail addresses, and they
should have the resources to manage providing good delivery to the
non gmail addresses too. It also adds the ability to read and
respond to messages via just the web browser without actual mail
messages being delivered to the inbox, (making it more like forums)
for people that like that approach.
Now the downside: There doesn't seem to be a supported way to
import list archives into a google group. Some searching turns up
some hacks to do it, involving subscribing an address to both the
current list and the google group and then requesting a resend of
messages to the address. It has to be done carefully though since
Google will block message floods. Since we have a few hundred
thousand messages in our archives this is not something I want to
spend a lot of time on, or have to babysit. Anybody have any
experience with this, or know of an automated way to do it?
If we can get a list of the messages in the archives into Python
(downloading the mbox files should do it), then we could just have it
loop through the messages and sleep for a second or two after each.
This should avoid a flood, but we should probably have it record the
last successful message (and which mbox it was in) each time so that,
if the script fails, we can restart where we left off.
Post by Robin Dunn
On the other hand, we do have good 3rd party archives already
available (gmane, nabble, etc.) with at least the recent history in
them, so it would be feasible to just decide to not bother importing
them into the Google groups.
Yeah, the above aside ;-) , I'm not sure it's worth bothering with
either.

Thanks,

Kevin
Post by Robin Dunn
Thoughts? Comments? Opinions? Alternatives? I would also be
interested in hearing from anybody who has managed multiple medium
to high traffic lists at google groups about what you like or
dislike, and if you are happy that you chose to use GG.
--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!
_______________________________________________
wx-users mailing list
http://lists.wxwidgets.org/mailman/listinfo/wx-users
Robin Dunn
2009-05-05 18:32:16 UTC
Permalink
Post by Vadim Zeitlin
RD> We've once again had a CPU load spike that caused the hosting provider
RD> to shut down the VPS, and once again it appears to be related to the
RD> mail server. I've put what I think are reasonable limits on the number
RD> of processes that postfix can spawn, and then dropped the count by a few
RD> more. (The defaults were 100, and I've had them set at 25, 15, then 10,
RD> and now 8.)
I wonder if switching to Exim could help. I've never had any shadow of
such problems with it, although to be fair I never used it with that many
messages neither. OTOH, in spite of my dislike for Postfix, I have to admit
that it really looks more like a Linux kernel problem than a Postfix one,
it really should be able to handle disk accesses better
Yeah, I had that same impression when I was researching load problems
with postfix.
Post by Vadim Zeitlin
RD> Anyway, I'm contemplating moving the mail lists to Google Groups and am
RD> seeking opinions about doing this.
What are the alternatives?
There's Yahoo Groups.
Post by Vadim Zeitlin
It's clear that we can't continue to live with
this kind of problems all the time so we need to do something. And I don't
think anybody here can host the mailing lists (I have some spare hardware
but surely not enough bandwidth for them). So we need to host them
somewhere and Google is surely the best one among the free alternatives
(or does anybody seriously consider putting the lists on MSN?). As for paid
hosting, I don't know how good is it but your experience with the current
server seems to be rather dissuasive to me. OTOH maybe it's simply not the
best one... I've heard a lot of good things about DreamHost and they do
claim to provide "unlimited" mailing lists using mailman for $6/month. Has
anybody here had any experience with them?
RD> One of the main reasons I am considering Google Groups over other
RD> possible solutions is because they are at least partially to blame for
RD> our spikes.
This is weird, I've never had such huge delays with gmail addresses.
Again, I don't have nearly the same amount of traffic, of course, so maybe
it starts refusing messages after some threshold.
RD>
RD> T 5 10 20 40 80 160 320 640 1280 1280+
RD> TOTAL 5271 170 702 1068 1071 576 1467 40 16 9 152
RD> gmail.com 2682 94 339 533 469 331 916 0 0 0 0
RD> softwaremagic.net 86 0 0 0 1 0 0 2 3 5 75
...
Whoa, where do these thousands of messages come from?
On wx-dev for example there are 156 gmail addresses, so multiply that
times every wx-dev message including all the wxTrac updates. I think
the count shown above is the actual destination addresses, but the mail
server will coalesce the messages and only send one copy of the message
for every N addresses at the same domain. (I think N is something like
50.) But gmail will still have to process the message for all the
destination addresses on their end, so they may treat those messages
differently and throttle them at an earlier thresholds.
Post by Vadim Zeitlin
I haven't received
more than a couple of dozens wx messages yesterday, are there any high
volume wx mailing lists I'm not subscribed to?
The message buckets were all emptied by early evening yesterday, but now
they are stacking up again today with about half for gmail.com.
Post by Vadim Zeitlin
RD> So anyway, I'm thinking that moving the lists to Google will likely at
RD> least result in better delivery to the gmail addresses, and they should
RD> have the resources to manage providing good delivery to the non gmail
RD> addresses too. It also adds the ability to read and respond to messages
RD> via just the web browser without actual mail messages being delivered to
RD> the inbox, (making it more like forums) for people that like that approach.
Notice that you need a Google account to use these features however
(although you can subscribe to the list normally without it). Not a real
problem in practice probably but I just was rather surprised that I
couldn't disable email delivery for a mailing list I was subscribed to
under a non-gmail address (I finally could do it by adding this other
address as alternative address to my Google account and unsubscribing from
the list and resubscribing again but this wasn't exactly straightforward).
I discovered yesterday that you can create a GG account using a
non-gmail address as your user name, so it should be possible to access
all the features without needing people to use gmail. Not sure how long
they've allowed that.
Post by Vadim Zeitlin
RD> Now the downside: There doesn't seem to be a supported way to import
RD> list archives into a google group.
It's a pity, I liked the idea of migrating to Google Groups because I
already prefer to use Google search for wx-users (which is already indexed
there as Usenet group) rather using gmane which is nicer for browsing but
rather bad for searching.
BTW, what about wx-users Usenet gateway, how is this going to work if the
mailing list itself is hosted on Google Groups?
I don't know. I've been thinking a bit about it but haven't come up
with any ideas yet that are worth pursuing.
--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!
Marcin 'Malcom' Malich
2009-05-05 19:19:37 UTC
Permalink
Post by Robin Dunn
Post by Vadim Zeitlin
RD> Now the downside: There doesn't seem to be a supported way to import
RD> list archives into a google group.
It's a pity, I liked the idea of migrating to Google Groups because I
already prefer to use Google search for wx-users (which is already indexed
there as Usenet group) rather using gmane which is nicer for browsing but
rather bad for searching.
BTW, what about wx-users Usenet gateway, how is this going to work if the
mailing list itself is hosted on Google Groups?
I don't know. I've been thinking a bit about it but haven't come up
with any ideas yet that are worth pursuing.
Curently all msg from wx-user are being forwarded to usenet group
comp.soft-sys.wxwindows, so do we have any reason not to use directly
this group instead of wx-user?

For wx-dev and other group, google/yahoo group should be best way imo.

--
Pozdrowienia,
Marcin 'Malcom' Malich
***@malcom.pl
http://malcom.pl
Bob Paddock
2009-05-06 11:30:47 UTC
Permalink
Post by Marcin 'Malcom' Malich
Curently all msg from wx-user are being forwarded to usenet group
comp.soft-sys.wxwindows, so do we have any reason not to use directly
this group instead of wx-user?
Some paranoid Admins ban the use of usenet.
legalize+ (Richard)
2009-05-06 20:25:22 UTC
Permalink
[Please do not mail me a copy of your followup]
Post by Bob Paddock
Post by Marcin 'Malcom' Malich
Curently all msg from wx-user are being forwarded to usenet group
comp.soft-sys.wxwindows, so do we have any reason not to use directly
this group instead of wx-user?
Some paranoid Admins ban the use of usenet.
There are plenty of news servers you can access elsewhere if your
local admin doesn't want to run usenet.

XMission provides a full usenet feed, its how I read this list as a
newsgroup. I'm a very satisfied customer and have been for about 15
years.

<http://www.xmission.com>
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>

Legalize Adulthood! <http://blogs.xmission.com/legalize/>
Marcin 'Malcom' Malich
2009-05-06 20:42:01 UTC
Permalink
Post by legalize+ (Richard)
Post by Bob Paddock
Some paranoid Admins ban the use of usenet.
There are plenty of news servers you can access elsewhere if your
local admin doesn't want to run usenet.
XMission provides a full usenet feed, its how I read this list as a
newsgroup.  I'm a very satisfied customer and have been for about 15
years.
<http://www.xmission.com>
Or any web gateway for usenet, like google groups.
I'm using it for exploring usenet ;)

--
Pozdrowienia,
Marcin 'Malcom' Malich
***@malcom.pl
http://malcom.pl

Loading...