Azathoth wrote:
What is weird in all of this is that you have 10+ hours of reduced bandwidth, no visibly significant growth in CPU consumption (albeit there is a visible growth trend toward the end) and then suddenly collapse with lots of disk IO and swapping. I could be wrong, but somehow I doubt Apache OOMed, it would start swapping immediately, not after 10 hours, unless I'm gravely mistaken...
Syslog (which had better info than logview) shows what must be 100+ httpd processes whereas I rarely see more than 20 on top. So unless I'm interpreting it wrong I think something really did make Apache fire up everything.
In Apache's error_log there are a lot of:
Code:
[Mon Nov 28 17:50:11 2011] [error] [client 0.0.0.0] mod_wsgi (pid=7268): Exception occurred processing WSGI script '/home/kyle/example.com/bin/django.wsgi'.
[Mon Nov 28 17:50:11 2011] [error] [client 0.0.0.0] IOError: failed to write data
But I'm pretty sure that just happens when a client closes their connection before receiving the result. If it happens in some places Django will actually email you the exception.
Five minutes before things go wrong a lot of these start appearing in access_log and sometimes appear in bundles of 5-10:
Code:
::1 - - [30/Nov/2011:21:02:28 +0000] "OPTIONS * HTTP/1.0" 200 - "-" "Apache (internal dummy connection)"
A couple seconds later these start appearing (32 in total) in the error_log:
Code:
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] mod_wsgi (pid=32437): Exception occurred processing WSGI script '/home/kyle/example.com/bin/django.wsgi'., referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] Traceback (most recent call last):, referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] File "/home/kyle/example.com/eggs/Django-1.3.1-py2.7.egg/django/core/handlers/wsgi.py", line 274, in __call__, referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] signals.request_finished.send(sender=self.__class__), referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] File "/home/kyle/example.com/eggs/Django-1.3.1-py2.7.egg/django/dispatch/dispatcher.py", line 172, in send, referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] response = receiver(signal=self, sender=sender, **named), referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] File "/home/kyle/example.com/eggs/Django-1.3.1-py2.7.egg/django/db/__init__.py", line 85, in close_connection, referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] conn.close(), referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] File "/home/kyle/example.com/eggs/Django-1.3.1-py2.7.egg/django/db/backends/__init__.py", line 244, in close, referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] self.connection.close(), referer: http://blog.blogspot.com/
[Wed Nov 30 21:02:42 2011] [error] [client 0.0.0.0] InterfaceError: connection already closed, referer: http://blog.blogspot.com/
This seems to be a Django signal sent when a request is finished to close the database connection. Since it's sent right at the end it's presumably outside of Django's error handling and so I received no emails and nothing was logged in the app itself.
Then finally it happens:
Code:
[Wed Nov 30 21:07:25 2011] [error] server reached MaxClients setting, consider raising the MaxClients setting
So I'm thinking, PostgreSQL fell over which also hung all the Apache processes connected to it and so Apache launched new processes and quickly ran out of memory. In this case lowering MaxClients would only be half the fix.
PostgreSQL doesn't seem to log to a file by default.