I'm building a Django app on my 512 Linode, and I am running into Swap and I/O problems. Right now, only two of us are using/testing the site. Every 30 minutes, I run a cron job that launches a number of celery tasks that fetch a lot of data online and store the new records in a MySQL database. The site then provides an interface to access that data.
If I don't reboot for a day or two, the site becomes very slow. I noticed that after a reboot, the swap usage slowly climbs after each time the cron job runs (in increments of 3 to 11 percentage points for each cron job), while the memory usage holds steady at about 44%. I also get I/O warning emails occasionally, especially if it's been awhile since a reboot.
It also seems like some tasks are hanging. For instance, I just ran:
Code:
ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r | less
And got the following output:
Code:
%MEM %CPU RSS VSZ COMMAND
8.2 1.1 40892 194712 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --l
oglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
7.2 1.1 36376 190448 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --l
oglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
7.1 1.2 35548 190368 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
7.0 1.2 35192 189532 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
7.0 1.1 35364 189356 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
7.0 1.1 34992 189624 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
6.8 1.1 33980 194196 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
6.2 1.1 31292 187828 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
5.6 0.3 28388 477988 /usr/sbin/apache2 -k start
5.4 0.3 26988 189016 /home/dave/.virtualenvs/rsproject/bin/python /srv/www/redacted.com/application/rsproject/manage.py celeryd --time-limit=300 --loglevel=INFO --concurrency=8 -n w1.rs --logfile=/var/log/celery/w1.log --pidfile=/var/run/celery/w1.pid
21.3 1.9 106192 1411040 /usr/sbin/mysqld
1.9 0.1 9512 2140256 /usr/lib/erlang/erts-5.8.5/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -noshell -noinput -sname rabbit@rs -boot /var/lib/rabbitmq/mnesia/rabbit@rs-plugins-expand/rabbit -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit@rs.log"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@rs-sasl.log"} -os_mon start_cpu_sup true -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@rs"
1.4 0.0 7460 27452 -bash
0.5 0.0 2764 196076 /usr/bin/python /usr/bin/fail2ban-server -b -s /var/run/fail2ban/fail2ban.sock
0.4 0.0 2116 1821748 /usr/sbin/apache2 -k start
0.4 0.0 2016 1952708 /usr/sbin/apache2 -k start
0.2 0.0 1192 330208 /usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1
0.2 0.0 1000 9728 ps -eo pmem,pcpu,rss,vsize,args
0.1 0.0 968 73360 sshd: dave@pts/0
0.1 0.0 900 73360 sshd: dave [priv]
0.1 0.0 844 21816 sort -k 1 -r
0.1 0.0 724 90904 /usr/sbin/apache2 -k start
0.1 0.0 712 185508 whoopsie
0.1 0.0 612 37696 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 106:113
0.1 0.0 608 90636 /usr/sbin/apache2 -k start
It seems to me that those celeryd processes shouldn't still be in memory (that snapshot was taken safely between cron jobs). And it looks to me as if it's the process of writing to the logs that is an issue. I see those logs show up when I run iotop too (as well as mysqld). In fact, here is a snap shot of iotop while the cron job is running (items bounc around, so I don't know if this is "representative" or not... before, I had seen many more celery log tasks):
Code:
Total DISK READ: 29.78 K/s | Total DISK WRITE: 420.20 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
2761 be/4 mysql 3.31 K/s 52.94 K/s 0.00 % 7.77 % mysqld
3482 be/4 mysql 0.00 B/s 26.47 K/s 0.00 % 5.09 % mysqld
4051 be/4 mysql 3.31 K/s 36.40 K/s 0.00 % 2.36 % mysqld
1267 be/4 root 0.00 B/s 6.62 K/s 0.00 % 1.79 % [kjournald]
3481 be/4 mysql 0.00 B/s 36.40 K/s 0.00 % 1.68 % mysqld
4390 be/4 mysql 3.31 K/s 46.32 K/s 0.00 % 1.10 % mysqld
3487 be/4 mysql 0.00 B/s 16.54 K/s 0.00 % 0.30 % mysqld
4448 be/4 mysql 3.31 K/s 29.78 K/s 0.00 % 0.30 % mysqld
2538 be/4 mysql 0.00 B/s 29.78 K/s 0.00 % 0.19 % mysqld
3083 be/4 root 0.00 B/s 3.31 K/s 0.00 % 0.00 % python /srv/www/redacted.com/application/rsp~elery/w1.log --pidfile=/var/run/celery/w1.pid
2512 be/4 mysql 0.00 B/s 26.47 K/s 0.00 % 0.00 % mysqld
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0]
5 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]
6 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0]
7 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0H]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
9 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
10 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/1:0]
11 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/1:0H]
12 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
13 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2]
14 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/2:0]
15 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/2:0H]
16 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2]
17 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3]
18 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/3:0]
19 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/3:0H]
20 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3]
21 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuset]
22 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [khelper]
23 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kdevtmpfs]
24 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [netns]
25 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:1]
538 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [jfsSync]
28 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [xenwatch]
29 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [xenbus]
Addmittedly, I have not changed my MaxClients setting in apache2.conf, but I'm pretty certain that's not the problem since the swap usage increases during each cron job and we only have two sporadic users. I have adjusted the MySQL settings to these:
Code:
key_buffer = 16K
max_allowed_packet = 1M
thread_stack = 64K
table_cache = 4
sort_buffer = 64K
net_buffer_length = 2K
Another setting I tinkered with recently was setting:
Code:
vm.swappiness=10
in /etc/sysctl.conf, but the problem persists.
I'm pretty new to server admin, so any help at all would be appreciated, and please let me know if you need any other outputs to help diagnose. I'm hoping this can be solved with configs and not increasing the cost of the linode by expanding the available memory. Any ideas?