Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Tue Oct 09, 2012 11:12 pm 
Offline
Senior Newbie

Joined: Tue Oct 09, 2012 3:57 pm
Posts: 8
I have a 768 Linode on a CentOS LAMP running Drupal Aegir with just 3 Open Atrium sites, and about 10-20 concurrent users. At random times, it pukes Out of Memory kill. I can't figure out what's causing it. I'm not sure if i need to do some memory usage tweaking on my CentOS LAMP stack. I need to get this under control quickly or management is going to kill this project.. After a couple hours of a crash and burn OOM, here's some info. To my eye, it doesn't look like anything is wrong... No? Please help. thank you

First the ugly OOM output

ck:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? y
es
lowmem_reserve[]: 0 700 754 754
Normal free:5060kB min:3348kB low:4184kB high:5020kB active_anon:339984kB inactive_anon:340208kB act
ive_file:0kB inactive_file:0kB unevictable:1408kB isolated(anon):0kB isolated(file):0kB present:7172
88kB mlocked:1408kB dirty:0kB writeback:0kB mapped:604kB shmem:28kB slab_reclaimable:7776kB slab_unr
eclaimable:8448kB kernel_stack:1128kB pagetables:4404kB unstable:0kB bounce:0kB writeback_tmp:0kB pa
ges_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 0 428 428
HighMem free:116kB min:128kB low:192kB high:256kB active_anon:17624kB inactive_anon:17932kB active_f
ile:0kB inactive_file:0kB unevictable:3136kB isolated(anon):0kB isolated(file):0kB present:54868kB m
locked:3136kB dirty:0kB writeback:0kB mapped:3080kB shmem:0kB slab_reclaimable:0kB slab_unreclaimabl
e:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2 all_
unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
DMA: 6*4kB 2*8kB 7*16kB 8*32kB 8*64kB 7*128kB 5*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3096kB
Normal: 639*4kB 297*8kB 9*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50
76kB
HighMem: 6*4kB 1*8kB 0*16kB 3*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 128kB
56664 total pagecache pages
55697 pages in swap cache
Swap cache stats: add 17100319, delete 17044622, find 7567477/9043262
Free swap = 0kB
Total swap = 524284kB
198640 pages RAM
13826 pages HighMem
6639 pages reserved
14545 pages shared
188067 pages non-shared
Out of memory: Kill process 3448 (httpd) score 41 or sacrifice child
Killed process 3448 (httpd) total-vm:74716kB, anon-rss:26132kB, file-rss:1120kB

Processes running

[root@li21-298 ~]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2208 492 ? Ss 19:34 0:00 init [3]
root 2 0.0 0.0 0 0 ? S 19:34 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 19:34 0:00 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S 19:34 0:00 [kworker/0:0]
root 5 0.0 0.0 0 0 ? S 19:34 0:00 [kworker/u:0]
root 6 0.0 0.0 0 0 ? S 19:34 0:00 [migration/0]
root 7 0.0 0.0 0 0 ? S 19:34 0:00 [migration/1]
root 9 0.0 0.0 0 0 ? S 19:34 0:00 [ksoftirqd/1]
root 10 0.0 0.0 0 0 ? S 19:34 0:00 [migration/2]
root 12 0.0 0.0 0 0 ? S 19:34 0:00 [ksoftirqd/2]
root 13 0.0 0.0 0 0 ? S 19:34 0:00 [migration/3]
root 15 0.0 0.0 0 0 ? S 19:34 0:00 [ksoftirqd/3]
root 16 0.0 0.0 0 0 ? S< 19:34 0:00 [cpuset]
root 17 0.0 0.0 0 0 ? S< 19:34 0:00 [khelper]
root 18 0.0 0.0 0 0 ? S 19:34 0:00 [kdevtmpfs]
root 19 0.0 0.0 0 0 ? S 19:34 0:00 [kworker/u:1]
root 21 0.0 0.0 0 0 ? S 19:34 0:00 [xenwatch]
root 22 0.0 0.0 0 0 ? S 19:34 0:00 [xenbus]
root 162 0.0 0.0 0 0 ? S 19:34 0:00 [sync_supers]
root 164 0.0 0.0 0 0 ? S 19:34 0:00 [bdi-default]
root 166 0.0 0.0 0 0 ? S< 19:34 0:00 [kblockd]
root 176 0.0 0.0 0 0 ? S 19:34 0:00 [kworker/3:1]
root 178 0.0 0.0 0 0 ? S< 19:34 0:00 [md]
root 262 0.0 0.0 0 0 ? S< 19:34 0:00 [rpciod]
root 263 0.0 0.0 0 0 ? S 19:34 0:00 [kworker/2:1]
root 275 0.0 0.0 0 0 ? S 19:34 0:04 [kswapd0]
root 276 0.0 0.0 0 0 ? SN 19:34 0:00 [ksmd]
root 277 0.0 0.0 0 0 ? S 19:34 0:00 [fsnotify_mark]
root 281 0.0 0.0 0 0 ? S 19:34 0:00 [ecryptfs-kthr]
root 283 0.0 0.0 0 0 ? S< 19:34 0:00 [nfsiod]
root 284 0.0 0.0 0 0 ? S< 19:34 0:00 [cifsiod]
root 287 0.0 0.0 0 0 ? S 19:34 0:00 [jfsIO]
root 288 0.0 0.0 0 0 ? S 19:34 0:00 [jfsCommit]
root 289 0.0 0.0 0 0 ? S 19:34 0:00 [jfsCommit]
root 290 0.0 0.0 0 0 ? S 19:34 0:00 [jfsCommit]
root 291 0.0 0.0 0 0 ? S 19:34 0:00 [jfsCommit]
root 292 0.0 0.0 0 0 ? S 19:34 0:00 [jfsSync]
root 293 0.0 0.0 0 0 ? S< 19:34 0:00 [xfsalloc]
root 294 0.0 0.0 0 0 ? S< 19:34 0:00 [xfs_mru_cache]
root 295 0.0 0.0 0 0 ? S< 19:34 0:00 [xfslogd]
root 296 0.0 0.0 0 0 ? S< 19:34 0:00 [glock_workque]
root 297 0.0 0.0 0 0 ? S< 19:34 0:00 [delete_workqu]
root 298 0.0 0.0 0 0 ? S< 19:34 0:00 [gfs_recovery]
root 299 0.0 0.0 0 0 ? S< 19:34 0:00 [crypto]
root 862 0.0 0.0 0 0 ? S 19:34 0:00 [khvcd]
root 976 0.0 0.0 0 0 ? S< 19:34 0:00 [kpsmoused]
root 1016 0.0 0.0 0 0 ? S< 19:34 0:00 [deferwq]
root 1019 0.0 0.0 0 0 ? S 19:34 0:00 [kjournald]
root 1023 0.0 0.0 0 0 ? S 19:34 0:00 [kworker/1:1]
root 1044 0.0 0.0 0 0 ? S 19:34 0:00 [kauditd]
root 1077 0.0 0.0 2424 356 ? S<s 19:34 0:00 /sbin/udevd -d
root 2692 0.0 0.0 2452 40 ? Ss 19:34 0:00 /sbin/dhclient
root 2759 0.0 0.0 10624 420 ? S<sl 19:34 0:00 auditd
root 2761 0.0 0.0 11184 444 ? S<sl 19:34 0:00 /sbin/audispd
root 2781 0.0 0.0 1964 532 ? Ss 19:34 0:00 syslogd -m 0
root 2784 0.0 0.0 1808 288 ? Ss 19:34 0:00 klogd -x
named 2825 0.0 0.1 58936 1032 ? Ssl 19:34 0:00 /usr/sbin/named
dbus 2847 0.0 0.0 2896 504 ? Ss 19:34 0:00 dbus-daemon --s
root 2884 0.0 0.0 23272 524 ? Ssl 19:34 0:00 automount
root 2903 0.0 0.0 7256 632 ? Ss 19:34 0:00 /usr/sbin/sshd
ntp 2917 0.0 0.5 4548 4544 ? SLs 19:34 0:00 ntpd -u ntp:ntp
root 2928 0.0 0.0 5344 160 ? Ss 19:34 0:00 /usr/sbin/vsftp
root 2964 0.0 0.0 4676 572 ? S 19:34 0:00 /bin/sh /usr/bi
root 3018 0.0 0.0 0 0 ? S 19:34 0:00 [flush-202:0]
mysql 3057 7.9 1.7 126808 13296 ? Sl 19:34 14:23 /usr/libexec/my
root 3089 0.0 0.0 9372 696 ? Ss 19:34 0:00 sendmail: accep
smmsp 3097 0.0 0.0 8284 336 ? Ss 19:34 0:00 sendmail: Queue
root 3106 0.0 0.0 2044 152 ? Ss 19:34 0:00 gpm -m /dev/inp
root 3115 0.0 0.1 27820 1320 ? Ss 19:34 0:00 /usr/sbin/httpd
root 3123 0.0 0.0 5380 552 ? Ss 19:34 0:00 crond
xfs 3141 0.0 0.0 3308 436 ? Ss 19:34 0:00 xfs -droppriv -
apache 3235 0.0 4.6 56924 35484 ? S 19:34 0:03 /usr/sbin/httpd
apache 3239 0.0 5.7 72740 44192 ? S 19:34 0:04 /usr/sbin/httpd
apache 3240 0.0 4.4 56636 34068 ? S 19:34 0:02 /usr/sbin/httpd
apache 3241 0.0 4.1 52836 31660 ? S 19:34 0:01 /usr/sbin/httpd
apache 3242 0.0 3.9 52800 30372 ? S 19:34 0:01 /usr/sbin/httpd
apache 3243 0.0 4.0 52788 31428 ? S 19:34 0:02 /usr/sbin/httpd
apache 3244 0.0 4.4 56924 34556 ? S 19:34 0:02 /usr/sbin/httpd
apache 3245 0.0 4.5 57196 34828 ? S 19:34 0:03 /usr/sbin/httpd
root 3264 0.0 0.0 2408 180 ? Ss 19:34 0:00 /usr/sbin/atd
root 3279 0.0 0.2 26680 2192 ? SN 19:34 0:00 /usr/bin/python
root 3281 0.0 0.0 2704 536 ? SN 19:34 0:00 /usr/libexec/ga
root 3282 0.0 0.2 19420 1660 ? Ss 19:34 0:00 /usr/bin/perl /
apache 3491 0.0 3.3 52792 26060 ? S 19:35 0:01 /usr/sbin/httpd
apache 3492 0.0 4.8 59656 37352 ? S 19:35 0:02 /usr/sbin/httpd
apache 3493 0.0 4.5 56956 34564 ? S 19:35 0:01 /usr/sbin/httpd
apache 3494 0.0 4.0 52788 31128 ? S 19:35 0:03 /usr/sbin/httpd
root 5343 0.0 0.0 3028 624 ? Ss 19:49 0:00 login -- root
root 5796 0.0 0.0 4808 604 hvc0 Ss 19:53 0:00 -bash
root 6054 0.0 0.0 0 0 ? S 19:55 0:00 [kworker/0:2]
root 6583 0.0 0.0 4320 352 hvc0 S+ 20:00 0:00 less
root 21913 0.0 0.0 0 0 ? S 21:53 0:00 [kworker/2:0]
root 22407 0.0 0.0 0 0 ? S 21:57 0:00 [kworker/3:0]
root 23117 0.0 0.0 0 0 ? S 22:02 0:00 [kworker/1:0]
root 24625 0.0 0.3 10264 2656 ? Ss 22:13 0:00 sshd: root@nott
root 24628 0.0 0.1 6692 1536 ? Ss 22:13 0:00 /usr/libexec/op
root 26703 0.0 0.3 10108 2932 ? Rs 22:28 0:00 sshd: root@pts/
root 26812 0.0 0.1 4812 1452 pts/0 Ss 22:29 0:00 -bash
root 27493 0.0 0.0 0 0 ? S 22:34 0:00 [kworker/1:2]
root 27494 0.0 0.1 4400 932 pts/0 R+ 22:34 0:00 ps aux


Free usage stats

[root@li21-298 ~]# free -m
total used free shared buffers cached
Mem: 750 497 252 0 6 77
-/+ buffers/cache: 413 336
Swap: 511 87 424


[root@li21-298 ~]# free
total used free shared buffers cached
Mem: 768004 490708 277296 0 8680 79892
-/+ buffers/cache: 402136 365868
Swap: 524284 73136 451148


List of running processes sorted by memory use

# ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r | less

%MEM %CPU RSS VSZ COMMAND
4.9 0.0 38240 59656 /usr/sbin/httpd
4.7 0.0 36324 72740 /usr/sbin/httpd
4.6 0.0 35724 56940 /usr/sbin/httpd
4.6 0.0 35512 56924 /usr/sbin/httpd
4.2 0.0 32676 56924 /usr/sbin/httpd
4.2 0.0 32312 56380 /usr/sbin/httpd
3.8 0.0 29604 52800 /usr/sbin/httpd
3.7 0.0 29024 52792 /usr/sbin/httpd
3.7 0.0 28992 52788 /usr/sbin/httpd
3.7 0.0 28664 52788 /usr/sbin/httpd
1.8 7.8 13928 127624 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
0.9 0.0 7260 52836 /usr/sbin/httpd
0.5 0.0 4544 4548 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
0.3 0.0 2940 10108 sshd: root@pts/0
0.3 0.0 2680 26680 /usr/bin/python -tt /usr/sbin/yum-updatesd
0.2 0.0 2208 53236 /usr/sbin/httpd
0.2 0.0 1660 19420 /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
0.1 0.0 1460 4812 -bash
0.1 0.0 1320 27820 /usr/sbin/httpd
:


Type of MPM in use by Apache

[root@li21-298 ~]# httpd -V | grep 'MPM'
Server MPM: Prefork
-D APACHE_MPM_DIR="server/mpm/prefork"

Current settings in my httpd.conf file (/etc/httpd/httpd.conf)

<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

Current settings in mysql (located /etc/my.cnf)

[mysqld]
max_allowed_packet=50M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql


Top
   
PostPosted: Tue Oct 09, 2012 11:27 pm 
Offline
Senior Member

Joined: Fri Jan 09, 2009 5:32 pm
Posts: 634
Your MaxClients is way too high.


Top
   
PostPosted: Tue Oct 09, 2012 11:36 pm 
Offline
Senior Newbie

Joined: Tue Oct 09, 2012 3:57 pm
Posts: 8
Thanks. What you suggest max clients to be? I'm not sure I ever really understood that setting. We have Aegir spiting out Open Atrium sites. As I mentioned, we only have 3 sites, but would like to plan to scale to at least 50


Top
   
PostPosted: Wed Oct 10, 2012 1:25 am 
Offline
Senior Member
User avatar

Joined: Sun Dec 27, 2009 11:12 pm
Posts: 1038
Location: Colorado, USA
Drupal is a resource hog, if you plan on scaling to 50+ plan on getting more VPS's.

Search the forum for MaxClients - it's been discussed many many times.

_________________
Either provide enough details for people to help, or sit back and listen to the crickets chirp.
Security thru obscurity is a myth - and really really annoying.


Top
   
PostPosted: Wed Oct 10, 2012 11:06 am 
Offline
Senior Newbie

Joined: Tue Oct 09, 2012 3:57 pm
Posts: 8
I'll reserach maxclients a bit more. I agree Drupal is resourse hog, but I have a huge beefy server with 1GB RAM and just a few sites and a handful of users. Yet I still get OOM kills at random times. See linode I/O graphs below. Things look nice and calm at the moment. I'm suspecting Apache is the culprit, and/or maybe some php process that goes crazy at weird times. A hunch tells me I have a bunch of sys generated tasks piling up and banging things around. But I'm at a loss as to how to root cause this problem and where to look. I've installed systat and ran some basic querries. All looks pretty good, would you agree? But you can see from my history chart that things go crazy every once in a while. Any tips on how I can dig into this?

# iostat

avg-cpu: %user %nice %system %iowait %steal %idle
6.11 0.00 2.29 1.00 0.75 89.84

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvda 11.79 289.42 58.40 509698 102848
xvdb 0.01 0.50 0.00 872 0

# iostat -d -x 2 5

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.65 4.58 8.64 3.02 281.50 60.80 29.36 0.18 15.87 4.58 5.34
xvdb 0.05 0.00 0.01 0.00 0.48 0.00 48.44 0.00 8.72 5.22 0.01

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 2.50 0.00 1.50 0.00 32.00 21.33 0.01 9.00 9.00 1.35
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 5.50 0.00 11.50 0.00 136.00 11.83 0.75 65.39 14.30 16.45
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 1.00 0.00 1.50 0.00 20.00 13.33 0.06 41.67 41.67 6.25
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 2.00 0.00 4.50 0.00 52.00 11.56 0.11 25.22 25.22 11.35
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

# mpstat

10:52:39 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
10:52:39 AM all 6.24 0.00 2.17 0.95 0.00 0.15 0.72 89.77 1230.74

24Hr Avg I/O
[Image

30Day Avg I/O
Image


Top
   
PostPosted: Wed Oct 10, 2012 11:02 pm 
Offline
Senior Member
User avatar

Joined: Sun Jan 18, 2009 2:41 pm
Posts: 830
In rough numbers, each of your Apache processes is using 4.5% of your Linode's memory. It takes only 23 simultaneous page requests to cause 23 simultaneous Apache processes to be started, using up all your memory and forcing your Linode to start swapping. When this will occur cannot be predicted in advance, because you don't know exactly when a flood of requests will arrive (unless you cause it yourself, of course).

Reduce MaxClients and ServerLimit to a number less than 20. If more requests than this number come in, they will enter a queue and wait patiently for the few milliseconds it takes the request ahead of them to be served. If your users experience significant delays in page loading, consider reducing KeepAliveTimeout or disabling keepalives entirely with KeepAlive Off.


Top
   
PostPosted: Thu Oct 11, 2012 6:32 am 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
Vance wrote:
It takes only 23 simultaneous page requests to cause 23 simultaneous Apache processes to be started


It's worse than that, even. 23 simultaneous requests, period. Watch your waterfall chart as you load a web page: you're doing more than one concurrent request per page, more often than not.

Alternate answer: PHP-FPM, if your distro supports it. mod_php ruins Apache's performance.

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group