Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: Diagnosing Lag Spikes
PostPosted: Sun Oct 03, 2010 11:31 am 
Offline
Newbie

Joined: Sun Oct 03, 2010 11:26 am
Posts: 2
I'm running a MUD on my VPS and for the past couple of weeks, I've been noticing pretty pronounced spikes of lag. I started profiling the MUD via gprof, adding various benchmark timing to the code to further profile major loops, database functions, file I/O, etc.

I can't find anything in the MUD itself that would be causing the lag spikes, so I'm turning to the VPS itself or something else running on it. I've noticed that even in a shell over SSH, I notice random bursts of lag. So I can confirm that the problem is affecting both me in my shell and my various MUD players.

I've been looking at "top" to measure CPU and memory usage, but I'm not finding anything out of the ordinary.

Are there are tools or utilities out there I can use to keep track of various system stats over time?

Anyone have any suggestions on how to otherwise track down or diagnose problems like this?


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 4:20 pm 
Offline
Senior Newbie

Joined: Tue Jan 26, 2010 4:25 pm
Posts: 10
I am also having problems with unexpected load, my VPS runs idle most of the time, but not anymore. Any use seems to slow everything down. I also use Linode backups and the time it takes to do backups as increased 4-5 fold. I opened a ticket twice and was told everything is OK despite my high disk i/o.

You wouldn't happed to be on dallas152?


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 5:59 pm 
Offline
Senior Member

Joined: Fri Feb 18, 2005 4:09 pm
Posts: 594
crazylane wrote:
You wouldn't happed to be on dallas152?


zunzun.com is on dallas105 and is seeing very bad lag spikes and lost packets. I thought this was on my my end at first, but not according to my tests here - only linode.

James


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 6:09 pm 
Offline
Newbie

Joined: Sun Oct 03, 2010 11:26 am
Posts: 2
crazylane wrote:
You wouldn't happed to be on dallas152?


Nope, atlanta8.

I haven't done much digging into network stats yet. I can't find any hints in CPU usage, memory usage, or file I/O though, despite the fact that my SSH connection lags at the same time players on my MUD complain, so something is definitely up.


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 6:20 pm 
Offline
Senior Newbie

Joined: Tue Jan 26, 2010 4:25 pm
Posts: 10
The best answer I received from support was that they can move me to a different server. I would rather have a better answer than that, I do not want the downtime at all.


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 6:22 pm 
Offline
Senior Newbie

Joined: Tue Jan 26, 2010 4:25 pm
Posts: 10
Here is my current top:


top - 18:21:03 up 1 day, 9:22, 1 user, load average: 14.26, 10.75, 7.53
Tasks: 190 total, 2 running, 187 sleeping, 0 stopped, 1 zombie
Cpu(s): 1.2%us, 0.9%sy, 0.1%ni, 41.4%id, 56.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1549180k total, 1247216k used, 301964k free, 153940k buffers
Swap: 524280k total, 4180k used, 520100k free, 765600k cached

12017 apache 20 0 34708 17m 3716 R 4.6 1.2 0:00.23 httpd
2889 mysql 20 0 246m 95m 5340 S 3.3 6.3 64:45.48 mysqld
10947 apache 20 0 35316 18m 4752 S 0.7 1.2 0:01.29 httpd
11478 apache 20 0 34156 17m 4292 S 0.7 1.2 0:00.55 httpd
12018 apache 20 0 34164 17m 3812 S 0.7 1.1 0:00.22 httpd
9488 root 20 0 13408 9928 2380 S 0.3 0.6 0:00.51 backup.pl
10686 root 20 0 13140 10m 1560 S 0.3 0.7 0:28.26 lfd
11622 apache 20 0 34208 17m 4412 S 0.3 1.2 0:00.38 httpd
22437 root 39 19 1968 648 284 S 0.3 0.0 1:43.58 gzip
25285 root 20 0 28604 14m 4724 S 0.3 1.0 0:28.19 httpd
28122 root 20 0 2416 1184 828 R 0.3 0.1 0:22.29 top
1 root 20 0 2152 604 560 S 0.0 0.0 0:00.38 init


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 7:02 pm 
Offline
Senior Member
User avatar

Joined: Tue Nov 24, 2009 1:59 pm
Posts: 362
NO idea if this is related, but since a few days (not sure... less than a week, but I might not be watching dstat when doing these loads earlier) I'm noticing way larger IO contention - whenever I hit data that's not in cache, I get whole CPU core stuck in iowaits, and the disk read speed is measured in hundreds of KB, not as before tens of MB. Exact same files (large database used once a day), so it's something on the host disk subsystem. Actually, even launching up emacs seems to be getting stuck in iowaits for quite a few seconds, and this is not my SSH connection, as other half of split-screen still scrolls the dstat.
I'm on newark10.

Try running one of the line-per-second monitoring tools (maybe in a screen session?), like 'vmstat 1' or 'dstat -c', and look for large amounts of CPU time spent in iowaits when the stalls happen.


Top
   
 Post subject:
PostPosted: Sun Oct 03, 2010 8:15 pm 
Offline
Senior Newbie

Joined: Tue Jan 26, 2010 4:25 pm
Posts: 10
Anything intensive rsync, tar, gzip just totally kills the vps.


procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 5 3956 270452 170008 785656 0 0 23 16 3 5 1 0 75 24 0
0 6 3956 265384 170044 785632 0 0 36 0 824 401 1 1 30 69 0
0 7 3956 261408 170088 785700 0 0 32 92 370 164 0 0 42 57 0
0 9 3956 254920 170120 785668 0 0 32 0 508 168 1 1 9 89 0
0 9 3956 252740 170160 785700 0 0 40 0 212 91 0 0 35 65 0
1 9 3956 243120 170172 785700 0 0 12 0 406 162 1 1 0 98 0
0 10 3956 236460 170188 785700 0 0 16 0 972 501 1 1 41 56 0
0 11 3956 236460 170224 785664 0 0 32 236 166 82 0 0 0 100 0
0 11 3956 236460 170260 785700 0 0 36 0 43 71 0 0 38 62 0
0 11 3956 236460 170284 785676 0 0 24 0 45 54 0 0 0 100 0
0 11 3956 236540 170320 785704 0 0 36 0 69 71 0 0 40 60 0
0 11 3956 236540 170356 785668 0 0 32 8 75 60 0 0 0 100 0
0 11 3956 236572 170368 785704 0 0 12 4 44 49 0 0 40 60 0
0 10 3956 236572 170380 785692 0 0 12 0 47 43 0 0 0 100 0
0 10 3956 236760 170404 785704 0 0 24 0 45 58 0 0 39 61 0
1 10 3956 236760 170420 785688 0 0 16 32 48 48 0 0 12 88 0
0 10 3956 236760 170440 785704 0 0 20 0 49 48 0 0 35 65 0
0 10 3956 235140 170468 785676 0 0 36 40 169 56 0 0 0 99 0
0 6 3956 235320 170504 785712 0 0 36 0 87 83 0 0 38 62 0
0 6 3956 235196 170528 785688 0 0 24 0 61 54 0 0 0 100 0
0 5 3956 238056 170572 785708 0 0 44 0 224 114 0 0 40 60 0
1 5 3956 236320 170608 785712 0 0 36 0 233 118 0 0 0 99 0


Top
   
 Post subject:
PostPosted: Mon Oct 04, 2010 10:15 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Well, you really should take up Linode on their offer to move you to another host. If somebody else on the box is hitting the disk pretty hard, why would you want to stay on that box?


Top
   
 Post subject:
PostPosted: Mon Oct 04, 2010 11:33 am 
Offline
Senior Member
User avatar

Joined: Tue Nov 24, 2009 1:59 pm
Posts: 362
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  5   3956 270452 170008 785656    0    0    23    16    3    5  1  0 75 24  0
 0  6   3956 265384 170044 785632    0    0    36     0  824  401  1  1 30 69  0
 0  7   3956 261408 170088 785700    0    0    32    92  370  164  0  0 42 57  0
 0  9   3956 254920 170120 785668    0    0    32     0  508  168  1  1  9 89  0
 0  9   3956 252740 170160 785700    0    0    40     0  212   91  0  0 35 65  0
 1  9   3956 243120 170172 785700    0    0    12     0  406  162  1  1  0 98  0
 0 10   3956 236460 170188 785700    0    0    16     0  972  501  1  1 41 56  0
 0 11   3956 236460 170224 785664    0    0    32   236  166   82  0  0  0 100 0
 0 11   3956 236460 170260 785700    0    0    36     0   43   71  0  0 38 62  0
 0 11   3956 236460 170284 785676    0    0    24     0   45   54  0  0  0 100 0
 0 11   3956 236540 170320 785704    0    0    36     0   69   71  0  0 40 60  0
 0 11   3956 236540 170356 785668    0    0    32     8   75   60  0  0  0 100 0
 0 11   3956 236572 170368 785704    0    0    12     4   44   49  0  0 40 60  0
 0 10   3956 236572 170380 785692    0    0    12     0   47   43  0  0  0 100 0
 0 10   3956 236760 170404 785704    0    0    24     0   45   58  0  0 39 61  0
 1 10   3956 236760 170420 785688    0    0    16    32   48   48  0  0 12 88  0
 0 10   3956 236760 170440 785704    0    0    20     0   49   48  0  0 35 65  0
 0 10   3956 235140 170468 785676    0    0    36    40  169   56  0  0  0 99  0
 0  6   3956 235320 170504 785712    0    0    36     0   87   83  0  0 38 62  0
 0  6   3956 235196 170528 785688    0    0    24     0   61   54  0  0  0 100 0
 0  5   3956 238056 170572 785708    0    0    44     0  224  114  0  0 40 60  0
 1  5   3956 236320 170608 785712    0    0    36     0  233  118  0  0  0 99  0


Yep, iowaits. You can try complaining so the Linode staff locates whoever's the disk hog and convinces him to stop, or accept the host switch offer.

And I thought I had it bad with 25-30% in waits... you seem to be having 60-100%... >.>

PS. [ code ] tag is useful.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group