Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Fri Mar 25, 2011 8:38 pm 
Offline
Junior Member

Joined: Wed Feb 09, 2011 7:41 am
Posts: 28
Hello,

We recently moved one of our dedicated servers to a Linode 4096 and the performance was great for the past week or two, with the site having a load average of less than 0.10 during the day and down to 0.05 in the evening.

This evening, the load is up to 0.50 under the same level of user activity (we've not had a traffic spike or anything)

So I'm wondering where we start investigating what's increasing the load.

This is what collectl -10 -sCDN reports

Code:
# SINGLE CPU STATISTICS
#   Cpu  User Nice  Sys Wait IRQ  Soft Steal Idle
      0     3    0    1    0    0    0     0   94
      1     0    0    0    0    0    0     0   99
      2     0    0    0    0    0    0     0   98
      3     0    0    0    0    0    0     0   99

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
xvda             1      0    0    4      67      5   12    6       5     0    79      3    4
xvdb             0      0    0    0       0      0    0    0       0     0     0      0    0


As you can see the CPU is doing pretty much nothing, but there's wait on the (non-swap) disk. The site is serving perhaps 1-2 pages a second. Conversely another smaller Linode server delivering 20+ banner ads per second has zero wait.

Here's the output of top:

Code:
top - 08:53:16 up 19:01,  4 users,  load average: 0.63, 0.69, 0.99
Tasks: 113 total,   1 running, 112 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.0%sy,  0.0%ni, 86.1%id, 13.4%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4194508k total,  1791364k used,  2403144k free,   149148k buffers
Swap:   262136k total,   104808k used,   157328k free,   430576k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                               
 8803 mysql     15   0  717m 214m 4284 S  0.3  5.2  27:00.06 mysqld                                                                 
16741 nobody    16   0  521m  94m 4752 S  0.3  2.3   0:01.24 httpd                                                                 
15766 nobody    16   0  524m  97m 4896 S  0.2  2.4   0:01.59 httpd                                                                 
16739 nobody    16   0  522m  94m 4756 S  0.2  2.3   0:00.56 httpd                                                                 
16740 nobody    16   0  522m  94m 4768 S  0.2  2.3   0:00.41 httpd                                                                 
15767 nobody    16   0  523m  95m 4888 S  0.1  2.3   0:01.82 httpd                                                                 
16375 nobody    16   0  522m  94m 4792 S  0.1  2.3   0:00.93 httpd                                                                 
17988 root      15   0 12744 1272  932 R  0.1  0.0   0:00.04 top                                                                   
    1 root      15   0 10348  188  160 S  0.0  0.0   0:09.98 init                                                                   
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.24 migration/0                                                           
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0                                                           
    4 root      RT   0     0    0    0 S  0.0  0.0   0:00.29 migration/1                                                           
    5 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/1     


Again, the CPU isn't doing much, and there's plenty of spare memory.

Do you know how I can find what process is associated with the disk wait ?

The server is running Centos 5.5, MySQL 5.5, Apache 2.2 + mod_perl.

Here's the my.cnf, SHOW PROCESSLIST just shows 12 connections (one for each httpd process)

Code:
key_buffer = 192M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 1M
join_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache_size = 32
table_open_cache = 512

query_cache_type = 1
query_cache_size = 24M
query_cache_limit = 2M

max_heap_table_size = 56M
tmp_table_size = 48M

thread_concurrency = 8


Here's the httpd.conf, there are never more than 12-13 httpd processes, so it's not running out of those IMO.

Code:
 
  StartServers 10
  MinSpareServers 5
  MaxSpareServers 15
  MaxClients 45
  MaxRequestsPerChild 200


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 5:11 am 
Offline
Senior Member

Joined: Sat Mar 12, 2011 3:43 am
Posts: 76
Location: Russia
Hello.

Maybe your scripts does something with disk? I don't know, well... maybe some previews generating (just for example).

Your mysql-settings looks very modest for 4096 RAM :)
key_buffer can be increased, if necessary (as minimum take 256M for this conf, as max - 1024M, but I doubt that it's necessary).
Also can be increased sort_buffer_size, read_buffer_size, query_cache_size and query_cache_limit (definitely).

This script: http://www.day32.com/MySQL/tuning-primer.sh
can analyze your mysql values and give more concrete tips about tuning.
Or this one: http://www.ubuntugeek.com/mysqltuner-ch ... mance.html
but I prefer first variant :)

Also, check error logs of Apache - maybe somebody parses your site for vulnerability.


Last edited by OZ on Sat Mar 26, 2011 5:25 am, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 5:22 am 
Offline
Junior Member

Joined: Wed Feb 09, 2011 7:41 am
Posts: 28
Thanks for taking time to respond Oz

If it was that MySQL didn't have enough resources then surely the disks would be busy reading stuff that's not in memory.

My problem is that the CPU is idle, the disks are idle doing "waits"


I use mysqltuner.pl all the time and it didn't have any suggestions that would relate to my problem (IMO)

Here's what it says

[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.8
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 642M (Tables: 97)
[--] Data in InnoDB tables: 560K (Tables: 11)
[--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17)
[--] Data in MEMORY tables: 2M (Tables: 6)
[--] Data in SPHINX tables: 0B (Tables: 2)
[!!] Total fragmented tables: 24


-------- Performance Metrics -------------------------------------------------
[--] Up for: 18h 39m 43s (1M q [22.085 qps], 2K conn, TX: 4B, RX: 296M)
[--] Reads / Writes: 28% / 72%
[--] Total buffers: 408.0M global + 8.2M per thread (151 max threads)
[OK] Maximum possible memory usage: 1.6G (40% of installed RAM)
[OK] Slow queries: 0% (10/1M)
[OK] Highest usage of available connections: 12% (19/151)
[OK] Key buffer size / total MyISAM indexes: 192.0M/410.1M
[OK] Key buffer hit rate: 100.0% (465M cached / 117K reads)
[OK] Query cache efficiency: 32.2% (178K cached / 552K selects)
[!!] Query cache prunes per day: 26487
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 96K sorts)
[OK] Temporary tables created on disk: 8% (5K on disk / 58K total)
[OK] Thread cache hit rate: 99% (19 created / 2K connections)
[OK] Table cache hit rate: 96% (335 open / 346 opened)
[OK] Open file limit used: 34% (405/1K)
[OK] Table locks acquired immediately: 99% (1M immediate / 1M locks)
[OK] InnoDB data size / buffer pool: 560.0K/128.0M

Variables to adjust:
query_cache_size (> 24M)

Having a query_cache_size too small wouldn't result in IO waits.


Last edited by exiges on Sat Mar 26, 2011 5:30 am, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 5:28 am 
Offline
Senior Member

Joined: Sat Mar 12, 2011 3:43 am
Posts: 76
Location: Russia
Quote:
[OK] Key buffer size / total MyISAM indexes: 192.0M/410.1M

Your buffer is smaller than indexes, set key_buffer to 512M
Quote:
[OK] Query cache efficiency: 32.2% (178K cached / 552K selects)

It's not 'OK', this value should be around 70-80%.
Quote:
Having a query_cache_size too small wouldn't result in IO waits.

mmm.... not enough buffers/cache size it's reason for full scan of table, it's reading from disk.


Last edited by OZ on Sat Mar 26, 2011 5:52 am, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 5:35 am 
Offline
Junior Member

Joined: Wed Feb 09, 2011 7:41 am
Posts: 28
OZ wrote:
Quote:
[OK] Key buffer size / total MyISAM indexes: 192.0M/410.1M

Your buffer is smallest than indexes, set key_buffer to 512M
Quote:
Having a query_cache_size too small wouldn't result in IO waits.

mmm.... not enough buffers/cache size it's reason for full scan of table, it's reading from disk.


To be sure I've now increased the key_buffer_size but I'm inclined to think it's something else because prior to moving it to Linode the server had a 192MB setting for the last 3yrs without issue (and the dataset is the same size)

If it was a case of not enough buffers and it was reading from disk then I would expect to see the disks busy in "svctime", rather than "wait"

Code:
DISK STATISTICS (/sec) 
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
xvda             1      0    0    4      67      5   12    6       5     0    79      3    4
xvdb             0      0    0    0       0      0    0    0       0     0     0      0    0


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 5:42 am 
Offline
Senior Member

Joined: Sat Mar 12, 2011 3:43 am
Posts: 76
Location: Russia
I can't (and don't want) to argue, I can only make assumptions.

After changing values in my.cnf mysql should be restarted.
If reason not in mysql, check error_logs of apache.


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 5:49 am 
Offline
Junior Member

Joined: Wed Feb 09, 2011 7:41 am
Posts: 28
OZ wrote:
I can't (and don't want) to argue, I can only make assumptions.


Your help is appreciated :D


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 6:01 am 
Offline
Senior Member

Joined: Sat Mar 12, 2011 3:43 am
Posts: 76
Location: Russia
I hope somebody else will help you more effectively.
As last advice: try to use 'iotop' to monitor IO using.


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 9:01 am 
Offline
Senior Member
User avatar

Joined: Fri Oct 24, 2003 3:51 pm
Posts: 965
Location: Netherlands
You might find that the utilities in the sysstat suite give you a clearer picture of what is going on.

_________________
/ Peter


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 10:07 am 
Offline
Junior Member

Joined: Wed Feb 09, 2011 7:41 am
Posts: 28
I've raised a ticket with Linode support to see if they can investigate the suspected disk performance issue.

Here's an exerpt from iostat -x 10

Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 24.60 1.60 18.20 18.40 325.60 17.37 2.51 124.97 12.78 25.31
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


As I understand it, await is the time to satisfy IO requests including the time service the request (svctm), and according to manpages await is supposed to be very close to svctm.

Looking at my iostat figures, await - svctm is up to 100ms, a long time.


Top
   
 Post subject:
PostPosted: Sat Mar 26, 2011 1:45 pm 
Offline
Junior Member

Joined: Wed Feb 09, 2011 7:41 am
Posts: 28
Update:

Turns out it was some disk performance issue.

I contacted Linode. Within 5 minutes I had a reply asking for some IOSTAT information. After supplying that information I had another reply within 5 minutes saying an engineer was being dispatched to investigate.

I enquired whether it was possible to migrate my Linode to another physical server, and again had an almost immediate reply saying that my Linode had been configured for migration and I just needed to press the button to get the server moved.

20 minutes later my Linode was running on another machine perfectly.

iostat -x 10 for the new Linode is:

await: 1.29 svctm: 1.29

as you can see, the response time from the disk is now 1.29ms of which all is spend servicing the request.

Compare that with the old server which spent over 100ms "waiting" over and above the service time (svctm)


Top
   
 Post subject:
PostPosted: Tue Mar 29, 2011 8:14 am 
Offline
Newbie

Joined: Tue Mar 29, 2011 8:06 am
Posts: 2
It is a shame more people haven't discovered collectl. When I wrote it I was annoyed by so many utilities, all with different switches and capabilities. It is good that exiges has seen the light!

The funny thing about monitoring is the data that collectl reports is identical to that of sysstat tools like sar, mpstat, iostat, etc, but in my opinion is much more readable. At least the columns for disk stats line up and if you try "collectl -sD --home", you'll see something more like 'top' format only for disks. If you've ever tried to use nfsstat, you'd immediately see the value in collectl for monitoring nfs servers/clients.

But it also does so much more, like making data available for plotting or loading into a spreadsheet PLUS it can collectl a whole lot more than just the basic stuff too.

The other real key is collectl can log to disk and play the data back, like sar, but again much more user friendly - at least in my opinion. Perhaps that's what some of the world's largest high performance clusters use it.

-mark


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group