Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Thu Feb 20, 2014 6:32 pm 
Offline
Senior Member

Joined: Thu Feb 20, 2014 5:06 pm
Posts: 58
My workplace is in the process of switching from dedicated servers to Linode for hosting our website, and I'm trying to decide which plans to choose for each server. This has been pretty straightforward for most of the servers, except for the one that'll host the main part of our site, which is a fairly complex Rails application. The application gets a lot of traffic, and I want to get a good idea of how Linode will perform before migrating. I decided to start by ordering a 4096 Linode and a 2048 Linode (both in the Newark, NJ datacenter), provision them identically using Puppet, and benchmark them by running ApacheBench off a 3rd Linode using the private IPs. Both Linodes are running Centos 6.5 64-bit and have the same configuration profile.

To my surprise, the 2048 performed signficantly better in terms of latency: average response times were about 60% higher on the 4096 Linode, and there was much higher variation. The 4096 performed better in terms of throughput since it could run more Passenger workers, but I'm more concerned about latency. The parts of the Rails application I'm benchmarking are mainly CPU-bound, so I'm guessing this is due to the noisy neighbor problem (i.e the server the 4096 is on has a lot more tenants using the CPU than the 2048). To check, I ran "sysbench --test=cpu" on both Linodes, and the results seem to confirm my suspicions:

4096 Linode:
Code:
[root@web masonm]# sysbench --test=cpu --cpu-max-prime=100000 --num-threads=2 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 2

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 100000


Test execution summary:
    total time:                          266.7814s
    total number of events:              10000
    total time taken by event execution: 533.4837
    per-request statistics:
         min:                                 33.73ms
         avg:                                 53.35ms
         max:                                202.86ms
         approx.  95 percentile:              89.52ms

Threads fairness:
    events (avg/stddev):           5000.0000/1.00
    execution time (avg/stddev):   266.7419/0.01


2048 Linode:
Code:
[root@web shared]# sysbench --test=cpu --cpu-max-prime=100000 --num-threads=2 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 2

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 100000


Test execution summary:
    total time:                          141.9363s
    total number of events:              10000
    total time taken by event execution: 283.8482
    per-request statistics:
         min:                                 27.67ms
         avg:                                 28.38ms
         max:                                 32.22ms
         approx.  95 percentile:              29.39ms

Threads fairness:
    events (avg/stddev):           5000.0000/0.00
    execution time (avg/stddev):   141.9241/0.00


Is there something else that can explain this? The next thing I'm going to try is switching datacenters, but it feels like I'm missing something.

EDIT: I got the plans wrong in my original message: the server I said was a 8192 is actually a 4096, and the 4096 is actually a 2048. Sorry for any confusion!


Last edited by masonm on Thu Feb 20, 2014 11:58 pm, edited 2 times in total.

Top
   
PostPosted: Thu Feb 20, 2014 8:49 pm 
Offline
Senior Member

Joined: Mon Jan 02, 2012 12:45 pm
Posts: 365
Have you asked Linode support about this?


Top
   
PostPosted: Thu Feb 20, 2014 11:41 pm 
Offline
Senior Member

Joined: Thu Feb 20, 2014 5:06 pm
Posts: 58
Main Street James wrote:
Have you asked Linode support about this?


No. I figured I'd ask here in case I was doing something stupid so I wouldn't bother support unnecessarily.


Top
   
PostPosted: Fri Feb 21, 2014 12:02 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
masonm wrote:
sysbench --test=cpu --cpu-max-prime=100000 --num-threads=2 run


Perhaps the difference due to the fact that your benchmark only uses 2 threads. Likewise, your Rails app probably uses only one thread per request.

A lot of things could be different between the host that houses your 2GB Linode and the one that houses your 4GB Linode. One of the possibilities is that the 4GB host has a larger number of slower CPUs.

This would slow down single-threaded CPU-bound apps, but the total amount of CPU that is shared among the tenants would be similar, and there would be half as many tenants on average. (The fact that you only see 8 cores in both cases is irrelevant because you're never supposed to max out all the cores.)

Linode has gone through many generations of servers, so I wouldn't be surprised if this were the case. And of course there could be noisy neighbors as you said.


Last edited by hybinet on Fri Feb 21, 2014 12:06 am, edited 2 times in total.

Top
   
PostPosted: Fri Feb 21, 2014 12:03 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
deleted duplicate post


Top
   
PostPosted: Fri Feb 21, 2014 4:27 am 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 567
Website: http://www.mattnordhoff.com/
As hybinet brought up, the two servers may have different model CPUs. You can check that yourself with `cat /proc/cpuinfo`.

To see if noisy neighbors are limiting your CPU resources, run something CPU-intensive, open `top` or `htop`, and see if the "st" (steal) percentage is high.

_________________
Matt Nordhoff (aka Peng on IRC)


Top
   
PostPosted: Fri Feb 21, 2014 11:45 am 
Offline
Senior Member

Joined: Thu Feb 20, 2014 5:06 pm
Posts: 58
mnordhoff wrote:
As hybinet brought up, the two servers may have different model CPUs. You can check that yourself with `cat /proc/cpuinfo`.


Yes, I meant to include that in my original post. The 4096 has a Xeon E5-2670, while the 2048 has a Xeon E5-2680 v2. From some Googling, it looks like the E5-2680 is a bit faster, but not nearly enough to account for the differences I'm seeing.

Quote:
To see if noisy neighbors are limiting your CPU resources, run something CPU-intensive, open `top` or `htop`, and see if the "st" (steal) percentage is high.


Cool, I didn't know about the steal percentage metric. I ran sysbench again while monitoring the steal percentage on both hosts. On the 2048 it never went above 1%, while on the 4096 it fluctuated widely from ~6% to ~70%. That pretty much cinches it. I'm going to file a support ticket to have the 4096 moved to California and hope I have better neighbors this time.

Thanks for your help hybinet and mnordhoff!


Top
   
PostPosted: Fri Feb 21, 2014 2:08 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Keep in mind that the "v2" in the model number means that it's a different generation, Ivy Bridge vs Sandy Bridge. On top of that, the E5-2670 is an 8-core CPU and the E5-2680 v2 is a 10-core CPU. And then the clockspeed is higher...

Benchmarks seem to indicate a ~10% performance improvement from the newer architecture, then you've got an ~8% improvement in clockspeed, and a 25% improvement in core count. Overall, that should produce ~48% performance improvement. That would seem to reflect a good chunk of the difference you're seeing.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group