Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Mon Jan 16, 2012 12:12 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
I am reaching out for some help here, I have spent quite a few hours on this forum searching the posts and also on google trying to see what the issue may be and trying to solve my problem without much success. I have the following setup.

(1) - One load balancer doing a round robin with least connections to the following two nodes
(2) - 2GB nodes each running Nginx + php-fpm + APC and connected to yet another instance of mySQL running on a dedicated machine over a local IP address. Each of the nodes have (4) dedicated cores.

I am running a worpdess site on each node, basically each node runs an identical copy of the same wordpress PHP code.

Everything is running as it should but the load on the CPU is very high, we are averaging the following load on the nodes:

Node 1: Load average: 4.11 3.84 3.95
Node 2: Load average: 4.20 3.94 4.95

Why are the cores spiking and holding the load. The whole websites gets about 2.1 million requests a day balanced over the (2) nodes.

Is this an nginx configuration or php-fpm issue, or it's just a matter of needing to add one or two more nodes?

Thank you in advance for your help.
Dave

My nginx.conf

[ddavtian@mobilefood-1 nginx]$ more nginx.conf

Code:
    #######################################################################
    #
    # This is the main Nginx configuration file. 
    #
    # More information about the configuration options is available on
    #   * the English wiki - http://wiki.nginx.org/Main
    #   * the Russian documentation - http://sysoev.ru/nginx/
    #
    #######################################################################
   
    #----------------------------------------------------------------------
    # Main Module - directives that cover basic functionality
    #
    #   http://wiki.nginx.org/NginxHttpMainModule
    #
    #----------------------------------------------------------------------
   
    user              nginx;
    worker_processes  4;
    worker_rlimit_nofile 30000;
   
    error_log  /var/log/nginx/error.log;
    #error_log  /var/log/nginx/error.log  notice;
    #error_log  /var/log/nginx/error.log  info;
   
    pid        /var/run/nginx.pid;
   
   
    #----------------------------------------------------------------------
    # Events Module
    #
    #   http://wiki.nginx.org/NginxHttpEventsModule
    #
    #----------------------------------------------------------------------
   
    events {
        worker_connections  1024;
    }
   
   
    #----------------------------------------------------------------------
    # HTTP Core Module
    #
    #   http://wiki.nginx.org/NginxHttpCoreModule
    #
    #----------------------------------------------------------------------
   
    http {
        include       /etc/nginx/mime.types;
        default_type  application/octet-stream;
   
        log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                          '$status $body_bytes_sent "$http_referer" '
                          '"$http_user_agent" "$http_x_forwarded_for"';
   
        #access_log  /var/log/nginx/access.log  main;
   
        sendfile        on;
        #tcp_nopush     on;
   
        #keepalive_timeout  0;
        keepalive_timeout  15;
   
        #gzip  on;
       
        # Load config files from the /etc/nginx/conf.d directory
        # The default server is in conf.d/default.conf
        include /etc/nginx/conf.d/*.conf;
    }



My fastcgi_params

Code:
    
    fastcgi_param  QUERY_STRING       $query_string;
    fastcgi_param  REQUEST_METHOD     $request_method;
    fastcgi_param  CONTENT_TYPE       $content_type;
    fastcgi_param  CONTENT_LENGTH     $content_length;
   
    fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
    fastcgi_param  REQUEST_URI        $request_uri;
    fastcgi_param  DOCUMENT_URI       $document_uri;
    fastcgi_param  DOCUMENT_ROOT      $document_root;
    fastcgi_param  SERVER_PROTOCOL    $server_protocol;
   
    fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
    fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;
   
    fastcgi_param  REMOTE_ADDR        $remote_addr;
    fastcgi_param  REMOTE_PORT        $remote_port;
    fastcgi_param  SERVER_ADDR        $server_addr;
    fastcgi_param  SERVER_PORT        $server_port;
    fastcgi_param  SERVER_NAME        $server_name;
   
    fastcgi_connect_timeout 60;
    fastcgi_send_timeout 180;
    fastcgi_read_timeout 180;
    fastcgi_buffer_size 128k;
    fastcgi_buffers 4 256k;
    fastcgi_busy_buffers_size 256k;
    fastcgi_temp_file_write_size 256k;
    fastcgi_intercept_errors on;
   
    # PHP only, required if PHP was built with --enable-force-cgi-redirect
    fastcgi_param  REDIRECT_STATUS    200;



My default.conf

[ddavtian@mobilefood-1 conf.d]$ more default.conf

Code:

    #
    # The default server
    #
    server {
        listen       80;
        server_name  mobilefoodblog.com;
   
        #charset koi8-r;
       
        access_log off;
        #access_log  logs/host.access.log  main;
   
        location /nginx_status {
         stub_status on;
         access_log   off;
         allow 67.23.12.32;
         deny all;
        }
   
        location / {
            root   /var/www/html/blog;
            index  index.php index.html index.htm;
            try_files $uri $uri/ /index.php?q=$uri&$args;
        }
   
        error_page  404              /404.html;
        location = /404.html {
            root   /usr/share/nginx/html;
        }
   
        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   /usr/share/nginx/html;
        }
   
        # proxy the PHP scripts to Apache listening on 127.0.0.1:80
        #
        #location ~ \.php$ {
        #    proxy_pass   http://127.0.0.1;
        #}
   
        # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
        #
        #location ~ \.php$ {
        #    root           html;
        #    fastcgi_pass   127.0.0.1:9000;
        #    fastcgi_index  index.php;
        #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
        #    include        fastcgi_params;
        #}
   
        location ~ \.php$ {
           root           /var/www/html/blog;
           fastcgi_pass   127.0.0.1:9000;
           fastcgi_index  index.php;
           fastcgi_param  SCRIPT_FILENAME  /var/www/html/blog$fastcgi_script_name;
           include        fastcgi_params;
           set_real_ip_from  192.168.255.0/24;
           real_ip_header X-Forwarded-For;
        }
   
   
        # deny access to .htaccess files, if Apache's document root
        # concurs with nginx's one
        #
        #location ~ /\.ht {
        #    deny  all;
        #}
    }



My php-fpm.conf

[ddavtian@mobilefood-1 conf.d]$ more default.conf

Code:
    #
    # The default server
    #
    server {
    listen       80;
    server_name  mobilefoodblog.com;

    #charset koi8-r;
   
    access_log off;
    #access_log  logs/host.access.log  main;

    location /nginx_status {
     stub_status on;
     access_log   off;
     allow 67.180.226.49;
     deny all;
    }

    location / {
        root   /var/www/html/blog;
        index  index.php index.html index.htm;
        try_files $uri $uri/ /index.php?q=$uri&$args;
    }

    error_page  404              /404.html;
    location = /404.html {
        root   /usr/share/nginx/html;
    }

    # redirect server error pages to the static page /50x.html
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }

    # proxy the PHP scripts to Apache listening on 127.0.0.1:80
    #
    #location ~ \.php$ {
    #    proxy_pass   http://127.0.0.1;
    #}

    # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
    #
    #location ~ \.php$ {
    #    root           html;
    #    fastcgi_pass   127.0.0.1:9000;
    #    fastcgi_index  index.php;
    #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
    #    include        fastcgi_params;
    #}

    location ~ \.php$ {
       root           /var/www/html/blog;
       fastcgi_pass   127.0.0.1:9000;
       fastcgi_index  index.php;
       fastcgi_param  SCRIPT_FILENAME  /var/www/html/blog$fastcgi_script_name;
       include        fastcgi_params;
       set_real_ip_from  192.168.255.0/24;
       real_ip_header X-Forwarded-For;
    }


    # deny access to .htaccess files, if Apache's document root
    # concurs with nginx's one
    #
    #location ~ /\.ht {
    #    deny  all;
    #}
}



My php-fpm.conf

[ddavtian@mobilefood-1 php-fpm.d]$ more www.conf

Code:

    ; Start a new pool named 'www'.
    [www]
   
    ; The address on which to accept FastCGI requests.
    ; Valid syntaxes are:
    ;   'ip.add.re.ss:port'    - to listen on a TCP socket to a specific address on
    ;                            a specific port;
    ;   'port'                 - to listen on a TCP socket to all addresses on a
    ;                            specific port;
    ;   '/path/to/unix/socket' - to listen on a unix socket.
    ; Note: This value is mandatory.
    listen = 127.0.0.1:9000
   
    ; Set listen(2) backlog. A value of '-1' means unlimited.
    ; Default Value: -1
    ;listen.backlog = -1
     
    ; List of ipv4 addresses of FastCGI clients which are allowed to connect.
    ; Equivalent to the FCGI_WEB_SERVER_ADDRS environment variable in the original
    ; PHP FCGI (5.2.2+). Makes sense only with a tcp listening socket. Each address
    ; must be separated by a comma. If this value is left blank, connections will be
    ; accepted from any ip address.
    ; Default Value: any
    listen.allowed_clients = 127.0.0.1
   
    ; Set permissions for unix socket, if one is used. In Linux, read/write
    ; permissions must be set in order to allow connections from a web server. Many
    ; BSD-derived systems allow connections regardless of permissions.
    ; Default Values: user and group are set as the running user
    ;                 mode is set to 0666
    ;listen.owner = nobody
    ;listen.group = nobody
    ;listen.mode = 0666
   
    ; Unix user/group of processes
    ; Note: The user is mandatory. If the group is not set, the default user's group
    ;       will be used.
    ; RPM: apache Choosed to be able to access some dir as httpd
    user = apache
    ; RPM: Keep a group allowed to write in log dir.
    group = apache
   
    ; Choose how the process manager will control the number of child processes.
    ; Possible Values:
    ;   static  - a fixed number (pm.max_children) of child processes;
    ;   dynamic - the number of child processes are set dynamically based on the
    ;             following directives:
    ;             pm.max_children      - the maximum number of children that can
    ;                                    be alive at the same time.
    ;             pm.start_servers     - the number of children created on startup.
    ;             pm.min_spare_servers - the minimum number of children in 'idle'
    ;                                    state (waiting to process). If the number
    ;                                    of 'idle' processes is less than this
    ;                                    number then some children will be created.
    ;             pm.max_spare_servers - the maximum number of children in 'idle'
    ;                                    state (waiting to process). If the number
    ;                                    of 'idle' processes is greater than this
    ;                                    number then some children will be killed.
    ; Note: This value is mandatory.
    pm = dynamic
   
    ; The number of child processes to be created when pm is set to 'static' and the
    ; maximum number of child processes to be created when pm is set to 'dynamic'.
    ; This value sets the limit on the number of simultaneous requests that will be
    ; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
    ; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
    ; CGI.
    ; Note: Used when pm is set to either 'static' or 'dynamic'
    ; Note: This value is mandatory.
    pm.max_children = 60
   
    ; The number of child processes created on startup.
    ; Note: Used only when pm is set to 'dynamic'
    ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
    pm.start_servers = 10
   
    ; The desired minimum number of idle server processes.
    ; Note: Used only when pm is set to 'dynamic'
    ; Note: Mandatory when pm is set to 'dynamic'
    pm.min_spare_servers = 5
   
    ; The desired maximum number of idle server processes.
    ; Note: Used only when pm is set to 'dynamic'
    ; Note: Mandatory when pm is set to 'dynamic'
    pm.max_spare_servers = 35
     
    ; The number of requests each child process should execute before respawning.
    ; This can be useful to work around memory leaks in 3rd party libraries. For
    ; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.
    ; Default Value: 0
    ;pm.max_requests = 500
   
    ; The URI to view the FPM status page. If this value is not set, no URI will be
    ; recognized as a status page. By default, the status page shows the following
    ; information:
    ;   accepted conn    - the number of request accepted by the pool;
    ;   pool             - the name of the pool;
    ;   process manager  - static or dynamic;
    ;   idle processes   - the number of idle processes;
    ;   active processes - the number of active processes;
    ;   total processes  - the number of idle + active processes.
    ; The values of 'idle processes', 'active processes' and 'total processes' are
    ; updated each second. The value of 'accepted conn' is updated in real time.
    ; Example output:
    ;   accepted conn:   12073
    ;   pool:             www
    ;   process manager:  static
    ;   idle processes:   35
    ;   active processes: 65
    ;   total processes:  100
    ; By default the status page output is formatted as text/plain. Passing either
    ; 'html' or 'json' as a query string will return the corresponding output
    ; syntax. Example:
    ;   http://www.foo.bar/status
    ;   http://www.foo.bar/status?json
    ;   http://www.foo.bar/status?html
    ; Note: The value must start with a leading slash (/). The value can be
    ;       anything, but it may not be a good idea to use the .php extension or it
    ;       may conflict with a real PHP file.
    ; Default Value: not set
    ;pm.status_path = /status
     
    ; The ping URI to call the monitoring page of FPM. If this value is not set, no
    ; URI will be recognized as a ping page. This could be used to test from outside
    ; that FPM is alive and responding, or to
    ; - create a graph of FPM availability (rrd or such);
    ; - remove a server from a group if it is not responding (load balancing);
    ; - trigger alerts for the operating team (24/7).
    ; Note: The value must start with a leading slash (/). The value can be
    ;       anything, but it may not be a good idea to use the .php extension or it
    ;       may conflict with a real PHP file.
    ; Default Value: not set
    ;ping.path = /ping
   
    ; This directive may be used to customize the response of a ping request. The
    ; response is formatted as text/plain with a 200 response code.
    ; Default Value: pong
    ;ping.response = pong
     
    ; The timeout for serving a single request after which the worker process will
    ; be killed. This option should be used when the 'max_execution_time' ini option
    ; does not stop script execution for some reason. A value of '0' means 'off'.
    ; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
    ; Default Value: 0
    ;request_terminate_timeout = 0
     
    ; The timeout for serving a single request after which a PHP backtrace will be
    ; dumped to the 'slowlog' file. A value of '0s' means 'off'.
    ; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
    ; Default Value: 0
    ;request_slowlog_timeout = 0
     
    ; The log file for slow requests
    ; Default Value: not set
    ; Note: slowlog is mandatory if request_slowlog_timeout is set
    slowlog = /var/log/php-fpm/www-slow.log
     
    ; Set open file descriptor rlimit.
    ; Default Value: system defined value
    ;rlimit_files = 1024
     
    ; Set max core size rlimit.
    ; Possible Values: 'unlimited' or an integer greater or equal to 0
    ; Default Value: system defined value
    ;rlimit_core = 0
     
    ; Chroot to this directory at the start. This value must be defined as an
    ; absolute path. When this value is not set, chroot is not used.
    ; Note: chrooting is a great security feature and should be used whenever
    ;       possible. However, all PHP paths will be relative to the chroot
    ;       (error_log, sessions.save_path, ...).
    ; Default Value: not set
    ;chroot =
     
    ; Chdir to this directory at the start. This value must be an absolute path.
    ; Default Value: current directory or / when chroot
    ;chdir = /var/www
     
    ; Redirect worker stdout and stderr into main error log. If not set, stdout and
    ; stderr will be redirected to /dev/null according to FastCGI specs.
    ; Default Value: no
    ;catch_workers_output = yes
     
    ; Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are taken from
    ; the current environment.
    ; Default Value: clean env
    ;env[HOSTNAME] = $HOSTNAME
    ;env[PATH] = /usr/local/bin:/usr/bin:/bin
    ;env[TMP] = /tmp
    ;env[TMPDIR] = /tmp
    ;env[TEMP] = /tmp
   
    ; Additional php.ini defines, specific to this pool of workers. These settings
    ; overwrite the values previously defined in the php.ini. The directives are the
    ; same as the PHP SAPI:
    ;   php_value/php_flag             - you can set classic ini defines which can
    ;                                    be overwritten from PHP call 'ini_set'.
    ;   php_admin_value/php_admin_flag - these directives won't be overwritten by
    ;                                     PHP call 'ini_set'
    ; For php_*flag, valid values are on, off, 1, 0, true, false, yes or no.
   
    ; Defining 'extension' will load the corresponding shared extension from
    ; extension_dir. Defining 'disable_functions' or 'disable_classes' will not
    ; overwrite previously defined php.ini values, but will append the new value
    ; instead.
   
    ; Default Value: nothing is defined by default except the values in php.ini and
    ;                specified at startup with the -d argument
    ;php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i -f www@my.domain.com
    ;php_flag[display_errors] = off
    php_admin_value[error_log] = /var/log/php-fpm/www-error.log
    php_admin_flag[log_errors] = on
    ;php_admin_value[memory_limit] = 32M



Last edited by ddavtian on Mon Jan 16, 2012 12:47 pm, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 12:34 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
Code:
pm.max_children = 60 

That looks pretty high, even on a 2GB Linode. Imagine what would happen if a server tried to generate 60 web pages simultaneously. That's a lot of CPU, a lot of connections to the database, and potentially a lot of RAM, too.

What does your memory usage look like when the load spike happens? Post the output of
Code:
free -m


I'd normally recommend an aggressive caching plugin for WordPress, but caching gets tricky when you have more than one server that needs to produce identical results. Memcached might help, but I don't have any experience with WordPress caching plugins in a load-balanced situation so I'll leave that topic to someone else.

By the way, some nice recipes you've got there. I'm getting hungry!


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 12:45 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
Hello,

I have tried lowering the:

Quote:
pm.max_children = 60


to a lower number, if the memory serves me correctly I have tried things like 30, 40 and 50 and the CPU load still stayed about the same as mentioned above. The servers seem to be using normal amount of memory or at least they are not swapping.

Code:
[ddavtian@mobilefood-1 php-fpm.d]$ free -m
             total       used       free     shared    buffers     cached
Mem:          1997        535       1462          0         33        268
-/+ buffers/cache:        234       1763
Swap:         2047          0       2047
[ddavtian@mobilefood-1 php-fpm.d]$


Code:
[ddavtian@mobilefoodblog-2 ~]$ free -m
             total       used       free     shared    buffers     cached
Mem:          1997        804       1193          0         38        463
-/+ buffers/cache:        302       1694
Swap:         2047          0       2047
[ddavtian@mobilefoodblog-2 ~]$


As for caching, we are using php based banner ads and I am always fearful that if I start caching the content (which will help with the CPU I am sure) it will start caching the banner ads as well.

Thank you on the content :D we try to add them as much as possible and since it's a mobile oriented website our consumers are 98% mobile based from (iPhone + Android) and we use WpTouch to render the website for mobile use.

Thanks
Dave


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 4:04 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
Whoa, only using 300MB of RAM out of 2GB? Swapping is definitely ruled out then.

What does top show?

Do you have APC installed?

Is your database server working fine?

As for adjusting pm.max_children, did you restart php5-fpm after each adjustment, or did you restart nginx instead? Unlike with Apache, restarting nginx will not have any effect on PHP. (Pretty basic stuff here, but sometimes people miss it.)

You could also try disabling WordPress plugins, one at a time, for a few minutes each, on only one server. Do the same with the ad script if possible. See if the load average goes down. This could help identify any misbehaving component of your site.


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 5:11 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
Question: if I install memcached in addition to APC, is memcached doing caching for database queries only? I am trying to avoid caching our banner ads.

Update: I disabled all of the plugins, leaving one WPTouch which is really needed for the content to be rendered for mobile phones and disabling the plugins didn't change the CPU load numbers.

Yes, memory consumption is quite well and neither of the nodes are swapping by any means, here's an output from htop, as you can see php-fpm is the one that seems to be consuming the most amount of CPU here.

Code:

  1  [|||||                       12.3%]     Tasks: 53, 4 thr; 1 running
  2  [||||                         8.8%]     Load average: 2.34 2.28 2.20
  3  [||||                         7.9%]     Uptime: 2 days, 02:11:33
  4  [||||                         8.2%]
  Mem[||||||||||||||||       379/1997MB]
  Swp[                         0/2047MB]

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
11750 apache    20   0  508M 28264 16620 S  2.0  1.4  0:06.10 php-fpm: pool www
11727 apache    20   0  513M 33252 16968 S  2.0  1.6  0:08.99 php-fpm: pool www
11757 apache    20   0  508M 27308 15852 S  2.0  1.3  0:06.04 php-fpm: pool www
11762 apache    20   0  508M 27380 15852 S  2.0  1.3  0:05.75 php-fpm: pool www
11775 apache    20   0  508M 28352 16620 S  2.0  1.4  0:05.75 php-fpm: pool www
11806 apache    20   0  508M 28088 16620 S  2.0  1.4  0:01.48 php-fpm: pool www
11807 apache    20   0  508M 27340 15840 S  2.0  1.3  0:00.92 php-fpm: pool www
11758 apache    20   0  508M 28236 16636 S  1.0  1.4  0:06.32 php-fpm: pool www
11774 apache    20   0  508M 27536 15852 S  1.0  1.3  0:05.65 php-fpm: pool www
11742 apache    20   0  509M 29484 17200 S  1.0  1.4  0:08.75 php-fpm: pool www
11747 apache    20   0  508M 27408 15916 S  1.0  1.3  0:08.42 php-fpm: pool www
11761 apache    20   0  508M 28300 16620 S  1.0  1.4  0:05.96 php-fpm: pool www
11754 apache    20   0  508M 27572 15852 S  1.0  1.3  0:06.14 php-fpm: pool www
11779 apache    20   0  508M 27376 15908 S  1.0  1.3  0:03.87 php-fpm: pool www
11755 apache    20   0  508M 28100 16624 S  1.0  1.4  0:06.11 php-fpm: pool www
11751 apache    20   0  508M 27640 15896 S  0.0  1.4  0:06.06 php-fpm: pool www
11778 apache    20   0  508M 28336 16620 S  0.0  1.4  0:04.00 php-fpm: pool www
11743 apache    20   0  508M 27976 16024 S  0.0  1.4  0:08.50 php-fpm: pool www
11745 apache    20   0  508M 28176 16720 S  0.0  1.4  0:08.31 php-fpm: pool www
11780 apache    20   0  508M 27356 15908 S  0.0  1.3  0:03.56 php-fpm: pool www
11777 apache    20   0  508M 28108 16620 S  0.0  1.4  0:05.65 php-fpm: pool www
11748 apache    20   0  508M 28316 16792 S  0.0  1.4  0:08.50 php-fpm: pool www
11749 apache    20   0  508M 28336 16656 S  0.0  1.4  0:06.20 php-fpm: pool www
11759 apache    20   0  508M 28084 16620 S  0.0  1.4  0:06.11 php-fpm: pool www
11781 apache    20   0  508M 28052 16620 S  0.0  1.4  0:02.73 php-fpm: pool www
11756 apache    20   0  508M 28316 16624 S  0.0  1.4  0:06.16 php-fpm: pool www
11760 apache    20   0  508M 28096 16616 S  0.0  1.4  0:05.98 php-fpm: pool www
11776 apache    20   0  508M 28320 16620 S  0.0  1.4  0:05.66 php-fpm: pool www
11744 apache    20   0  508M 28104 16648 S  0.0  1.4  0:08.72 php-fpm: pool www


Quote:
Do you have APC installed?


Yes APC is installed on both nodes

Quote:
Is your database server working fine?


Yes, I even ran mysqltunner to make sure all the numbers are in tact.

Quote:
As for adjusting pm.max_children, did you restart php5-fpm after each adjustment, or did you restart nginx instead? Unlike with Apache, restarting nginx will not have any effect on PHP. (Pretty basic stuff here, but sometimes people miss it.)


Yes, after every change to php-fpm I did issue a restart of php-fpm to make sure the changes are reflected.

Quote:
You could also try disabling WordPress plugins, one at a time, for a few minutes each, on only one server. Do the same with the ad script if possible. See if the load average goes down. This could help identify any misbehaving component of your site.


Thank you will try this next.

Dave


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 5:34 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Yikes, that's insane overkill. I've always been of the opinion that there's not much point running much more than 6-8 PHP processes on any size of linode. You've only got 4 cores, so as long as you're not blocked waiting on something else non-CPU related (such as a database on a different machine, disk IO, etc), and you don't have any long-living scripts you're not getting any real additional benefit except to unnecessarily increase contention and RAM usage.

Of course, if you are relying on long-lived PHP scripts or a remote database, the rules change.


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 7:04 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
APC will not cause any page content to be cached. Its primary purpose is to cache the compiled opcode of your PHP scripts, so the PHP engine doesn't need to interpret your scripts every time somebody requests a web page. This can result in a 2x speed boost even without any other optimizations, and with absolutely no change to the behavior of your scripts. Yep, that's free CPU for you! APC also has the ability to cache data, just like Memcached, and some plugins (such as Total Cache) can be configured to make use of this. But that's totally optional.

Your htop output shows that you're only using ~40% CPU even though your load average is above 2. Does your Dashboard show a similar level of CPU usage? Does the server feel sluggish at all when the load average is above 4? Your site seemed to load pretty quickly when I checked it out earlier today. If your CPU usage and disk I/O are under control and the site doesn't feel slow, you might not need to worry about the load average all that much. The load average isn't a particularly accurate representation of system resource utilization anyway. It just means that you have a lot of processes competing for CPU time, so you might want to reduce the number of processes you're running. Which leads back to Guspaz's comment above:

Guspaz wrote:
Yikes, that's insane overkill. I've always been of the opinion that there's not much point running much more than 6-8 PHP processes on any size of linode. You've only got 4 cores, so as long as you're not blocked waiting on something else non-CPU related (such as a database on a different machine, disk IO, etc), and you don't have any long-living scripts you're not getting any real additional benefit except to unnecessarily increase contention and RAM usage.

Very good idea. If 30 didn't help, try reducing it even further. Since WordPress relies very heavily on database calls and you're not using any caching plugins, you might want to aim a little higher than Guspaz's suggestion: start with 12-16 and make adjustments over time to find your sweet spot. If you're skeptical about this experiment, fire up a smaller linode (768 should be more than enough) with the lower settings and add it to the load balancer to see how it performs. You might have been wasting money on a pair of oversized linodes :P

(Until recently, php5-fpm shipped with 150 children by default. That's even more insane than Apache's prefork MPM using 150 children by default, because at least some of those Apache children would be serving static requests. Fortunately, the default value was reduced in PHP 5.3.9, released just a few days ago.)


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 7:28 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
Here are the latest htop results from each of the nodes, now the site seems to be heading into heavier traffic then earlier in the morning:

Node 1 CPU:

Code:
  1  [|||||||||||||||||||||||||||||||||||||||94.4%]     Tasks: 62, 9 thr; 17 running
  2  [|||||||||||||||||||||||||||||||||||||| 82.9%]     Load average: 4.81 4.78 4.11
  3  [||||||||||||||||||||||||||||||||||||   77.4%]     Uptime: 2 days, 04:16:31
  4  [|||||||||||||||||||||||||||||||||      72.6%]
  Mem[|||||||||||||||||||               469/1997MB]
  Swp[                                    0/2047MB]

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
29186 apache    20   0  511M 30656 18032 S 11.0  1.5  1:02.53 php-fpm: pool www
29203 apache    20   0  511M 29308 17296 R 10.0  1.4  0:59.24 php-fpm: pool www
29208 apache    20   0  510M 32336 20564 R  9.0  1.6  0:58.67 php-fpm: pool www
29184 apache    20   0  511M 30372 17748 S  9.0  1.5  1:02.48 php-fpm: pool www
29210 apache    20   0  510M 32552 20780 S  9.0  1.6  0:58.96 php-fpm: pool www
29197 apache    20   0  510M 29088 17336 S  9.0  1.4  0:59.09 php-fpm: pool www
29196 apache    20   0  514M 40232 24704 R  9.0  2.0  0:58.52 php-fpm: pool www
29191 apache    20   0  512M 37376 23940 S  9.0  1.8  1:02.39 php-fpm: pool www
29282 apache    20   0  510M 28840 17296 R  9.0  1.4  0:11.29 php-fpm: pool www
29183 apache    20   0  512M 30644 17508 S  9.0  1.5  1:02.14 php-fpm: pool www
29189 apache    20   0  509M 28436 17488 R  9.0  1.4  1:03.10 php-fpm: pool www


Node 2 CPU:

Code:
  1  [|||||||||||||||||||||||||   70.6%]     Tasks: 61, 9 thr; 4 running
  2  [|||||||||||||||||||         54.9%]     Load average: 2.04 2.17 2.27
  3  [|||||||||||||||             42.5%]     Uptime: 2 days, 04:17:21
  4  [|||||||||||||               34.7%]
  Mem[|||||||||||||||||      427/1997MB]
  Swp[                         0/2047MB]

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
12961 apache    20   0  510M 28228 16404 S  6.0  1.4  0:14.71 php-fpm: pool www
12956 apache    20   0  510M 31316 19488 S  6.0  1.5  0:15.21 php-fpm: pool www
12950 apache    20   0  511M 28728 16888 S  6.0  1.4  0:15.32 php-fpm: pool www
12967 apache    20   0  510M 28208 16388 S  6.0  1.4  0:13.10 php-fpm: pool www
12965 apache    20   0  510M 27684 15860 R  6.0  1.4  0:13.04 php-fpm: pool www
13056 apache    20   0  510M 27520 15840 S  6.0  1.3  0:01.41 php-fpm: pool www
12952 apache    20   0  510M 28028 16388 S  6.0  1.4  0:15.25 php-fpm: pool www
12968 apache    20   0  511M 28232 16396 S  5.0  1.4  0:13.28 php-fpm: pool www
12987 apache    20   0  510M 28216 16396 S  5.0  1.4  0:09.88 php-fpm: pool www
12959 apache    20   0  510M 28376 16556 S  5.0  1.4  0:14.91 php-fpm: pool www
13034 apache    20   0  510M 28000 16368 S  5.0  1.4  0:03.03 php-fpm: pool www


As for graphs:

Node 1: CPU

Image

Node 1 IO:

Image

Node 2 CPU:

Image

Node 2 IO:

Image

Quote:
Does the server feel sluggish at all when the load average is above 4?


Not really, things are loading quite fast

Quote:
Guspaz's suggestion: start with 12-16 and make adjustments over time to find your sweet spot


Right now php-fpm is running with 30, will try to reduce this more and more and keep an eye on things. I have now also installed memcached on both servers and looking to see if it makes any difference.

Thank you!
Dave


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 9:56 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
OK, CPU usage really is getting high. Disregard my comment about only using 40% CPU. But the suggestion about pm.max_children is still valid, so keep playing with it.

Meanwhile, here's another question: Why does node 2 only peak at 110% when node 1 peaks at 350-400%? Does node 1 get 3 times as much traffic as node 2 does? No wait, node 2 has nearly constant I/O regardless of CPU usage. Why the difference?

Other than reducing pm.max_children and playing with plugins, here are a few other changes that you can try:

- Make the ads load separately from the pages themselves, using iframes or JavaScript. Then you can use aggressive caching on WordPress without worrying about your ads getting cached.

- Use 4 Linode 1024s instead of 2 Linode 2048s. That way, you'll still have plenty of RAM, but your CPU usage will be more widely spread out. Using 350% on a single node is not nice to your neighbors.


Top
   
 Post subject:
PostPosted: Mon Jan 16, 2012 10:43 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
Yes about this time, 7:00 p.m. PST the CPU really kick in since we usually get the most amount of traffic during this time.

Quote:
Meanwhile, here's another question: Why does node 2 only peak at 110% when node 1 peaks at 350-400%? Does node 1 get 3 times as much traffic as node 2 does? No wait, node 2 has nearly constant I/O regardless of CPU usage. Why the difference?


Excellent question, no idea, both nodes are configured with the same "weight" 100 within the node balancer, so in essence both nodes should get the same amount of traffic. The only thing is that the node balancer is configured to send traffic based on "least connections". It calculates (somehow) to see who has the least amount of active connections to it and routes the new connection to that node.

Quote:
Other than reducing pm.max_children and playing with plugins, here are a few other changes that you can try:


I have been playing this number all day on both of the nodes, if I reduce this number NGINX starts dropping the connection and starts throwing "no connections". If I increase the number we see that the CPU usage increases. It's like the chicken and the egg game now with this. I am currently at about 35 on both nodes.

Quote:
- Make the ads load separately from the pages themselves, using iframes or JavaScript. Then you can use aggressive caching on WordPress without worrying about your ads getting cached.


The ad provider ONLY provides a php based SDK for this, so javascript is really out of the question, unless I write my own javascript and have it call the php page itself. iFrame is a good solution, will give it a try.

Quote:
- Use 4 Linode 1024s instead of 2 Linode 2048s. That way, you'll still have plenty of RAM, but your CPU usage will be more widely spread out. Using 350% on a single node is not nice to your neighbors.


Agreed, 350% cpu is not being a friendly neighbor.

Thanks for all the help today, I appreciated it.

Dave


Top
   
 Post subject:
PostPosted: Tue Jan 17, 2012 10:44 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
ddavtian wrote:
Excellent question, no idea, both nodes are configured with the same "weight" 100 within the node balancer, so in essence both nodes should get the same amount of traffic. The only thing is that the node balancer is configured to send traffic based on "least connections". It calculates (somehow) to see who has the least amount of active connections to it and routes the new connection to that node.

What do the traffic graphs look like? If traffic looks more or less the same, then there must be some other difference between the two nodes. If traffic is skewed, try using a different load balancing algorithm.

ddavtian wrote:
I have been playing this number all day on both of the nodes, if I reduce this number NGINX starts dropping the connection and starts throwing "no connections". If I increase the number we see that the CPU usage increases. It's like the chicken and the egg game now with this. I am currently at about 35 on both nodes.

502 Bad Gateway or 504 Gateway Timeout?

ddavtian wrote:
The ad provider ONLY provides a php based SDK for this, so javascript is really out of the question, unless I write my own javascript and have it call the php page itself. iFrame is a good solution, will give it a try.

iframes would work perfectly if the ads are not contextual, but the ad provider might notice that the referer is always the same, because it's always the page inside the iframe that gets used as the referer. In extreme cases, the linked page might load inside the iframe, which is not only useless but also looks a lot like you're trying to fake the clicks.

If you have jQuery, you can do something like
Code:
$('#ad_space').load('/path/to/ads.php');

and the ads will be loaded into the <div> with id="ad_space". Much cleaner than using an iframe, and the referer will be correct, too. But please do check with your ad provider whether this is permitted.


Top
   
 Post subject:
PostPosted: Tue Jan 17, 2012 2:28 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
Quote:
Use 4 Linode 1024s instead of 2 Linode 2048s. That way, you'll still have plenty of RAM, but your CPU usage will be more widely spread out. Using 350% on a single node is not nice to your neighbors.


Thank you very much for your continuos help here. Took your advice (very valid one for that matter) and switched to (4) 1024 instances. Things are operating much better now, will know better tonight when the traffic picks up. On any case it should be better now as far as the CPU's are concerned since we went from 8 CPU's to 16 CPU's

Quote:
502 Bad Gateway or 504 Gateway Timeout?


Was seeing 504's

Thanks again.
Dave


Top
   
 Post subject:
PostPosted: Tue Jan 17, 2012 5:13 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
For 504 errors, try increasing listen.backlog in your FPM pool configuration.


Top
   
 Post subject:
PostPosted: Thu Jan 26, 2012 4:57 pm 
Offline
Senior Newbie

Joined: Thu Apr 30, 2009 2:37 am
Posts: 12
ICQ: 1365234
Website: http://seekeraftertruth.com
WLM: faisal_humayun@hotmail.com
Yahoo Messenger: faisal.humayun
AOL: faisal+humayun
Location: Deerfield Beach, FL
2 cents -

Do you have a wordpress caching plugin enabled?
Are you using plugins to deter trackback/spam ?

If you've got any plugins (php scripts) doing DNS lookups/IP translations, that'd be sure to spike your php5-fpm usage.

For giggles, I'd recommend you disable a batch of your anti-spam plugins (e.g. popular plugins like akismet, Simple Trackback Validation with Topsy Blocker, etc). Restart nginx/php5-fpm observe htop.

Also, bear in mind that the newest wordpress cores, especially 3.3.1 have deprecated many functions, so while older plugins may still work, they do so with significantly more overhead. So, in some cases, rolling back to the last known stable core is also particularly helpful.

_________________
Faisal Humayun


Top
   
 Post subject:
PostPosted: Thu Jan 26, 2012 6:00 pm 
Offline
Senior Newbie

Joined: Mon Jan 16, 2012 12:04 pm
Posts: 11
Location: United States
Hello,

Quote:
Do you have a wordpress caching plugin enabled?


Yes, W3 Total Cache is enabled with very minimal caching, i.e. object caching, database caching and a configured cloudflare account there.

Quote:
Are you using plugins to deter trackback/spam ?


Yes, Akismet has been configured to block spam.

Right now, we have spread the load over 4 nodes using a load balancer and things are ok, but I will take your advice and disable some of the above and see if anything changes as far as load.

Thank You!
Dave


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group