IMO, monit's greatest advantages are that it is relatively simple and light. I'm not sure I would choose it if I needed super-duper uptime or detailed stats, but for my purpose it is good enough.
Here are selections from my monitrc. Config-specific variables have been redacted and are marked with %%%. Also, note that Postfix is installed in a send-only configuration, so I only care if it is up and accessible from localhost. The built-in HTTP server is set to only bind to localhost - on the rare occasions I need to use it, I do so via an SSH tunnel with the command
ssh -L 2812:localhost:2812 mylogin@mylinodeipaddress.
Code:
###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
#
set daemon 120
#
## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omited, monit will use 'user' facility by default. You can specify the
## path to the file for monit native logging.
#
# set logfile syslog facility log_daemon
set logfile /var/log/monit.log
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
# set alert sysadm@foo.bar # receive all alerts
# set alert manager@foo.bar only on { timeout } # receive just service-
# # timeout alert
set alert %%%YOUR ADMIN E-MAIL ADDRESS%%%
## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
allow %%%LOGIN%%%:%%%PASS%%% # require user LOGIN with password PASS
###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
check system localhost
if loadavg (1min) > 10 then alert
if loadavg (5min) > 8 then alert
if memory usage > 80% then alert
if cpu usage (user) > 70% for 2 cycles then alert
if cpu usage (system) > 50% for 2 cycles then alert
if cpu usage (wait) > 50% for 2 cycles then alert
if loadavg (1min) > 20 for 3 cycles then exec "/sbin/shutdown -r now"
if loadavg (5min) > 15 for 5 cycles then exec "/sbin/shutdown -r now"
if memory usage > 97% for 3 cycles then exec "/sbin/shutdown -r now"
## Check that a process is running, responding on the HTTP request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (mysql) which
## is defined in the monit control file as well.
check process apache with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu > 80% for 5 cycles then restart
if children > 50 then alert
if children > 60 then restart
# Apache MaxClients = 60
if failed host %%%PUBLIC IP ADDR%%% port 80 protocol http
and request "/index.html"
# Some smallish page that should be available when server is up
with timeout 10 seconds
for 2 cycles
# Sometimes Apache doesn't respond right away, so give it two chances before
# forcing a restart.
then restart
depends on mysql
if 3 restarts within 8 cycles then timeout
check process mysql with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
# Base above value on your experience
if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql
# If you use the network instead of a UNIX socket, adjust settings
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then timeout
check process sshd with pidfile /var/run/sshd.pid
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if failed host %%%PUBLIC IP ADDR%%% port 22 protocol ssh 2 times within 2 cycles
then restart
if 3 restarts within 8 cycles then timeout
check process postfix with pidfile /var/spool/postfix/pid/master.pid
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if cpu > 30% for 5 cycles then restart
if totalmem > 60.0 MB for 3 cycles then restart
if failed host localhost port 25 protocol smtp
with timeout 60 seconds
then restart
if 3 restarts within 8 cycles then timeout
## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.
check device filesystem with path /dev/xvda
if space usage > 80% for 5 times within 15 cycles then alert
if space usage > 95% then exec "/etc/init.d/apache2 stop ; /etc/init.d/mysql stop"
if inode usage > 70% then alert
if inode usage > 95% then exec "/etc/init.d/apache2 stop ; /etc/init.d/mysql stop"
## Check a file's timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
# Monitor denyhosts activity, but not as often
check file hosts.deny path /etc/hosts.deny
every 3 cycles
if changed checksum then alert
There are probably many optimizations I could make to the above, but it works well enough to avoid downtime of more than a few minutes. Configuration is easy enough to figure out, which was a major plus in my book. As obs points out, it's not a substitute for proper configuration, but is a useful fallback when things go unexpectedly wrong.