It's very important to know about server problems before they have impart on overall system performance and stability. Typical problem with server that may occur:
- missing disk space
- high server load (caused by CPU/IO)
In past I had problems with web applications failing to operate because lack of free disk space used for session storage. From this time I used to install monitoring on every resource that may be a problem for system.
The easiest way (Keep It Simple Stupid) to be monitored is to redirect email from root@localhost and install checks on root's crontab. Output written to stdout/stderr (if present) is send to crontab owner after script execution. Our srcipts will only generate output when problem is found. For example:
Notify high (>4) load on server:
*/5 * * * * cat /proc/loadavg | awk '$2 > 4 { print "High 5-minute load", $2 }'
Notify when used disk space is above 90%:
00 21 * * * df | awk '/^\// && $5 > 90 { print $0 }'
Many additional checks could be configured that way.
In addition I used to install munin. It produces graphs that shows various resources levels (day, week, month and year perspective). You can see in one place your overall system. It's sample graph showing month VPS instance load:
Do you have similar tools to monitor your server performance?