One of my customers had many problems with stability of his server. Service stopped under medium web traffic and the only resolution was server restart. I was asked for optimisation of core components (MySQL database / Apache2 web server) but the result was not very satisfactory. I managed to limit swapping by limitting web server concurrency but it was not real problem solution.
Finally I traced problem source to PHP application run on this server. Joomla (especially forum script) caused system to crawl. One process needed above 60 MB of RSS memory. Imagine system handling 20 concurrent requests. The problems became visible when database size exceeded 200 MB. Source code analysis located the problem: loading big recordsets (whole tables) into memory. One PHP process without memory limit might "swap out" whole server.
My customer decided to rewrite from scratch source code, you will find below results of this optimisation (mostly Munin graphs). New system was installed on 27th March.
Here's number of slow queries recorded by MySQL, you can see they almost disappeared:
And below MySQL queries, two red spikes are database upgrade during transition into new system. After new version installation SQL queries dropped few times:
And here's CPU usage, again: dropped below 20%:
Thanks to the optimisation system load is now under 1:
And network can handle more traffic when application is optimised (faster page generation and loading):
And finally: response time recorded from site-uptime.net service ("visible" by your browser) is almost flat line!
In this case big optimisation (rewrite from scratch) deliveded "big" results. The hardest part is always to guess what's the performance bottleneck before optimisation. Sometime it's IO, sometimes CPU (especially on virtual servers), sometimes application is badly written. Monitoring tools (like Munin or site-uptime.net) will help a lot in this task.
Now the system has a huge space to grow.