Auto Profiling For Continuous Integration

When you have auto-build and auto-test process already in place you can use the same infrastructure to catch early performance problems as well. It's not as complicated as you may think.

First of all you can monitor your CPU/IO usage during tests and take snapshots on "errant" situation. If there's no high local processing probably 100% CPU usage means that there are some performance problems in your software. Sample:

top -b -n 1 | awk '
  / 0% idle/ { enable=1 }
  $7 > 20 && $0 && enable { print "kill -SIGUSR2 " $1; }
' | sh

Above script checks for 0% idle time and sends every process that uses above 20% of CPU SIGUSR2 signal. Installed signal handler will make snapshot of current running thread and will give enough information to fix the performance issue.

If your environment doesn't allow to inspect current thread state at any time you can try to guess performance problems source and can introduce special profiling code to catch problems early. For example: typical performance problem is resource over-usage. Resource (remote server, database, …) may be called very often to retrieve the same data – it's a candidate for local caching.

void ResourceProfiler::count() {
    if (!triggerCount) {
        return;
    }
    counter++;
    if (counter >= triggerCount) {
        time_t deltaS = time(NULL) - timeOfFirstOccurenceS;
        if (deltaS <= intervalS) {
            const int BUFFER_SIZE = 256;
            char buffer[BUFFER_SIZE];
            snprintf(buffer, BUFFER_SIZE, "Abused resource: %s=%d, %s=%ds", ENV_TRIGGER_COUNT, counter, ENV_INTERVAL_S, intervalS);
            warning(buffer);
        }
        timeOfFirstOccurenceS = 0;
        counter = 0;
    } else if (!timeOfFirstOccurenceS) {
        timeOfFirstOccurenceS = time(NULL);
    }
}

Above example shows core method of resource usage profiling code that issues a warning when there are more than triggerCount events in intervalS time frame. You can install this handler by calling count() when given resource is used (served is called, database record is loaded, …). Over-usage will be located dynamically during auto-tests.

Above method can be generalized to count usage per SQL query to locate very frequent SQL queries (typical N+1 SELECT problem). Instead of use one counter use Map<String, int> for counters and Map<String, time_t> for timestamps. It's left as an exercise for a reader.

This entry was posted in en and tagged , , . Bookmark the permalink.