Dariusz on Software Quality & Performance

14/06/2013

Find most CPU-hungry threads under Linux

Filed under: en — Tags: , , — dariusz.cieslak @

It's pretty easy to find CPU usage per process (top, ps), but If you want to find top CPU users per thread there's a method:

$ awk '{printf "TID: %6u  %8u %8u %8u \t %s \n", $1, $14, $15, $14+$15, $2;}' /proc/*/task/*/stat | sort -n -k 5 | tail -10
TID:   3600     33700    11461    45161      (metacity) 
TID:   3612     38053     7716    45769      (nautilus) 
TID:   3634     65079        0    65079      (dconf 
TID:   3601     64168     4997    69165      (gnome-panel) 
TID:  17298    137525    15650   153175      (firefox) 
TID:   3840    168736    21351   190087      (skype) 
TID:  11622     76119   150472   226591      (multiload-apple) 
TID:   3616    492076    53810   545886      (gnome-terminal) 
TID:   3617    370649   220327   590976      (skype) 
TID:   1072    626086  1031732  1657818      (Xorg)

Results are sorted by cumulated CPU usage (user + system).

23/05/2013

Server flood automatic detection

Filed under: en — Tags: , — dariusz.cieslak @

My current customer develops embedded devices used by many end users. In order to save server load devices use multicasts for downloading data: every device registers itself on multicast channel using IGMP and listens to UDP packets. No connections to be managed results in lower overhead.

However, some data (or some requests) cannot be downloaded from multicasts channels and HTTP/HTTPS must be used to interact with server. As the number of devices is very high special methods have been used in order not to overload backend servers (randomised delays, client software optimization).

Consequently, small bug in client software that will trigger more events than usual can be very dangerous to whole system stability (because the effect of thousands of devices – perfect DDOS may kill back-end). In order to catch such errant behaviour as soon as possible I've developed daily report that controls server usage in my customer office.

First of all, we need to locate the most "interesting" device by IP address from logs (we list top 20 IPs based on server usage):

    ssh $server "cat /path-to-logs/localhost_access_log.$date.log" | awk '
        {
            t[$1 " " $7]++
            ip[$1]++
        }
        END {
            for(a in t) { print t[a], a }
            max = 0
            max_ip = ""
            for(a in ip) { if(ip[a] > max) { max=ip[a]; max_ip=a; } }
            print max_ip > "/tmp/max_ip"
        }
    ' | sort -n | tail -20
    IP="`cat /tmp/max_ip`"

Then selected IP will be examined hour-by-hour to locate patterns in behavior:

(more…)

Google.com performance analysis over few years

Filed under: en — Tags: , , , — dariusz.cieslak @

Time is money. No doubt about that. The more time your customer waits for your service response the less coins may hit your pocket. You may ask: why? If your page loading is too slow your "almost" customer hits back in his browser and selects another link from sponsored links Google provides him. You loose this customer in favour of another supplier.

What about Google? Was their service (no kidding, it is a sevrice that searches information, paid by ads) always so blazingly fast? Let's check this site performance monitored from London server over few years:

As we can see there were some small problems in 2009 (~5% of requests required more than 1 second to be processed). Not so bad.

(more…)

21/09/2011

Automatic free disk space monitoring on Linux server

Filed under: en — Tags: , , — dariusz.cieslak @

The simplest way to monitor free disk space on Linux serwer, just place it in crontab:

0 8 * * *     df -h | awk '/^\// && $5 > 95 { print "missing space", $0 }'

and ensure e-mail to this user is forwarded to you then you will see e-mail when occupied space is higher than 95% on any disk.

Pretty simple.

20/05/2010

MySpace vs Facebook vs Twitter uptime comparision

Filed under: en — Tags: , — dariusz.cieslak @

Most of us are using Web2.0 sites but massive user base that logins every second is a big challenge to system performance. Let's see how engineers working for MySpace, Facebook and Twitter are doing their job.

Note: all uptime buttons and images below are generated in real-time, you can click on images / baners to see reports with details directly from site-uptime.net.

MySpace

MySpace uptime

MySpace uptime

(more…)

Older Posts »

Powered by WordPress