Dariusz on Software Quality

14/06/2013

Find most CPU-hungry threads under Linux

Filed under: en — Tags: , — dariusz.cieslak @

It’s pretty easy to find CPU usage per process (top, ps), but If you want to find top CPU users per thread there’s a method:

$ awk '{printf "TID: %6u  %8u %8u %8u \t %s \n", $1, $14, $15, $14+$15, $2;}' /proc/*/task/*/stat | sort -n -k 5 | tail -10
TID:   3600     33700    11461    45161      (metacity) 
TID:   3612     38053     7716    45769      (nautilus) 
TID:   3634     65079        0    65079      (dconf 
TID:   3601     64168     4997    69165      (gnome-panel) 
TID:  17298    137525    15650   153175      (firefox) 
TID:   3840    168736    21351   190087      (skype) 
TID:  11622     76119   150472   226591      (multiload-apple) 
TID:   3616    492076    53810   545886      (gnome-terminal) 
TID:   3617    370649   220327   590976      (skype) 
TID:   1072    626086  1031732  1657818      (Xorg)

Results are sorted by cumulated CPU usage (user + system).

25/05/2013

Migration to python subprocess module

Filed under: en — Tags: — dariusz.cieslak @

After recent OS upgrade one of my unit tests started to fail (to be precise it started to hang). Quickly check showed me that CGI process started by os.popen() hanged. The old source code:

f = os.popen("./cgi_script.cgi > /dev/null", "w")
f.write(postBody)
f.flush()
f.close()

As os.popen() is deprecated now (I know, it’s a very old codebase that started with Python 1.5) I’ve moved to new subprocess module:

fNull = file("/dev/null", "w")
p = subprocess.Popen("./cgi_script.cgi", shell=False, bufsize=1024, stdin = subprocess.PIPE, stdout = fNull)
fw = p.stdin
fw.write(postBody)
fw.flush()
fw.close()
del p

As you can see it’s more verbose now but I’ve eliminated shell (slightly faster operation).

Some notes found during migration:

  • without “del p” process may be not terminated causing problems with DB state (CGI proces updates database and test checks this state later)
  • I/O configuration is more flexible than os.popen() – you can make pipes more easily

23/05/2013

Server flood automatic detection

Filed under: en — Tags: , — dariusz.cieslak @

My current customer develops embedded devices used by many end users in Netherlands. In order to save server load devices use multicasts for downloading data: every device registers itself on multicast channel using IGMP and listens to UDP packets. No connections to be managed results in lower overhead.

However, some data (or some requests) cannot be downloaded from multicasts channels and HTTP/HTTPS must be used to interact with server. As the number of devices is very high special methods have been used in order not to overload backend servers (randomised delays, client software optimization).

Consequently, small bug in client software that will trigger more events than usual can be very dangerous to whole system stability (because the effect of thousands of devices – perfect DDOS may kill back-end). In order to catch such errant behaviour as soon as possible I’ve developed daily report that controls server usage in my customer office.

First of all, we need to locate the most “interesting” device by IP address from logs (we list top 20 IPs based on server usage):

    ssh $server "cat /path-to-logs/localhost_access_log.$date.log" | awk '
        {
            t[$1 " " $7]++
            ip[$1]++
        }
        END {
            for(a in t) { print t[a], a }
            max = 0
            max_ip = ""
            for(a in ip) { if(ip[a] > max) { max=ip[a]; max_ip=a; } }
            print max_ip > "/tmp/max_ip"
        }
    ' | sort -n | tail -20
    IP="`cat /tmp/max_ip`"

Then selected IP will be examined hour-by-hour to locate patterns in behavior:

(more…)

Google.com performance analysis over few years

Filed under: en — Tags: , , — dariusz.cieslak @

Time is money. No doubt about that. The more time your customer waits for your service response the less coins may hit your pocket. You may ask: why? If your page loading is too slow your “almost” customer hits back in his browser and selects another link from sponsored links Google provides him. You loose this customer in favour of another supplier.

What about Google? Was their service (no kidding, it is a sevrice that searches information, paid by ads) always so blazingly fast? Let’s check this site performance monitored from London server over few years:

As we can see there were some small problems in 2009 (~5% of requests required more than 1 second to be processed). Not so bad.

(more…)

01/05/2013

HTTP(S) exchange analysis using Wireshark

Filed under: en — Tags: — dariusz.cieslak @

Wireshark is a tool that allows to scan network packets and make analysis of network connection without direct access to server or client. Today we will show simple method to analyse TCP connections using this tool.

TCP connection is composed of many IP packets, connected by common strem index number. You can select particular TCP stream using Analyze / Follow TCP stream option or directly select given stream by it’s index:

tcp.stream eq 9

If you want track every opened connection you can check 1st packet of every TCP stream opened to particular server IP (213.75.34.114 in our example):

tcp.flags.syn==1 and tcp.flags.ack==0 and ip.dst == 213.75.34.114

Note that with HTTP/1.1 things may be more complicated as this protocol supports “Persistent/Keep Alive” mode that allows multiple requests over one connection, so you may see only one packet with “tcp.flags.syn==1 and tcp.flags.ack==0″. In order to scan full exchange you have to analyse protocol contents for request / response pairs.

Another complication is HTTPS (HTTP over SSL layer) – you won’t be able even to count requests (if using “Keep Alive” mode). In this scenario you have to check traffic after HTTPS node or just inspect server logs.

Older Posts »

Powered by WordPress