Skip to content

Entries from September 2012.

Time Tracking in Bug Trackers

Modern bug tracking software are not only the bug trackers. There are more features. Typical additional components that are included are:

  • Wiki systems: to share knowledge
  • Estimations
  • Time tracking

I'd like to focus today on time tracking capabilities and compare few mainstream bug trackers: Trac, Redmine, FogBugz, Jira.

Why do I expect time tracking to be included in ticket management system? Some of our project are invoiced by time & materials, so it is useful for a developer (or designer, or tester, ...) to register hours spent on subject in his basic workflow tool (bug tracker).

What functionality do I expect from such time tracking module:

  • Able to start/stop work on particular issue
  • Able to fix existing time table (in case of a mistake)
  • Show total time spent on issue
  • Compute total time spent on milestone / software component / ... (reporting with aggregations)

Let's review some bugtrackers time tracking abilities then.

Trac

Trac is Python-based ticket tracker well known for his integrated Wiki and extremely powerful queries tool.

We have few options to select from:

Redmine

Redmine, on the other hand, is written in Ruby. It's positioned as biggest Trac competitor. Unlike Trac "batteries are included" in default installation (no need to install additional plugins).

Redmine contains "Log time" option, but there's no start/stop options available:

FogBugz

FogBugz is a commercial, hosted solution used by my customer at the moment.

There's "Working On" menu you can use to start (by selecting an issue) or stop (select "Nothing") time measurement. Very convenient feature. I don't have to track time, just switching current tasks and resulting reports are generated automatically.

Jira

I was using that commercial, hosted, solution in the past for Websphere-based projects management. Probably written in Java.

Also time tracking is included, but without start/stop (is there any plugin for such feature present?).

Summary

Currently I see FogBugz the only viable option for time tracking, however Trac's query language (you can embed queries even in your Wiki page) attracts my attention to Trac. Looks like I will have to write start/stop time tracking plugin for Trac myself. Have I missed any existing plugin with that functionality?

[SOLVED] "too many jobs" mysterious bug in libupnp

One of my responsibilities in current project is analysis of "hard" bugs - problems that are not easily reproducible or the cause is so mysterious that remains unresolved (or worse: resolved - not reproducible) even after few iterations developer <-> integration team. The problem visible in production environment but not reproducible by developer is pretty simple: you have to check logs carefully and match source code to derive results. Random problem with unknown exact reproduction steps but visible sometimes in PROD environment is harder - there's even no easy validation procedure for our fixes.

Recently I've hit one of most mysterious problems ever seen. After some time application logs are filled by "too many jobs" error message. The source of this message was tracked to libupnp library by google lookup pretty easily.

Libupnp handles UPNP (Universal Plug And Play) protocol that is used for auto-discovery of network services (printers, streaming applications, ...). When new multicast announcement is received uPNP client downloads XML description of that service using HTTP protocol. This activity is done in newly created separate thread. Libupnp limits number of threads to 100 in order not to abuse system resources.

And here comes the problem localisation: with frequent multicast announcements and very slow (or unavailable) HTTP service it possible for libupnp to create 100 threads waiting for HTTP responses and no new threads can be created. I prooved that by:

  • starting local uPNP service (mediatomb) on my laptop: service mediatomb start
  • block HTTP port that serves description: iptables -A INPUT -p tcp --dport 49152 -j DROP ("DROP" prevents from sending any response, so a peer don't know what's going on)
  • Generate some UPNP_* events (hello/bye): while true; do service mediatomb restart; sleep 1; done
  • Watch logs

The solution seems to be very easy: lower HTTP timeout so "unavailable" uPNP devices (note the quotes, in my opinion it's a network configuration error) will not block our device. After searching in source code I located the line:

#define HTTP_DEFAULT_TIMEOUT   30

I lowered it to 2 seconds, rebuilt libupnp and did my test again. And it didn't work.

I've analysed where this timeout is applied, checked every socket API call and found the following line:

ret_code = connect( handle->sock_info.socket,
    ( struct sockaddr * )&peer->hostport.IPv4address,
    sizeof( struct sockaddr_in ) );

note that it's a blocking call with timeout set by operating system (on Linux it depends on /proc/sys/net/ipv4/tcp_syn_retries).

So even if we apply timeouts properly when HTTP (TCP) port is open it's the port opening time that gives so big delay (>20 seconds in our case). Solution was to use non-blocking connections and select() call properly (pseudo code below):

// set non-blocking mode
fcntl(sock, F_SETFL, flags | O_NONBLOCK);
// start connecting in background
connect(sock, (struct sockaddr *)&sa, size));
// wait until connection is made or timeout is reached
ret = select(sock + 1, &rset, &wset, NULL, (timeout) ? &ts : NULL));
// put socket back in blocking mode
fcntl(sock, F_SETFL, flags);

After that change unavailable (due to network error or local firewall) devices were quickly discovered (in 2 seconds).