Dariusz on Software: entries from October 2008

About This Site

Software development stuff

Entries from October 2008.

Thu, 09 Oct 2008 16:46:01 +0000

Hosting is cheaper than virtual server or dedicated machine on a network. This fact is true because resources are shared between many customers and one can place hundreds of accounts on one physical server. Adding too much accounts leads to slowing server responses and to instability in general. It's called overselling. Do you know how to check if your hosting account is overselled?

The Fighters

I'm comparing three hosting providers here:

DreamHost.com - popular host provider from USA
Kei.pl - good hosting, 24/7 administration from Poland
Linuxpl.com - Polish company, servers in Germany, cheap hosting with SSH option

I'm using separate Linux box located in Poland to reach analysed servers with few tools (awk, cron, bash).

The Time

First, the most important value you can measure is time to get a response from your server. Of course checking only for ping values is not sufficient here. You have to measure HTTP traffic (typical task for hosting account is to serve content by HTTP protocol). Let's think about what resources can slow down your response time:

CPU (too much load caused by other customers)
bandwidth (too much concurrent traffic to/from server)
SQL database overload (typically on separate server)

Let's observer three hostings in work by measuring their time to get simple dynamically generated page (5 minute interval):

A legend:

red: DreamHost.com
green: Kei.pl
blue: Linuxpl.com

As you can see sometimes server uses very log time to get pages (up to 14 seconds). You can see how often such situation occurs and make histogram from time. Lets zoom fragment of this graph:

You can see differences here between server in the same country (low ping values) and from USA (higher minimal time = pings above 200 ms). The closest server (green) has higher average time value because of higher variance.

I added -1 value if HTTP client (here: wget) returned an error during fetching a page. You can see red host (DreamHost) was inaccessible by 25 minutes. Let's compare average time to get a page:

Kei.pl: 0,522 s
Linuxpl.com: 0,330 s
Dreamhost.com: 0,747 s

Availability

The higher availability - less page visitors will see blank screen during viewing your page. You can measure that parameter by collecting response time (shown above) and compare failed (or very slow) responses to all responses. You can get percentage availability then. Lets assume more than 4 seconds as unacceptable slow value. Here are the results:

Kei.pl: 99,4914 %
Linuxpl.com: 98,5758 %
Dreamhost.com: 99,0854 %

So: the best is the more expensive from the three taken into account.

CPU Load & database

Let's control how CPU load and DB efficiency impact on time to get response on DreamHost account. We measure inside server to get load values and use 1-minute interval to get data from system:

Legend:

red: CPU loadavg (1-minute)
green: time to fetch locally file via HTTP CGI script that's not connecting DB
blue: time to fetch locally file via HTTP that use SQL connection

Let's zoom interesting fragment:

As you can see:

simple scripts can operate efficiently with high (30) load
average load of my server is pretty good (under 2)
peak load is not very big (30 - once per few days)
there's a factor that has impact on execution time, but it's not load average (see next picture). It's a SQL database load that's not directly measured

Why I will not use DreamHost for any serious task?

Let's see situation after one week:

And look into interesting fragment:

You see regular page loading (with SQL queries) over 10 seconds over 7 hours! No comments.

Summary

Using simple tools (wget, cron, bash) you can measure parameter of your new hosting account and decide if it has enough level of availability (it's not oversold). I had many critical problems with load average in the past with few hosting companies so it's better to monitor host for a while before any serious web application is placed on new hosting.

Tags: hosting.

Using FastCGI on a shared hosting

Sat, 04 Oct 2008 08:08:14 +0000

CGI is a standard protocol to run dynamic web-apps using HTTP protocol. It's advantage: standard, available everywhere, disadvantages: it's slow (a process is created for every HTTP request). To overcome this limitations FastCGI has been created (one process serves many HTTP requests). But not everywhere FastCGI is supported. I'll show how to use special bridge to use FastCGI on CGI-only environment.

First, we have to download current FastCGI development kit then compile cgi-fcgi binary usging the same architecture as our hosting provider uses. Then copy compiled binary to server and create small script to run our bridge, lets name it app.bridge:

    #! ./cgi-fcgi -f
    -connect app.sock app.fcgi 1

Here "1" means: create one process, multiple requests will be handled by internal thread manager in application. If your application is not multi threaded you can pass higher number here.

The bridge file must have correct permissions and the following fragment must be added to .htaccess to allow to call this bridge script:

    Options +ExecCGI
    AddHandler cgi-script .bridge

That's all, folks!

Tags: linux.

Git Version Control System usage techniques

Sat, 04 Oct 2008 07:15:49 +0000

We (Aplikacja.info) are a small Polish software company that write mostly in Python language and use Linux for development. GIT is an advanced version control system that was created by Linux Torvalds for maintain Linux kernel source tree. I'll show how git can be connected in Unix shell environment with make tool. Why GIT have been chosen to support our version control needs?

Failure in development server will not block our work
Access to history is very fast (it's stored on every working copy)
I used to work during flight, GIT supports all VCS operations off-line (commit, diff, etc.)

The hard thing about GIT is that it may confuse beginners. If you are one, remember:

Always use git diff HEAD instead of git diff
Always using git commit -a instead of git commit

Some of this gotchas have been addressed by some GIT interfaces, but they are deprecated now. In order to ease GIT (and earlier, CVS) usage we introduced additional Makefile targets that handle typical source control tasks.

Examining working copy.

make st shows current working copy state with all local branches and current selected branch:

    st:
        git branch
        git status -a | cat

(cat simply remove pager call). make di shows differences between last committed version and working copy:

    di:
        git diff HEAD

Locally stored history (few last commits) can be inspected by make ch:

    ch:
        git log --pretty=oneline -15 | cat

We are using topic branches strategy to develop software, so switching branch (make sb) is a very common operation:

    sb:
        @read -p "name of existing branch to switch to [a-z_0-9]+: "\
            branch_name;\
        git checkout $$branch_name;

Local commits

If we have some changes uncommitted in working copy we can commit them now:

    commit: di st
        git commit -a

Above command shows current changes to be committed (di) and state of a repository (st) then allows to enter comment.

Going remote

Lets synchronise our working copy with central repo (without locally unmodified changes): make sync:

    sync:
        -git pull $(REPO) +`git branch | awk '/^\*/ {print $$2}'`
        git push $(REPO) `git branch | awk '/^\*/ {print $$2}'`

It downloads current branch from central server (this operation can fail when no branch exists yet, so we ignore errors here by "-" character) and pushes un-synced commits made.

Tags: git.

Dariusz on Software

Methods and Tools