Skip to content

Entries from March 2010.

Optimisation Matters, Really

One of my customers had many problems with stability of his server. Service stopped under medium web traffic and the only resolution was server restart. I was asked for optimisation of core components (MySQL database / Apache2 web server) but the result was not very satisfactory. I managed to limit swapping by limitting web server concurrency but it was not real problem solution.

Finally I traced problem source to PHP application run on this server. Joomla (especially forum script) caused system to crawl. One process needed above 60 MB of RSS memory. Imagine system handling 20 concurrent requests. The problems became visible when database size exceeded 200 MB. Source code analysis located the problem: loading big recordsets (whole tables) into memory. One PHP process without memory limit might "swap out" whole server.

My customer decided to rewrite from scratch source code, you will find below results of this optimisation (mostly Munin graphs). New system was installed on 27th March.

Here's number of slow queries recorded by MySQL, you can see they almost disappeared:

And below MySQL queries, two red spikes are database upgrade during transition into new system. After new version installation SQL queries dropped few times:

And here's CPU usage, again: dropped below 20%:

Thanks to the optimisation system load is now under 1:

And network can handle more traffic when application is optimised (faster page generation and loading):

And finally: response time recorded from site-uptime.net service ("visible" by your browser) is almost flat line!

Now systems performs very well.

In this case big optimisation (rewrite from scratch) deliveded "big" results. The hardest part is always to guess what's the performance bottleneck before optimisation. Sometime it's IO, sometimes CPU (especially on virtual servers), sometimes application is badly written. Monitoring tools (like Munin or site-uptime.net) will help a lot in this task.

Now the system has a huge space to grow.

How To Effectively Migrate Web Application Between Hosts

In Agile world there are no immutable constraints. Your requirements may change, libraries used may be replaced during development, application may outgrown your current server setup etc. I'll show you how to make web application migration between servers as fast as possible: with minimum downtime and data consistency preserved (techniques also apply to hosting providers environment).

Known Problems

You may say: moving a site? No problem: just copy your files, database and voila! Not so fast. There are many quirks you may want to handle properly:

  • DNS propagation time
  • Database consistency
  • Preserve logs
  • Preserve external system configuration
  • Environment change impact integration tests

DNS Propagation Time

DNS is a distributed database handed by servers along the globe. In order to be efficient some kind of caching had to be introduced. For every DNS record you can define so called "TTL": "Time To Leave" (time in seconds when the information can be cached safely). Typical it's half hour. I suggest you to make TTL shorter (5 minutes for example) during transition.

Then create new site under new temporary address and validate if it's working properly. I suggest to configure temporary.address.com in your DNS and redirect it to your new location. This way you can access new site and validate if all is working properly under new address.

Setup redirection on old site. This redirection will work during DNS update time. Here's example that can be used for Apache (place it in .htaccess if AllowOverride is enabled):

Redirect 301 / http://temporary.address.com

Here's the poor-mans' version that use HTML HTTP-EQUIV header (use it if you don't have .htaccess enabled on your account):

<meta HTTP-EQUIV="REFRESH" content="0; url=http://temporary.address.com">

This will redirect all visitors that open your site in transition period. After it ends add redirect in opposite way: from temporary address to main application address.

Database Consistency

The worst thing you can do is to leave two public versions of your application and leave them "open" for customers. You will end up with two separate databases modifications that will result in data loss (it's practically impossible to merge updates from two databases). I suggest the following procedure:

  1. Setup new site under temporary address (see previous section)
  2. Prepare fully automated script that on new site (easy with SSH + RSA keys + Bash):
    1. block both application instances (service message)
    2. download database dump from old site
    3. install database dump on new site
    4. unblock old site and redirect to temporary adress
    5. unblock new site
  3. Test the script (only 2, 3 steps)
  4. Run the script
  5. Reconfigure DNS-es

Above procedure ensures:

  • no concurrent DB state modifications (only one public version available)
  • no updates lost (freeze during move)
  • minimum downtime (process is automated)

Preserve Logs

Application logs hold information on application history. They are not critical to application but you may want to move them with application in order to leave historic information for analysis.

Logs could be transferred in the same step as application database. Then new location will just append to transferred logs and all history is preserved.

Preserve External System Configuration

To be checked/migrated (listed in random order).

  • Webserver configuration (i.e. virtuals setup)
  • Crontab setup that supports the application
  • Mail Agent configuration (specific to application being moved)

Environment Change Impact Integration Tests

Your new environment may differ in environment setup to old one. That's why temporary DNS address is very useful here. You can make integration test on new location before migration to be sure no environment changes will break application. The following environment attributes need to be taken into account:

  • OS architecture (32 / 64 bit)
  • OS (i.e. Debian, Centos)
  • Webserver software used (Apache / Lighttpd / Ngnix)
  • Database versions
  • Application server (JBoss, Weblogic or just plain Tomcat?)

Are there other issues I haven't mentioned?

Generic Types in Java 1.5

Generics are very useful Java language feature introduced in Java 1.5. Starting from 1.5 you can statically declare expected types of objects inside collections and compiler will enforce this assumption during compilation:

Map<String, BankAccount> bankAccounts = new HashMap<String, BankAccount>();
bankAccounts.put("a1", bankAccount);
bankAccounts.add("a2", "string"); <-- compilation error
Integer x = bankAccounts.get("a1"); <-- compilation error
bankAccounts.put(new Integer("11"), bankAccount); <-- compilation error

Many projects, however, keep 1.4 compatibility mode for many reasons. I think 1.5 is mature enough (ok, let's say that: old) so it may be used safely.

Recently I've started migration from big 1.4 project into 1.5 (in order to introduce generics). I changed Java project compliance level and got the following error:

Java compiler level does not match the version of the installed Java project facet

Quick googling for this error showed the two possible solutions:

  • Right mouse click on project / Properties / Projects Facets / change Java version
  • OR: right click on error in Problems view / Select Quick Fix / Choose 1.5 compiler level

After this fix workspace compiles without errors. Generics can be added now. Eclipse supports this refactoring very well: just select: "Refactor / Infer Generic Type Arguments" and your current file will be filled with generic arguments. Very nice!

And the best at the end: you can select whole project for generic type update - Eclipse will insert generics where possible. It will speed up refactoring greatly!

How To Debug Hibernate SQL Queries With Parameters

I bet everyone knows how to enable SQL logging for Hibernate. If you add this parametr to Hibernate configuration:

<property name="hibernate.show_sql">true</property>

you will see queries like this in log file:

select roles0_.EntityKey as Entity9_1_, roles0_.ENTITYKEY as ENTITY9_168_0_
from USERROLE roles0_
where roles0_.EntityKey=?

but wait: what value is passed as parameter by Hibernate? (parameters are marked as "?" in JDBC) You have configure TRACE level for some log categories, here's example for simplelog.properties:

org.apache.commons.logging.simplelog.log.org.hibernate.SQL=trace
org.apache.commons.logging.simplelog.log.org.hibernate.engine.query=trace
org.apache.commons.logging.simplelog.log.org.hibernate.type=trace
org.apache.commons.logging.simplelog.log.org.hibernate.jdbc=trace

For your convenience here's log4j configuration:

log4j.logger.org.hibernate.SQL=trace
log4j.logger.org.hibernate.engine.query=trace
log4j.logger.org.hibernate.type=trace
log4j.logger.org.hibernate.jdbc=trace

Then you will see all passed parameters and results in logs:

[DEBUG] (generated SQL here with parameter placeholders)
[TRACE] preparing statement
[TRACE] binding '1002' to parameter: 1
[TRACE] binding '1002' to parameter: 2
[TRACE] binding '1002' to parameter: 3
[DEBUG] about to open ResultSet (open ResultSets: 0, globally: 0)
[DEBUG] about to close ResultSet (open ResultSets: 1, globally: 1)
[DEBUG] about to close PreparedStatement (open PreparedStatements: 1, globally: 1)
[TRACE] closing statement

Nothing hidden here. Happy debugging!

Building tools review for software developers

Creating software projects is composed of many activities. One of them (not very appreciated by typical developer) is building process. The activity should create executables or libraries from source code. You may say now: hey, you forget about documentation, generated API specification, installation, deployment, ...! As you can see there are many task realised by this activity.

I'll review most popular building tools and will point out their strengths and weaknesses:

  • Make
  • Ant
  • Maven
  • IDE-based builders

Make

Make is the oldest tool mentioned here. Comes from UNIX world, is very popular among all non-java projects. Make uses "Makefile" text file with build specification. Here is sample Makefile that builds executable from C source code:

helloworld: helloworld.o
     cc -o $@ $<

helloworld.o: helloworld.c
     cc -c -o $@ $<

clean:
     rm -f helloworld helloworld.o

General makefile syntax:

target: dependicies
<tab>command1
<tab>command2
...

You can chain dependencies and make will resolve them properly and build them in correct order.

Make is used for Linux kernel development and (with configure, imake, ... support) for many Unix based programs.

Make main benefits:

  • compact syntax
  • basic functionality is a standard among all implementations
  • very popular among operating system distributions

Main disadvantages:

  • Sometimes <tab> usage requirement may be hard for novices
  • Syntax may be cryptic for beginners

Ant

Few quirks built into Make tool (TAB requirement for instance, portability problems, etc) caused Ant to be born. It's Java-based, XML-driven tool that is used mainly for building Java based projects.

<?xml version="1.0"?>
<project name="Hello" default="compile">
    <target name="clean" description="remove intermediate files">
        <delete dir="classes"/>
    </target>
    <target name="clobber" depends="clean" description="remove all artifact files">
        <delete file="hello.jar"/>
    </target>
    <target name="compile" description="compile the Java source code to class files">
        <mkdir dir="classes"/>
        <javac srcdir="." destdir="classes"/>
    </target>
    <target name="jar" depends="compile" description="create a Jar file for the application">
        <jar destfile="hello.jar">
            <fileset dir="classes" includes="**/*.class"/>
            <manifest>
                <attribute name="Main-Class" value="HelloProgram"/>
            </manifest>
        </jar>
    </target>
</project>

Ant tries to extend portability by supplying many built-in operations (make uses shell commands here).

Ant is still low-level tool. You can compose your build process any way you want (you can customize almost everything). The need for creating higher level of abstraction created next tool.

Maven

Maven uses "convention over configuration" philosophy. Configuration is also written in XML, but, unlike Ant, you tell Maven the "WHAT", not "HOW". Most parameters (source folders, target folders) have sensible defaults and you can (theoretically) start work on any Maven based project without surprises.

Main Maven benefits:

  • standardised project structure
  • binary libraries are stored outside source tree (you declare dependencies, libraries are downloaded on demand during build)
  • support for multi-project builds with complicated dependencies

The main problem you can face is the distance from low-level build commands to user interface. Some details are not visible and you need big experience with Maven to diagnose properly problems that may occur during a build.

IDE-based builders

Very popular option for novice/Windows programmers. You can setup build configuration within your IDE (Integrated development Environment) and expect all executables/libraries be build.

Benefits:

  • Easy to use
  • Incremental builds

Drawbacks:

  • Hard to connect to continuous integration tools

Every big project I was working on had IDE-based build duplicated with Maven/Ant based scripts. Why? Because Hudson/Cruise Control builds were created using scripting interface. You can use IDE builders as help during development (it's faster than script rebuilds).

What to choose?

It depends:

  • Small project created on Unix machine: use Make
  • Bigger project with custom build commands: use Ant
  • Multi project build with complicated dependencies: use Maven

Good luck :-)