Skip to content

Entries from October 2010.

Micro-tools: Linux command line find & replace

Sometimes you have to update many files at once. If you are using an IDE there is option during replace "search in all files". I'll show you how make those massive changes faster, using just Linux command line. Place this file in ~/bin/repl

#!/bin/sh

PATTERN="$1"
FROM="$2"
TO="$3"

FILES=$(grep -le "$FROM" `find . -name "$PATTERN"`)
echo Files: $FILES

sed -i "s!$FROM!$TO!g" $FILES

Sample usage is:

repl "*.cpp" "loadFiles" "retrieveFiles"

Some explanation about script:

  • grep -l: shows on stdout only filenames of matching files
  • grep -e: accepts pattern as argument
  • sed -i: changing files "in place"
  • grep -le: we want only files that will match $FROM to be updated (minimizes build time after change)

Pretty easy and powerful. Enjoy!

"svn status" for Perforce

Status command is very important part of any VCS (Version Control System) local interface. It allows you to check your workspace state and ensure correct commit will be created.

Perforce is a commercial VCS that is similar to CVS (revisions per file) and SVN (global revisions of whole tree). It's support for status command is very clumsy. Let's check how we can emulate status with this tool using small script:

echo === extra files not tracked by Perforce ===
find . '!' -type d '!' -executable | p4 -x - have 2>&1 | grep 'not on client' | \
 sed '/\/moc_/d;/\.so/d;/\.o /d;/Makefile /d;/\.a /d'

echo === Current changelist ===
p4 changelist -o

As you can see we implemented "ignore" mechanism in above script (by sed filtering). "Extra files not tracked by Perforce" reflects "?"-status from CVS/SVN. "Current changelist" reflects "A,D,U"-statuses from CVS/SVN.

By using such script you can ensure your commit contains all files from your workspace, thus will be buildable on other developers' machines.

Codebase Integration Scenarios

Last week I've been observing details of integrating source code coming from three different development teams (located in different countries). Each team owns some subset of modules and has R/W access only to those modules. Of course compile dependencies cross the borders in many places, so global changes usually done in one commit had to be split into at least two commits (because of missing W permission for someone).

There was one development branch and one integration branch. Development branch most of this time was in not "buildable" state (permanent build error), so no one was able to ensure changes are not breaking the build before commit until stabilisation. Integration branch was loaded using files copied from development branch when stable state was achieved (I know, lame). This integration style (allowing for non-buildable commits on development branch) blocks parallel integrations (one have to wait for stable state on development branch).

Are there better ways to organise such integration?

Shared Codebase Ownership

In this scenario every team has full write access to all modules and it's possible to create single commit by team T1 that spans module boundaries: refactors module A (owned by T2) AND calling code from module B (owned by T1).

There will be only one "a must" rule: all codebase should be compilable after every commit. Modifications to "not owned" modules should be minimised (only to make code compilable), probably with some "FIXME" left in code.

Doing that operations in single commit allows you (1) to inspect that commit afterwards (2) retain "green build" property after every commit.

Having always-compilable head we will be able to schedule such operations in paralell. Nobody will be blocked (of course teams have to communicate/consult such cross-border changes before commit).

In this case applied changes are visible by diff-ing selected changesets.

Topic Branches

If globally shared codebase is not an idea to be considered we could use short-living branches (called sometimes "topic branches") that get updates from both teams and after stabilisation they are merged back into main branch (and deleted).

Code on topic branch could be unstable, on development branch must be always stable.

Note that merging person should have write access to all "dev" branch (in this aspect this strategy is similar to previous one). I used to place at this point code review process.

In this case pending changes are visible by doing a diff on existing topic branch.

Moving "Stable" Tag

This strategy can be used for file-tracking systems like CVS or Perforce. For those VCS-es one can move tag to new version for subset of files (SVN, GIT, Bazaar will not allow that).

Person that created "stable" commit should create/move some tag to current revision.

$ p4 tag -l STABLE ...

Then anyone interested in stable state would sync to that state:

$ p4 sync ...@STABLE

Then I can fetch fresh (from head) files I'm going to change:

$ p4 sync P1/...

After adding fixes I'm committing the change:

$ p4 submit

and moving stable label for files I've changed (I must check if full build is green before):

$ p4 tag -l STABLE P1/...

In this case pending changes are visible by diff-ing STABLE..head revisions range.

Summary

If some independent activities cannot be performed in paralell it's a bad sign. It means some artificial dependencies were introduced and it results in slower progress for project (caused by serialisation).

In this case missing "W" access was the cause for additional burden. Unstable main branch was a global semaphore that blocked everyone (at least in integration terms).

DCSim: digital circuit simulator

Recently I've found my old project that was prepared during studies at Warsaw University of Technology. It's called DCSim (Digital Circuit Simulator) and is written using C++ with old Athena widget library (does anyone remember libxt6.so?).

Software allows to define compound circuits and construct large networks from basic digital gates.

Probably the code will not compile cleanly using current compiler set (I used GCC 2.7 then), I hope someone will clone this project and adapt for more current widget library (QT/GTK). Any volunteers :-) ?

Micro-tools: making developers life easier / C++ symbols

Good software developers are lazy persons. They know that any work that can be automated should be automated. They have tools and skills to find automated (or partially automated) solution for many boring tasks. As a bonus only the most interesting tasks (that involve high creativity) remain for them.

I think I'm lazy ;-). I hope it's making my software development better because I like to hire computer for additional "things" apart from typewriter and build services. If you are reading this blog probably you are doing the same. Today I'd like to focus on C++ build process example.

The problem

If you're programmer with Java background C++ may looks a bit strange. Interfaces are prepared in textual "header" files that are included by client code and are processed together with client code by a compiler, .o files are generated. At second stage linker collects all object files (.o) and generates executable (in simplest scenario). If there are missing symbols in header file compiler raises an error, if symbol is missing on linking stage linker will complain.

On recent project my team was responsible for developing a C++ library (header and library files). This C++ library will be used then by another teams to deliver final product. Functionality was defined by header files. We started development using (automated by CPPUnitLite) unit tests then first delivery was released.

When other teams started to call our library we discovered that some method implementations were missing! It's possible to build a binary with some methods undefined in *.cpp files as long as you are not calling them. When you call that method from source code linker will find undefined symbol.

I raised that problem on our daily stand-up meeting and got the following suggestion: we have to compare symbols stored in library files (*.a, implementation) with method signatures from public *.h files (specification). Looks hard?

The solution

It's not very hard. First: collect all symbols from library files, strip method arguments for easier parsing:

nm -C ./path/*.a | awk '$2=="T" { sub(/\(.*/, "", $3); print $3; } '\
  | sort | uniq > $NMSYMS

Then: scan header files using some magic regular expressions and a bit of AWK programming language:

$1 == "class" {
   gsub(/:/, "", $2)
   CLASS=$2
}

/^ *}/ {
   CLASS = ""
}

/\/\*/,/\*\// {
   next
}

buffer {
   buffer = buffer " " $0
   if (/;/) {
     print_func(buffer)
     buffer = ""
   }
   next
}

function print_func(s) {

   if (s ~ /= *0 *;/) {
     # pure virtual function - skip
     return
   }
   if (s ~ /{ *}/) {
     # inline method - skip
     return
   }

   sub(/ *\(.*/, "", s);
   gsub(/\t/, "", s)
   gsub(/^ */, "", s);
   gsub(/ *[^ ][^ ]*  */, "", s);
   print CLASS "::" s
}

CLASS && /[a-zA-Z0-9] *\(/ && !/^ *\/\// && !/= *0/ && !/{$/ && !/\/\// {
   if (!/;/) {
     buffer = $0;
     next;
   }
   print_func($0)
}

Then make above script work:

awk -f scripts/scan-headers.awk api/headers/*.h | sort | uniq > $HSYMS

Now we have two files with sorted symbols that looks like those entries:

ClassName::methodName1
ClassName2::~ClassName2

Let's check if there are methods that are undefined:

diff $HSYMS $NMSYMS | grep '^<'

You will see all found cases. Voila!

Limitations

Of course, selected soultion has some limitations:

  • header files are parsed by regexp, fancy syntax (preprocessor tricks) will make it unusable
  • argument types (that count in method signature) are simply ignored

but having any results are better that have no analysis tool in place IMHO.

I found very interesting case by using this tool: an implementation for one method was defined in *.cpp file but resulting *.o file was merged into private *.a library. This way public *.a library has still this method missing! It's good to find such bug before customer do.

Conclusions

I spent over one hour on developing this micro tool, but saved many hours of manual analysis of source code and expected many bug reports (it's very easy to miss something when codebase is big).