Skip to content

Entries tagged "git".

Git Version Control System usage techniques

We (Aplikacja.info) are a small Polish software company that write mostly in Python language and use Linux for development. GIT is an advanced version control system that was created by Linux Torvalds for maintain Linux kernel source tree. I'll show how git can be connected in Unix shell environment with make tool. Why GIT have been chosen to support our version control needs?
  • Failure in development server will not block our work
  • Access to history is very fast (it's stored on every working copy)
  • I used to work during flight, GIT supports all VCS operations off-line (commit, diff, etc.)
The hard thing about GIT is that it may confuse beginners. If you are one, remember:
  • Always use git diff HEAD instead of git diff
  • Always using git commit -a instead of git commit
Some of this gotchas have been addressed by some GIT interfaces, but they are deprecated now. In order to ease GIT (and earlier, CVS) usage we introduced additional Makefile targets that handle typical source control tasks.

Examining working copy.

make st shows current working copy state with all local branches and current selected branch:
    st:
        git branch
        git status -a | cat
(cat simply remove pager call). make di shows differences between last committed version and working copy:
    di:
        git diff HEAD
Locally stored history (few last commits) can be inspected by make ch:
    ch:
        git log --pretty=oneline -15 | cat
We are using topic branches strategy to develop software, so switching branch (make sb) is a very common operation:
    sb:
        @read -p "name of existing branch to switch to [a-z_0-9]+: "\
            branch_name;\
        git checkout $$branch_name;

Local commits

If we have some changes uncommitted in working copy we can commit them now:
    commit: di st
        git commit -a
Above command shows current changes to be committed (di) and state of a repository (st) then allows to enter comment.

Going remote

Lets synchronise our working copy with central repo (without locally unmodified changes): make sync:
    sync:
        -git pull $(REPO) +`git branch | awk '/^\*/ {print $$2}'`
        git push $(REPO) `git branch | awk '/^\*/ {print $$2}'`
It downloads current branch from central server (this operation can fail when no branch exists yet, so we ignore errors here by "-" character) and pushes un-synced commits made.

GIT rationale by Linus Torvalds

GIT is a very poverful Source Control Management System we are using for managing most our projects. Let's give a voice GIT's creator, Linux Torvalds:

Is "commit freeze" really required for software development?

1162385_yellow_iciclesDuring software development (especially done in agile way) there are often time when working software release must be prepared for customer evaluation of internal testing. I found many software release managers use a feature called "commit freeze": no one can commit to main branch of development (trunk/master) until release is packaged. I doubt if it is really required.

The possible reason for freezing commits:

Creating releases

If you want to make minor changes related to release and block any other (probably more risky) changes to be accidentially introduced you needn't freeze commits. The more efficient solution here is to fork a branch. On separate branch you can do any justification you need to build the binaries for release.

Merging

For time-consuming merges (especially when many conflicts are present) it's tempting to prevent commits on target branch to minimize problems related to local development during merged changes commit. I think merging person should perform frequent updates insted from current branch and match merged changes to current trunk state.

Switch version control software / repositories

Switching between version control system is a big change in development team. One has to learn new toolset to operate efficiently with new version control system. Postponing commits on old repository is not required. Those changes could be reapplied later by creating patch from missing changesets and applying them on new repository. Patch format is a standard that allow to move changesets beteen different repositories.

Summary

In my opinion temporary blocking commits (so called "commit freeze") is not a good idea. Agile methodology (the one we use at Aplikacja.info) requires frequent information sharing. There are alternatives that have lower impact on development and not get in the way for normal code flow.

Do not reformat whole files on commit, PLEASE!

What's the purpose of internal project documentation? To help people do their jobs. Developers need the knowledge to be distributed across the team, testers need definition of proper system behaviour, marketing needs information on product features to sell it.

Questions

Important knowledge that may be required by developers doing updates may be summarized in few sentences:

  • Who changed recently that line of code?
  • When this method have been changed?
  • Why algorithm works that way?

There's simple method of automatically saving and retrieving this kind of information: Subversion (or any other version control system). How?

Answers

There's nice feature of version control system that is not the most frequent used but is very useful: annotation/blame. This special view shows you for a file:

  • Who changed this line?
  • When this line was changed?
  • Revision number of commit => Log entry => Bug tracker task ID => rationale (Why)

After locating such information you may have better understanding of source code.

How to check annotation using different tools:

  • svn annotate filename
  • git annotate filename
  • bzr annotate filename
  • Eclipse: Team / Show Annotation

The problem

Looks simple, but there's a "quirk" here. If you are doing massive code changes (to enforce n+1th coding standard) you are overwriting original source code authors and information. Thus annotation (and log) becomes useless.

That's why I'm asking you:

Do not reformat whole files on commit, PLEASE!

Coloured GIT output

Coloured git output on console is a very helpful feature. It was enabled by default on git interface cogito. I liked it, but had to switch to raw git few years ago (cogito is deprecated now).

Fortunately current version of GIT supports that nice feature. It can be enabled with few settings in ~/.gitconfig file:

[diff]
color = true

[pager]
color = true

[status]
color = true

Additionally pager used (less in my case) should support ANSI colors (~/.bash_profile):

export LESS="-R"

And now diffs are rendered using colors that improve readability (anyone who doesn't review changesets before commit? ;-) ). Now diffs look much better on a X terminal:

Change "origin" of your GIT repository

GIT is a distributed version control system - that means it doesn't require to have any central repository. It's possible to build system by exchanging commits between equal nodes. It's convenient, however, to mark one repository as the central one. Of course you can change your decision at any time. I'll show you how to do that.

If you created your repo copy by "clone" operation you will have "origin" remote branch defined. This remote can be used to pull/push changes.

$ git remote -v
origin zeus.aplikacja.info:cust-proj1

If you decide to change this definition later you can issue the following commands:

$ git remote rm origin
$ git remote add origin git@github.com:aplikacjainfo/proj1.git
$ git config master.remote origin
$ git config master.merge refs/heads/master

After this change you can push your commits to new repository location (origin is selected as default remote branch for master, it's configured in .git/config):

$ git push

That's all. Much simpler than moving Subversion repository.

UPDATE 2011-12-09: replaced sed command with much simpler "git config" replacement.

How to push local GIT branch to a remote repository

GIT is a Distributed Version Control System (DVCS) that was "born" for Linux kernel development. It's not the easiest to use for novices, but it fast and allows to create advanced code sharing scenarios.

Sometimes you want to share locally created GIT branch with other developers (code review, collaborative development on branch, etc). You can do it easily by issuing a command:

$ git push origin local_branch_name

If you decide to drop remote branch just issue:

$ git push origin :local_branch_name

Happy GIT-ing!

Git: "pull --rebase" by default

GIT is a distributed version control system that allows to share codebase between developers. Born in Linux kernel world proved to be very useful for any programming task. "Distributed" means you can commit locally (during flight), syncing commits to some external repository is done in separate step (at the airport, waiting for luggage).

During watching my commit list using gitk I noticed many developers are accidentially merging their changes without so called "fast forward" (additional commit is created and the history is not linear). Why? The cause is that they are pulling changes from server AFTER local commit. A example commit tree taken from this worth-reading article:

The solution for this problem is to use "git pull --rebase" when downloading changes from repository. Existing local commits will be "rebased" (SHA-ids will change) and the history will be left linear. It looks much better:

"Rebasing" can be requested during pull by using this syntax:

git pull --rebase

I bet you will forget that after n-th commit ;-) That's why GIT allows to make rebase default option. Just do for every branch you have (including "master"):

git config branch.master.rebase true
git config branch.branch_10.rebase true
...

And tell GIT to setup such rule for every new branch:

git config branch.autosetuprebase always

Of course you can disable automatic "rebasing" when needed:

git pull --no-rebase

Software Releases Using GIT

Releasing Software is not just packing latest version to tarball and send to SFTP server. It requires preparation and some planning to be done properly. I'll describe release procedure I applied on one of my latest projects. Supporting version control system is GIT.

The aims for releasing procedure designed:

  • allow for testing window before release date
  • have the possibility for examine released version to test for reported bugs
  • possibility to manage existing releases (hot-fixing critical bugs)

Prepare Release Candidate Branch

We want to stabilize and test some snapshot of current development branch. That's why I'm forking few days before release RC branch from "master" and switch (it will be used for hot-fixing):

git branch RELEASE_0.2.1_branch
git push origin RELEASE_0.2.1_branch
git co RELEASE_0.2.1_branch

I'm marking RC state (RC=Release Candidate) to be able to see changes done for released version:

git tag RELEASE_0.2.1_RC
git push --tags

Anyone familiar with advanced CVS usage will see similarities to tagging for CVS merge purposes. GIT tracks merges automatically, however marking branch starting point is a good idea.

As you can see I created simple naming convention schema to manage releases.

Prepare Release

Time after creating RC branch and before releasing from this branch is the time for testers to do their job and sweep out as many defects as possible. Fixes are added directly on release branch (we will port them to master later).

When tests are done we are preparing release and tag current version by RELEASE_0.2.1

git commit -a -m "version number changed (#XYZ)"
git tag RELEASE_0.2.1
git push --tags

By this tag we will be able to inspect exact version that was sent to our clients.

Prepare hotfix

Sometimes out safety net composed of automated test suite and testing team fails and we have to fix errors reported from production. That is the purpose for release candidate branch. First, switch to correct branch that supports hot-fixed release:

git branch --track RELEASE_0.2.1_branch origin/RELEASE_0.2.1_branch
==OR==: git co RELEASE_0.2.1_branch; git pull origin

Tag current version that goes in this hotfix by RELEASE_0.2.1_hotfix_YYYYMMDD:

git commit -a -m "version number changed (#XYZ)"
git tag RELEASE_0.2.1_hotfix_YYYYMMDD
git push --tags

As you might noticed naming convention is based on branch name. Thanks to this convention we can answer the following questions:

  • what hot-fixes were prepared for release X?
  • what is the latest hotfix for release X?
  • what was delivered in latest hotifix of release X?
  • etc.

Porting back changes from branch to master

Sometimes changes made on branch will be useful for next releases. You can easily merge them back to master:

$ git co master
$ git merge RELEASE_0.2.1
$ git push

GIT tracks what have been merged already so you can merge/cherry-pick in both directions.

Typical Usage

What changes were included in latest hotfix compared to previous one:

 $ git diff RELEASE_0.2.1_hotfix_date1 RELEASE_0.2.1_hotfix_date2

What changes were added in new release:

$ git log  RELEASE_0.2..RELEASE_0.2.1
$ git diff RELEASE_0.2..RELEASE_0.2.1

"git cherry-pick" for Perforce

Cherry-picking is a technique of porting only selected commits from one branch to another. It's directly supported in GIT by special command:

git cherry-pick <SHA-COMMIT-ID>

Also SVN has simple merge mode that supports selecting of single commit:

svn merge -c <REV-NO> <URL>

What about Perforce? After checking Perforce documentation for merging I hit the following syntax for selecting subset of changelists to merge:

p4 integrate //depot/release/jam/...@30,@30 ...

That's it: add the same change list id twice (separated by comma) to source URL and you will get cherry-pick!

[2011-04-20] Update: looks like sometimes @CL,@CL will not merge change properly, it's better to use SVN-style revision range mode (@CL-1,@CL):

p4 integrate //depot/release/jam/...@29,@30 ...

Perforce has additional step to perform:

p4 resolve -af

that walks thru all changes from integrate step and allow to check for conflicts. Of course in the end you have to commit changes.

p4 submit

BTW. I (still) don't see any benefit from using Perforce over well-known SVN. Anyone?

GIT: importing remote branches

GIT is a fast version control system that handles branching very efficiently and allow for most operations to be done offline (is a distributed VCS). Of course sometimes you have to exchange code with external GIT repos (maybe central storage). Being distributed forces some design decisions: local/remote branches distrinction were introduced.

Remote branch in GIT is a head that tracks branch stored on server. You shouldn't update (commit) directly that branch. You can update local branches instead then use "push" to publish changes from local branch to remote. Remote branch can be used to answer the following questions:

  • is my local branch up to date regarding to server state?
  • what changes (diff/log) were added to my local branch and aren't submitted (push) yet?

In order to use properly remote branches you have to create maching local branch for every remote branch. Boring and error-prone task. Let's automate it:!

git branch -a | awk \
    '/RELEASE/ { sub("remotes/origin/", "", $1); \
    print "git branch --track " $1 " origin/" $1 }' | sh

Above command imports all branches that contain string "RELEASE" (I assume we may be interested in checking release status).

GIT: Automatic source code version information

Supporting releases: It's very important to know exact version used by customer. In order to reproduce the error you have to switch to codebase used for reported release and analyse the problem.

If you have fixed release cycles it's pretty easy to embed version number on "About" page and get that information with bug report from customer. wait, but what about "continuos delivery" practices (there may be many different software versions pulled between official releases)? And what about human mistakes (one can forget to release software without version updated)?

The answer is: automation. You have to embed branch/commit ID somewhere in application (to be visible for end user). Then you can point exact software version that was installed on this particular machine.

You can find below how we retrieve branch name / commit ID with GIT/C++ environment (our Makefile fragment):

git branch -v | sed 's/no branch/no_branch/' \
    | awk '/^\*/ { print "#define APP_VERSION \"" $$2 " " $$3 "\"" }' \
    > headers/version.h

version.h file is regenerated automatically on every build (and is not stored under GIT), so it will be always filled properly. Based on this automatically generated files you can show version information in UI or insert it into logs (it depends on your application type).

Perforce -> GIT import

I've been assigned recently a task to prepare development process for two teams that are working on separate version control systems (GIT and Perforce in my case). One of important parts of this task is to create effective method of syncing codebases between both storages.

Of course we have git-p4 tool, but my requirements are a bit complicated for this tool:

  • Only subset of whole GIT repository will be stored in P4
  • GIT repository already exists with some history (the same for P4)

so I decided to write small script that will do at least P4 -> GIT sync.

My first attempt was:

  • Sync GIT with main repository: git pull
  • Sync dir to latest sync point: p4 sync -f subdir/...@$CL1
  • Reload local changes from GIT: git reset --hard
  • make files acceptable by P4: find subdir -type f -print0 | xargs --null chmod u-w
  • Learn P4 about local GIT changes based on $CL1: p4 diff -se subdir/... | p4 -x - edit
  • Inspect local changes from P4 point of view: p4 diff -du subdir/...
  • Merge latest changes from P4 up to $CL2: p4 sync subdir/...@$CL2
  • Resolve potential conflicts: p4 resolve -af
  • make files acceptable by GIT: find subdir -type f -print0 | xargs --null chmod u+w
  • Add missing files: git add .
  • Build + tests
  • Upgrade GIT repo: git commit -am "subdir merged up to CL $CL2"

But I noticed that merges performed by P4 aren't fast nor accurate. A developer from my team suggested that it may be useful to import P4 in smaller chunks in order to allow do bisection if there's a bug in imported codebase.

Then I created simple script:

#!/bin/sh

CL1=$1
CL2=$2

p4 changes ...@$CL1,@$CL2 | sort -n | awk -v CL1=$CL1 '
BEGIN {
    print "p4 sync mw/...@" CL1
    print "p4 sync ui/...@" CL1
}

{
    CL=$2

    print "p4 sync mw/...@" CL
    print "p4 sync ui/...@" CL
    print "git add -A ."
    print "p4 changes -l  ...@" CL ",@" CL " | git commit -a -F -"
}
'

It creates mirror of subset of P4 history and creates GIT commit per every CL. After that import merges can be done inside GIT (fast with good algorithms).

GIT merge status

If you are merging/cherry-picking changes frequently between GIT branches it's very useful to know exactly what changes were already merged, what changes are waiting for merge and for wchich change there will be a conflict during merge.

This information should be available from "git log", but unfortunately I dif not get good results (even with --cherry-pick). Then some other solution must be prepared.

I decided to create a small script that will perform series of cherry-picks and prepare a report that shows integration status. Usage is pretty simple: $ git-merge-status SHA1..SHA2

Note that due to internal GIT commit storage source branch selection is not necessary because pair of commit IDs will point exactly the codebase. Sample run:

$ git-merge-status 32e5886..568f4c0
+ 32e5886 Dariusz Cieslak Version upgraded to 0.17
. e54772d T* S* (#2101) Preload two pages when switching pages
. ea76457 R* S* Implement trickplay icon (#2050)
. bb18d64 R* S* Force trickplay icon update (#2050)
. f45efab R* S* Fix zapbanner hours played (#2159)
. 73411c4 R* S* Inherit parent's VOD skin
. 7446a00 D* S* red light on front display while recording (#1502) (cherry picked from commit 6e049cc2f81947621b17105869e312b559164ce9)
. 983d4cc D* S* pvrservice: fromdos (only formatting!) (#2183) (cherry picked from commit db6de56171a90a9bd8eed681f4db7177819611a3)
. 37ed65f A* R. C* (#1711) TSTVEnableDelay key in system.properties (cherry picked from commit c094431ac17fc910a55163b8680c8fb81913dad5)
. 139f4a0 R* S* Display all movie icons in zapbanner
. 5230758 R* S* Fix VOD movie length display (#2139)
C b80a5fb Dariusz Cieslak Appman scripts placed in mw/config/etc/... (#1629)
. 1c8ff68 D* M* (#2186) Fixed qt signals in PLTVHelper
(...)

already merged (.): 49
to be merged (+): 13
conflict during merge(C): 26

Script body:

#!/bin/sh

git log --pretty=format:"%h %cn %s" --reverse $* | awk '
BEGIN {
    system("git reset --hard >/dev/null 2>&1")

    CMD="git log --decorate --pretty=oneline --abbrev-commit"
    while (CMD | getline result > 0) {
        history[substr(result, 0, 7)] = 1
        if (result ~ /cherry picked from commit/) {
            CL2 = $0
            sub(/.*commit /, "", CL2)
            CL2 = substr(CL2, 0, 7)
            if (CL2 in history) {
                revhistory[CL2] = 1
            }
        }
    }
    close(CMD)
}
{
    CID=$0
    gsub(/"/, " ", CID)

    CL=$3

    if (CL in revhistory) {
        # already merged in opposite direction
        print ". " CID
        PREVIOUSLY_MERGED ++
        next
    }

    if (/cherry picked from commit/) {
        CL2 = $0
        sub(/.*commit /, "", CL2)
        CL2 = substr(CL2, 0, 7)
        if (CL2 in history) {
            print ". " CID
            PREVIOUSLY_MERGED ++
            next
        }
    }

    CMD="git cherry-pick -x " $1 " 2>&1"
    buf = ""
    while (CMD | getline result > 0) {
        buf = buf result
    }
    close(CMD)

    if (buf ~ /files changed/) {
        # Just merged
        print "+ " CID
        JUST_MERGED ++
    }
    else if (buf ~ /Automatic cherry-pick failed/) {
        # Conflict, skip
        print "C " CID
        system("git reset --hard >/dev/null 2>&1")
        CONFLICT ++
    }
    else {
        # Already merged
        print ". " CID
        PREVIOUSLY_MERGED ++
    }

}
END {
    print ""
    print "already merged (.): " PREVIOUSLY_MERGED
    print "to be merged (+): " JUST_MERGED
    print "conflict during merge(C): " CONFLICT
}
'

Fixing invalid comment / branch name in GIT

Recently I was asked to help with fixing branch that had:

  • invalid name (wrong artifact number)
  • invalid comment inside (also based on wrong artifact number)

it was a mistake and programmer wanted to preserve changes done on branch, but using different name.

The solution I proposed was to:

  1. clone existing branch under different name
  2. "amend" last commit on this new branch (to fix comment)
  3. push new branch into correct location
  4. drop old branch

Sequence of GIT commands was:

$ git branch corect-name origin/invalid-name
(We saved desired SHA with new name)
$ git checkout corect-name
$ git commit --amend
(Fix comment of last commit in editor)
$ git push origin corect-name
(Fixed branch published on our default remote)
$ git push origin :invalid-name
(Weird GIT syntax for deleting remote branches)

Bazaar to GIT migration

Today I've moved using site-uptime.net development from Bazaar repository to GIT using elegant bzr2git script. The why:

  • In-place branches (I used to use them heavily)
  • Faster (no Python libs loading during "cold" start)
  • Can't live without "git rebase -i" now :-)

GIT hooks: commit-msg to enforce commit rules

Recently I forgot to add #reviewthis directive for modifications of codebase that belongs to team A. And a subtle bug was introduced that way. Ops! I agreed earlier that all changes done to moduleB should be passed to a reviewer that will do peer review for that particular change. What a shame :-( (We are using excellent GitHub's review mechanism, BTW).

How to avoid that situation in a future? Should I rely on my memory? Is it possible for a human to track so many small rules manually? My intuition tells me that enforcement of those small ruleset should be automated.

GIT allows you to specify so called "commit hooks" that can validate many stages of GIT workflow. I'll use simplest local verification of commit message, first the rule in plain text: If you are changing moduleB you should notify developerC about this change

The implementation (.git/hooks/commit-msg):

#!/bin/sh

if git st -s | grep -q moduleB; then
    grep -q '#reviewthis:.*developerC@xyz.com' $1 || {
        echo "Add #reviewthis.*developerC@xyz.com"
        echo "Commit aborted"
        false
    }
else
    true
fi

If you make a change in moduleB and developerC is not notified commit operation is aborted. You cannot forget (or else your commit will fail). If you want to make that rule repository-wide (in order to block everyone from skipping this rule) you should use different hook: pre-receive on a server.

GIT hooks are very powerful validation / automation mechanism that is making our work more efficient.

Git local stashes minimal browser

Stash mechanism (sometimes it's called "shelve" - in bazaar for example) is responsible for holding your local changes aside from remote branch. You can save your current work state, switch to different version and restore it later.

In order to quickly inspect list of local stashes content in your working copy you can use the following command: git stash list | awk -F: '{ print "\n\n\n\n"; print $0; print "\n\n"; system("git --no-pager stash show -p " $1); }' | less If you see a change should be deleted it's pretty easy: git stash drop "stash@{2}" To load any change onto working copy: git stash pop "stash@{2}" Pretty simple, but powerful tool.

Couring GIT output

Default git colour setup is not so good. On black terminal background it looks dark:

297

However, with small change in ~/.git/config file: [color "diff"] meta = yellow bold frag = magenta bold old = red bold new = green bold you would get much better diff display on your terminal:

298