Skip to content

Entries tagged "python".

How to make SQLObject instances "picklable"?

I tried to store SQLObject instances in a session, but had problems with pickling them. Here is smart base class for model classes that allow to fast pickle / unpickle operation:
from sqlobject import *
import logging

class Base(SQLObject):

    def __getstate__(self):
        return self.id

    def __setstate__(self,id):
        obj = self.__class__.get(id)
        self.__dict__ = obj.__dict__
Just inherit from that class and that's all. Remember: restoring object from db is cheap: SQLObject caches instances in memory.

Automatically migrate SQL schema in SQLObject

When you're building an application you have to extends your model (and alter database schema that is placed under the model). I've found a method to automatically upgrade SQL schema (only ADD COLUMN operations for now):
from sqlobject import *
import logging

class Base(SQLObject):

    def migrate(self, *args, **kv):

        conn = self._connection
        schema = {}
        cols = self.sqlmeta.columns

        for a in conn.columnsFromSchema(self.sqlmeta.table, self):
            schema[a.name] = a
        logging.debug("existing schema: %s" % schema)

        for name, col in cols.iteritems():
            if name not in schema.keys():
                logging.info("add column: %s.%s" % (self.__class__.__name__, name))
                conn.addColumn(self.sqlmeta.table,col)

    migrate = classmethod(migrate)
migrate() class method only handles simplest case (adding a new column), but it may be sufficient in typical development process (more advanced transformations will still require SQL migration scripts written by hand). Enjoy!

Django 1.1 released

New version (1.1) of very popular web framework Django has been released. New version includes ORM improvements, better testing performance, conditional view processing and URL namespaces.

ORM Improvements

One of SQLObject benefits over Django ORM was in my opinion  possibility to make aggregations without issuing raw SQL-s. This feature is now present in Django allowing to make more efficient reports implementation.

Model improvements

A table can be marked now as "unmanaged": syncdb/reset will not touch such table. Also: proxy models and deferred fields now are present.

Testing improvements

Big optiomalisation for unit testing fans like me: tests are run in one transaction, so performance is much better than before. I don't use Django unit test infrastructure (selected nosetests for this task), but it's pleased to hear about this improvement. Hope it will make unit testing more popular among Django programmers.

django-logo-negative

Migrate Database Schema in Django

When your application grows your database must be extended together with application needs. Schema migrations could be a pain if done improperly. Let's review some methods to do schema migrations in your database.

Manual schema migrations, single database instance

This approach assumes you are using database administration tool to manually change your database schema to reflect application. Typically there's one single "master" database that has "official" schema and is used for tests and (possibly cloned version) for production. It's the way most novice programmers attack the problem. This approach is simple and strighforward, but does't have more benefits. Problems that may occure:
  • It's problematic to develop in paralell on branches: you need to change schema but other developers will see your changes and may break their code
  • schema migration is chaotic - it's very easy to forget add some field on production database

Migrations by series of SQL scripts, manual apply

Another, more advanced method, is to use series of SQL files that will ALTER database and optionally UPDATE some fields to reflect migration. You can order those scripts by prepending date and make all developers manually path their databases. Benefits:
  • Arbitrary schema modifications can be implementaed this way
Some problems to notice:
  • It's still very easy to forget applying some patches (or applying them in incorrect order) making your env broken
  • Additional work is needed to prepare scripts

Migrations by series of SQL scripts, applied automatically

A variation of above method. SQL migrations scripts are recorded (by name) after apply in special database table. They are applied automatically (system lists special upgrade/*.sql directory) and already applied patches are silently ignored. Benefits:
  • Automatic retest of whole sequence of patches (load old database dump and run upgrade)
  • Any compicated schema migrations could be implemented this way
  • Automatic order of execution
  • Exclude "double execution" problem (patch names recorded)
This solution is used by us and works perfectly for 6 years of continuous development (400 patches applied so far!).

Automatic schema migrations for ORM

ORM (Object Relational Mappers) have few possibilities to detect current database schema and compare it with existing model. Then migrations can be applied automatically on database to make it work with application. Examples: Using Django Evolution is very simple:
./manage.py evolve --execute --hint --noinput
Similar for Deseb:
./manage.py evolvedb --noinput

Which approach should I choose?

I suggest to start with Django Evolution (or others ORM equivalent) then if migrations become more advanced switch to registered, applied automatically SQL scripts. This way you will have speed at the initialisation of a project and later - more confidence with manually created SQL patches. 1

How To Migrate Django To Different Database Backend

Changing database location is simple - just launch dump on source database server, import it into destination database, redirect domain and voila! You can use this method to migrate your database into newer database engine version. But what can you do if you realize whole backend must be changed (i.e. from MySQL to PostgreSQL)?

Migrating SQL dump to different database dialect is not very easy (column types / dates formats as first examples come to mind). But you don't have to operate on SQL dumps. The simple answer here is: "dumpdata".

Django uses special manage.py script to manage typical operations like: initialisation of database, preloading data, dropping database etc. The command:

manage.py dumpdata appname

prints on stdout all data contained in appname in universal Json format. Then you can load dump just created by using:

manage.py sqlreset gabinet | psql ...
manage.py loaddata filename.json

Database state must be reset before import. That's why sqlreset is used. sqlreset alone prints DROP DATABASE statements on stdout allows to purge database from tables (if passed to SQL execution tool).

Additionally you can gzip JSON data created to make migration (much) faster:

manage.py dumpdata appname | gzip -c | ssh destinationserv 'cat > data.json.gz'
(login to destinationserv ...)
manage.py sqlreset appname | psql ...
gzip -dc data.json.gz | manage.py loaddata -

Happy migrating!

Python Web Framework Selection

I've been using many different Python Web Frameworks so far:

All frameworks have its strengths and weakness. For new project that will handle appointments using existing calendar I decided to give web2py a try, rationale:

  • all stuff included on board, no manual integration of 3rd party libraries
  • stable API
  • small and elegant
  • integrates with GAE (with subset of DB layer)
  • template selection separated from controller (easier unit testing)
  • easy template syntax (reuses Python code embedded into markup language)

After first phase of product I'll report if my expectations above were correct and what kind of problems were located (if any).

Static verificaiton tool for web2py templates

Web2Py is a full-stack Python web framework that can be compared to Django, but is easier to learn due to convention-over-explicit-statement preference. In this article I'll check how static verification techniques developed by me for many different environments (JSP, Django templates, TAL, ...) can be applied for web2py environment.

Static verification means locating simple bugs without running application thus very high (>95%) testing coverage (and high related cost) is not required. Instead with trying to cover by tests every possible screen/workflow/code line/... we can scan all codebase and search for some constraints. Most of them (based on my experience) are static - do not depend on runtime data thus can be effectively checked without running an application.

Here's short example of web2py template language:

<a href="{{=URL('books','show')}}">...</a>

As you can see web2py will substitute {{=<python-expression>}}s by evaluated result. In this example URL points to existing module controllers/books.py and function inside this module named 'show'. I assume you see the problem here: one can select undefined module / function and it will result in a runtime error.

First example is purely static: refence (URL('books','show')) will not change during runtime, neither the source code. Then our static checker might be applied succesfully: check if all URL's in all *.html files have proper functions defined in source code.

Technical solution can be composed to the following steps:

  • locating all resources to check: scanning given directory in filesystem tree
  • locating all interesting fragments in HTML file: I used regexp with arguments to easily extract interesting data
  • locating functions in *.py code: because controllers are not plain python modules (expects some data in global namespace) I decided just to scan them textually

Another check that can be done is references inside HTML files (to CSS resources, JS files, ...). This also can be automated:

<script type="text/javascript" src="svgcanvas.min.js"></script>

Source code refactorings might break your links/references and static scan might ensure you are not breaking application by refactoring.

Complete source code for URL() / SRC= / HREF= checker:

import sys

sys.path.append("web2py/applications")
sys.path.append("web2py")

import os
import copy
import re
import string
from gluon.globals import *
from gluon.http import *
from gluon.html import *

RE_SRC = re.compile(r'src *= *"([^"]*)"')
RE_HREF = re.compile(r'href *= *"([^"]*)"')
RE_URL = re.compile(r'{{=URL\(\'([^\']*)\'')
RE_URL2 = re.compile(r'{{=URL\(\'([^\']*)\' *, *\'([^\']*)\'')

FILENAME= None
FNR = 0

request = Request()

def report_error(s):

    print "%s:%d: %s" % (FILENAME, FNR, s)

def check_src_exists(arg):

    if arg[0] == "{":
        # variable value
        return

    elif arg[0].find("{{"):
        # skip this URL
        pass

    elif arg[0] == "/":
        # absolute file
        fullPath = "web2py/applications" + "/" + arg
        if not os.path.exists(fullPath):
            report_error("file %s doesn't exists" % fullPath)

    else:
        # relative file
        fullPath = os.path.dirname(FILENAME) + "/" + arg
        if not os.path.exists(fullPath):
            report_error("file %s doesn't exists" % fullPath)

def check_href_exists(arg):

    #print arg
    if arg.startswith("{{="):
        pass
    elif arg.find("{{") > 0:
        pass
    elif arg.startswith("/"):
        if not os.path.exists("web2py/applications" + arg):
            report_error("absolute file %s doesn't exists" % arg)
    elif arg.startswith("http://"):
        # external link, do not check
        pass
    elif arg.startswith("https://"):
        # external link, do not check
        pass
    elif arg.startswith("mailto:"):
        # external link, do not check
        pass
    elif arg.startswith("javascript:"):
        # external link, do not check
        pass
    elif arg.find("#") == 0:
        # anchor, skip
        pass
    else:
        fullPath = os.path.dirname(FILENAME) + "/" + arg
        if not os.path.exists(fullPath):
            report_error("relative file %s doesn't exists" % fullPath)

def templatePathToPythonPath(templatePath):

    return string.join(templatePath.replace("/views/", "/controllers/").split("/")[:-1], "/") + ".py"

def eq(got, expected):
    if got != expected:
        print "got:'%s' != expected:'%s'" % (got, expected)
        return False
    return True

assert eq(templatePathToPythonPath(
    "web2py/applications/ad/views/default/details.html"),
    "web2py/applications/ad/controllers/default.py")
assert eq(templatePathToPythonPath(
    "web2py/applications/ad/views/default/index.html"),
    "web2py/applications/ad/controllers/default.py")
assert eq(templatePathToPythonPath(
    "web2py/applications/examples/views/ajax_examples/index.html"),
    "web2py/applications/examples/controllers/ajax_examples.py")

name_to_contents = {}
def get_file_contents(fileName):

    global name_to_contents
    if not name_to_contents.has_key(fileName):
        if os.path.exists(fileName):
            f = file(fileName)
            name_to_contents[fileName] = f.read()
            f.close()
        else:
            report_error("Cannot load %s" % fileName)
            name_to_contents[fileName] = ""
    return name_to_contents[fileName]

def check_url_exists(url):

    if FILENAME.find("appadmin.html") > 0:
        return

    if url.find(".") > 0:
        functionName = url.split(".")[-1]
    else:
        functionName = url

    # print "check_url_exists(%s) moduleName=%s" % (url, moduleName)
    pythonFilePath = templatePathToPythonPath(FILENAME)

    if get_file_contents(pythonFilePath).find("def " + functionName + "()") < 0:
        report_error("cannot find %s in %s" % (functionName, pythonFilePath))

def check_url2_exists(moduleName, functionName):

    if moduleName == "static":
        return

    # print "check_url_exists(%s) moduleName=%s" % (url, moduleName)
    pythonFilePath = string.join(FILENAME.replace("/views/", "/controllers/").split("/")[:-1], "/") + "/" + moduleName + ".py"

    if get_file_contents(pythonFilePath).find("def " + functionName + "()") < 0:
        report_error("cannot find %s in %s" % (functionName, pythonFilePath))

def scan_file(path):

    global FILENAME
    global FNR

    FILENAME = path
    FNR = 0

    f = file(path)

    while 1:
        FNR += 1
        line = f.readline()
        if not line:
            break

        m = RE_SRC.search(line)
        if m:
            check_src_exists(m.group(1))

        m = RE_HREF.search(line)
        if m:
            check_href_exists(m.group(1))

        m = RE_URL2.search(line)
        if m:
            check_url2_exists(m.group(1), m.group(2))
        else:
            m = RE_URL.search(line)
            if m:
                check_url_exists(m.group(1))

    f.close()

def test_html(directory):

    for a in os.listdir(directory):
        if a == "epydoc":
            continue
        p = directory + "/" + a
        if os.path.isdir(p):
            test_html(p)
        if a.endswith(".html"):
            scan_file(p)

test_html("web2py/applications")

Web2py Lighttpd Deployment

Web2py is "full stack" Python Web Framework, Lighttpd is fast, multi-threaded HTTP server. I'll present a method to connect web2py-based application under lighttpd.

I assume the following setup is already done:

  • A domain named "myapp.com" is configured to point to your server
  • Python / lighttpd is already installed on server
  • Your web2py app is placed under /var/www/web2py
  • Your web2py app has application "myapproot" configured

First of all, you have to configure lighttpd to locate web2py application, create file /etc/lighttpd/conf-enabled/myapp.conf:

$HTTP["host"] =~ "(www\.)?myapp\.com" {
    server.indexfiles = ( "/myapproot" )
    server.document-root = "/var/www/myapp"
    server.dir-listing = "disable"
    fastcgi.server = (
        ".fcgi" => ("localhost" => (
            "check-local" => "disable",
            "min-procs" => "1",
            "max-procs" => "2",
            "socket" => "/tmp/myapp.sock")
        )
    )

    url.rewrite-once = (
        "^/$" => "/ad",
        "^(/.+?/static/.+)$" => "/applications$1",
        "(^|/.*)$" => "/fcgihandler.fcgi$1",
    )
    $HTTP["url"] !~ "^(/ad|/fcgihandler.fcgi|/applications/myapproot/static/)" {
        url.access-deny = ("")
    }
}

Explanation:

  • (www\.)?myapp\.com: regular expression to match domain with or without "www." prefix
  • server.indexfiles: specifies relative URL that should be called when only domain is given
  • server.document-root: specifies location of web2py app in filesystem
  • server.dir-listing: we do not want user to list our files using HTTP
  • fastcgi.server: specifies where socket file is located
  • url.rewrite-once: allow to use elegant (short) URLs
  • url.access-deny: files other than static directory should be forbidden (security)

Then you have to configure fcgihandler.fcgi script properly:

(...)
fcgi.WSGIServer(application, bindAddress='/tmp/myapp.sock').run()

Note that /tmp/myapp.sock must be the same as specified in lighttpd configuration.

Then you have to start the fcgihandler.fcgi proces and ensure it will start on every boot. That's all.

Migration to python subprocess module

After recent OS upgrade one of my unit tests started to fail (to be precise it started to hang). Quickly check showed me that CGI process started by os.popen() hanged. The old source code:

f = os.popen("./cgi_script.cgi > /dev/null", "w")
f.write(postBody)
f.flush()
f.close()

As os.popen() is deprecated now (I know, it's a very old codebase that started with Python 1.5) I've moved to new subprocess module:

fNull = file("/dev/null", "w")
p = subprocess.Popen("./cgi_script.cgi", shell=False, bufsize=1024, stdin = subprocess.PIPE, stdout = fNull)
fw = p.stdin
fw.write(postBody)
fw.flush()
fw.close()
del p

As you can see it's more verbose now but I've eliminated shell (slightly faster operation).

Some notes found during migration:

  • without "del p" process may be not terminated causing problems with DB state (CGI proces updates database and test checks this state later)
  • I/O configuration is more flexible than os.popen() - you can make pipes more easily