ComfyChair

ALPHA RELEASE!

Authors:Martin Pool and Tim Potter
Version:0.2

Introduction

ComfyChair is a general framework for writing unit, regression, and system tests in Python.

ComfyChair is designed to be useful for testing not just code written in Python, but also native code. Python provides a flexible and simple language to perform test setup, execution, and evaluation. C code can be exposed to Python by small wrapper functions, or through the normal program interface.

ComfyChair presents a programming interface to make writing new tests straightforward. It also has a user interface for people running tests.

You could do all of this in another framework such as PyUnit, but I think ComfyChair works better, at least for the kind of projects that I work on.

ComfyChair releases can be downloaded from http://sourcefrog.net/projects/comfychair/releases/

The name is a Monty Python reference.

ComfyChair has been used it test suites for distcc and rdiff.

For a larger example of a comfychair test suite, see distcc's test suite:

http://samba.org/ftp/unpacked/distcc/test/testdistcc.py

If you want a more powerful (but more complex) test suite, you might look at QMTest.

Principles

The goals and principles of comfychair are:

  1. Simplicity. Tests should be easy to write, and easy to run. Running tests should require no more than a single command. comfychair can be shipped with any program by including a single Python file.
  2. Use Python to test C. C is too low-level, perl and sh are too flaky.
  3. The code under test runs in its own process. Write little harness programs to expose C functions or subsystems, controlled by arguments and writing to stdout. Most of the test intelligence is in Python, and the harness just acts as a way to reach the code under test. Harnesses can be run by themselves for exploratory testing or debugging.
  4. Tests should be completely automatic. No manual setup or user intervention is required. Sometimes this requires a bit of thought about how to make things automatically testable. That effort usually improves the program's quality.
  5. Common test code should be in base classes. In particular, many different tests might have some setup/teardown in common.
  6. Tests should clean up wherever possible.
  7. Tests should not spam the user with details, but should record everything that might help in debugging. Avoid needing to re-run tests to see what went wrong.

Terminology

The basic element of ComfyChair testing is a "test case". This is defined by a little bit of Python code in a class inherting from comfychair.TestCase. Each test case should exercise one particular aspect of the program.

TestCases are fragments of program code. Running the code produces a "test invocation" object, which is an instance of the class.

Test cases are grouped into suites of related tests. Suites are defined by Python modules (source files). Small programs (like distcc) might have only a single suite, and bigger projects like Samba may have several suites to cover different sets of functionality.

Running Tests

To run tests, just run the appropriate test suite as a Python program. For example:

$ ./tests/testdistcc.py

or, if it's not marked executable:

$ python2.2 ./tests/testdistcc.py

By default, all the test cases for this suite are run in the order specified by the author. However, you can also run particular tests, or change the order. To get a list of available test cases, use the '--list' option when running the suite. To run particular cases, just specify their names in order on the command line.

Tip

If there is a test that's failing intermittently (because of a timing issue, for example), then repeatedly running that test case can be a great way to recreate the failure. Just say something like this:

$ while ./test/testdistcc.py BadInclude_Case; do :; done

Writing Tests

Programmers can write test suites that interact with ComfyChair to test their code.

Each test that can be run is defined by a Python class. The framework instantiates one instance of this class when the test is run. (Test suites never need to construct test instances. This is only ever doneb by the framework.)

Tests consist of three phases: setup, runtest, and teardown. Setup and teardown are optional. The simplest test is:

class OnePlusOne(comfychair.TestCase):
    def runtest(self):
        self.assert_(1 + 1 == 2)

Tests indicate failure by raising a Python exception. Failure can occur in any stage. If setup fails then runtest is skipped, but the cleanup still runs.

It is quite common that if a setup or runtest fails then the teardown will fail as well: in this case you get an exception from both of them.

Functions should call add_cleanup at the moment of allocation of any resources that needs to be freed on completion. All of the cleanups are run in reverse order of allocation, even if some of them fail. One example of a resource that should be cleaned up is a process started during testing -- it should always be killed off.

(You can also use try/finally clauses, but using add_cleanup tends to work better with subclassed tests, where the base class may need to run some cleanups that the subclass does not know about.)

Each test module needs a little bit of standard glue to transfer control to the ComfyChair test runner when the module is started as a program. Put this at the end of your test suite:

if __name__ == '__main__':
  comfychair.main(tests)

Each module also needs to pass a list of available test classes:

tests = [OnePlusOne]

Tests are run in the order listed, so faster or simpler tests should be put first so that somebody running the tests gets feedback as soon as possible.

Test assertions

Any exception thrown by a test class method indicates a failure of (that phase of) the test. comfychair also provides some convenience functions to check particular conditions and raise exceptions with explanatory messages if they have failed.

assert_
Note the trailing underscore to distinguish this method from the builtin assert statement.

assert_equal

assert_re_match
Check that a string matches a regular expression.
assert_re_search
Check that a regular expression matches somewhere in a string.
assert_no_file
Check that a file does not exist.

Test prerequisites

Sometimes tests need to be skipped: they have neither passed nor failed but just not run. Tests can indicate this by raising the NotRunError exception. They can check for preconditions by calling TestCase.require. For the special case of tests that must be run by the superuser, call TestCase.require_root.

Running commands

runcmd(self, cmd, expectedResult = 0)
Run a command, and block until it completes. Fail if the exit code is not the expected value.
runcmd_background
Run a command as a background process and return its pid.
run_captured(self, cmd)
Run a command, capturing stdout and stderr. Returns a tuple of (waitstatus, stdout, stderr).

Subclassing tests

It is common to have several tests with some parts in common: perhaps they all need to do some common setup and cleanup such as starting a daemon, or perhaps they just need to use a common function to check their results.

For example, this base class cleans the environment to make test results more repeatable:

class SimpleDistCC_Case(comfychair.TestCase):
    '''Abstract base class for distcc tests'''
    def setup(self):
        self.stripEnvironment()

    def stripEnvironment(self):
        """Remove all DISTCC variables from the environment, so that
        the test is not affected by the development environment."""
        for key in os.environ.keys():
            if key[:7] == 'DISTCC_':
                # NOTE: This only works properly on Python 2.2: on
                # earlier versions, it does not call unsetenv() and so
                # subprocesses may get confused.
                del os.environ[key]
        os.environ['TMPDIR'] = self.tmpdir
        ddir = os.path.join(self.tmpdir, 'distccdir')
        os.mkdir(ddir)
        os.environ['DISTCC_DIR'] = ddir

Extra tests

Some test suites contain tests that should not run by default. They might require special prior setup, or be a security risk, or rely on software not installed on all machines. For these there is an extra_tests parameter to comfychair.main, which takes a list of test classes that are off by default but can be requested on the command line.

Configuring tests

Sometimes there is no way to avoid taking some configuration information from the user. For example, a suite that tests a remote machine might need to know that machine's IP address.

In the current release comfychair tests take this information from environment variables, through Python's os.environ dictionary.

Try to keep this information to a minimum. Often you can automatically detect settings, or arrange for them to be predictable.

Designing programs for testability

(please write this :-)

Writing test harnesses

For testing C functions I find useful to write a small wrapper program that exports just the relevant functionality in a way that is easy to call. Running each test in a single process makes it easy to see which test failed if it crashed, and gives good isolation between tests and between tests and the driver.

These harnesses should take arugments from argv and write to stdout. That fits in well with the comfychair framework and also allows the harness to be run by hand for exploratory testing or from within gdb.

There is a balance between how much intelligence is put into the harness and how much in the Python script. The right answer depends on the particular situation but I lean towards making the harnesses as simple as possible. I generally find it's better not to make the test evaluate its own correctness, but rather leave this to the Python script.

Testing C programs

If you have a program written in C, how will you write the test suite?

Programs written in C tend to have these properties:

Why use Python for testing C?

You can write it in C, but C can be rather slow to write and debug. Test suites are rarely performance-critical or need to access low-level libraries, and so C is a waste.

You can write your test driver in shell script. Automake contains a builtin driver. The original rdiff test driver was done in shell, and I think I wasted a lot of time on it. It's hard to write a portable shell script because the commands you want to use like head or seq vary between machines and the only way to find problems is to try. It is hard to do systematic exception handling in shell, and this is a problem for test suites whose whole purpose is to

You can use Tcl, as in the DejaGNU test suite. Tcl is considered a pretty crufty language these days. The main strength of Tcl for testing is the very cool Except system for driving interactive programs, but that is no longer an overwhelming factor. Many useful programs can be run in a non-tty mode: either reading stdin/writing stdout, or purely batch. If you do need to simulate a tty user, there are things like pexpect for Python.

You could use perl, but $0=~s(.*/)();#@#&!&*@*(&(*^@^@^@^@^@ NO CARRIER

Future Work