ALPHA RELEASE!
| Authors: | Martin Pool and Tim Potter |
|---|---|
| Version: | 0.2 |
ComfyChair is a general framework for writing unit, regression, and system tests in Python.
ComfyChair is designed to be useful for testing not just code written in Python, but also native code. Python provides a flexible and simple language to perform test setup, execution, and evaluation. C code can be exposed to Python by small wrapper functions, or through the normal program interface.
ComfyChair presents a programming interface to make writing new tests straightforward. It also has a user interface for people running tests.
You could do all of this in another framework such as PyUnit, but I think ComfyChair works better, at least for the kind of projects that I work on.
ComfyChair releases can be downloaded from http://sourcefrog.net/projects/comfychair/releases/
The name is a Monty Python reference.
ComfyChair has been used it test suites for distcc and rdiff.
For a larger example of a comfychair test suite, see distcc's test suite:
http://samba.org/ftp/unpacked/distcc/test/testdistcc.py
If you want a more powerful (but more complex) test suite, you might look at QMTest.
The goals and principles of comfychair are:
The basic element of ComfyChair testing is a "test case". This is defined by a little bit of Python code in a class inherting from comfychair.TestCase. Each test case should exercise one particular aspect of the program.
TestCases are fragments of program code. Running the code produces a "test invocation" object, which is an instance of the class.
Test cases are grouped into suites of related tests. Suites are defined by Python modules (source files). Small programs (like distcc) might have only a single suite, and bigger projects like Samba may have several suites to cover different sets of functionality.
To run tests, just run the appropriate test suite as a Python program. For example:
$ ./tests/testdistcc.py
or, if it's not marked executable:
$ python2.2 ./tests/testdistcc.py
By default, all the test cases for this suite are run in the order specified by the author. However, you can also run particular tests, or change the order. To get a list of available test cases, use the '--list' option when running the suite. To run particular cases, just specify their names in order on the command line.
Tip
If there is a test that's failing intermittently (because of a timing issue, for example), then repeatedly running that test case can be a great way to recreate the failure. Just say something like this:
$ while ./test/testdistcc.py BadInclude_Case; do :; done
Programmers can write test suites that interact with ComfyChair to test their code.
Each test that can be run is defined by a Python class. The framework instantiates one instance of this class when the test is run. (Test suites never need to construct test instances. This is only ever doneb by the framework.)
Tests consist of three phases: setup, runtest, and teardown. Setup and teardown are optional. The simplest test is:
class OnePlusOne(comfychair.TestCase):
def runtest(self):
self.assert_(1 + 1 == 2)
Tests indicate failure by raising a Python exception. Failure can occur in any stage. If setup fails then runtest is skipped, but the cleanup still runs.
It is quite common that if a setup or runtest fails then the teardown will fail as well: in this case you get an exception from both of them.
Functions should call add_cleanup at the moment of allocation of any resources that needs to be freed on completion. All of the cleanups are run in reverse order of allocation, even if some of them fail. One example of a resource that should be cleaned up is a process started during testing -- it should always be killed off.
(You can also use try/finally clauses, but using add_cleanup tends to work better with subclassed tests, where the base class may need to run some cleanups that the subclass does not know about.)
Each test module needs a little bit of standard glue to transfer control to the ComfyChair test runner when the module is started as a program. Put this at the end of your test suite:
if __name__ == '__main__': comfychair.main(tests)
Each module also needs to pass a list of available test classes:
tests = [OnePlusOne]
Tests are run in the order listed, so faster or simpler tests should be put first so that somebody running the tests gets feedback as soon as possible.
Any exception thrown by a test class method indicates a failure of (that phase of) the test. comfychair also provides some convenience functions to check particular conditions and raise exceptions with explanatory messages if they have failed.
assert_equal
Sometimes tests need to be skipped: they have neither passed nor failed but just not run. Tests can indicate this by raising the NotRunError exception. They can check for preconditions by calling TestCase.require. For the special case of tests that must be run by the superuser, call TestCase.require_root.
It is common to have several tests with some parts in common: perhaps they all need to do some common setup and cleanup such as starting a daemon, or perhaps they just need to use a common function to check their results.
For example, this base class cleans the environment to make test results more repeatable:
class SimpleDistCC_Case(comfychair.TestCase):
'''Abstract base class for distcc tests'''
def setup(self):
self.stripEnvironment()
def stripEnvironment(self):
"""Remove all DISTCC variables from the environment, so that
the test is not affected by the development environment."""
for key in os.environ.keys():
if key[:7] == 'DISTCC_':
# NOTE: This only works properly on Python 2.2: on
# earlier versions, it does not call unsetenv() and so
# subprocesses may get confused.
del os.environ[key]
os.environ['TMPDIR'] = self.tmpdir
ddir = os.path.join(self.tmpdir, 'distccdir')
os.mkdir(ddir)
os.environ['DISTCC_DIR'] = ddir
Some test suites contain tests that should not run by default. They might require special prior setup, or be a security risk, or rely on software not installed on all machines. For these there is an extra_tests parameter to comfychair.main, which takes a list of test classes that are off by default but can be requested on the command line.
Sometimes there is no way to avoid taking some configuration information from the user. For example, a suite that tests a remote machine might need to know that machine's IP address.
In the current release comfychair tests take this information from environment variables, through Python's os.environ dictionary.
Try to keep this information to a minimum. Often you can automatically detect settings, or arrange for them to be predictable.
(please write this :-)
For testing C functions I find useful to write a small wrapper program that exports just the relevant functionality in a way that is easy to call. Running each test in a single process makes it easy to see which test failed if it crashed, and gives good isolation between tests and between tests and the driver.
These harnesses should take arugments from argv and write to stdout. That fits in well with the comfychair framework and also allows the harness to be run by hand for exploratory testing or from within gdb.
There is a balance between how much intelligence is put into the harness and how much in the Python script. The right answer depends on the particular situation but I lean towards making the harnesses as simple as possible. I generally find it's better not to make the test evaluate its own correctness, but rather leave this to the Python script.
If you have a program written in C, how will you write the test suite?
Programs written in C tend to have these properties:
You can write it in C, but C can be rather slow to write and debug. Test suites are rarely performance-critical or need to access low-level libraries, and so C is a waste.
You can write your test driver in shell script. Automake contains a builtin driver. The original rdiff test driver was done in shell, and I think I wasted a lot of time on it. It's hard to write a portable shell script because the commands you want to use like head or seq vary between machines and the only way to find problems is to try. It is hard to do systematic exception handling in shell, and this is a problem for test suites whose whole purpose is to
You can use Tcl, as in the DejaGNU test suite. Tcl is considered a pretty crufty language these days. The main strength of Tcl for testing is the very cool Except system for driving interactive programs, but that is no longer an overwhelming factor. Many useful programs can be run in a non-tty mode: either reading stdin/writing stdout, or purely batch. If you do need to simulate a tty user, there are things like pexpect for Python.
You could use perl, but $0=~s(.*/)();#@#&!&*@*(&(*^@^@^@^@^@ NO CARRIER
What information should be logged and where? Writing to a file is an attractive because even if the test framework is interrupted (or even crashes) then the information is still available. It's also easier for somebody reporting a failure to the developer to capture all the files in that directory.
Some kind of progress indicator for long-running tests?
More standard helper functions?
Perhaps need to allow for running whole sets of tests with different parameters. For example, many Samba string-handling tests should be run in different character sets to get good coverage.
Better design patterns for writings tests are needed. Deeply nested class hierarchies are not easy to read and understand.
Test Variations:
Perhaps we need some kind of mixin-based system. We might want to run tests repeatedly under different conditions. For example for distcc it might make sense to run most tests both with compression turned on and with it off. Or perhaps all the Samba tests should be run with different character sets? Perhaps this should be done by just subclassing the particular test cases? But if the parameter applies to many or all of the test cases that's not so attractive, because it needlessly multiplies the number of tests.
Put each test into it's own temporary directory?
Perhaps also set TMPDIR and similar things for that test?
Have some means for running all invoked commands under Valgrind or similar checker programs. (Perhaps a coverage/profiler would be interesting?). This might be tricky if the command that's being run is actually a complex shell command rather than just a simple invocation.
Output as a script the list of shell commands run by the program so that it can be traced through by a human. Probably actually not very practical because the Python script might also be doing important work.
Need a system for turning on run-time options like using Valgrind. Perhaps a --with-valgrind command-line option, or perhaps environment variables or something else?
Do we need (like autoconf) for the top-level system not to check options, because they can be interpreted by particular tests. Or can we perhaps register available options, so that there can be some checking against typos.
Cleanup does not work very well: an error in cleanup in a subclass can cause the parent cleanups not to be run. The problem with this is that they might for example leave a daemon running, or leave the process in a subdirectory, which can cause trouble with later tests.
Really we want something like Python's exception system, where there can be several nested cleanups. If an earlier one fails, the later ones are still run.
How can this be represented in the case of subclassed tests?
Perhaps allow different postmortem debuggers to be attached, with --post-mortem=foo. Are there any good graphical ones that can be invoked this way?
What's the best way to invoke gdb on failed programs? We don't want to launch it on every one; that would be slow. Or can we avoid gdb loading symbols at startup?
Automatically call into C functions without needing to write specific wrappers. Perhaps do dlopen() or something similar. I don't know if this would be useful for nontrivial cases because building the data values to pass in could be quite hard.