Version control on tiny machines
One issue that tends to pop up in regard to new version control
systems is: how can I use this for large projects on my tiny
machine?
Large projects
typically means something like the
Linux kernel, gcc or X. Tiny machines
typically means those
old enough to support only old disk connections, and therefore limited
to perhaps less than 1GB of disk.
(I occasionally get mail from people porting Linux or BSD to older architectures such as m68k, saying they find distcc very helpful in getting builds done in reasonable time.)
(I remember somebody from SCO posting on the Subversion list a while ago, saying that one of their kernels couldn't support disks larger than 8GB. Poor SCO, shambling zombie of the tech industry.)
The main problem for most of these people is that most (all?) new version control packages keep a pristine copy of the source on the client for quick reference. (At least Subversion, Arch and Darcs do this.)
The reason is that it allows checking for changes (svn diff) to be done locally. It's nice that this common operation is fast, and it also helps make merges and commit fast.
So almost all of these new systems will use more disk than CVS does, maybe two or three or four times as much. In addition to pristine files, they tend to have more metadata files than CVS. Arch often has an ID file for each source and pristine. Arch has a file for each change that has merged in the past. Darcs has the compressed diff for each change in the past, going back to a horizon.
It's not quite as bad as it seems: Arch and Darcs both have clever schemes for keeping hardlinks between multiple related trees. If you're actively developing a project and have several checkouts or branches, hardlinks can save a lot of disk, and also reduce memory pressure. Under some circumstances Arch can even use hardlinks between the pristine and working copies, so that you get essentially copy-on-write allocation of space.
In general trading off disk for reduced network usage is a good deal. Disk is cheap, getting cheaper, and usually easier to add. I just bought a 200GB disk with pocket money; this is enough for a thousand unpacked copies of the kernel source. Fast network access tends to be expensive, and not available at all in some places.
Nevertheless, if you're doing a port to an old hp9000, then you may not be able to get a new disk. What do you do then?
One good option can be to keep your working copies on NFS, mounted from a modern PC. Depending on the hardware, this might be faster than the local disk!
Assuming you don't want that, I think Quilt is probably the way to go. Disk overhead is very small indeed. You do need to tell it before you edit a file, at which point it makes a pristine copy. (This can be a pain if you ever forget and need to go back and fix it, but if you're still working on a 50MHz 68k I suppose you're patient and painstaking already.) I think this means disk usage is only proportional to the files touched in any particular changestet, which is about as good as one can reasonably expect.
Another idea is to use reiserfs if you can. It stores small files efficiently, which is particularly important for Arch, but useful for source trees in general. It also does not use a static inode table as in ext2fs or ufs, which frees up a lot of blocks for data.
posted Fri 25 Jun 2004 in /software/vc | link
Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May
Copyright (C) 1999-2007 Martin Pool.