Martin Pool's blog

Loss of a server in Arch and Darcs

I wrote a while ago on some things I think are less than perfect in Arch.

I think the one that bugs me most is that branches are bound to a particular location, rather than being purely distributed. (I use the word branch here for the comfort of a general audience; in Arch they would strictly be called versions which I think is a bit misleading.) I want to try to explain this a bit more.

The machine hosting sourcefrog.net crashed because of hardware problems the other week and was offline for a couple of days. I wanted to work on two projects which are hosted there, librsync and distcc. Because I am a version-control gourmet, distcc is in Arch and librsync is stored in Darcs.

Because sourcefrog is quite close to where I live, I normally work directly against its repository from Arch. I would have the choice of making downstream repositories on each machine I work on, but that would introduce a lot of "noise" merges every time I moved code from those machines onto sourcefrog. Since there's only one distcc branch, and I'm the only person who commits, I'd rather just work directly to that branch.

A consequence of this is that when sourcefrog is down, I can't commit or update at all. I am stuck.

Or almost stuck. In fact, I can cheat: make an archive on my laptop and a new branch in that archive, and commit from my working copy onto that branch. When the main machine is back up, I can merge from my branch back to sourcefrog.

This is pretty neat. I don't think I could easily do it in either Subversion or CVS. With those systems, I'd probably keep hacking and just make one big commit at the end. (Which is not really such a bad thing, but not ideal.) At best, I could keep snapshots of the tree at different and commit each one by hand as a separate patch.

On the other hand, what I did is not documented, and I'm not sure it's entirely kosher. It does require a certain amount of understanding Arch internals and fiddling to get the merge to work back. It is a testament to the elegance and flexibility of the Arch design that it's possible to use it in this unintended way.

By contrast in Darcs having your server go down makes no difference at all, except that you can't publish to that particular server. Because everything is always committed locally and then pushed up the natural way of working means there's little dependency on anything but the local machine. All of this doesn't leave any major permanent record, because revision names don't depend on the machine to which they were originally committed. With the server offline you can make changes, record them, roll them back, and make branches. If the machine's going to be down for a while you can start committing to a different server, or email your changesets to someone else.

You can do this in Arch but it's more natural in Darcs.

I think at the moment I would compare them like this:

Arch has a lot of structure and metadata to let you see the history of every changeset and to organize large trees. That might be good for very large projects. It's good for small projects, though the sheer complexity can be a disincentive.

Darcs is much simpler. I think you can show someone all they need to know in ten minutes. It's naturally very distributed. I rarely or never need to wait for network traffic.

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May