Martin Pool's blog

arch rocks: mirroring (updated)

There are plenty of good free software developers in the world who don't have big machines on good pipes where they can put their CVS server or downloads. People outside of the US often don't realize just how slow the combination of a modem and intercontinental latency can be. A former Prime Minister called Australia the "arse end of the world" for a reason.

Anyhow, generally what has happened until now is that these developers either find a bigger project like gnome.org or samba.org to host them, or they sign up for something like Sourceforge. Now Sourceforge is a pretty valuable thing, but it has been patchy recently. If that's where your CVS is hosted, you don't have much choice but to just not commit while it's offline.

Another drawback is that if you put CVS on sourceforge, then every time you diff or commit it needs to go all the way to California and back. This is pretty slow. When I did this over a modem, it would take a good fraction of a minute just to diff a reasonably small tree. It is annoying. It grinds you down.

What I really wanted was to have my working repository close by: either on my own disk, or at least in the same city. At the same time, I wanted my public tree to be on a fast machine on a fat pipe.

I suppose I could have kludged it up using cvsup or rsync but they're not completely satisfying solutions.

Finally, GNU Arch solves this, in a truly elegant way. Anyone can mirror a public archive. ("Archive" in Arch ~= "repository", it holds the history of changes.) In fact, several sites such as sourcecontrol.net have set up to just mirror all the open source software they can find.

(If you want to follow another developer's work closely, you can mirror their archive onto your own machine, and their entire history is available for quick consultation, even when you're offline. Conversely, and unlike Bitkeeper, you are not *required* to keep their full history if you don't want it. If you merely want to download their most recent tree, or the patches to update to the most recent tree, that's what you get.)

Other people mirror just intermittently, as a backup in case a primary archive is lost. Even the humblest programmer can now adopt Linus's backup technique: write good software, and the world will do your backups for you!

What's more, because changesets are strongly GPG-signed, people using the archive can feel sure that they're getting the changes as the original author wrote them, without any accidental or intentional modifications.

Microsoft wrote a while ago that free software development scales up to the size of the internet better than Microsoft's own processes. Arch removes the scalability bottleneck of a single CVS server.

This is a really cool thing.

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May