Martin Pool's blog

Funding open source projects

Mark Shuttleworth, founder/CEO of Canonical, has a blog entry about the difficulties of funding open source projects. My take is that, in some respects, open source projects are no different from any other software project: many of them fail. In particular, projects will fail if they do not clearly focus on (in Mark's words) solving the unique problems first. For example, a version-control tool ought to have a friendly web interface, but this is not an essential or urgent problem. Solving it will not tell you whether you're on the right track or not. For a new design it is good to first tackle the problems which have the potential to falsify your model.

LCA2005 early-bird registration open

Preparations for linux.conf.au are proceeding apace. (For one thing, we got the delegation of the linux.conf.au domain name sorted out again.) We have also selected a program from many excellent submissions, secured a venue and accomodation for delegates and speakers, and are close to settling dinner and social venues.

Steeply-discounted early-bird registrations are open and are selling well. Jeremy wrote (we hope :-) a friendly and secure registration system.

We are just starting to accept media registrations. Some of the bigger names in Linux news and analysis will be there — and why not, with a chance to talk to so many prominent Linux and open source developers in one place?

Jeeves vs. Pooh

Fascinating short article by James Parker in the Boston Globe:

P.G. Wodehouse, creator of the ultimate literary butler, and A.A. Milne, creator of Winnie-the Pooh, started as friends in Edwardian London. But their falling out in 1941 revealed something essential about the men—and their lasting creations.

It doesn't matter what job you apply to and it never has

Interesting post from Heather Leigh at Microsoft on how recruiters try to find the right person from a stack of incoming resumes. She says that the particular position the person originally applied for is very unimportant compared to keyword matches, the general impression of their resume, etc.

Worth a read.

I guess the general lesson, as always, is to try to write in a way that will make sense to your audience.

Canon EOS 20D

I bought a Canon EOS 20D just before the holidays. It is really beautiful: a work of art that makes art.

I'll write more later, and pick a couple of good images to put up here. A few brief points:

Threads ignore the last 20 years of OS and hardware development

Thomas Smits:

The design of the Java Virtual Machine ignores the painful lessons operating system vendors have learned in the past 40 years. The concepts of processes, virtual memory management, and different protection modes for kernel and user code can be found in all modern operating systems. They focus on the question of isolation and therefore robustness: an application with errors cannot affect the other applications running in the system.

In contrast, Java follows the all-in-one-VM paradigm: everything is processed inside one virtual machine running in one operating system process. Inside the VM, parallelism is implemented using threads with no separation regarding memory or other resources. In this respect Java has not changed since its invention in the early nineties. The fact that Java was originally invented as a programming language for embedded devices may explain this approach.

Andrew Tridgell (via Tim Potter):

What is it about the word "thread" that people find so damn sexy? Maybe it needs a name change — "slow-as-hell-no-memory-protection-locks-dont-work" API might be suitable, but I suspect the standards committees wouldn't like that one.

The MMU was added to CPUs for a very good reason. Why is it so hard to understand that trying to avoid it is a bad idea?

Have you thought about the orders of magnitude here? With process switching on a modern CPU you basically have to swap one more register. That's one extra instruction. Modern CPUs have nanosecond cycle times.

Now, some CPUs also need to do an extra tlb flush or equivalent, but even that is cheap on all but the worst CPUs.

Compare this to the work that a file server has to do in responding to a packet. Say it's a SMBread of 4k. That is a 4k copy of memory. Memory is slow. Typically that SMBread will take tens of thousands of times longer than the context switch time.

But by saving that nanosecond you will make the read() system call slower! Why? Because in the kernel the file descriptor number needs to be mapped from a integer to a pointer to a structure. That means looking up a table. That table needs to be locked. If you have 100 threads doing this then they all lock the same structure, so you get contention, and suddenly your 16 cpu system is not scaling any more. With processes they lock different structures, so no contention, so better scaling.

This lock contention can be fixed with some really smart programming (like using RCU), and recently that has been done in Linux. That's one reason why Linux sucks less than other systems for threads.

Try thinking about this. How do threads do their IPC? They use the same system calls and mechanisms that are available to processes. The difference is that in a threads library these mechanisms are wrapped into a nice API that makes it convenient to do IPC really easily. You can do exactly the same types of IPC with processes, you just need to write a bit more code.

For many things, perhaps even for some file server applications, that extra convenience is worthwhile. Convenience means simpler code which means fewer bugs. So I'm not saying to never use threads, I'm just trying to kill this persistent meme that says threads are somehow faster. That's like believing in the tooth fairy.

Torrents of Samba source

Samba is a popular open-source project. Many people, happy with Samba, want to mirror the web site, both so that it is easily available from their network, and as a way of giving back to the project.

An unintended consequence is that the master server ends up sending a lot of traffic to small mirrors, which may well end up increasing the load on the original server they were trying to help. This is because people typically mirror the whole download directory, including old versions and binaries for obsolete and irrelevant platforms (SCO). (There are various small things we could do or have done to try to prevent this.)

A more interesting (partial) solution is to use BitTorrent, which is specifically designed for distributing large, popular files. There is now a .torrent file in the download directory; if you click it from a machine with bittorrent installed you'll get a copy of Samba from the p2p network.

This also gives a very nice way for people to contribute to the cost of distributing Samba: just leave the client running when it's finished downloading, and it will pass the source on to other people. If you already have a copy of the tarball by other means then you can put it in the bittorrent working directory and it will start serving immediately.

(This is the theory; let's see how well it works out.)

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May