Martin Pool's blog

Threads ignore the last 20 years of OS and hardware development

Thomas Smits:

The design of the Java Virtual Machine ignores the painful lessons operating system vendors have learned in the past 40 years. The concepts of processes, virtual memory management, and different protection modes for kernel and user code can be found in all modern operating systems. They focus on the question of isolation and therefore robustness: an application with errors cannot affect the other applications running in the system.

In contrast, Java follows the all-in-one-VM paradigm: everything is processed inside one virtual machine running in one operating system process. Inside the VM, parallelism is implemented using threads with no separation regarding memory or other resources. In this respect Java has not changed since its invention in the early nineties. The fact that Java was originally invented as a programming language for embedded devices may explain this approach.

Andrew Tridgell (via Tim Potter):

What is it about the word "thread" that people find so damn sexy? Maybe it needs a name change — "slow-as-hell-no-memory-protection-locks-dont-work" API might be suitable, but I suspect the standards committees wouldn't like that one.

The MMU was added to CPUs for a very good reason. Why is it so hard to understand that trying to avoid it is a bad idea?

Have you thought about the orders of magnitude here? With process switching on a modern CPU you basically have to swap one more register. That's one extra instruction. Modern CPUs have nanosecond cycle times.

Now, some CPUs also need to do an extra tlb flush or equivalent, but even that is cheap on all but the worst CPUs.

Compare this to the work that a file server has to do in responding to a packet. Say it's a SMBread of 4k. That is a 4k copy of memory. Memory is slow. Typically that SMBread will take tens of thousands of times longer than the context switch time.

But by saving that nanosecond you will make the read() system call slower! Why? Because in the kernel the file descriptor number needs to be mapped from a integer to a pointer to a structure. That means looking up a table. That table needs to be locked. If you have 100 threads doing this then they all lock the same structure, so you get contention, and suddenly your 16 cpu system is not scaling any more. With processes they lock different structures, so no contention, so better scaling.

This lock contention can be fixed with some really smart programming (like using RCU), and recently that has been done in Linux. That's one reason why Linux sucks less than other systems for threads.

Try thinking about this. How do threads do their IPC? They use the same system calls and mechanisms that are available to processes. The difference is that in a threads library these mechanisms are wrapped into a nice API that makes it convenient to do IPC really easily. You can do exactly the same types of IPC with processes, you just need to write a bit more code.

For many things, perhaps even for some file server applications, that extra convenience is worthwhile. Convenience means simpler code which means fewer bugs. So I'm not saying to never use threads, I'm just trying to kill this persistent meme that says threads are somehow faster. That's like believing in the tooth fairy.

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May