linux.conf.au 2005 is in my home town of Canberra, so I have got pulled into helping organize it.
I was feeling really tired on Monday after helping Stephane print and
bind her thesis (go Steph!) so I spent some easy time adding some
content to the linux.conf.au
web site. PHP tells me
only 41 weeks to go!
Stephane likes my cooking. That makes me proud:
Martin made a great dinner. He was trying to duplicate a beef and scallop hot pot with black pepper as served at noted Canberra restaurant The Chairman and Yip. However, I think his version was even better. He added some mushrooms and green peppers. I thought it was pretty ballsy that he just broke up the mushrooms with his hands over the pot, rather than chopping them up with a knife. I liked the irregular shapes. He also made a spicy mashed pumpkin side dish with peanuts in it that I was really excited about.
I find when using Darcs it is nice to set up these shell aliases:
alias what='darcs what' record='darcs record'
record my changes, like
What is short for
what changed?, in other words show a
pseudo-diff with adds, removes, and so on.
But mostly I just like shouting what? what? at my shell, like a crotchety deaf old man.
Python is growing, but not towards Lisp. As Python becomes more popular, I expect advocates of other languages will try to claim it as a descendant of theirs (call it "Alexander Graham Bell is Canadian" syndrome). Python is really a little Lisp, says Graham. A Haskell programmer could claim that Python is really a little Haskell, thanks to its support for List Comprehensions and some lazy features. An Icon programmer could claim that Python is getting to be more like Icon with the addition of lazy generators. A Smalltalk programmer would recognize metaclasses, the unit testing features and probably the new method resolution order. The warning, logging and exception handling infrastructures are probably closest to Java. In ten years, Python will probably have stolen more ideas from these languages (including Common Lisp) and they may even have stolen some back. But if you expect Python to grow towards any particular one of these, you'll be waiting a long, long time.
Programmers do not like deeply nested expressions. They like a language that encourages a style where expression results are assigned names. A statement/expression distinction encourages (and in some cases requires) that. A symbol type is not a bad idea but the marginal gain over interned strings is minimal. And the Lisp S-expression notation has been loudly and explicitly rejected over the last half century.
I like Python. I lovehate lisp. Should I learn Haskell?
The other day I heard someone make these two arguments, one after the other:
1. Red Hat is just mooching off the community's work.
2. Red Hat is wasting their money by giving work away to the community.
(OK, they are not *quite* contradictory, if you assume it is different communities that are respectively contributing and benefiting. But it's awfully close, and this person at least didn't notice it.)
One issue that tends to pop up in regard to new version control
how can I use this for large projects on my tiny
Large projects typically means something like the
Linux kernel, gcc or X.
Tiny machines typically means those
old enough to support only old disk connections, and therefore limited
to perhaps less than 1GB of disk.
(I remember somebody from SCO posting on the Subversion list a while ago, saying that one of their kernels couldn't support disks larger than 8GB. Poor SCO, shambling zombie of the tech industry.)
The main problem for most of these people is that most (all?) new version control packages keep a pristine copy of the source on the client for quick reference. (At least Subversion, Arch and Darcs do this.)
The reason is that it allows checking for changes (svn diff) to be done locally. It's nice that this common operation is fast, and it also helps make merges and commit fast.
So almost all of these new systems will use more disk than CVS does, maybe two or three or four times as much. In addition to pristine files, they tend to have more metadata files than CVS. Arch often has an ID file for each source and pristine. Arch has a file for each change that has merged in the past. Darcs has the compressed diff for each change in the past, going back to a horizon.
It's not quite as bad as it seems: Arch and Darcs both have clever schemes for keeping hardlinks between multiple related trees. If you're actively developing a project and have several checkouts or branches, hardlinks can save a lot of disk, and also reduce memory pressure. Under some circumstances Arch can even use hardlinks between the pristine and working copies, so that you get essentially copy-on-write allocation of space.
In general trading off disk for reduced network usage is a good deal. Disk is cheap, getting cheaper, and usually easier to add. I just bought a 200GB disk with pocket money; this is enough for a thousand unpacked copies of the kernel source. Fast network access tends to be expensive, and not available at all in some places.
Nevertheless, if you're doing a port to an old hp9000, then you may not be able to get a new disk. What do you do then?
One good option can be to keep your working copies on NFS, mounted from a modern PC. Depending on the hardware, this might be faster than the local disk!
Assuming you don't want that, I think Quilt is probably the way to go. Disk overhead is very small indeed. You do need to tell it before you edit a file, at which point it makes a pristine copy. (This can be a pain if you ever forget and need to go back and fix it, but if you're still working on a 50MHz 68k I suppose you're patient and painstaking already.) I think this means disk usage is only proportional to the files touched in any particular changestet, which is about as good as one can reasonably expect.
Another idea is to use reiserfs if you can. It stores small files efficiently, which is particularly important for Arch, but useful for source trees in general. It also does not use a static inode table as in ext2fs or ufs, which frees up a lot of blocks for data.
I don’t really think that view’s helpful: colours that you can’t actually see don’t make things easier to reason about; and while sometimes you have to come up with terms to describe things because there’s no more meaningful way to look at things, this isn’t one of those cases.
colo[u]r is reasonably well established in science for
things that you can't actually see: consider
Part of the charm of Matthew's essay is that it maps copyright into
a concept that is both strange but familiar to computer scientists:
suppose there are colors you can't see.
magic in a similar sense, without implying belief in
Matthew says, correctly(?) that you can't determine whether a particular bit string is copyrighted just by examining the bits. AJ thinks
It’s not irrecoverable though – there’s no reason why you can’t just provide the software with all the information it actually needs: working out who the current copyright holder is could be made as easy as querying the Library of Congress’s website, or some similar body, governmental or private as appropriate. As long as you have the information your function actually needs, determining the copyright status of some bits is straightforward.
This is a decent practical approximation but not actually true: it's possible that you could have independently recreated the bits without copying them. Checking whether the string was previously registered for copyright doesn't imply the string was actually copied. Conversely, the fact that a string is not registered with the Library of Congress doesn't mean it is not copyrighted.
So this is to say: we can have an external lookup table which, given a string of bits, indicates what colour they are likely to have. But it will give false positives (independently recreated) and negatives (copyrighted but not registered).
Of course, as we see on Mediawatch, for nontrivial strings the chances that a string would be spontaneously reinvented become low. All this says though, is that there are some domains where the heuristic is accurate. Copyright is still not a function of the bits, nor even a function of the bits and the LoC.
The colour of copyright persists on bits across arbitrary transformations: consider human translation into a different language. AJ's oracle could not detect the colour, but the law could. It would similarly fail on the XOR-pad thought experiment Matthew describes.
I think AJ demonstrates Matthew is right: even computer scientists who know a lot about IP will get mixed up as long as they think of copyright as an attribute of bitstrings.
While looking up something about CGI variables, Google returned this page from NCSA. I remember reading it in about 1996 when trying to write web applications for the first time. It still has an mtime of 11 June 1995. That's great.
It has been said that for the inexperienced pilot, in the case of an engine failure, the second engine is there to get you to the site of the crash.
Seth also has a beautiful passage from Lincoln:
Both [Union and Confederacy] read the same Bible and pray to the same God, and each invokes His aid against the other. It may seem strange that any men should dare to ask a just God's assistance in wringing their bread from the sweat of other men's faces, but let us judge not, that we be not judged. The prayers of both could not be answered. That of neither has been answered fully. The Almighty has His own purposes. "Woe unto the world because of offenses; for it must needs be that offenses come, but woe to that man by whom the offense cometh." If we shall suppose that American slavery is one of those offenses which, in the providence of God, must needs come, but which, having continued through His appointed time, He now wills to remove, and that He gives to both North and South this terrible war as the woe due to those by whom the offense came, shall we discern therein any departure from those divine attributes which the believers in a living God always ascribe to Him? Fondly do we hope, fervently do we pray, that this mighty scourge of war may speedily pass away. Yet, if God wills that it continue until all the wealth piled by the bondsman's two hundred and fifty years of unrequited toil shall be sunk, and until every drop of blood drawn with the lash shall be paid by another drawn with the sword, as was said three thousand years ago, so still it must be said "the judgments of the Lord are true and righteous altogether."
If I know you and you'd like to catch up, mail me.
Cory Doctorow gave a great talk about DRM, which is now available in wiki form. (It's published using MoinMoin, a descendent of my pikipiki code. Truly we all stand on each others shoulders.) Appetizers:
Here's the social reason that DRM fails: keeping an honest user honest is like keeping a tall user tall.
anticirumvention lets rightsholders invent new and exciting copyrights for themselves -- to write private laws without accountability or deliberation -- that expropriate your interest in your physical property to their favor. Region-coded DVDs are an example of this: there's no copyright here or in anywhere I know of that says that an author should be able to control where you enjoy her creative works, once you've paid for them. I can buy a book and throw it in my bag and take it anywhere from Toronto to Timbuktu, and read it wherever I am: I can even buy books in America and bring them to the UK, where the author may have an exclusive distribution deal with a local publisher who sells them for double the US shelf-price. When I'm done with it, I can sell it on or give it away in the UK. Copyright lawyers call this "First Sale," but it may be simpler to think of it as "Capitalism."
it's funny because it's funny.
I hadn't realized quite how simple darcs is, but Peter Maxwell did:
In the last couple of weeks I've unexpectedly crossed paths with darcs twice, once at the Canberra LUG (where Martin Pool talked about several new VC systems, but darcs was the only one simple enough to be demonstrated in the limited time...) and once googling for the obscure PloneConference code.
I think I can teach someone how to get started with Darcs in less time than with any other system. That doesn't prove much about the long-term usability, but it is an interesting data point.
I gave a talk about new version-control systems the other week at our
LUG. Tridge challenged me
ok, so what's wrong with Arch?
I think it's important to see the bad points in whatever you're
adovcating. Distributed version control is pretty new, even the
stable ones are themselves an experiment. The differences between
competing systems are not just accidents of implementation, but also
fundamentally different ideas about what software version control
means, and how it should be done.
So: what's wrong with Arch? I like it quite a lot, but I'm going to put that aside and, just for this article, look for problems.
There is an elegant underlying simplicity to Arch, but it is expressed in a complex way: there are many commands printed by tla --help and that can confuse the novice user. It's actually possible to get by with a reasonably small subset, but the tutorial does not make that very clear.
Many of the commands expose lower-level operations that might be
useful in writing scripts or fixing problems. For example,
lets you tell arch
pretend I've merged these patches, without actually merging the text,
a little cheat which can be useful in resolving some merges. Exposing atomic
operations is an admirable goal; more programs should do it. But perhaps splitting them
out into separate programs would make it easier to understand.
I think to some extent this is driven by Tom's expressed desire for
Arch to become a platform for consulting work, rather than
primarily something people can just install and use. (Perhaps
the project is moving back from that position now.)
Perhaps this will make Arch a more desirable option for larger projects which want to do more complex operations.
It seems bizarre that despite all these commands there are some glaring gaps. For example, there is no single command to revert a file to its previous state. It is suggested instead that one get the diff and apply it through patch --reverse, or that one copy it from the pristine previous version. Both of these work, certainly, and they can be scripted, but it's puzzling that they're not built in.
Another gaping hole is that there is no command to find just the
changesets that touched a particular file. Accomodating renames makes
this slightly harder than in CVS, but only very slightly, since the
file has a persistent ID. I often do svn log CPU.cpp, but on
Arch I have to do without. Darcs can do this too, with
On the other hand there is an excellent multi-level tla undo, which saves the removed changes so that they can be put back with tla redo if you change your mind.
In general Arch is prone to "there's more than one way to do it", which can be both good and bad. For example it handles renamed files very well, by associating a file id that remains constant for the life of the file even if it is renamed. This allows Arch to correctly merge changes across renames, something notably lacking (last I looked) from Subversion. File ids are a a fine and elegant design. However, the implementation is complex and confusing: the id can be stored in an external file, can be derived from the name, or can be stored in the file in either of two different syntaxes. You can mix these methods within a single tree, and can customize to some extent the rules on which one is used. I guess you can make a case for any particular case being useful, but the end result is complex and hard to get to understand. Choosing only one method might not have hurt too much, and might have simplified the system.
Another area where Arch can be criticized for too much choice is in handling non-versioned files. Most vc systems have to accomodate files which exist in the source directory but that should not be versioned. The classic example is *.o files. CVS handles this fairly with a list of patterns in .cvsignore. Fine.
Arch allows you to classify files using regexps into Source, Junk,
Precious and Backup. Each class is treated slightly differently, but
personally I am never sure if my .o files are more accurately
Junk or Precious. I suppose there are cases where the distinction
would be useful, but again I wonder if it would not have been simpler
to simply follow CVS in saying
*.o is ignored. Leave
it up to the user to decide which files ought to be automatically
deleted and when. Being able to customize it to have simple behaviour is not as good as just being simple.
Some people think it uses too much disk: in some configurations you will have four inodes per source file. (Source, it's id, pristine source and pristine source id.) This is pretty much constrained to people working on very large trees on very old hardware, and I don't think it is a general argument against arch. In arch's favour, it can intelligently manage hardlinked trees so that additional working copies are very cheap.
To share your source, Arch depends on having a read-only web server. This is an enormous advance over CVS, which requires a special cvs pserver. On the other hand, it is substantially harder than the current stanadrd method of mailing a patch. I asked a while ago if this could be added, and despite some confusion about how it would be done it looks like it might go in eventually. Darcs has this already, which I count as a major feature.
Arch has a bit of a fetish about long names:
one regularly has to type identifiers like
This would be less painful if it were possible to use
more often: if I type an incomplete name it could be interpreted relative to
wherever I'm standing at the moment. Unfortunately common operations like
merging between a local and remote archive require giving a full name.
(OK, it's not all that bad if you can copy&paste,
or go back through command history.
But it's a bit gross that it is necessary.)
By contrast, Darcs has barely any naming at all: branches are
filesystem directories, and identified by their directory name
(and hostname, if remote.) You can arrange directories in whatever
organization most makes sense to you, and of course give
darcs pull ../upstream to move
Finally I have one issue which I think has not been mentioned before, which is a kind of meaning/mechanism mismatch in the way distributed operation works. Arch has excellent support for maintaining and merging between multiple branches. It also has good disconnected support: I can take my laptop to a desert island for a month, hack away, and come back and import all my changes, along with their history. Importantly I can also integrate those changes with whatever has happened while I've been away. So far, so wonderful.
The way I set up to do work on my laptop is to create a new branch, stored in an archive on my laptop. Suppose the main branch is firstname.lastname@example.org/foo--main--0, and on my laptop jolly I have email@example.com/foo--main--0. This is pretty clean: I can commit to the branch stored on my laptop when I'm offline, and I can merge back into the main branch when I'm online.
The problem is that this mixes mechanism with meaning. I don't want changes done on my laptop to look any different from those done online. I want to only create different branches for different streams of development, nof for changes that happen to occur on disconnected machines.
Once the changes have been merged upstream you can still see what was
done, but only indirectly: all the individual commits get wrapped up
in a single change called something like
merge from jolly,
unless I manually go through and commit them.
I think this is a bit of a problem. I like the ability to zip up changes from a downstream branch when applying them as a single unit to an upstream branch. But I want to be able to do disconnected work completely orthogonal to which branch I'm working on, and without needing to create new branches.
wraps up commits into larger commits, as far as I
can tell. All of my commits, once merged upstream, appear as part of
the same branch because it doesn't really remember which branch a
change was originally made on. That solves the immediate problem.
But it does seem like in some projects you really would want to
remember the way patches got bundled up...
I don't know if there is a perfect solution. Maybe either of them is good enough. What do you think?
ACM Queue David Ascher has an interesting and balanced case study of a company considering adopting an open-source library. The old chestnut of desktop linux is discussed by Bart Decrem, who is now at OSAF and was previously at Eazel and might have some perspective on it. Meanwhile, John Coates demonstrates that some Americans just don't understand sarcasm.
kate dumpfru% make dumpfru&&sudo ./dumpfru gcc -o dumpfru -Wall -g dumpfru.c -lezbmc -lm FRU 0000/0000 8 1 1 9 d 1c 0 0 FRU common header checksum OK :-) chassis info at offset 72 20 1 4 17 cb 0 0 0 | 0 0 0 0 0 0 0 0 cc 53 47 33 33 33 32 30 | 36 31 31 0 0 c1 0 0 32 bytes of chassis data chassis info length 32 chassis type 0x17 chassis part number: "" chassis serial number: "SG33320611" board info at offset 104 128 bytes of board info: 80 1 f 0 0 0 0 ca | 68 70 0 0 0 0 0 0 0 0 e0 77 6f 72 6b 73 | 74 61 74 69 6f 6e 20 7a 78 36 30 30 30 20 73 79 | 73 74 65 6d 20 62 6f 61 72 64 0 d0 34 30 43 54 | 4c 56 4e 30 33 54 0 0 0 0 0 0 cb 41 37 32 | 33 31 2d 36 36 35 31 30 41 11 c8 64 8 44 0 0 | 0 0 0 c4 34 33 32 31 c2 44 0 10 1 17 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0 c1 0 0 0 | 23 1 10 0 c2 68 70 e0 board manf: "hp" board product: "workstation zx6000 system board" board serial number: "40CTLVN03T" board part nr: "A7231-66510" board FRU file ID: 6bit: 1@,9
Christopher Hitchens on Abu Ghraib and consequences.
I used to think those headlines were just made up, but according to mediawatch, the ABC really did write:
New study finds bereaved suffer less after euthanasia
The Economist has a nice bit on Dennis Ritchie and the history of Unix.
The later history of Unix is convoluted, and indeed has again become mired in court battles. Following its origins at Bell Labs, a competing version sprang up at the University of California, Berkeley, which first released its version of Unix in 1977, under the leadership of a graduate student named Bill Joy, who later went on to found Sun Microsystems. Ideological battles raged between adherents of the two versions of Unix through much of the 1980s.
To an extent, this rivalry was stripped of relevance by an unexpected entrant. In 1991, an obscure university student in Finland, Linus Torvalds, announced a project to write a new, open-source clone of Unix from scratch — what has come to be known as Linux. That someone would seek to do this is a testament to the high regard in which programmers hold the achievement of the Bell Labs group. Dr Ritchie, in return, expresses a high regard for Linux, attributing its success to the fact that it was a unified effort, at a time when other competing versions of Unix were mired in legal battles.
Linux is also the true heir of the Unix tradition in the sense that its development process is collaborative. Dr Pike says that the thing he misses most from the 1970s at Bell Labs was the terminal room. Because computers were rare at the time, people did not have them on their desks, but rather went to the room, one side of which was covered with whiteboards, and sat down at a random computer to work. The technical hub of the system became the social hub.
It is that interplay between the technical and the social that gives both C and Unix their legendary status. Programmers love them because they are powerful, and they are powerful because programmers love them. David Gelernter, a computer scientist at Yale, perhaps put it best when he said,Beauty is more important in computing than anywhere else in technology because software is so complicated. Beauty is the ultimate defence against complexity.Dr Ritchie's creations are indeed beautiful examples of that most modern of art forms.
Ken Brown and Darl McBride should read it and save themselves further ridicule.
- 1 PEP 218: Built-In Set Objects
- 2 PEP 237: Unifying Long Integers and Integers
- 3 PEP 229: Generator Expressions
- 4 PEP 322: Reverse Iteration
- 5 Other Language Changes
- 6 New, Improved, and Deprecated Modules
- 7 Build and C API Changes
- 8 Other Changes and Fixes
Anthony Berno says:
I have elected to base my cuisine on the principles of evolutionary psychology.... This evolutionary approach has yielded four basic principles of cuisine: naturalism, organic symmetry, novelty, and disequilibrium. These are all different facets of the same core concept: that the goal of haute cuisine should be to stimulate the instincts that evolved within our species, and to do so more strongly than ordinary cuisine. It is this supernormal stimulation of our basic survival urge that makes food into art....
The final principle, and one of the most important, is that food must be always off-balance and in a state of change. In nature, things that are at equilibrium are dead, and are often dangerous to eat. Fresh foods, on the other hand, are either still alive - such as a carrot in a refrigerator - or so recently deceased that the disequilibrium of life still animates their form.
jmason quotes The Common Thread: Science, Politics, Ethics and the Human Genome, by John Sulston, head of the Sanger Centre, and a joint winner of the Nobel Prize for Medicine:
Once the first fluorescence sequencing machines arrived, it became clear that we had to take control of the software. The machines worked well, but ABI (jm: the vendor) wanted to keep control of the data analysis end by forcing their customers to use their proprietary software. ...
I could not accept that we should be dependent on a commercial company for the handling and assembly of the data we were producing. The company even had ambition to take control of the analysis of the sequence, which was ridiculous. ...
So, one hot summer Sunday afternoon, I sat on the lawn at home with printouts spread all around me and decrypted the ABI file that stored the trace data. ... Within a very few days, Rodger and his group had written display software that showed the traces - and there we were. The St Louis team joined in, and they all went to decrypt more of the ABI files, so that we had complete freedom to design our own display and analysis systems. It transformed our productivity. Previously we'd only been able to get the traces as printouts, which we bound together in fat notebooks ....
I certainly feel that between us we did push ABI back a bit and denied to them complete control of this downstream software. It was the first experience of the kind of battle for control of information that I seem to have been fighting with commercial companies ever since: a foretaste of the much larger battles that would later surround the human genome.
From this summary, his experience is remarkably similar to that of Richard Stallman several years earlier, when the frustration of closed-source printer software helped motivate him to start the GNU project. (The section from Free as in Freedom is also very good.) I think my favourite part of FaiF is the epilogue:
In The Autobiography of Malcolm X, Alex Haley gives readers a rare glimpse of that backstage drama. Stepping out of the ghostwriter role, Haley delivers the book's epilogue in his own voice. The epilogue explains how a freelance reporter originally dismissed as a "tool" and "spy" by the Nation of Islam spokesperson managed to work through personal and political barriers to get Malcolm X's life story on paper.
While I hesitate to compare this book with The Autobiography of Malcolm X, I do owe a debt of gratitude to Haley for his candid epilogue. Over the last 12 months, it has served as a sort of instruction manual on how to deal with a biographical subject who has built an entire career on being disagreeable.
Many of the primary sources for AdTI's Samizdat have come out and rebutted the book. I'm counting here only the people who were contacted in Brown's "extensive interviews".
Here is the current status:
- David Bloch, attorney
- Explicitly is not speaking about Linux/Unix, only copyright law in general, but his remarks are recast to denigrate Prof Lions.
- Eric Levenez
- "My Unix chart is not a representation about copyright or patent," but AdTI uses it to imply Linux is a derived work from Unix.
- Nikolai Bezroukov
- Says that Linus did not write Linux by himself, but rather with help from other contributors. That's hardly news. His conclusions are not universally accepted.
- Linus Torvalds
- Only quoted, not interviewed by Brown. Torvalds says Brown did not even email him. Admits Linux was written by the Easter Bunny and Santa Claus. If only Brown had asked in the first place!
- Andrew Tanenbaum
- Tells AdTI that Linux is "free of any Minix code", but they don't want to
believe him. Reply to Brown, who he calls, kindly,
"not the sharpest knife in the drawer".
Brown says Tanenbaum is
animated but tense, perhaps trying to imply Tanenbaum has something to hide. Replies again: he does not suffer fools like Brown gladly. Brown also seems to think Amsterdam is in Finland.
- Petri Kutvonen, Helsinki University
- Tells AdTI "I doubt if Linus ever did see a single line of original Unix code", but they don't want to believe him either.
- Jason Kipnis from Weil, Gotshal & Manges
- Tries to explain what "derived work" means in copyright law, but Brown doesn't seem to be listening.
- Eric Raymond
- Very unhappy at his words being twisted by Brown.
- Ilkka Tuomi
- Tuomi's paper on contributions to the kernel is cited to support Brown's theory that there is a conspiracy by Linus to not give credit to contributors from India and China. Brown doesn't name any such contributor. Tuomi says this is not a valid interpretation of his data, and gives four reasons why Brown is wrong.
- Fred N. van Kempen
- Tries to explain to AdTI the difference between (illegal) copyright infringement and (legal, ethical, normal) building on previous work. Well, anyone who knows anything about programming, especially the art of OS design and programming, knows one does not "invent" an OS. AdTI don't seem to understand this fundamental point.
- Dennis Ritchie
Quoted out of context on the Lions book. Says the only
interviewwas a brief email.
- Richard Stallman
- He did not create an operating system. He wrote a kernel. What Linus
released in 1991 was not a mature kernel, it was barely a functioning
kernel. It took a couple of more years for him to arrive at a kernel with
functionality comparable with the kernel of Unix.
Nonetheless, it is true he got Linux to work in an amazingly short time,
much less time than the Hurd needed. My only comment on that is that he
clearly a good programmer.
Brown again chooses not to believe his well-informed primary source.
that Brown deliberately
confuses his terms, and that
Linus really wrote the kernel.
- Dev Mazumdar
- States perfectly ordinary and reasonable policies about corporate contributions to open projects. Brown seems to feel they he says something against Linus but I don't see why, and Brown doesn't say why he quotes Mazumdar.
- Charles Mills, a due diligence consultant
Tells Brown that leakage of proprietary code into open projects is
far less of a problem than open code being appropriated by proprietary projects.
That seems to directly contradict Brown's thesis that corporate code leakage into Linux is common and a big problem. I don't know why he quotes Mills.
It later turns out that AdTI didn't
interview Mills and Jones at all, but rather lifted the text from a
private bulletin board. The owner of the board describes this as
extraordinarily shoddy journalism.
- Henry Jones of Intersect Technology Consulting
know and work with plenty of companies that permit such OSS participation during working hours... Smart companies allow
talent to work on non-company projects (charity, civic, etc.). Smart companies
are now developing robust OSS strategies processes, and staffing....Nobody's
laughing at Richard Stallman any more.
This also seems unremarkable. I don't understand why Brown quotes him.
It certainly doesn't help Brown's case. As for Mills, the so-called
interviewwas nothing of the kind.
- David Banks
- Oracle and other database suppliers face a growing threat from below: "open source" databases, which give customers a free or low-cost alternative to commercial products. Brown claims to be in favor of free markets, etc. But he sees giving customers a lower-cost option as a problem.
In summary: every primary source in Samizdat either contradicts Brown, is ignored or misinterpreted by Brown, or has later rebutted him. Not one person interviewed in the book has said they feel it correctly reports what they said. Not one person interviewed in the book agrees with its conclusions.
Bear in mind that AdTI says:
Brown's account is based on extensive interviews with more than two dozen leading technologists in the United States, Europe, and Australia, including Richard Stallman, Dennis Ritchie, and Andrew Tanenbaum.
Some people were not interviewed at all. Other were
extensively interviewed, but asked only a couple of
questions. And almost all of those interviewed think Brown is wrong,
and many of them dislike his
exceptionally shoddy journalism.
Sometimes when one is investigating a topic, some of the people interviewed might disagree with the thesis. But to have every single one feel that the researcher is either wrong or missing the point is quite an outstanding achievement. I would think any person interested in writing a serious book would at that point take a step back and check whether their thesis was really right. A less ethical person might select different sources to support their case. Microsoft/AdTI did not even bother to find sources who agreed with their outlandish theories — they just went to print anyhow.
Microsoft/AdTI's Samizdat quotes text from Charles Mills and Henry Jones. The comments are basically about the question of whether it is useful to companies for their engineers to be involved in open source projects.
Brown cites them as, for example,
Mills, Charles, interview with
AdTI, Apr 14 2004. One is given to believe that AdTI interviewed
these people in person, on the phone, or by email.
I found out today that in fact the text is taken, without permission or attribution, from a bulletin board at SoftwareCEO.com [registration required].
I think it's bad form to cast comments on a bulletin board as an
interview, to reprint in a book comments made to a semi-private
forum, to misconstrue comments. Orndorff
agreed to SoftwareCEO.com's terms-of-service, which he violated by reprinting text.
Ilkka Tuomi's research on contributions to the Linux Kernel is cited in Ken Brown's Samizdat to support Brown's kooky hypothesis that Linus copied code from Minix. The final version of Tuomi's paper has now been published in First Monday. Abstract:
Evolution of the Linux Credits file: Methodological challenges and reference data for Open Source research by Ilkka Tuomi
This paper presents time-series data that can be extracted from the Linux Credits files and discusses methodological challenges of automatic extraction of research data from open source files. The extracted data is used to describe the geographical expansion of the core Linux developer community. The paper also comments on attempts to use the Linux Credits data to derive policy recommendations for open source software.
It includes a section responding to Brown's paper:
[Microsoft/AdTI] claimed that the future of open source software and Linux is therefore threatened by the problem of assigning authorship to specific pieces of code, and potential legal costs resulting from this. As the argument to an important extent has been based on the data presented in this paper, a few observations may be useful. [...]
Based on common knowledge about software development, it therefore appears that a single computer enthusiast could well have created the first Linux version in a couple of months. In fact, by reading the original source code, it is quite clear that a single author, still in the early phases of learning to program operating systems, has produced it. [...]
The difficulty to accurately allocate credit in software development projects should not, however, be automatically interpreted as evidence of misallocated credit or intellectual property rights infringements, as the Tocqueville report, for example, has done. Software products are often based on incremental innovation where existing technologies and knowledge are recombined to create new functionality. [...] [D]evelopers may deserve much more credit than there is intellectual property available today. One way to deal with this issue is to create explicit representations of moral authorship that are only loosely connected with current concepts of intellectual property. The Linux Credits file is an example of such an approach.
It is clear from the paper that Tuomi has a good understanding of how credit for contributions is recorded, and he proposes some quite interesting ideas about how ideas actually propagate as compared to how formal IP law works. I rather get the impression in reading the response to Brown that Tuomi does not like his serious research being twisted and misconstrued. (Who would?)
Should you require further evidence that none of Brown's sources support his conclusions, read section 6.
Groklaw has further coverage.
Linux Journal article about using SPF to prevent mail forgery.
Ken Brown of AdTI wrote a response to criticisms of his Samizdat book. The response is simply too appallingly bad to bother criticizing, but if you really want to then it is reviewed by liedra, Slashdot and Groklaw (twice.)
There is also a particularly good one by Tim Lambert.
Not one person other than Brown has come out to defend the report, as far as I can see.
Reflections on Witty by Nicholas Weaver and Dan Ellis. Very good.
On March 20th, 2004, an attacker released a single-packet UDP worm, Witty, into the wild. Although only infecting roughly 12,000 machines, and less than 700 bytes long, this worm represents a dangerous trend in malicious code. The attack is well understood: there have been several analyses [lurhq, disassembly] of the worm itself, and an excellent analysis by Moore and Shannon on the network propagation [caida_witty], including the presence of seeding or hitlisting (starting the worm on a group of systems to speed the initial propagation). But what can we learn about the attacker?
Examining the timeline of events, the worm itself, its malicious payload, and the skills required all point to a sophisticated attacker. Witty was written by an author who was motivated, sophisticated, skilled, and malicious. Although there have been previous well-engineered worms (notably the Morris worm and Nimda), Witty represents a dangerous new trend, combining both skill and malice.
It's actually unfortunate that Witty hasn't gotten the attention lavished on previous worms, as it was a very significant attack. This worm contained a payload malicious to the host computer, was released with almost no time to patch systems. The worm contained no significant bugs, and was written by a malicious author deeply familiar with the theoretical and practical state-of-the-art in worm construction and computer security.
Analysis of Witty worm spread by CAIDA.
Crooked Timber points to an interesting speech by Robert Hughes about the Royal Academy, and by extension the idea of a professional academy in general. I think in the resurgence of open Unix (Unix as literary tradition) you can see some similar themes.
I believe it's not just desirable but culturally necessary that England should have a great institution through which the opinions of artists about artistic value can be crystallised and seen, there on the wall, unpressured by market politics: and the best existing candidate for such an institution is a revitalised Royal Academy, which always was dedicated to contemporary art.
Part of the Academy's mission was to teach. It still should be. In that regard, the Academy has to be exemplary: not a kindergarten, but a place that upholds the primacy of difficult and demanding skills that leak from a culture and are lost unless they are incessantly taught to those who want to have them. And those people are always in a minority. Necessarily. Exceptions have to be.
I'm not absolutely sure it's necessary, and I don't know who is best to do it. Who speaks about software not just as business or science, but as a culture?
Looks interesting: rpmerizor
rpmerizor is a Perl script that uses rpm as an archiving utility. It allows you to create an RPM package simply by specifying files on the command line and answering a few questions.
(I had told myself that I would stop flogging the dead horse that is Samiszdat, but this is so delicious I really had to post.)
Democracy means rule by the people, but rule means something more than mere elections. In our tradition, it also means control through reasoned discourse. This was the idea that captured the imagination of Alexis de Tocqueville, the nineteenth-century French lawyer who wrote the most important account of early "Democracy in America." It wasn't popular elections that fascinated him - it was the jury, an institution that gave ordinary people the right to choose life or death for other citizens. And most fascinating for him was that the jury didn't just vote about the outcome they would impose. They deliberated. Members argued about the "right" result; they tried to persuade each other of the "right" result, and in criminal cases at least, they had to agree upon a unanimous result for the process to come to an end. [...]
Enter the blog. The blog's very architecture solves one part of this problem. People post when they want to post, and people read when they want to read. The most difficult time is synchronous time. Technologies that enable asynchronous communication, such as e-mail, increase the opportunity for communication. Blogs allow for public discourse without the public ever needing to gather in a single public place. [...]
I think the recent discussion by the organization operating under de Tocqueville's name provides another good example. In the past, if somebody published a bad book in which they made unfounded claims, it would take a while for responses to come out. The most an ordinary person could probably manage would be to get something printed in a newspaper, which only covers a small fraction of the world. Now we are able to collectively deliberate and criticize the book before it appears on paper and publishing to anyone who cares to hear.
Television and newspapers are commercial entities. They must work to keep attention. If they lose readers, they lose revenue. Like sharks, they must move on.
But bloggers don't have a similar constraint. They can obsess, they can focus, they can get serious. If a particular blogger writes a particularly interesting story, more and more people link to that story. And as the number of links to a particular story increases, it rises in the ranks of stories. People read what is popular; what is popular has been selected by a very democratic process of peer-generated rankings.
Speaking of unanimity, I ought to go through Samizdat again and check if there are any of their primary sources who have not disowned the book. I think the only one is John Lions, who sadly is unable to, but I should check.
Ilkka Tuomi has some interesting papers on Linux seen from an organizational or economic point of view.
And so Mr. Brown would like to call his book, Samizdat, debasing a word covered with the blood of innocents. Brown has suggested the government "support" what he calls "true" open source code, and establish a government-approved "open source" code bank of some sort, giving money to universities to create it, to replace the free and open source code that thousands of creative volunteers have offered as a gift to the world already, code written by men and women who did it because they felt like expressing themselves, some of them because they wanted software code to be freely available to all, to benefit the world. [...]
It's important to keep clearly before us that software code is speech, a form of expression. Even the law sees it that way. What side would Brown have chosen in Stalinist Russia? I cannot say, but he is attacking an upright man without cause, unless perhaps you count politics or money as a worthy cause, not sending Linus to his death, of course, nothing as dramatic as that. But he does attempt to deface a man's life's work, diminishing his remarkable achievement by falsely implying that it was plagiarism, so as to destroy it and replace it with state-sponsored code, which won't be allowed in business but can be used in universities.
S was reading The Gulag Archipelago a little while ago. I should read it too.
The AdTI book on the origins of Linux seems to have been fairly resoundingly debunked here and on Groklaw. Many of the primary sources Brown interviewed have come out and said he was wrong, including Tanenbaum, Stallman and Salus. Ilkka Tuomi says he will be releasing something in the next couple of days.
Annoying though it is to see such slander, in a way it represents a kind of victory. Free software is so successful and so well known that Microsoft feels they need to fund publication of this kind of trash.
Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May
Copyright (C) 1999-2007 Martin Pool.