Martin Pool's blog

Sick of XML? Try YAML!

From slashdot:

YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. YAML(tm) is a balance of the following design goals:

  • YAML documents are very readable by humans.
  • YAML interacts well with scripting languages.
  • YAML uses host languages' native data structures.
  • YAML has a consistent information model.
  • YAML enables stream-based processing.
  • YAML is expressive and extensible.
  • YAML is easy to implement.

Working more productively with bash 2.x

Ian Macdonald has a good page on Working more productively with bash 2.x, covering both his superb bash-completion package but also some other tips. Plain Unix sh always seems so drab after bash.

SCO owns C++!

Courtesy of We Love the SCO Information Minister: McBride says "And C++ programming languages, we own those, have licensed them out multiple times, obviously. We have a lot of royalties coming to us from C++."

Wow.

Subversion tip -- What's new?

What's new in the repository since I last updated?

$ svn log -r BASE:HEAD | less
$ svn diff -r BASE:HEAD | less

Rapid Testing

Jason pointed me to James Bach's writing on rapid testing. It looks interesting:

How is Rapid Testing different from normal software testing?

Testing practice differs from industry to industry, company to company, and tester to tester. But there are some elements that most test projects have in common. Let's call those common elements "normal testing". In our experience, normal testing involves writing test cases against some kind of specification. These test cases are fragmentary plans or procedures that loosely specify what a tester will do to test the product. The tester is then expected to perform these test cases on the product, repeatedly, throughout the course of the project.

Rapid testing differs from traditional testing in several major ways:

pop quiz

What value is assigned to the macro by this line?

#define PEGASUS_ATOMIC_INT_NATIVE = 1

Why Changelogs?

In many GNU packages like gcc or emacs you'll see a ChangeLog file containing a description of all of the changes to the package. I'd always thought of them as a vestige of a previous era before CVS, but bje recently made a pretty good argument for continuing to use them.

Here's a sample from emacs's ChangeLog, for any readers who aren't familiar with the form:

2003-09-23  Dave Love  <fx@gnu.org>

* configure.in: Check members of struct ifreq.

2003-09-14 Kim F. Storm <storm@cua.dk>

* configure.in: Add checks for sys/ioctl.h and net/if.h.

2003-09-12 Luc Teirlinck <teirllm@mail.auburn.edu>

* Makefile.in (install-arch-indep, uninstall): Add SES manual.

2003-08-18 Lute Kamstra <Lute.Kamstra@cwi.nl>

* configure.in: Revert the change of 2003-07-29 as GTK+ 2.2 is not required anymore.

2003-08-07 Andrew Choi <akochoi@shaw.ca>

* configure.in [powerpc-apple-darwin*]: Use the -no-cpp-precomp option instead of -traditional-cpp for CPP.

There's a text format for listing changes and then also a description in the GNU coding standards of what information ought to be in the entries. This list of what was changed, when, and by whom is pretty similar to what you might see in a CVS history.

So if you're storing your project in CVS or some other revision control system, then keeping a ChangeLog as well is redundant. Indeed, if you keep the ChangeLog in CVS then every comment is literally being stored twice...

The main reason for using a ChangeLog is that it travels with the source it describes. Years in the future if somebody obtains the source they can still see its history, even if they no longer have access to the version control system. It's fairly common for projects to move between different vc systems over their lifetime. It's not unheard of for a vc system to be lost entirely, leaving only a set of tarballs as a record.

If some other party wants to fork the project, or develop it offline for a while before merging back, then the ChangeLog gives some hope that their history will be recorded.

Another advantage is that it's easy to scan through the log for mentions of a particular function. Easier than with any vc system I know of, at least if the log is well written.

I've recently got hold of a few source tarballs written by parties unknown, and existing in different versions with no clear indication of when things were changed or why. If they'd come with ChangeLogs, things might be a bit easier. Of course the kind of person who omits even a README might not write a ChangeLog, but if somebody else had started one perhaps they might have continued it.

If we all used a version control system like arch or bitkeeper that carried history with the source then perhaps this might be less necessary. But even then there's no guarantee that every person who gets the source in the future will want to keep using that system...

If things are being kept in both CVS and a ChangeLog, it ought to be easy to use a little script or macro to keep them in sync.

There are scripts such as cvs2cl that produce a ChangeLog for CVS sources.

GNU emacs also has commands to integrate version control and ChangeLogs.

Some projects require that ChangeLog entries be submitted with patches. This means an explanation of the change in the originator's own words always gets into the project history. If the ChangeLog standards are enforced then the entry will have a level of detail and preciseness that might not be present in an informal description of the change.

As with any history logs, there is a small challenge in writing descriptions that will be comprehensible and useful to people reading them months or years hence.

I'm going to try this for a while...

Alli Russell

Alli takes a step towards nerdliness.

Ethics of replication

I rediscovered a post from Mark Wooding that I particularly like:

From: mdw.at.nsict.org (Mark Wooding)
Newsgroups: comp.text.pdf,sci.crypt,gnu.misc.discuss
Subject: Re: FBI - Adobe's lapdogs & government war on citizens
Date: 13 Aug 2001 20:45:40 GMT
Organization: National Society for the Inversion of Cuddly Tigers
Message-ID: <slrn9ngf3k.u7f.mdw@tux.nsict.org>

Robert J. Kolker wrote:

Without getting in pejorative terminology, do you think it is kosher to deny or deprive the owner of intellectual property an opportunity to sell it?

Yes. Not every situation is an appropriate sales opportunity.

I'm not qualified to decide on what's kosher, or halal for that matter.

Perhaps if you meant to ask a different question, you should have done.

For example, say you borrow a book from a library (no problem). You make a two hard copies, one for you and one for your friend. Niether he nor you are likely to buy the book since you already have readable copies at hand.

Result. The publisher of the book has probably lost two sales.

I don't follow. First of all, you state that neither I nor my friend are likely to buy the book, since we have a copy at hand (in the library, presumably), and then complain that the publisher has lost sales. But libraries are OK.

And then there's the issue of a `lost' sale. How can it be lost? He never had it in the first place!

And this is the result of "sharing".

If the alternative is Stallman's `Right to Read' world, and we seem to be getting closer to that, uh, `ideal' every day, then count me down for sharing. Or whatever you want to call it.

Here's a thought experiment. Imagine you have a replicator, like in Star Trek. It costs about as much as a 100W light bulb to run, and is easy to maintain. It makes copies -- perfect working copies -- of inanimate objects[1]. All it takes is some space, to put the new copy in, and time to scan the original and make the new one. Suppose further that you bought yours from some guy in a corner shop for some small amount of money -- it's no hassle for him: he just replicates 'em, after all, and he's not in it for the money.

Which of these things do you think are morally `wrong', or should be `forbidden'? Justify your answers.

I think that last is the only one which is actually `wrong' in any obvious way. I can argue for and against the others, but tend to fall in favour or allowing them.

I'm interested in answers from both sides of the debate.

[1] I don't want to get into the ethics of replicating live people, or even animals. We'll allow replication of dead stuff, so food is fair game.

-- [mdw]

The Geneva Convention On The Treatment of Object Aliasing

The Geneva Convention On The Treatment of Object Aliasing, Hogg, Lea, Wills, deChampeaux and Holt.

Aliasing has been a problem in both formal verification and practical programming for a number of years. To the formalist, it can be annoyingly difficult to prove the simple Hoare formula {x = true} y := false {x = true}. If x and y refer to the same boolean variable, i.e., x and y are aliased, then the formula will not be valid, and proving that aliasing cannot occur is not always straightforward. To the practicing programmer, aliases can result in mysterious bugs as variables change their values seemingly on their own. A classic example is the matrix multiply routine mult(left, right, result) which puts the product of its first two parameters into the third. This works perfectly well until the day some unsuspecting programmer writes the very reasonable statement mult(a, b, a). If the implementor of the routine did not consider the possibility that an argument may be aliased with the result, disaster is inevitable.

Over the years, solutions or workarounds have been found for aliasing problems in traditional languages, and the matter is seemingly under control. Unfortunately, as described below these solutions tend to be too conservative to be useful in object-oriented programs.

The object paradigm has been sold partly on the basis of the strong encapsulation that it provides. This is a misleading claim. A single object may be encapsulated, but single objects are not interesting. An object must be part of a system to be useful, and a system of objects is not necessarily encapsulated.

Photos

Tell us how you really feel...

SCO toilet paper

SCO releases a list of files

At long last, SCO have released a list of the Linux source files they claim infringe on their copyrights and/or proprietary information.

Intial analysis seems to show they just grepped for anything with the word "SMP". In particular, they think include/asm-m68k/spinlock.h infringes. The entire file is:

#ifndef __M68K_SPINLOCK_H
#define __M68K_SPINLOCK_H
 
#error "m68k doesn't do SMP yet"
 
#endif

Jon Corbet writes:

The other amusing thing is that they listed the files in a different form:

include.asm-m68k.spinlock.h

People finally figured it out - they needed to flatten the entire kernel directory hierarchy in order to be able to grep through it. It seems that SCO's products, those luxury cars of operating systems, lack a recursive grep...

I can just imagine some law intern somewhere renaming all those files, one by one.

Spark Ada

Spark Ada, mentioned on RISKS, looks interesting: an annotated subset of Ada with unique and precise semantics allowing static proof of, amongst other things, that no run-time exceptions will occur.

From the Preface to the book,

SPARK has just those features required for writing reliable software: not so austere as to be a pain, but not so rich as to make program analysis out of the question. But it is sensible to share compiler technology with some other standard language and it so happens that Ada provides a better framework than many other languages. In fact, Ada seems to be the only language that has good lexical support for the concept of programming by contract by separating the ability to describe a software interface (the contract) from its implementation (the code) and enabling these to be analysed and compiled separately. The Eiffel language has created a strong interest in the concept of programming by contract which SPARK has embodied since its inception in the late 1980s.[...]

I have always been interested in techniques for writing reliable software, if only (presumably like most programmers) because I would like my programs to work without spending ages debugging the wretched things.

Perhaps my first realization that the tools used really mattered came with my experience of using Algol 60 when I was a programmer in the chemical industry. It was a delight to use a compiler that stopped me violating the bounds of arrays; it seemed such an advance over Fortran and other even more primitive languages which allowed programs to violate themselves in an arbitrary manner.

On the other hand I have always been slightly doubtful of the practicality of the formal theorists who like to define everything in some turgid specifica- tion language before contemplating the process known as programming. It has always seemed to me that formal specifications were pretty obscure to all but a few and might perhaps even make a program less reliable in a global sense by increasing the problem of communication between client and programmer.

No, SCO don't indemnify Samba customers

Turns out that SCO won't indemnify their customers:

In addition, the company continues to ship the GPL-covered Samba software, which lets Unix or Linux systems share files on Windows networks, as part of its UnixWare and OpenServer products.

SCO spokesman Blake Stowell said SCO doesn't offer indemnification, or legal protection, for use of Samba. As a hypothetical example, if Microsoft were to decide Samba violated its file system intellectual property and start suing companies that use the software, SCO would stop including Samba but wouldn't offer customers using the software legal protection, Stowell said.

"I'd be confident if we had any reservations that misappropriated code had gone into Samba, we ourselves would stop shipping it, and we would recommend to our users they stop using it," Stowell said. But of assuming responsibility for a Samba lawsuit, he said, "I don't think we could."

So, just to be clear: SCO's demanding that IBM and other vendors indemnify their customers against any problems arising from open source customers, but SCO won't do that for their customers. SCO would, if Stowell is believed, not only leave their customers open to lawsuits, but also stop providing updates. OK. Good to know.

And of course, if the GPL is invalid, SCO's presumably violating copyrights by distributing Samba without a licence...

SCO's Web site states unambiguously that it's not possible to offer indemnification on GPL software: "Some customers have asked their Linux distributors to indemnify them against intellectual property infringement claims in Linux. The Linux distributors are unable to do so because of the terms and conditions in the General Public License," a page describing SCO's Unix license said.

SCO has been suggesting that IBM should indemnify its Linux customers. "If IBM is so confident that Linux is free and clear, why don't they indemnify their users against any lawsuit SCO could bring against them?" Stowell said.

Not possible, eh? Unable? Not confident? HP just did it.

Quilt

Andrew Morton kicked off a little version control tool which is now called Quilt. It doesn't seem to have many web resources at the moment, so I have mirrored the README here. It is in Debian.

Quilt (an assembly of patches, right?) wins in simplicity. Essentially it helps you organize the open-source process of generating patches against somebody else's tree, which you will later presumably mail to them or something similar. Quilt helps you manage the common case of needing to say apply several patches on top of a Linus tree, and then write your own work on top of that.

Leaving the means of archiving, distributing, and reviewing patches out of the scope of the tool is pretty smart.

I *think* Arch is doing something like this on the inside, but it's too hard to understand in the time available.

I don't know this is the direct cause, but it does seem like Larry McVoy has succeeded in taunting free software developers into writing something better than CVS, if not yet clearly better than BitKeeper. There has been a real flowering of interesting new version control systems. Not just different implementations, but genuinely new ways of thinking about the problem, or even of defining what the problem ought to be.

One thing Quilt suggests is that it really may be appropriate to use different tools at different times. The kind of operations you want for sending a single small patch to somebody else's package are quite different to when you're maintaining your own team's tree over many years. It may well be it's better to use different simple tools for each case rather than designing one complex one to do everything.

Reading about some of akpm's other work inspires a mix of nostalgia and awe.

Fort Collins

I'm in Fort Collins, Colorado at an HP internal get-together. Contrary to my expectations, the Denver-Loveland-Ft Collins strip of Colardo is completely flat and treeless. There are some very pretty and impressive mountains just nextdoor though. I went for a walk in Rocky Mountains National Park and could feel the thin air when walking up hills. It's been snowing just enough to entertain. Temperatures in the 30s Fahrenheit don't feel uncomfortable compared to Canberra.

Pictures to follow.

A Personal Record

Reading Joseph Conrad's A Personal Record. Just great.

SHFS

SHFS: a Linux remote filesystem implemented by running shell commands over SSH, kind of like emacs Tramp mode.

N410c

/evo-N410c-acpi-static--2003-11-06.diff: kludgy patch to make ACPI work on the HP Compaq Evo N410c laptop, and a kernel configuration. A far better description of how to make this work is here.

I haven't tested it very much yet but this does at least allow X to run and prevents the machine going into thermal shutdown.

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May