I wrote a short implementation of secsh-filexfer in Python yesterday. It worked pretty well: it can successfully connect to the sftp server in OpenSSH, and open, read and write files. It's very nice and straighforward. Doing e.g. "futures" to support pipelined operations should also work very well in Python.
I am trying to suppress the urge to write it in C. Python is far quicker to write but I have this nagging feeling that I will need to go to C eventually to win benchmarks and so I might as well do it now. But I think I can stick with Python for a bit longer.
I think this will work out pretty well as a new rsync-like transport. The protocol needs to be extended in a few ways but there are standard extension breakouts to do that.
- Create hard links
- Get file digest
- Get file block checksums
- Get/set/list extended attributes
JW suggests that we can use "reverse rsync" for downloads so that less intelligence is required on the server.
I've been reading about the IETF Secure Shell working group, which is basically the standardization effort for the program/protocol we now know as "ssh".
("SSH" is a trademark of "SSH Communications Security", Tatu Ylonen's company. They've granted a licence for people to have programs called ssh, but I think the standard is trying to move away from using the trademark towards "secure shell" or "secsh".)
In particular, the draft-ietf-secsh-filexfer-04 protocol draft looks like an extremely interesting solution for an rsync successor. Rather than inventing the protocol and program from scratch, it could build on secsh-filexfer and add delta-compression operations to the protocol, plus offering some programs for doing recursive/automatic transfers.
The protocol is not exactly as I would have done it, but it is remarkably close to the design that I and other people have been converging on: a pipelined sequence of reasonably simple file operations, running over a secure channel.
Another thing I recently discovered is why SSH has the "subsystem" mechanism -- at the moment the most popular is sftp, though there are others such as skermit. On Unix this maps into the server just executing an inetd-style server, so it's more or less equivalent to the client invoking ssh /usr/sbin/sftpd, much as rsync or CVS do. So why bother having a special mechanism rather than just invoking the server?
The reason is that on some systems such as Netware it is impossible to implement a shell over SSH: some systems don't have interactive shells, and others don't have the right pty or fifo mechanisms you'd need to do it properly. (Telnet on Windows has always been pretty flaky.) But if the server is invoked as an SSH subsystem, then the server can invoke it in some system-dependent way. You could for example implement sftpd as a loadable module or even a builtin on a non-Unix sshd, and it would be transparent to clients. The subsystem design basically only requires that the operating system be able to do TCP -- process models and IPC are implementation details. Very clever indeed.
Ivan Sutherland on Technology and Courage.
"Superlifter" is a type of spaceship in the "Culture" series of novels by Iain M Banks:
The General System Vehicle Sanctioned Parts List appeared on the screen in the Superlifter's lounge as another point of light in the starfield. It became a silver dot and grew quickly to fill the screen, though there was no sign of detail on the shining surface.
~That'll be it.
~I suppose so.
~We've probably passed near several escort craft, though they wouldn't be making their presence so obvious. What the Navy called a High Value Unit; you never send them out alone.
~ I thought it might look a little more grand.
~They always look pretty unimposing from the outside.
The Superlifter plunged into the centre of the silver surface. Within it was like looking from an aircraft inside a cloud, then there was the impression of plunging through another surface, then another, then dozens more in quick succession, flicking past like thumbed paper pages in an antique book.
They burst from the last membrane into a great hazy space lit by a yellow-white line burning high above, beyond layers of wispy cloud. They were above and aft of the craft's stern. The ship was twenty-five kilometres long and ten wide. The top surface was parkland; wooded hills and ridges separated by and studded with rivers and lakes.
Bracketed by colossal ribbed and buttressed outriggers chevroned in red and blue, the GSV's sheer sides were a golden, tawny colour, scattered with a motley confusion of foliage-covered platforms and balconies and punctured by a bewildering variety of brightly lit openings, like a glowing vertical city set into sandstone cliffs three kilometres high. The air swarmed with craft of every type Quilan had ever seen or heard of, and more besides. Some were tiny, some were the size of a Superlifter. Still smaller dots were individual people, floating in the air.
Two other giant vessels, each barely an eighth of the size of the Sanctioned Parts List, shared the envelope of the GSV's surrounding field enclosure. Riding a few kilometres off each side, plainer and more dense-looking, they were surrounded with their own little concentrations of smaller flying craft.
~It is a little more impressive on the inside, isn't it?
Hadesh Hurler remained silent.
I have been thinking for a long time about a new program (working title of Superlifter) that would take some of the best ideas from rsync and leave behind the historical baggage.
There are already some old and sketchy notes about it here.
rsync follows an interesting pattern in open source projects of being quasi-transport-indepdendant by running over a plain TCP socket, or some other two-way stream provided by something like OpenSSH.
(The structure of the discussion here is based on the Pattern Form.)
- We want to optionally support as-fast-as-possible (wire speed?) operation with no overhead for an anonymous, typically read-only, configuration.
- We also want the option of strong cryptographic protection on authentication, encryption, and integrity.
- Some people will not want to increase the attack surface of their machine by adding new daemons listening for connections. In particular Subversion got a lot of resistance from administrators who didn't want to run apache2 mod_svn, regardless of how well written it is. Another listening process is another body of code that might possibly have bugs, that might need to be upgraded, that needs to be allowed for in firewall and intrusion-detection rules, that needs to be configured, ...
- People developing an application don't want to worry too much about security, for similar reasons: having any security functionality potentially means needing to rush out upgrades and certainly to worry a lot more about the code.
- Authentication should ideally be done over standard mechanisms. For most Linux machines, SSH is pretty standard. SSH in turn has some plugins for Kerberos, certificates, etc.
Write the application so that it only relies on a bidirectional byte stream, and then put in little mechanisms so that it can connect to the server either by opening a socket directly, or by invoking the server over ssh. (From the server's point of view this looks a lot like being started by inetd.)
CVS is the earliest program that I know of to follow this pattern. It is very common to have developers access the repository over SSH, and the rest of the world access a read-only mirror using pserver. However, some security-conscious projects (OpenBSD?) allow anonymous access over SSH so that public data is not modified in transit.
rsync also uses this pattern. Both modes are popular. In particular, TCP connections are often used for anonymous mirroring. One problem with the way this is used in rsync is that the two modes behave very differently: anonymous mode uses a "modules" virtual filesystem, allows access control, etc.
distcc borrowed this idea from rsync.
Subversion has recently added this mode of operation through the svn: URL protocol, and I think this will greatly ease adoption. Previously authenticated access required Apache2 with a special mod_svn module, which encountered some resistance from administrators.
(If it it sounds like there are a lot of negative consequences then that is mostly because familiarity with this pattern has made me look more critically. It's really quite good.)
When run in SSH mode, the application is very secure without needing to include any security or crypto code of its own. As soon as the application starts running, it knows that the user has passed the system's requirements to open an SSH connection. This is not to say that the user may not still try to get up to mischief, but the program is in the second line, not the front line.
In TCP mode connections are about as fast and simple as you can get. Very standard Unix, Stevens and all that.
Server processes run under the persona of the user who started them. They should not interfere with each other; if they want to access other system resources they do so subject to normal security mechanisms.
Ordinary users can install the server and use it over their SSH connection without needing any administrative setup, assuming they are able to install programs in their own directory. They are not exposing the system to attacks in the way that they would be by listening on a port.
sshd can be configured to allow particular users to only run particular commands.
Although the code to open SSH connections is a little complex, it can be adapted (subject to licensing restrictions) from other programs. If the code can't be used, then it is straightforward to rewrite.
The default remote username is the same as the local user, which is often reasonable. It can be easily overridden.
SSH is happiest with a separate OS-level account for all users, although you could probably set it up to allow different people to use different keys to get into the same account with different limitations.
A lot of the documentation of the program has to say "see the SSH manual for details", or just replicate it. This makes the documentation easier to write, but perhaps not easier to understand.
Normally, allowing somebody to SSH in to run the server allows them to run arbitrary commands too, though it can be tied down.
It is harder, though not impossible, to put limitations on what people connected by SSH can do, because they run an arbitrary command. This is a little different to typical FTP or HTTP servers, where root can impose limits on authenticated and anonymous users alike.
There is an asymmetry between the server directly listening on a port in TCP mode, and the server being started by sshd in SSH mode. It might be nice if instead of opening the socket ourselves, there was something like in.rshd that would allow anonymous connections to run one particular program.
The application is limited to very traditional client-server TCP applications. UDP, peers connecting back to clients, and multicast can't happen (or at least can't be secured.)
The server has no control over when it is invoked, at least in SSH mode. It can't do things like preforking or server pools that are relatively straightforward when it accepts its own connections. Apache is the most famous example of this but distcc does it to from version 2.5 and apparently gains a small benefit, and also a simple way to restrict the number of incoming connections.
SSH is slightly heavy, particularly in the latency for opening connections, compared to possible other protocols. The details will depend on the application but for example distcc is about 25% slower when run over SSH in a typical environment.
Secured connections tie together several issues that might usefully be separated: authentication, assignment to a server-side OS persona, integrity of the connection, confidentiality of the connection, etc.
For a reasonably readable implementation, see the distcc source. Some particular points to note:
- The distccd server can run either standalone, or (equivalently) from inetd or sshd. This is configured by command line parameters. For friendliness the daemon tries to guess if neither --daemon or --inetd is specified. It looks like the most reliable thing is to see whether stdin is a socket or a tty. But there are plausible situations where the server just cannot tell: one might reasonably do ssh root@fatso distccd and want it to start a daemon.
The client needs to fork a child process that will run ssh with a command line ssh $host distccd --inetd.
The "near end" of ssh needs to be connected to the client program. The best way to do this on Unix is with a socketpair, which creates an anonymous unix-domain socket. This is not available on all systems, so you may have to fall back to using a pair of pipe()s. These work fine, except that they are unidirection and therefore all the client code needs to be able to cope with having separate input and output fds. (Or you could just not support systems that don't have socketpair(), but using separate fds is not so hard if you start from the beginning.)
I have no idea how you'd do this on Windows. I would start looking at Putty and see what that supports, if anything.
It's obviously useful for the user to be able to change the name of the command used to open the connection.
Because ~/.ssh/config can contain sections switched by destination host name it is less necessary to have a way for the user to append arbitrary arguments to the command line. It can be faster/more reliable to execvp() ssh directly rather than through a shell. This goes against the user being able to specify anything other than just the first word of the command. (CVS and distcc have this behaviour; rsync does not.)
The way $PATH gets set for commands invoked over ssh can be quite confusingly different to what users see interactively for many reasons. distccd is installed by default into /usr/local/bin rather than sbin to make it more likely that it can be found on the default path. This has persistently been a problem for people with rsync installed in say /opt/freeware/rsync/bin/rsync. There must be a way for the full path to the remote server to be specified on the client to fix this kind of problem.
- Use the SSH "subsystem" facility. I don't know much about it.
Leave out entirely the ability for the server to listen on sockets of its own accord, and depend on something like inetd or netcat to run it and possibly to do access control. This means the server sees almost no difference between TCP and SSH connections.
In a first cut of the client, use netcat or socket to open connections.
rsync offers a weak challenge-response authentication protocol for use over TCP. This can be useful on mostly-secure networks. It does not send cleartext passwords, but file data is sent clear, and it would be vulnerable to a man-in-the-middle attack.
Sun wanted GSSAPI to do something like this: NFSv4 connections can either be weakly or strongly authenticated. But it does not seem to have become very generally popular on Linux, and therefore does not really have the "no overhead" benefit of SSH.
On a platform with a well-established secure RPC like Windows it might make sense to use it. But that is possibly less efficient than plain TCP, and it certainly constrains the application.
Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May
Copyright (C) 1999-2007 Martin Pool.