spam statistics; spam as steganographic cover
In a 24-hour period on samba.org, we received about 12751 messages, of which about 10950 were blocked by the system as either spam or viruses. So roughly 86% of incoming messages are trash. A bit more than half of them were blocked by blacklists such as Spamhaus and a third were rejected for containing malware signatures such as PE headers. Many of the remaining ones are bounced because they're going to invalid addresses, presumably coming either from dictionary attacks or spammers who collected random strings containing @. SpamAssassin deals with the remaining 528.
(SpamAssassin could probably pick out many more, but it's relatively expensive so we only run it on things that are not obviously bad.)
In fact, the fraction of spam is probably a bit higher because the system-wide filters are pretty conservative, and I am not counting messages filtered out by individual users. I think we're certainly over 90% spam/malware; possibly over 95%. It's a bit like John Birminham's description of a sewer of pure shit coming straight into our living room.
On the other hand, it rather reminds me of Rivest's great Chaffing and Winnowing: Confidentiality without Encryption paper, and of the idea of steganography in general. Hiding messages is technically easy; the hard part is finding cover traffic. (In the standard example, the FBI wonders why Alice and Bob are posting each other so many pictures of puppies.) Spam is the perfect background noise to send invisible steganographic messages, as long as you can agree on a method for your eligible receiver to pick out the good bits.
Rivest writes
We could thus have the following intriguing scenario: Alice is communicating with Bob using a standard packet-based communication scheme. Each packet is authenticated with a MAC created using a secret authentication key known only to Alice and Bob. (In practice, they might use a different key for packets in each direction, although this is not necessary if the packet contents identify sender and receiver.) Furthermore, each packet happens to contain only a single `message bit.'' (Alice wrote their software, and it contained a bug that caused this unusual behavior.)
So far, Alice and Bob are not encrypting anything, and are using standard messaging techniques that would not be considered as encryption and that would not be export-controlled. Alice and Bob have no intention of achieving confidentiality of their messages from an eavesdropper.
Now, Alice's packets to Bob may be routed from her computer through the computer of her Internet service provider, run by Charles, on another floor of her building, before being sent on to more major trunks of the Internet and then on to Bob.
Charles' computer, for whatever reason, then adds `chaff'' packets to the packet sequence from Alice to Bob. All of sudden, Charles' activities provide a very high degree of confidentiality for the communications between Alice and Bob! Alice's and Bob's software have not been modified in the least to achive this confidentiality! Charles does not know the secret authentication key used between Alice and Bob! Alice and Bob did not even want or care to have confidential communications! Charles is not using encryption and does not know any encryption key! Amazing!
In this case, Charles is COL CHARLES MOGUBE of the LIBERIAN ARMY.
posted Thu 6 May 2004 in /issues/spam | link
Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May
Copyright (C) 1999-2007 Martin Pool.