About this show... GIMP is cool Scheme is cool This presentation was brought to you by PinPoint, the new GIMP presentation tool. Text -> GIMP -> JPEG http://linuxcare.com.au/projects/pinpoint/ %page rproxy: dynamic web caching Martin Pool Linuxcare, Inc. http://linuxcare.com.au/rproxy/ %page Problem Statement People use web resources repeatedly Therefore: cache recently-used resources on client or proxy On each request, check currency: either reload or use same Increasingly, content is dynamic: all-or-nothing caches are less effective %page WIBNI It would be nice if we could transfer only differences Must interoperate smoothly with HTTP Must work on dynamic documents Must fit into popular HTTP software %page rsync Fast file transfer protocol Finds identical blocks between two files, therefore the delta Send per-block checksums Search for matching blocks Whatever's left is the difference %page Integration with HTTP Request/respond protocol Streaming Proxies Every response may be different %page Protocol Client transmits signature of cached resource to server Server computes & sends differences Signature sent as new HTTP header Delta as HTTP Transfer-Encoding Ignored if not supported %page Standalone Proxy Run on on client, one upstream Compress across slow links Already in Debian/Woody & Sid %page libhsync Integrate smoothly with many apps Become the encoding library for rsync 3.0 LGPL license for nonfree apps %page Hosting Applications: Mozilla: threaded Apache: multi-process-model Squid: select/poll-based Therefore: do no IO in library, caller supplies buffer State machine %page Privacy problems? Client holds server-supplied data & retransmits A "stealth cookie"? No more so than normal Last-Modified Client-generated signatures are even safer %page Tuning Encode particular content-types Fuzzy-matching of resources Cache signatures Choose block size ~90% saving %page Other schemes Explicit versioning Client-side variable portions %page Bonus slide: rsync 3.0 Scale to larger trees (1TB+ data, 10M+ files, 1000 machines) Less hardcoded structure Cached signatures, fuzzy matching Multicast 1:m, n:m %page rsync 3.0/2 Scriptable (Perl/Python/...): filtering, matching, reporting, ... Simpler client-server architecture Documented protocol SSL? rdiff tool: rsync-over-email? %page http://linuxcare.com.au/rproxy/ Questions? Come and see Linus's penguin at the Canberra aquarium.