Today’s links come direct from the floor of Gnomedex.
On the good news side, the team that I once PM’ed at Microsoft now has a blog of their own.
On the bad side, I see that they are discontinuing the DHTML Editing Control, something that I owned for a while when I was at Microsoft — first as a developer and then later as a Program Manager.
Win some, lose some!
- Ajax Blog: How vSocial Went from 0 to 71 Million Page Views in (about) 120 Days – “I would really encourage would be entrepreneurs to think through their funding needs. On paper, it sounds great to bring in $5-10M of outside money, but there is a bundle of assumptions that go with it. What can you do with angel dollars, what can you do on a boot strap, self funded basis?.”
- David McInnis:The Need for Something Other than ROI – “I have always measured ROI in terms of gross revenue and margin at the end of the month, not by any one component of our business process. I am sure that if I had taken the ROI measurement approach, I could not have justified more than half of the enhancements that we have made to the PRWeb platform..”
- Robert Scoble: What you really need to know about HDTV – “I see so many videoholics mouthing off about HDTV not teaching anyone what really matters that it pisses me off..”
- ValleyWag: How to Fix Conference WiFi – “All day at Supernova, attendees have suffered the most common conference headache — a wifi network that almost works..”
- Johan Sundström: Pimp My Site – ““I want things my way!”.”
- Nat Torkington: Fun CS and Hard Tech Links.
- New WordPress Themes Directory.
- Renee Blodgett: Unplug Please – “It took moving to Africa to truly become unplugged for more than a year.“
- Edd Dumbill: In search of Agile Infrastructure for Web Applications – “Previously a high-powered technology, cheap virtualization is now with us.”
- Unusual Hotels of the World.
- Setting Up Your Own Shoutcast Server To Stream Into SL.
- Love At First Byte – Excellent and informative biographical sketch of Don Knuth.
- Kimbro Staken: 10 things to change in your thinking when building REST XML Protocols – “Maximize flexibility, do not fall into the trap of believing that a fully specified and constrained system will be more robust.“
I use a lot of screen shots on the AWS Blog. I think that the screen shots make the blog entries more attractive and informative, so I like to take the time to set them up, take them, and format them.
This is usually pretty easy — I load up the application, hit Alt-PrtScn (Print Screen), paste the image into The Gimp, reduce it to 300 pixels wide (using the Cubic method, for best results), save it out as a PNG, and the load it up into TypePad. Occasionally, the application is too tall to show in a single browser page. If I really want to show the entire thing, I will do take multiple screen shots and then cut and paste until I have what I want. I also find that taking the time to float the images and to put nice borders and margins around them is worthwhile.
This is really tedious.
Earlier this week, my friend TDavid had a really tall screen shot in his story about Netscape’s Digg clone. Taking a screen shot this tall using my method would take me at least 20 minutes, so I figured there had to be a better way. I posted a comment on his blog, and he told pointed me at Andy Mutton’s Screen Grab utility for Firefox. Installed it and restarted, and it seemed to work pretty well. I started getting some weird errors (perhaps because I have a really wide screen and was snapshotting some very tall pages).
I found a reference to GraphicsEx in another comment, and installed it instead. It worked perfectly, and took the screen shot at right. It is pretty tall, mostly because I got a bit too verbose when I was writing about Vista, BitTorrents and Amazon S3 earlier this week.
The picture at right shows this blog (before I posted this entry). The picture at left is the home page of Syndic8, my RSS and Atom feed directory.
- Mouse Surplus – Over 5000 theme park surplus items in stock; Jeff Lange has a nice photo tour of the place. I’ll be in Orlando this summer for the Affiliate Summit; maybe I will have some time for a visit!
- Esther Dyson: Visible Demand: The New Air-Taxi Market – “The air-taxi market is not about luxury travel or vacation getaways. It’s about productivity.“
- dzone.com – “Fresh links for developers.“
Disclaimer: This is a post which reflects my personal opinions, and should not be construed as something that in any way reflects the opinions, thoughts, policies, or beliefs of my employer.
Last week Microsoft made a beta version of Windows Vista available as an open and free download. Predictably, demand for this single-file, 3.1 gigabyte downl0ad was very high, and taxed Microsoft’s ability to keep up. Microsoft also offered to ship out copies of Vista on a DVD.
In a conversation reported by Chris Pirillo, Microsoft acknowledged that the download was very popular. I was fine with this, Vista has been a long time in the making and a lot of people are anxious to give it a try. However, Chris also reported that Microsoft feared that opening up the pipes to allow additional concurrent downloads could actually put the infrastructure of the internet at risk. Without knowing the details behind this, I am a bit skeptical, but that’s not the point of this post.
In the same chat session where Microsoft acknowledged the reality of the slow downloads, an attendee asked why Microsoft didn’t create a BitTorrent seed for the file. Again, and somewhat predictably, Microsoft expressed their fear that this could result in people receiving corrupted or even fraudulent downloads.
Never one to sit idly by, Chris and his business partner Jake Ludington took the bull by the horns (no Longhorn joke intended) and created VistaTorrent.com , seeding it with an official copy of the 3.1 gigabyte Vista download. Chris announced this on Sunday evening.
If you’ve never used BitTorrent before, I should explain a few terms at this point. BitTorrent is a protocol which makes peer-to-peer file transfer simple and efficient. In the old days (before BitTorrent) file transfer took place on a point to point, server to client, basis. If 100 clients downloaded the same file, the server would see 100 requests, and the server would consume bandwidth equal to 100 times the size of the file. With BitTorrent, things are a lot different, and a bit more equitable. Instead of a central server, there’s a central location known as a tracker. As its name implies, the tracker keeps track of a set of BitTorrent clients where the bits and pieces of a file can be found. The clients are all simultaneously downloading parts of the file while keeping the other simultaneously running clients aware of which parts of the file that they have and which pieces they need. By adding in some algorithmic randomization, all of the clients eventually wind up with a complete copy of the file. At various times in the downloading process, each “client” will be both a client (to download data) and a server (to provide data to other clients). One important fact, which I have skipped in my quick explanation above, is that the file has to come from somewhere. Any client which has a complete copy of the file is known as a seed. Putting that first complete copy of the file on the network is known as seeding the torrent. There’s also something called a swarm, but I don’t understand what it is or what it does.
The entire BitTorrent system has a number of remarkable properties. First, the system as a whole automatically tolerates the spontaneous appearance and disappearance of client applications from the overall network. As long as there is (in aggregate) one copy of each chunk of the file residing somewhere in the network, all of the downloads will eventually complete. BitTorrent measures the number of complete copies using a factor called availability. As long as this value is at least 1.0, all of the downloads should succeed (assuming that no vital client goes offline afterward). Second, the protocol is self-regulating and self-adjusting in the face of slow networks, fast networks, slow clients, and so forth. Third, the blocky nature of the protocol makes it possible for clients to lose the connection, and then make it again without having to restart the download.
There’s a huge amount of subtlety in the protocol, and Bram Cohen deserves some kind of international prize for making all of this work as well as it does.
When you “do the math” on the protocol, the results are very surprising. Let’s start with one seed, and 100 clients that want to download the same file. In the best (most totally random) case, each client will start up, pick a single random block, and request it from the seeding client. After that, the clients will become peers and each one will obtain the other blocks of the file from another client, and not from the seed. So, if all goes perfectly well, the seed will see just one request for each block of the file, and will use up bandwidth equal to the size of the file. Instead of the 100x case that we saw with the client-server download model, the bandwidth cost to the seed is effectively constant regardless of the number of downloads. There is traffic to and from the tracker, but this is very, very small when compared to the size of the file.
Ok, so there are some flies in the ointment. As you can imagine, BitTorrent is great for trading large media files (audio and video) that might or might not be legal. This has caused some people to equate “BitTorrent” and “piracy” in their minds. In fact, nothing could be further from the truth and there are many, many ways to use BitTorrent in a totally legal fashion. For example, I use it to do backups of large data files from my Syndic8 server. New versions of the Linux kernel are made available in this way, and I am sure that there are many other great examples. If I was a lawyer I’d mention something about “substantial non-infringing uses” in my defense of this technology.
You should know that the copyright owners of commonly pirated files have taken to posting “poisoned” downloads to the various BitTorrent directories. I don’t know a whole lot about this, but I do know that it is a fairly dirty trick and that it is a severe “monkey wrench” in the system.
Moving right along, one of the more interesting issues in web-scale computing is, literally, scalability. Specifically, the ability for centralized resources to grow in a cost-effective fashion to meet demand. As we can see from the Vista download crunch, client-server downloads don’t scale. When the richest company on the planet cannot afford sufficient resources to accomodate a download, something’s definitely wrong with the model. I hope that my little exposition of the BitTorrent protocol has shown that it can scale.
Now its time for the sales pitch (feel free to skip this part). Amazon’s Simple Storage Service (S3 for short) contains the foundational parts needed to build a scalable download solution. First, the actual bandwidth (20 cents per gigabyte) and storage (15 cents per gigabyte per month) charges are considered low by industry standards. Second, S3 supports a BitTorrent interface. Once a file has been uploaded to S3, it can be exposed as a fully seeded BitTorrent.
Let’s compare the cost for our hypothetical 100 clients to download the Vista binary in the traditional client-server fashion and with S3′s BitTorrent interface. To make it easy, let’s use S3′s bandwidth and storage costs for both cases. To make the math easier let’s assume that the binary is exactly 3 gigabytes in length.
The traditional download would cost 60 cents to upload, 45 cents to store and $60 to download, for a total of $61.05.
The BitTorrent download would again cost 60 cents to upload, 45 cents to store, but just 60 more cents to download, for a total of $1.65. As I noted above, this is the best possible case, where the clients use an optimal/perfect random distribution of initial block requests, so each block is downloaded from the seed just once. Even if the clients aren’t perfect and each block is downloaded, say, 5 times, the total cost is still just $3.45, or just 6% of the traditional cost.
Microsoft has some legitimate qualms about the integrity of the downloaded files. It wouldbe possible for some evil-doer to label almost any random 3 gigabyte pile of data as Vista and to convince people to download it.
Chris and Jake sidestepped this possible impediment by computing and posting the MD5 hash of the file, and asking people to verify the value against what was posted. The MD5 hash guarantees with a high degree of certainty that the downloaded bits are as offered.
So, if Microsoft wanted to join the modern peer-to-peer era, they could allow BitTorrent downloads and simply post MD5 hashes of the data. To make it even easier, they could distribute (via direct download) a small program that would verify the integrity of the data obtained via the torrent. The integrity checker could perform multiple checks on the data to give all parties involved a lot of confidence that the bits came from Microsoft.
I didn’t plan to write a book, but I hope that this has been at least somewhat educational. Leave me a comment, link to this, and let me know what you think.