In Praise of Evolvable Systems

(First appeared in the ACM’s net_worker, 1996)

Why something as poorly designed as the Web became The Next Big Thing, and what that means for the future.

If it were April Fool’s Day, the Net’s only official holiday, and you wanted to design a ‘Novelty Protocol’ to slip by the Internet Engineering Task Force as a joke, it might look something like the Web:
The server would use neither a persistent connection nor a store-and-forward model, thus giving it all the worst features of both telnet and e-mail.
The server’s primary method of extensibility would require spawning external processes, thus ensuring both security risks and unpredictable load.
The server would have no built-in mechanism for gracefully apportioning resources, refusing or delaying heavy traffic, or load-balancing. It would, however, be relatively easy to crash.
Multiple files traveling together from one server to one client would each incur the entire overhead of a new session call.
The hypertext model would ignore all serious theoretical work on hypertext to date. In particular, all hypertext links would be one-directional, thus making it impossible to move or delete a piece of data without ensuring that some unknown number of pointers around the world would silently fail.
The tag set would be absurdly polluted and user-extensible with no central coordination and no consistency in implementation. As a bonus, many elements would perform conflicting functions as logical and visual layout elements.
HTTP and HTML are the Whoopee Cushion and Joy Buzzer of Internet protocols, only comprehensible as elaborate practical jokes. For anyone who has tried to accomplish anything serious on the Web, it’s pretty obvious that of the various implementations of a worldwide hypertext protocol, we have the worst one possible.

Except, of course, for all the others.

MAMMALS VS. DINOSAURS

The problem with that list of deficiencies is that it is also a list of necessities — the Web has flourished in a way that no other networking protocol has except e-mail, not despite many of these qualities but because of them. The very weaknesses that make the Web so infuriating to serious practitioners also make it possible in the first place. In fact, had the Web been a strong and well-designed entity from its inception, it would have gone nowhere. As it enters its adolescence, showing both flashes of maturity and infuriating unreliability, it is worth recalling what the network was like before the Web.

In the early ’90s, Internet population was doubling annually, and the most serious work on new protocols was being done to solve the biggest problem of the day, the growth of available information resources at a rate that outstripped anyone’s ability to catalog or index them. The two big meta-indexing efforts of the time were Gopher, the anonymous ftp index; and the heavy-hitter, Thinking Machines’ Wide Area Information Server (WAIS). Each of these protocols was strong — carefully thought-out, painstakingly implemented, self-consistent and centrally designed. Each had the backing of serious academic research, and each was rapidly gaining adherents.

The electronic world in other quarters was filled with similar visions of strong, well-designed protocols — CD-ROMs, interactive TV, online services. Like Gopher and WAIS, each of these had the backing of significant industry players, including computer manufacturers, media powerhouses and outside investors, as well as a growing user base that seemed to presage a future of different protocols for different functions, particularly when it came to multimedia.

These various protocols and services shared two important characteristics: Each was pursuing a design that was internally cohesive, and each operated in a kind of hermetically sealed environment where it interacted not at all with its neighbors. These characteristics are really flip sides of the same coin — the strong internal cohesion of their design contributed directly to their lack of interoperability. CompuServe and AOL, two of the top online services, couldn’t even share resources with one another, much less somehow interoperate with interactive TV or CD-ROMs.

THE STRENGTH OF WEAKNESS AND EVOLVABILITY

In other words, every contender for becoming an “industry standard” for handling information was too strong and too well-designed to succeed outside its own narrow confines. So how did the Web manage to damage and, in some cases, destroy those contenders for the title of The Next Big Thing? Weakness, coupled with an ability to improve exponentially.

The Web, in its earliest conception, was nothing more than a series of pointers. It grew not out of a desire to be an electronic encyclopedia so much as an electronic Post-it note. The idea of keeping pointers to ftp sites, Gopher indices, Veronica search engines and so forth all in one place doesn’t seem so remarkable now, but in fact it was the one thing missing from the growing welter of different protocols, each of which was too strong to interoperate well with the others.

Considered in this light, the Web’s poorer engineering qualities seem not merely desirable but essential. Despite all strong theoretical models of hypertext requiring bi-directional links, in any heterogeneous system links have to be one-directional, because bi-directional links would require massive coordination in a way that would limit its scope. Despite the obvious advantages of persistent connections in terms of state-tracking and lowering overhead, a server designed to connect to various types of network resources can’t require persistent connections, because that would limit the protocols that could be pointed to by the Web. The server must accommodate external processes or it would limit its extensibility to whatever the designers of the server could put into any given release, and so on.

Furthermore, the Web’s almost babyish SGML syntax, so far from any serious computational framework (Where are the conditionals? Why is the Document Type Description so inconsistent? Why are the browsers enforcement of conformity so lax?), made it possible for anyone wanting a Web page to write one. The effects of this ease of implementation, as opposed to the difficulties of launching a Gopher index or making a CD-ROM, are twofold: a huge increase in truly pointless and stupid content soaking up bandwidth; and, as a direct result, a rush to find ways to compete with all the noise through the creation of interesting work. The quality of the best work on the Web today has not happened in spite of the mass of garbage out there, but in part because of it.

In the space of a few years, the Web took over indexing from Gopher, rendered CompuServe irrelevant, undermined CD-ROMs, and now seems poised to take on the features of interactive TV, not because of its initial excellence but because of its consistent evolvability. It’s easy for central planning to outperform weak but evolvable systems in the short run, but in the long run evolution always has the edge. The Web, jujitsu-like, initially took on the power of other network protocols by simply acting as pointers to them, and then slowly subsumed their functions.

Despite the Web’s ability to usurp the advantages of existing services, this is a story of inevitability, not of perfection. Yahoo and Lycos have taken over from Gopher and WAIS as our meta-indices, but the search engines themselves, as has been widely noted, are pretty lousy ways to find things. The problem that Gopher and WAIS set out to solve has not only not been solved by the Web, it has been made worse. Furthermore, this kind of problem is intractable because of the nature of evolvable systems.

THREE RULES FOR EVOLVABLE SYSTEMS

Evolvable systems — those that proceed not under the sole direction of one centralized design authority but by being adapted and extended in a thousand small ways in a thousand places at once — have three main characteristics that are germane to their eventual victories over strong, centrally designed protocols.

  • Only solutions that produce partial results when partially implemented can succeed. The network is littered with ideas that would have worked had everybody adopted them. Evolvable systems begin partially working right away and then grow, rather than needing to be perfected and frozen. Think VMS vs. Unix, cc:Mail vs. RFC-822, Token Ring vs. Ethernet.
  • What is, is wrong. Because evolvable systems have always been adapted to earlier conditions and are always being further adapted to present conditions, they are always behind the times. No evolving protocol is ever perfectly in sync with the challenges it faces.
  • Finally, Orgel’s Rule, named for the evolutionary biologist Leslie Orgel — “Evolution is cleverer than you are”. As with the list of the Web’s obvious deficiencies above, it is easy to point out what is wrong with any evolvable system at any point in its life. No one seeing Lotus Notes and the NCSA server side-by-side in 1994 could doubt that Lotus had the superior technology; ditto ActiveX vs. Java or Marimba vs. HTTP. However, the ability to understand what is missing at any given moment does not mean that one person or a small central group can design a better system in the long haul.

Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially. It’s dinosaurs vs. mammals, and the mammals win every time. The Web is not the perfect hypertext protocol, just the best one that’s also currently practical. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong and ultimately better designed than its competition.

LESSONS FOR THE FUTURE

And the Web is just a dress rehearsal. In the next five years, three enormous media — telephone, television and movies — are migrating to digital formats: Voice Over IP, High-Definition TV and Digital Video Disc, respectively. As with the Internet of the early ’90s, there is little coordination between these efforts, and a great deal of effort on the part of some of the companies involved to intentionally build in incompatibilities to maintain a cartel-like ability to avoid competition, such as DVD’s mutually incompatible standards for different continents.

And, like the early ’90s, there isn’t going to be any strong meta-protocol that pushes Voice Over IP, HDTV and DVD together. Instead, there will almost certainly be some weak ‘glue’ or ‘scaffold’ protocol, perhaps SMIL (Synchronized Multimedia Integration Language) or another XML variant, to allow anyone to put multimedia elements together and synch them up without asking anyone else’s permission. Think of a Web page with South Park in one window and a chat session in another, or The Horse Whisperer running on top with a simultaneous translation into Serbo-Croatian underneath, or clickable pictures of merchandise integrated with a salesperson using a Voice Over IP connection, ready to offer explanations or take orders.

In those cases, the creator of such a page hasn’t really done anything ‘new’, as all the contents of those pages exist as separate protocols. As with the early Web, the ‘glue’ protocol subsumes the other protocols and produces a kind of weak integration, but weak integration is better than no integration at all, and it is far easier to move from weak integration to strong integration than from none to some. In 5 years, DVD, HDTV, voice-over-IP, and Java will all be able to interoperate because of some new set of protocols which, like HTTP and HTML, is going to be weak, relatively unco-ordinated, imperfectly implemented and, in the end, invincible.

The Icarus Effect

First published in ACM, 11/97.

A funny thing happened on the way from the router. 

We are all such good students of Moore’s Law, the notion that processor speeds will double every year and a half or so, that in any digital arena, we have come to treat it as our ‘c’, our measure of maximum speed. Moore’s Law, the most famously accurate prediction in the history of computer science, is treated as a kind of inviolable upper limit: the implicit idea is that since nothing can grow faster than chip speed, and chip speed is doubling evey 18 months, that necessarily sets the pace for everything else we do. 

Parallelling Moore’s law is the almost equally rapid increase in storage density, with the amount of data accessible on any square inch of media growing by a similar amount. These twin effects are contantly referenced in ‘gee-whiz’ articles in the computer press: “Why, just 7 minutes ago, a 14Mhz chip with a 22K disk cost eleventy-seven thousand dollars, and now look! 333 Mhz and a 9 gig drive for $39.95!” 

All this breath-taking period doubling makes these measurements into a kind of physics of our world, where clock speeds and disk densities become our speed of light and our gravity – the boundaries that determine the behavior for everything else. All this is well and good for stand-alone computers, but once you network them, a funny thing happens on the way from the router: this version of the speed of light is exceeded, and from a most improbable quarter. 

It isn’t another engineering benchmark that is outstripping the work at Intel and IBM, its the thing that often gets shortest shrift in the world of computer science – the users of the network. 

Chip speeds and disk densities may be doubling every 18 months, but network population is doubling roughly annually, half again as fast as either of those physical measurements. Network traffic, measured in packets, is doubling semi-annually (last year MAE-East, a major East Coast internet interconnect point) was seeing twice the load every 4 months, or an 8-fold annualized increase). 

There have always been internal pressures for better, faster computers – weather modelling programs and 3-D rendering, to name just two, can always consume more speed, more RAM, more disk – but the Internet, and particularly the Web and its multi-media cousins of java applications and streaming media, present the first externalpressure on computers, where Moore’s law simply can’t keep up and will never catch up. The network can put more external pressure on individual computers than they handle, now and for the forseeable future. 

IF YOU SUCCEED, YOU FAIL. 

This leads to a curious situation on the Internet, where any new service risks the usual failure if there is not enough traffic, but also risks failure if there is too much traffic. In a literal update of Yogi Berra’s complaint about a former favorite hang-out, “Nobody goes there anymore. Its too crowded”, many of the Web sites covering the 1996 US Presidential election crashed on election night, the time when they would have been most valuable, because so many people thought they were a good idea. We might dub this the ‘Icarus Effect’ – fly too high and you crash. 

What makes this ‘Icarus Effect’ more than just an engineering oversight is the relentless upward pressure on both population and traffic – given the same scenario in the 2000 election, computers will be roughly 8 times better equipped to handle the same traffic, but they will be asked to handle roughly 16 times the traffic. (More traffic than that even, much more, if the rise in number of users is accompanied by the same rise in time spent on the net by each user that we’re seeing today.) 

This is obviously an untenable situation – computing limits can’t be allowed to force entrepreneurs and engineers to hope for only middling success, and yet everywhere I go, I see companies excercising caution whenever they are comtemplating making any moves which will increase traffic, even if that would be make for a better site or service. 

FIRST, FIX THE PROBLEM. NEXT, EMBRACE FAILURE. 

We know what happens when the need for computing power outstrips current technology – its a two-step process, which first beefs up the current offering by improving performance and fighting off failure, and then, when that line of development hits a wall (as it inevitably does), embracing the imperfection of individual parts and adopting parallel development to fill the gap. 

Ten years ago, Wall St. had a similar problem to the Web today, except it wasn’t web sites and traffic, it was data and disk failure. When you’re moving trillions of dollars around the world in real time, a disk drive dying can be a catastrophic loss, and a backup that can get online ‘in a few hours’ does little to soften the blow. The first solution is to buy bigger and better disk drives, moving the Mean Time Between Failure from say, 10,000 hours to 30,000 hours. This is certainly better, but in the end, the result is simply spreading the pain of catastrophic failure over a longer average period of time. When the failure does come, it is the same catastrophe as before. 

Even more disheartening, the price/performance curve is exponential, putting the necessary order-of-magnitude improvements out of reach. It would cost far more to go from 30K/hrs MTBF to 90K/hrs than it did to go from 10 to 30, and going from 90 to 270 would be unthinkably expensive. 

Enter the RAID, the redundant array of inexpensive disks. Instead of hoping for the Platonic ‘ideal disk’, the RAID accepts that each disk is prone to failure, howsobeit rare, and simply groups them together in such a way that the failure of any one disk isn’t catastrophic, because the other disks contain all of the failed disk’s data in a matrix shared among the remaining disks. As long as a new working disk is put in place of the failed drive, the theoretical MTBF of a RAID made of ordinary disks, where two disks failed at precisely the same time, would be something like 900 million hours. 

A similar path of development happened with the overtaking of the supercomputer by the parallel processor, where the increasingly baroque designs of single CPU supercomputers was facing the same uphill climb that building single reliable disks did, and where the notion of networking cheaper, slower CPUs proved a way out that bottleneck. 

THE WEB HITS THE WALL. 

I believe that with the Web we are now seeing the beginning of one of those uphill curves – there is no way that chip speed and storage density can keep up with exploding user base, and this problem will not abate in the forseeable future. Computers, individual computers, are now too small, slow and weak to handle the demand of a popular web site, and the current solution to the demands of user traffic – buy a bigger computer – are simply postponing the day when those solutions also fail. 

What I can see in the outlines of in current web site development is what might be called a ‘RAIS’ strategy – redundant arrays of inexpensive servers. Just as RAIDs accept the inadequacy of any individual disk, a RAIS would accept that servers crash when overloaded, and that when you are facing 10% more traffic than you can handle, having to buy a much bigger and more expensive server is a lousy solution. RAIS architecture comes much closer to the necessary level of granularity for dealing with network traffic increases. 

If you were to host a Web site on 10 Linux boxes instead of one big commercial Unix server, you could react to a 10% increase in traffic with 10% more server for 10% more money. Furthermore, one server dying would only inconvenience the users who were mid-request on that particular box, and they could restart their work on one of the remaining servers immediately. Contrast this with the current norm, a 100% failure for the full duration of a restart in cases where a site is served by a single server. 

The initial RAISs are here in sites like C|NET and ESPN, where round-robin DNS configurations spread the load across multiple boxes. However, these solutions are just the beginning – their version of redundancy is often simply to mirror copies of the Web server. A true RAIS architecture will spread not only versions of the site, but will also spread functionality: images, a huge part of network traffic, are ‘read only’ – a server or group of servers optimized to handle only images could be served from WORM drives and serve the most popular images from RAM. Incoming CGI data, on the other hand, can potentially be ‘write only’ simply recording information on removable medai which can be imported into a database at a later date, on another computer, and so on. 

This kind of development will ultimately dissolve the notion of discrete net servers, and will lead to server networks, where an individual network address does not map to a physical computer but rather to a notional source of data. Requests to and from this IP address will actually be handled not by individual computers, whether singly or grouped into clusters of mirroring machines, but by a single-address network, a kind of ecosystem of networked processors, disks and other devices, each optimized for handling certain aspects of the request – database lookups, image serving, redirects, etc. Think of the part of the site that handles database requests as an organ, specialized to its particular task, rather than as a seperate organism pressed into that particular service. 

THE CHILD IS FATHER TO THE MAN 

It has long been observed that in the early days of ARPANet, packet switching started out by piggy-backing on the circuit-switched network, only to overtake it in total traffic, which will happen this year, and almost certainly to subsume it completely within a decade. I beleive a similar process is happening to computers themselves: the Internet is the first place where we can see that cumulative user need outstrips the power of individual computers, even taking Moore’s law into account, but it will not be the last. In the early days, computers were turned into networks, with the cumulative power of the net rising with the number of computers added to it. 

In a situation similar to the packet/circuit dichotomy, I believe that we are witnessing another such tipping point, where networks are brought into individual computers, where all computing resources, whether cycles, RAM, storage, whatever, are mediated by a network instead of being bundled into discrete boxes. This may have been the decade where the network was the computer, but in the next decade the computer will be the network, and so will everything else.