Playfulness in 3-D Spaces

First published in ACM, 03/98.

“Put the trodes on and [you] were out there, all the data in the world stacked up like a big neon city, so you could cruise around and have a kind of grip on it, visually anyway, because if you didn’t, it was too complicated, trying to find your way to a particular piece of data you needed.” 

Every six months or so, I find a reason to survey the state of VRML (Virtual Reality Modeling Language; more info at http://www.vrml.org/) and 3-D ‘shared worlds.’ I started these semi-annual forays in 1994, while researching a book, “Voices From the Net” (Ziff-Davis, 1994) , about online communities. In that book, I concluded that the classic model of cyberspace, quoted above from William Gibson’s “Mona Lisa Overdrive”, was not possible in this century (a bit of a grandiose cheat, since the century amounted to but a half-dozen years by that time), if at all. 

Woe betide the prognosticator who puts a far-off date on computer revolutions — as the visual components of networking grew by leaps and bounds, it looked like my predictions might be off the mark. I kept hearing that VRML was at the edge of usability, and every now and again would go exploring to see for myself. The improvements from year to year were impressive, although they were due in part to the rising tide of clock speed, which lifts all the boats. However, they never seemed any closer to being something that someone would use for anything other than a “Now, instead of just looking up the info you want, you can actually fly through your company’s parts catalog!” trade show demo. It was always better than last time, but never enough of a jump to actually be, in Alan Kay’s famous words, “good enough to criticize.” 

The Streets of San Francisco

VRML applications are the videophones of the Net — something that is possible in theory, but which excites designers far more than it excites potential users. At one point, the height of VRML applications was a 3-D Yellow Pages for San Francisco, the one town whose 3-D shape was uniquely ill-suited to such an already questionable endeavor (“Now, instead of simply looking up your dry cleaner’s phone number, you can search for it over hill and dale!”). Thus, I concluded, my Nostradamus-like powers were in no danger of being upended by unkind reality. Any real example of 3-D space that could be shared by multiple users and didn’t involve an unacceptable trade-off between appearance and speed, any 3-D space that had any sort of visceral effect on the user, was still years away. 

There was only one problem with that conclusion: I was wrong. 

I have now found what I was looking for, and found it, to my chagrin, years late. A networked 3-D engine with excellent shared space capabilities, open source code, a well-published world-building specification, an active development community, and a project showing the kind of rapid improvement year after year that comes from the best massively parallel design efforts, such as the Web. 

It is called Quake. 

I know, I know. As they say on Usenet, “Quake? That’s so 5 minutes ago!” 

Quake (http://www.idsoftware.com), for those of you like me who don’t follow the PC gaming world, is a “First Person Action” game, meaning roughly that the screen is your view of the world, and that you pilot your digital self, or “avatar,” through that world by maneuvering through what you see on the screen. I missed out on Quake when it was launched in the mid-’90s because as a general all-around Serious Person, I don’t play games, in particular games whose ideal customer is less than half my age. I have now seen the error of my ways. Quake solves the shared 3-D space problem far better than any VRML implementation — not despite being a game, but because it is a game. 

The Quake 3-D engine is for something, while VRML is a technology in search of a use. The Quake engine’s original use is immaterial; once you have a world-modeling protocol that lets you wade through a subterranean stream, picking up shotgun shells and blowing up other people who are picking up rocket launchers and trying to blow up you, you’ve got the whole ball of wax — spaces, movement, objects, avatars, even a simple physics. It would be easier to turn that into a tool for shared corporate meeting spaces than it would be to build the same thing in VRML, because once a tool is good for one thing, it’s easy to see how to make it good for another thing. 

On The Cutting Edge

This is one of the first axioms of creativity in this realm: Solve something first, then apply that solution elsewhere. It doesn’t matter if the first stone was sharpened to kill animals or cut wood, because once you have the principle down, making a knife into a spear or a scimitar or a scalpel is just application-level development. I have always been skeptical of any real possibility of viscerally engaging 3-D spaces, but I believe that I see in Quake — and its subsequent imitators and descendants — the sharpened stone, the thing that indicates the beginning of a new way of using the network. 

Quake is the real thing. It’s a local solution to a local problem (how to make a 3-D space realistic enough so that you can accurately aim a double-barreled shotgun at a chainsaw-wielding ogre, while both of you are running) that has global ramifications. I have seen a lot of 3-D spaces over the years, and I have never run into anything that gives as palpable a sense of place as this does. (Before the objection is raised that Quake is software and VRML is a protocol, consider that Quake, like VRML, always runs in client-server mode and reads external data files, even locally, and that the Quake client is one-fifth the size of the benchmark VRML client.) 

After playing with Quake for only a few hours, I found myself in possession of an absolute and rigorous sense of the spaces it contained: There was a moment, while I was off thinking about something else altogether, when I realized that in one particularly sticky part of the game, I could jump off a ramp and swim through some water under a trap located in a hallway. This was as matter-of-fact a thought as if I’d realized I could bring a big package up the service elevator at my office; in other words, I had a large, accurate map of an entire space in my head, not from having studied it, but from having experienced it. I had derived, for the first time ever, a sense of space — not an ‘as if’ sense or a sense of a metaphorical cyberspace, but of real space — from a screen. 

Here’s why I think this matters: Though Quake isn’t necessarily the right solution for the 3-D shared space problem, it feels right in a way that nothing in the VRML world does. Once you’ve seen it in action, much of what’s wrong with VRML snaps into focus: Quake does something well instead of many things poorly. 

This matters more than I can say. Despite all the theoretical underpinnings we now use as a basis for our work as computing reaches the half-century mark and networking turns 30, world-shifting developments still often come from practical solutions to simple problems. 

The VRML community has failed to come up with anything this compelling — not despite the community’s best intentions, but because of them. Every time VRML practitioners approach the problem of how to represent space on the screen, they have no focused reason to make any particular trade-off of detail versus rendering speed, or making objects versus making spaces, because VRML isn’t for anything except itself. Many times, having a particular, near-term need to solve brings a project’s virtues into sharp focus, and gives it enough clarity to live on its own. Quake puts tools in the hands of the users. 

Id software, producers of Quake, started allowing its games to be edited by users, and published a world-building spec so that users could write their own editors as well. As with HTML, one can build a variety of 3-D editors that can create a Quake world by simply writing to the spec. This has led to a riot of sophisticated tools for creating Quake “levels” like BSP and WorldCraft that would make any VRML partisan drool, and has spawned a community of user-engineers who are creating compelling worlds, objects, and sometimes whole alternate worlds at a pace and with an intensity that should make any VRML adherent faint. VRML is older than Quake, and yet it has fewer tools and fewer users, and still exists at the stage where the best examples of the technology are produced by the companies selling it, not the users using it. Quake has a simple model of compliance. 

The problem of representing space, surfaces, objects and worlds is almost infinitely vast. There was for some time in the AI community a proposal to develop a ‘naive physics’ where every aspect of physical organization of the world would be worked out in a way that would let computers deduce things about their surroundings. For instance, the naive physics of liquids included all sorts of rules about flow and how it was affected by being bounded by floors, walls and ceilings. This kind of elaborate modeling has been inherited in spirit by VRML, which is so complex that rendering a simple piece of furniture can bog down an average desktop computer. 

Quake, on the other hand, adopts a breathtakingly simple attitude toward physics. Quake has shapes like blocks, wedges and columns, and it has materials like stone, metal and water. If you make a freestanding column of water, so be it. The column will appear in any space you create; you can walk into it, swim to the top, walk out at the top and then fall down again. It does this simply because a physics of liquids is too complex to solve right now, and because Quake does not try to save users from themselves by adding layers of interpretation outside the engine itself. If it compiles, it complies. 

Hacking X for Y 

This willingness to allow users to do stupid things, on the assumption that they will learn more quickly if the software doesn’t try to second-guess them, has given Quake a development curve that looks like HTML in the early days, where a user’s first efforts were often lousy but the progress between lousy and good was not blocked by the software itself. Quake encourages experimentation and incrementalism in the tradition of the best tools out there. 

I acknowledge, however, that the Quake format lacks the rigorous throughput into VRML — the comparison here is not architecture to architecture, but effect to effect. I am not seriously suggesting that Quake or one of its children like Quake II (http://www.idsoftware.com), Half-Life (http://www.valvesoftware.com) or Unreal (http://www.unreal.com) be used as-is as a platform from which to build other 3-D shared spaces (well, maybe I am suggesting that half-seriously). More important, I am suggesting that there is a problem because the 3-D game people aren’t talking enough to the VRML people, and vice versa. Networking has come so far so fast partly because we have been willing to say, over and over again, “Well, we never thought of using X for Y, but now that you mention it, why not?” 

The Net is too young, and the 3-D space problem too tricky, to shoot for perfection right now. What worries me most about my belated discovery of Quake is that in years of looking at 3-D spaces, no one ever mentioned it in the same breath as VRML and I never thought of it myself, simply because of the stigma attached to things that are “merely” games. Many of us, myself very much included, have come to believe our own press about the seriousness of our current endeavors. We have been adversely affected by the commercial applications of the network, and have weaned ourselves off the playfulness that has always produced insights and breakthroughs. 

It will be a terrible loss if those of us doing this work start to take ourselves too seriously. Round the clock and round the world, there are hundreds of people chasing one another through rooms and tunnels and up and down staircases that don’t exist anywhere in the real world, a model of electronic space closer to Gibson’s ideal than anything else that exists out there, and we shouldn’t write those people off. Their game playing may look frivolous to us Serious Persons, but they know a lot about 3-D shared worlds, and it will be a terrible waste of everybody’s time if we don’t find a way to tap into that.

In Praise of Evolvable Systems

(First appeared in the ACM’s net_worker, 1996)

Why something as poorly designed as the Web became The Next Big Thing, and what that means for the future.

If it were April Fool’s Day, the Net’s only official holiday, and you wanted to design a ‘Novelty Protocol’ to slip by the Internet Engineering Task Force as a joke, it might look something like the Web:
The server would use neither a persistent connection nor a store-and-forward model, thus giving it all the worst features of both telnet and e-mail.
The server’s primary method of extensibility would require spawning external processes, thus ensuring both security risks and unpredictable load.
The server would have no built-in mechanism for gracefully apportioning resources, refusing or delaying heavy traffic, or load-balancing. It would, however, be relatively easy to crash.
Multiple files traveling together from one server to one client would each incur the entire overhead of a new session call.
The hypertext model would ignore all serious theoretical work on hypertext to date. In particular, all hypertext links would be one-directional, thus making it impossible to move or delete a piece of data without ensuring that some unknown number of pointers around the world would silently fail.
The tag set would be absurdly polluted and user-extensible with no central coordination and no consistency in implementation. As a bonus, many elements would perform conflicting functions as logical and visual layout elements.
HTTP and HTML are the Whoopee Cushion and Joy Buzzer of Internet protocols, only comprehensible as elaborate practical jokes. For anyone who has tried to accomplish anything serious on the Web, it’s pretty obvious that of the various implementations of a worldwide hypertext protocol, we have the worst one possible.

Except, of course, for all the others.

MAMMALS VS. DINOSAURS

The problem with that list of deficiencies is that it is also a list of necessities — the Web has flourished in a way that no other networking protocol has except e-mail, not despite many of these qualities but because of them. The very weaknesses that make the Web so infuriating to serious practitioners also make it possible in the first place. In fact, had the Web been a strong and well-designed entity from its inception, it would have gone nowhere. As it enters its adolescence, showing both flashes of maturity and infuriating unreliability, it is worth recalling what the network was like before the Web.

In the early ’90s, Internet population was doubling annually, and the most serious work on new protocols was being done to solve the biggest problem of the day, the growth of available information resources at a rate that outstripped anyone’s ability to catalog or index them. The two big meta-indexing efforts of the time were Gopher, the anonymous ftp index; and the heavy-hitter, Thinking Machines’ Wide Area Information Server (WAIS). Each of these protocols was strong — carefully thought-out, painstakingly implemented, self-consistent and centrally designed. Each had the backing of serious academic research, and each was rapidly gaining adherents.

The electronic world in other quarters was filled with similar visions of strong, well-designed protocols — CD-ROMs, interactive TV, online services. Like Gopher and WAIS, each of these had the backing of significant industry players, including computer manufacturers, media powerhouses and outside investors, as well as a growing user base that seemed to presage a future of different protocols for different functions, particularly when it came to multimedia.

These various protocols and services shared two important characteristics: Each was pursuing a design that was internally cohesive, and each operated in a kind of hermetically sealed environment where it interacted not at all with its neighbors. These characteristics are really flip sides of the same coin — the strong internal cohesion of their design contributed directly to their lack of interoperability. CompuServe and AOL, two of the top online services, couldn’t even share resources with one another, much less somehow interoperate with interactive TV or CD-ROMs.

THE STRENGTH OF WEAKNESS AND EVOLVABILITY

In other words, every contender for becoming an “industry standard” for handling information was too strong and too well-designed to succeed outside its own narrow confines. So how did the Web manage to damage and, in some cases, destroy those contenders for the title of The Next Big Thing? Weakness, coupled with an ability to improve exponentially.

The Web, in its earliest conception, was nothing more than a series of pointers. It grew not out of a desire to be an electronic encyclopedia so much as an electronic Post-it note. The idea of keeping pointers to ftp sites, Gopher indices, Veronica search engines and so forth all in one place doesn’t seem so remarkable now, but in fact it was the one thing missing from the growing welter of different protocols, each of which was too strong to interoperate well with the others.

Considered in this light, the Web’s poorer engineering qualities seem not merely desirable but essential. Despite all strong theoretical models of hypertext requiring bi-directional links, in any heterogeneous system links have to be one-directional, because bi-directional links would require massive coordination in a way that would limit its scope. Despite the obvious advantages of persistent connections in terms of state-tracking and lowering overhead, a server designed to connect to various types of network resources can’t require persistent connections, because that would limit the protocols that could be pointed to by the Web. The server must accommodate external processes or it would limit its extensibility to whatever the designers of the server could put into any given release, and so on.

Furthermore, the Web’s almost babyish SGML syntax, so far from any serious computational framework (Where are the conditionals? Why is the Document Type Description so inconsistent? Why are the browsers enforcement of conformity so lax?), made it possible for anyone wanting a Web page to write one. The effects of this ease of implementation, as opposed to the difficulties of launching a Gopher index or making a CD-ROM, are twofold: a huge increase in truly pointless and stupid content soaking up bandwidth; and, as a direct result, a rush to find ways to compete with all the noise through the creation of interesting work. The quality of the best work on the Web today has not happened in spite of the mass of garbage out there, but in part because of it.

In the space of a few years, the Web took over indexing from Gopher, rendered CompuServe irrelevant, undermined CD-ROMs, and now seems poised to take on the features of interactive TV, not because of its initial excellence but because of its consistent evolvability. It’s easy for central planning to outperform weak but evolvable systems in the short run, but in the long run evolution always has the edge. The Web, jujitsu-like, initially took on the power of other network protocols by simply acting as pointers to them, and then slowly subsumed their functions.

Despite the Web’s ability to usurp the advantages of existing services, this is a story of inevitability, not of perfection. Yahoo and Lycos have taken over from Gopher and WAIS as our meta-indices, but the search engines themselves, as has been widely noted, are pretty lousy ways to find things. The problem that Gopher and WAIS set out to solve has not only not been solved by the Web, it has been made worse. Furthermore, this kind of problem is intractable because of the nature of evolvable systems.

THREE RULES FOR EVOLVABLE SYSTEMS

Evolvable systems — those that proceed not under the sole direction of one centralized design authority but by being adapted and extended in a thousand small ways in a thousand places at once — have three main characteristics that are germane to their eventual victories over strong, centrally designed protocols.

  • Only solutions that produce partial results when partially implemented can succeed. The network is littered with ideas that would have worked had everybody adopted them. Evolvable systems begin partially working right away and then grow, rather than needing to be perfected and frozen. Think VMS vs. Unix, cc:Mail vs. RFC-822, Token Ring vs. Ethernet.
  • What is, is wrong. Because evolvable systems have always been adapted to earlier conditions and are always being further adapted to present conditions, they are always behind the times. No evolving protocol is ever perfectly in sync with the challenges it faces.
  • Finally, Orgel’s Rule, named for the evolutionary biologist Leslie Orgel — “Evolution is cleverer than you are”. As with the list of the Web’s obvious deficiencies above, it is easy to point out what is wrong with any evolvable system at any point in its life. No one seeing Lotus Notes and the NCSA server side-by-side in 1994 could doubt that Lotus had the superior technology; ditto ActiveX vs. Java or Marimba vs. HTTP. However, the ability to understand what is missing at any given moment does not mean that one person or a small central group can design a better system in the long haul.

Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially. It’s dinosaurs vs. mammals, and the mammals win every time. The Web is not the perfect hypertext protocol, just the best one that’s also currently practical. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong and ultimately better designed than its competition.

LESSONS FOR THE FUTURE

And the Web is just a dress rehearsal. In the next five years, three enormous media — telephone, television and movies — are migrating to digital formats: Voice Over IP, High-Definition TV and Digital Video Disc, respectively. As with the Internet of the early ’90s, there is little coordination between these efforts, and a great deal of effort on the part of some of the companies involved to intentionally build in incompatibilities to maintain a cartel-like ability to avoid competition, such as DVD’s mutually incompatible standards for different continents.

And, like the early ’90s, there isn’t going to be any strong meta-protocol that pushes Voice Over IP, HDTV and DVD together. Instead, there will almost certainly be some weak ‘glue’ or ‘scaffold’ protocol, perhaps SMIL (Synchronized Multimedia Integration Language) or another XML variant, to allow anyone to put multimedia elements together and synch them up without asking anyone else’s permission. Think of a Web page with South Park in one window and a chat session in another, or The Horse Whisperer running on top with a simultaneous translation into Serbo-Croatian underneath, or clickable pictures of merchandise integrated with a salesperson using a Voice Over IP connection, ready to offer explanations or take orders.

In those cases, the creator of such a page hasn’t really done anything ‘new’, as all the contents of those pages exist as separate protocols. As with the early Web, the ‘glue’ protocol subsumes the other protocols and produces a kind of weak integration, but weak integration is better than no integration at all, and it is far easier to move from weak integration to strong integration than from none to some. In 5 years, DVD, HDTV, voice-over-IP, and Java will all be able to interoperate because of some new set of protocols which, like HTTP and HTML, is going to be weak, relatively unco-ordinated, imperfectly implemented and, in the end, invincible.