Communities, Audiences, and Scale

April 6, 2002

Prior to the internet, the differences in communication between community and audience was largely enforced by media — telephones were good for one-to-one conversations but bad for reaching large numbers quickly, while TV had the inverse set of characteristics. The internet bridged that divide, by providing a single medium that could be used to address either communities or audiences. Email can be used for conversations or broadcast, usenet newsgroups can support either group conversation or the broadcast of common documents, and so on. Most recently the rise of software for “The Writable Web”, principally weblogs, is adding two-way features to the Web’s largely one-way publishing model.

With such software, the obvious question is “Can we get the best of both worlds? Can we have a medium that spreads messages to a large audience, but also allows all the members of that audience to engage with one another like a single community?” The answer seems to be “No.”

Communities are different than audiences in fundamental human ways, not merely technological ones. You cannot simply transform an audience into a community with technology, because they assume very different relationships between the sender and receiver of messages.

Though both are held together in some way by communication, an audience is typified by a one-way relationship between sender and receiver, and by the disconnection of its members from one another — a one-to-many pattern. In a community, by contrast, people typically send and receive messages, and the members of a community are connected to one another, not just to some central outlet — a many-to-many pattern [1]. The extreme positions for the two patterns might be visualized as a broadcast star where all the interaction is one-way from center to edge, vs. a ring where everyone is directly connected to everyone else without requiring a central hub.

As a result of these differences, communities have strong upper limits on size, while audiences can grow arbitrarily large. Put another way, the larger a group held together by communication grows, the more it must become like an audience — largely disconnected and held together by communication traveling from center to edge — because increasing the number of people in a group weakens communal connection. 

The characteristics we associate with mass media are as much a product of the mass as the media. Because growth in group size alone is enough to turn a community into an audience, social software, no matter what its design, will never be able to create a group that is both large and densely interconnected. 

Community Topology

This barrier to the growth of a single community is caused by the collision of social limits with the math of large groups: As group size grows, the number of connections required between people in the group exceeds human capacity to make or keep track of them all.

A community’s members are interconnected, and a community in its extreme position is a “complete” network, where every connection that can be made is made. (Bob knows Carol, Ted, and Alice; Carol knows Bob, Ted, and Alice; and so on.) Dense interconnection is obviously the source of a community’s value, but it also increases the effort that must be expended as the group grows. You can’t join a community without entering into some sort of mutual relationship with at least some of its members, but because more members requires more connections, these coordination costs increase with group size.

For a new member to connect to an existing group in a complete fashion requires as many new connections as there are group members, so joining a community that has 5 members is much simpler than joining a community that has 50 members. Furthermore, this tradeoff between size and the ease of adding new members exists even if the group is not completely interconnected; maintaining any given density of connectedness becomes much harder as group size grows. As new members join, it creates either more effort or lowers the density of connectedness, or both, thus jeopardizing the interconnection that makes for community. [2]

As group size grows past any individual’s ability to maintain connections to all members of a group, the density shrinks, and as the group grows very large (>10,000) the number of actual connections drops to less than 1% of the potential connections, even if each member of the group knows dozens of other members. Thus growth in size is enough to alter the fabric of connection that makes a community work. (Anyone who has seen a discussion group or mailing list grow quickly is familiar with this phenomenon.)

An audience, by contrast, has a very sparse set of connections, and requires no mutuality between members. Thus an audience has no coordination costs associated with growth, because each new member of an audience creates only a single one-way connection. You need to know Yahoo’s address to join the Yahoo audience, but neither Yahoo nor any of its other users need to know anything about you. The disconnected quality of an audience that makes it possible for them to grow much (much) larger than a connected community can, because an audience can always exist at the minimum number of required connection (N connections for N users).

The Emergence of Audiences in Two-way Media

Prior to the internet, the outbound quality of mass media could be ascribed to technical limits — TV had a one-way relationship to its audience because TV was a one-way medium. The growth of two-way media, however, shows that the audience pattern re-establishes itself in one way or another — large mailing lists become read-only, online communities (eg. LambdaMOO, WELL, ECHO) eventually see their members agitate to stem the tide of newcomers, users of sites like slashdot see fewer of their posts accepted. [3]

If real group engagement is limited to groups numbering in the hundreds or even the thousands [4], then the asymmetry and disconnection that characterizes an audience will automatically appear as a group of people grows in size, as many-to-many becomes few-to-many and most of the communication passes from center to edge, not edge to center or edge to edge. Furthermore, the larger the group, the more significant this asymmetry and disconnection will become: any mailing list or weblog with 10,000 readers will be very sparsely connected, no matter how it is organized. (This sparse organization of the larger group can of course encompass smaller, more densely clustered communities.)

More Is Different

Meanwhile, there are 500 million people on the net, and the population is still growing. Anyone who wants to reach even ten thousand of those people will not know most of them, nor will most of them know one another. The community model is good for spreading messages through a relatively small and tight knit group, but bad for reaching a large and dispersed group, because the tradeoff between size and connectedness dampens message spread well below the numbers that can be addressed as an audience.

It’s significant that the only two examples we have of truly massive community spread of messages on the internet — email hoaxes and Outlook viruses — rely on disabling the users’ disinclination to forward widely, either by a social or technological trick. When something like All Your Base or OddTodd bursts on the scene, the moment of its arrival comes not when it spreads laterally from community to community, but when that lateral spread attracts the attention of a media outlet [5].

No matter what the technology, large groups are different than small groups, because they create social pressures against community organization that can’t be trivially overcome. This is a pattern we have seen often, with mailing lists, BBSes, MUDs, usenet, and most recently with weblogs, the majority of which reach small and tightly knit groups, while a handful reach audiences numbering in the tens or even hundreds of thousands (e.g. andrewsullivan.com.)

The inability of a single engaged community to grow past a certain size, irrespective of the technology, will mean that over time, barriers to community scale will cause a separation between media outlets that embrace the community model and stay small, and those that adopt the publishing model in order to accommodate growth. This is not to say that all media that address ten thousand or more people at once are identical; having a Letters to the Editor column changes a newspaper’s relationship to its audience, even though most readers never write, most letters don’t get published, and most readers don’t read every letter.

Though it is tempting to think that we can somehow do away with the effects of mass media with new technology, the difficulty of reaching millions or even tens of thousands of people one community at a time is as much about human wiring as it is about network wiring. No matter how community minded a media outlet is, needing to reach a large group of people creates asymmetry and disconnection among that group — turns them into an audience, in other words — and there is no easy technological fix for that problem. 

Like the leavening effects of Letters to the Editor, one of the design challenges for social software is in allowing groups to grow past the limitations of a single, densely interconnected community while preserving some possibility of shared purpose or participation, even though most members of that group will never actually interact with one another.


Footnotes

1. Defining community as a communicating group risks circularity by ignoring other, more passive uses of the term, as with “the community of retirees.” Though there are several valid definitions of community that point to shared but latent characteristics, there is really no other word that describes a group of people actively engaged in some shared conversation or task, and infelicitous turns of phrase like ‘engaged communicative group’ are more narrowly accurate, but fail to capture the communal feeling that arises out of such engagement. For this analysis, ‘community’ is used as a term of art to refer to groups whose members actively communicate with one another. [Return]

2. The total number of possible connections in a group grows quadratically, because each member of a group must connect to every other member but themselves. In general, therefore, a group with N members has N x (N-1) connections, which is the same as N2 – N. If Carol and Ted knowing one another count as a single relationship, there are half as many relationships as connections, so the relevant number is (N2 – N)/2.

Because these numbers grow quadratically, every 10-fold increase in group size creates a 100-fold increase in possible connections; a group of ten has about a hundred possible connections (and half as many two-way relationships), a group of a hundred has about ten thousand connections, a thousand has about a million, and so on. The number of potential connections in a group passes a billion as group size grows past thirty thousand. [Return]

3. Slashdot is suffering from one of the common effects of community growth — the uprising of users objecting to the control the editors exert over the site. Much of the commentary on this issue, both at slashdot and on similar sites such as kuro5hin, revolves around the twin themes of understanding that the owners and operators of slashdot can do whatever they like with the site, coupled with a surprisingly emotional sense of betrayal that the community control, in the form of moderation. 

(More at kuro5hin and slashdot. [Return]

4. In Grooming, Gossip, and the Evolution of Language (ISBN 0674363361), the primatologist Robin Dunbar argues that humans are adapted for social group sizes of around 150 or less, a size that shows up in a number of traditional societies, as well as in present day groups such as the Hutterite religious communities. Dunbar argues that the human brain is optimized for keeping track of social relationships in groups small than 150, but not larger. [Return]

5. In The Tipping Point (ISBN 0316346624), Malcolm Gladwell detailed the surprising spread of Hush Puppies shoes in the mid the ’90s, from their adoption by a group of cool kids in the East Village to a national phenomenon. The breakout moment came when Hush Puppies were adopted by fashion designers, with one designer going so far as to place a 25 foot inflatable Hush Puppy mascot on the roof of his boutique in LA. The cool kids got the attention of the fashion designers, but it was the fashion designers who got the attention of the world, by taking Hush Puppies beyond the communities in which it started and spreading them outwards to an audience that looked to the designers. [Return]

The Java Renaissance

06/12/2001

Java, the programming language created by Sun Microsystems to run on any operating system, was supposed to make it possible to write programs anywhere and to post them online for PC users to download and run instantly. Java was supposed to mean computer users wouldn’t have to choose between the Macintosh and Microsoft version of a program-and upgrading would be as simple as a mouse click. The idea, called “write once, run anywhere,” was a promise Java has not lived up to.

Java never ran as smoothly on PCs as Microsoft-haters hoped. Buggy versions of the Java engine in Netscape and Microsoft’s Internet Explorer, the difficulty of writing a good user interface in Java, and Microsoft’s efforts to deflect the threat of platform-independent software all contributed. Consequently, only a limited number of PC programs were written in Java. The current wisdom: Java is a great language for application and database servers, where it’s terrific at integrating functions across several different computers, but it’s dead on the desktop.

Which makes the current renaissance of Java programming for the PC all the more surprising.

A number of peer-to-peer companies, such as Roku Technologies (file synching and sharing), Parabon Computation (distributed computing), and OpenCola (content searching, bandwidth optimization), are writing applications in Java. These are young companies, and it is not clear whether they will be able to overcome either Java’s earlier limitations on the PC or Microsoft’s inevitable resistance. But their
willingness to try tells us much about software engineering, and about the PC’s place in computing ecology.

The most obvious explanation for this renaissance is the growing quality of Java itself. Sun made a big bet on Java and stuck with it even when Java failed to live up to its advance billing. The current implementation, Java 1.3, is a huge step in maturity for the language, and third parties such as IBM are making Java faster and more reliable.

This is not to say that all of Java’s weaknesses have been overcome. Writing an
interface in Java is still a wretched experience. Many programmers simply bypass
Java and write interfaces in HTML, a maneuver that allows them to change the interface without altering the underlying engineering.

Java is mainly returning to the PC, though, because the PC itself is becoming a
server. The companies coding in Java are all creating distributed, network-aware
applications, and Java’s value as a server language makes it an obvious choice for
the PC’s new role. Java is unparalleled as a language for distributed applications
because it was built around Internet protocols, rather than bolting them on, and is more secure than firewalls alone when a networked machine needs to access remote
resources or “share” resources remotely.

For all its problems, Java is still the leader in cross-device interoperability, running on everything from servers to cell phones and set-tops. If a programmer wants to write code to run on multiple devices, the only other choice on the horizon is Microsoft’s promised .NET architecture, which is still a long way off.

It’s too early to handicap the success of Java for PC-as-server applications. Microsoft could stop distributing Java with Internet Explorer, cross-device code may turn out to be less important than cross-device data formats, and the improvements in Java’s speed and stability may not be enough to please users.

Nevertheless, the return of Java is more evidence that the difference between client and server is increasingly blurry. You can get server-class hardware under your desk for $1,000 and high-speed access for 50 bucks a month, and as Napster and Seti@home have shown, users will eagerly sign up for services that put those
capabilities to use.

Furthermore, all applications are now network applications; Microsoft is even
rewriting Office to be network aware via the .NET initiative. In this environment,
anyone who can offer ways to write distributed applications that can operate over
the network while remaining secure will earn the respect of the developer community.

It’s not clear whether Java will finally fulfill its promise. But its surprising return to the PC shows that developers are hungry for a language that helps them deal with the opportunities and problems the Internet is creating. For all its faults, Java is still the best attempt at creating a cross-platform framework, and the success or failure of these young companies will tell us a lot about the future of software in our increasingly networked world.

Enter the Decentralized Zone

Digital security is a trade-off. If securing digital data were the only concern a business had, users would have no control over their own computing environment at all-the Web would be forbidden territory; every disk drive would be welded shut. That doesn’t happen, of course, because workers also need the flexibility to communicate with one another and with the outside world. The current compromise between security and flexibility is a sort of intranet-plus- firewall sandbox, where the IT department sets the security policies that workers live within. This allows workers a measure of freedom and flexibility while giving their companies heightened security. That was the idea, anyway. In practice, the sandbox model is broken. Some of the problem is technological, of course, but most of the problem is human. The model is broken because the IT department isn’t rewarded for helping workers do new things, but for keeping existing things from breaking. Workers who want to do new things are slowly taking control of networking, and this movement toward decentralized control cannot be reversed. The most obvious evidence of the gap between the workers’ view of the world and the IT department’s is in the proliferation of email viruses. When faced with the I Love You virus and its cousins, the information technology department lectures users against opening attachments. Making such an absurd suggestion only underlines how out of touch the IT group is: If you’re not going to open attachments, you may as well not show up for work. Email viruses are plaguing the workplace because users must open attachments to get their jobs done- the IT department has not given them another way to exchange files. For all the talk of intranets and extranets, the only simple, general-purpose tool for moving files between users, especially users outside the corporation, is email. Faced with an IT department that thinks not opening attachments is a reasonable option, end users have done the only sensible thing: ignore the IT department. Email was just the beginning. The Web has created an ever-widening hole in the sandbox. Once firewalls were opened up to the Web, other kinds of services like streaming media began arriving through the same hole, called port 80. Now that workers have won access to the Web through port 80, it has become the front door to a whole host of services, including file sharing. And now there’s ICQ. At least the IT folks knew the Web was coming-in many cases, they even installed the browsers themselves. ICQ (and its instant messaging brethren) is something else entirely-the first widely adopted piece of business software that no CTO evaluated and no administrator installed. Any worker who would ever have gone to the boss and asked for something that allowed them to trade real-time messages with anyone on the Net would have been turned down flat. So they didn’t ask, they just did it, and now it can’t be undone. Shutting off instant messaging is not an option. The flood is coming. And those three holes- email for file transfer, port 80 drilled through the firewall, and business applications that workers can download and install themselves-are still only cracks in the dike. The real flood is coming, with companies such as Groove Networks, Roku Technologies, and Aimster lining up to offer workers groupware solutions that don’t require centralized servers, and don’t make users ask the IT department for either help or permission to set them up. The IT workers of any organization larger than 50 people are now in an impossible situation: They are rewarded for negative events-no crashes or breeches-even as workers are inexorably eroding their ability to build or manage a corporate sandbox. The obvious parallel here is with the PC itself; 20 years ago, the mainframe guys laughed at the toy computers workers were bringing into the workplace because they knew that computation was too complex to be handled by anyone other than a centralized group of trained professionals. Today, we take it for granted that workers can manage their own computers. But we still regard network access and configuration as something that needs to be centrally managed by trained professionals, even as workers take network configuration under their control. There is no one right answer-digital security is a trade-off. But no solution that requires centralized control over what network users do will succeed. It’s too early to know what the new compromise between security and flexibility will look like, but it’s not too early to know that the old compromise is over.

Hailstorm: Open Web Services Controlled by Microsoft

First published on O’Reilly’s Openp2p on May 30, 2001.

So many ideas and so many technologies are swirling around P2P — decentralization, distributed computing, web services, JXTA, UDDI, SOAP — that it’s getting hard to tell whether something is or isn’t P2P, and it’s unclear that there is much point in trying to do so just for the sake of a label.

What there is some point in doing is evaluating new technologies to see how they fit in or depart from the traditional client-server model of computing, especially as exemplified in recent years by the browser-and-web-server model. In this category, Microsoft’s Hailstorm is an audacious, if presently ill-defined, entrant. Rather than subject Hailstorm to some sort of P2P litmus test, it is more illuminating to examine where it embraces the centralization of the client-server model and where it departs by decentralizing functions to devices at the network’s edge.

The design and implementation of HailStorm is still in flux, but the tension that exists within HailStorm between centralization and decentralization is already quite vivid.

Background

HailStorm, which launched in March with a public announcement and a white paper, is Microsoft’s bid to put some meat on the bones of its .NET initiative. It is a set of Web services whose data is contained in a set of XML documents, and which is accessed from the various clients (or “HailStorm endpoints”) via SOAP (Simple Object Access Protocol.) These services are organized around user identity, and will include standard functions such as myAddress (electronic and geographic address for an identity); myProfile, (name, nickname, special dates, picture); myCalendar, myWallet; and so on.

HailStorm can best be thought of as an attempt to re-visit the original MS-DOS strategy: Microsoft writes and owns the basic framework, and third-party developers write applications to run on top of that framework.

Three critical things differentiate the networked version of this strategy, as exemplified by HailStorm, from the earlier MS-DOS strategy:

  • First, the Internet has gone mainstream. This means that Microsoft can exploit both looser and tighter coupling within HailStorm — looser in that applications can have different parts existing on different clients and servers anywhere in the world; tighter because all software can phone home to Microsoft to authenticate users and transactions in real time.
  • Second, Microsoft has come to the conclusion that its monopoly on PC operating systems is not going to be quickly transferable to other kinds of devices (such as PDAs and servers); for the next few years at least, any truly ubiquitous software will have to run on non-MS devices. This conclusion is reflected in HailStorm’s embrace of SOAP and XML, allowing HailStorm to be accessed from any minimally connected device.
  • Third, the world has shifted from “software as product” to “software as service,” where software can be accessed remotely and paid for in per-use or per-time-period licenses. HailStorm asks both developers and users to pay for access to HailStorm, though the nature and size of these fees are far from worked out.

Authentication-Centric

The key to shifting from a machine-centric application model to a distributed computing model is to shift the central unit away from the computer and towards the user. In a machine-centric system, the software license was the core attribute — a software license meant a certain piece of software could be legally run on a certain machine. Without such a license, that software could not be installed or run, or could only be installed and run illegally.

In a distributed model, it is the user and not the hardware that needs to be validated, so user authentication becomes the core attribute — not “Is this software licensed to run on this machine?” but “Is this software licensed to run for this user?” To accomplish this requires a system that first validates users, and then maintains a list of attributes in order to determine what they are and are not allowed to do within the system.

HailStorm is thus authentication-centric, and is organized around Passport. HailStorm is designed to create a common set of services which can be accessed globally by authenticated users, and to this end it provides common definitions for:

  • Identity
  • Security
  • Definitions and Descriptions

or as Microsoft puts it:

From a technical perspective, HailStorm is based on Microsoft Passport as the basic user credential. The HailStorm architecture defines identity, security, and data models that are common to all HailStorm services and ensure consistency of development and operation.

Decentralization

The decentralized portion of HailStorm is a remarkable departure for Microsoft: they have made accessing HailStorm services on non-Microsoft clients a core part of the proposition. As the white paper puts it:

The HailStorm platform uses an open access model, which means it can be used with any device, application or services, regardless of the underlying platform, operating system, object model, programming language or network provider. All HailStorm services are XML Web SOAP; no Microsoft runtime or tool is required to call them.

To underscore the point at the press conference, they demonstrated HailStorm services running on a Palm, a Macintosh, and a Linux box.

While Microsoft stresses the wide support for HailStorm clients, the relationship of HailStorm to the Web’s servers is less clear. In the presentation, they suggested that servers running non-Microsoft operating systems like Linux or Solaris can nevertheless “participate” in HailStorm, though they didn’t spell out how that participation would be defined.

This decentralization of the client is designed to allow Hailstorm applications to spread as quickly as possible. Despite their monopoly in desktop operating systems, Microsoft does not have a majority market share for any of the universe of non-PC devices — PDAs, set-tops, pagers, game consoles, cell phones. This is not to say that they don’t have some notable successes — NT has over a third of the server market, the iPaq running the PocketPC operating system is becoming increasingly popular, and the XBox has captured the interest of the gaming community. Nevertheless, hardware upgrade cycles are long, so there is no way Microsoft can achieve market dominance in these categories as quickly.

Enter HailStorm. HailStorm offers a way for Microsoft to sell software and services on devices that aren’t using Microsoft operating systems. This is a big change — Microsoft typically links its software and operating systems (SQLServer won’t run outside an MS environment; Office is only ported to the Mac). By tying HailStorm to SOAP and XML rather than specific client environments, Microsoft says it is giving up its ability to control (or even predict) what software, running on which kinds of devices, will be accessing HailStorm services.

The embrace of SOAP is particularly significant, as it seems to put HailStorm out of reach of many of its other business battles — vs. Java, vs. Linux, vs. PalmOS, and so on — because, according to Microsoft, any device using SOAP will be able to participate in HailStorm without prejudice — “no Microsoft runtime or tool” will be required, though the full effect of this client-insensitivity will be determined by how much Microsoft alters Kerberos or SOAP in ways that limit or prevent other companies from writing HailStorm-compliant applications.

HailStorm is Microsoft’s most serious attempt to date to move from competing on unit sales to selling software as a service, and the announced intention to allow any sort of client to access HailStorm represents a remarkable decentralization for Microsoft.

It is not, however, a total decentralization by any means. In decentralizing their control over the client, Microsoft seeks to gain control over a much larger set of functions, for a much larger group of devices, than they have now. The functions that HailStorm centralizes are in many ways more significant than the functions it decentralizes.

Centralization

In the press surrounding HailStorm, Microsoft refers to its “massively distributed” nature, its “user-centric” model, and even makes reference to its tracking of user presence as “peer-to-peer.” Despite this rhetoric, however, HailStorm as described is a mega-service, and may be the largest client-server installation ever conceived.

Microsoft addressed the requirements for running such a mega-service, saying:

Reliability will be critical to the success of the HailStorm services, and good operations are a core competency required to ensure that reliability. […] Microsoft is also making significant operational investments to provide the level of service and reliability that will be required for HailStorm services. These investments include such things as physically redundant data centers and common best practices across services.

This kind of server installation is necessary for HailStorm, because Microsoft’s ambitions for this service are large: they would like to create the world’s largest address registry, not only of machines but of people as well. In particular, they would like to host the identity of every person on the Internet, and mediate every transaction in the consumer economy. They will fail at such vast goals of course, but succeeding at even a small subset of such large ambitions would be a huge victory.

Because they have decentralized their support of the client, they must necessarily make large parts of HailStorm open, but always with a caveat: while HailStorm is open for developers to use, it is not open for developers to build on or revise. Microsoft calls this an “Open Access” model — you can access it freely, but not alter it freely.

This does not mean that HailStorm cannot be updated or revised by the developer community; it simply means that any changes made to HailStorm must be approved by Microsoft, a procedure they call “Open Process Extensibility.” This process is not defined within the white paper, though it seems to mean revising and validating proposals from HailStorm developers, which is to say, developers who have paid to participate in HailStorm.

With HailStorm, Microsoft is shifting from a strategy of controlling software to controlling transactions. Instead of selling units of licensed software, Hailstorm will allow them to offer services to other developers, even those working on non-Microsoft platforms, while owning the intellectual property which underlies the authentications and transactions, a kind of “describe and defend” strategy.

“Describe and defend” is a move away from “software as unit” to “software as service,” and means that their control of the HailStorm universe will rely less on software licenses and more on patented or copyrighted methods, procedures, and database schema.

While decentralizing client-code, Microsoft centralizes the three core aspects of the service:

  • Identity (using Passport)
  • Security (using Kerberos)
  • Definitions and Descriptions (using HailStorm’s globally standardized schema)

Identity: The goal with Passport is simple — ubiquity. As Bill Gates put it at the press conference: “So it’s our goal to have virtually everybody who uses the Internet to have one of these Passport connections.”

HailStorm provides a set of globally useful services which, because they are authentication-centric, requires all users to participate in its Passport program. This allows Microsoft to be a gatekeeper at the level of individual participation — an Internet user without a Passport will not exist within the system, and will not be able to access or use Passport services. Because users pay to participate in the HailStorm system, in practice this means that Microsoft will control a user’s identity, leasing it to them for use within HailStorm for a recurring fee.

It’s not clear how open the Passport system will be. Microsoft has a history of launching web initiatives with restrictive conditions, and then dropping the restrictions that limit growth: the original deployment of Passport required users to get a Hotmail account, a restriction that was later dropped when this adversely affected the potential size of the Passport program. You can now get a Passport with any email address, and since an email address is guaranteed to be globally unique, any issuer of email addresses is also issuing potentially valid Passport addresses.

The metaphor of a passport suggests that several different entities agree to both issue and honor passports, as national governments presently do with real passports. There are several entities who have issued email addresses to millions or tens of millions of users — AOL, Yahoo, ATT, British Telecom, et al. Microsoft has not spelled out how or whether these entities will be allowed to participate in HailStorm, but it appears that all issuing and validation of Passports will be centralized under Microsoft’s control.

Security: Authentication of a HailStorm user is provided via Kerberos, a secure method developed at MIT for authenticating a request for a service in a computer network. Last year, Microsoft added its own proprietary extension to Kerberos, which creates potential incompatibilities between clients running non-Microsoft versions of Kerberos and servers running Microsoft’s versions.

Microsoft has published the details of its version of Kerberos, but it is not clear if interoperability with the Microsoft version of Kerberos is required to participate in HailStorm, or if there are any licensing restrictions for developers who want to write SOAP clients that use Kerberos to access HailStorm services.

Definitions and Descriptions: This is the most audacious aspect of HailStorm, and the core of the describe-and-defend strategy. Microsoft wants to create a schema which describes all possible user transactions, and then copyright that schema, in order to create and manage the ontology of life on the Internet. In HailStorm as it was described, all entities, methods, and transactions will be defined and mediated by Microsoft or Microsoft-licensed developers, with Microsoft acting as a kind of arbiter of descriptions of electronic reality:

The initial release of HailStorm provides a basic set of possible services users and developers might need. Beyond that, new services (for example, myPhotos or myPortfolio) and extensions will be defined via the Microsoft Open Process with developer community involvement. There will be a single schema for each area to avoid conflicts that are detrimental to users (like having both myTV and myFavoriteTVShows) and to ensure a consistent architectural approach around attributes like security model and data manipulation. Microsoft’s involvement in HailStorm extensions will be based on our expertise in a given area.

The business difficulties with such a system are obvious. Will the airline industry help define myFrequentFlierMiles, copyright Microsoft, when Microsoft also runs the Expedia travel service? Will the automotive industry sign up to help the owner of CarPoint develop myDealerRebate?

Less obvious but potentially more dangerous are the engineering risks in a single, global schema, because there are significant areas where developers might legitimately disagree about how resources should be arranged. Should business users record the corporate credit card as a part of myWallet, alongside their personal credit card, or as part of myBusinessPayments, alongside their EDI and purchase order information? Should a family’s individual myCalendars be a subset of ourCalendar, or should they be synched manually? Is it really so obvious that there is no useful distinction between myTV (the box, through which you might also access DVDs and even WebTV) and myFavorite TVShows (the list of programs to be piped to the TiVo)?

Microsoft proposes to take over all the work of defining the conceptual entities of the system, promising that this will free developers to concentrate their efforts elsewhere:

By taking advantage of Microsoft’s significant investment in HailStorm, developers will be able to create user-centric solutions while focusing on their core value proposition instead of the plumbing.

Unmentioned is what developers whose core value proposition is the plumbing are to do with HailStorm’s global schema. With Hailstorm, Microsoft proposes to divide the world into plumbers and application developers, and to take over the plumbing for itself. This is analogous to the split early in its history when Microsoft wrote the DOS operating system, and let other groups write the software that ran on top of DOS.

Unlike DOS, which could be tied to a single reference platform — the “IBM compatible” PC — HailStorm is launching into a far more heterogeneous environment. However, this also means that the competition is far more fragmented, and given the usefulness of HailStorm to developers who want to offer Web services without rethinking identity or authentication from the ground up (one of the biggest hurdles to widespread use of Sun’s JXTA), and the possible network effects that a global credentials schema could create, HailStorm could quickly account for a plurality of Internet users. Even a 20% share of every transaction made by every Internet user would make Microsoft by far the dominant player in the world of Web services.

Non-Microsoft Participation

With HailStorm, Microsoft has abandoned tying its major software offerings to its client operating systems. Even if every operating system it has — NT/Win2k, PocketPC, Stinger, et al — spreads like kudzu, the majority of the world’s non-PC devices will still not be controlled by Microsoft in any short-term future. By adopting open standards such as XML and SOAP, Microsoft hopes to attract the world’s application developers to write for the HailStorm system now or soon, and by owning the authentication and schema of the system, they hope to be the mediator of all HailStorm users and transactions, or the licenser of all members of the HailStorm federation.

Given the decentralization on the client-side, where a Java program running on a Linux box could access Hailstorm, the obvious question is “Can a HailStorm transaction take place without talking to Microsoft owned or licensed servers?”

The answer seems to be no, for two, and possibly three, reasons.

First, you cannot use a non-Passport identity within HailStorm, and at least for now, that means that using HailStorm requires a Microsoft-hosted identity.

Second, you cannot use a non-Microsoft copyrighted schema to broker transactions within HailStorm, nor can you alter or build on existing schema without Microsoft’s permission.

Third, developers might not be able to write HailStorm services or clients without using the Microsoft-extended version of Kerberos.

At three critical points in HailStorm, Microsoft is using an open standard (email address, Kerberos, SOAP) and putting it into a system it controls, not through software licensing but through copyright (Passport, Kerberos MS, HailStorm schema). By making the system transparent to developers but not freely extensible, Microsoft hopes to gain the growth that comes with openness, while avoiding the erosion of control that also comes with openness.

This is a strategy many companies have tried before — sometimes it works and sometimes it doesn’t. Compuserve collapsed while pursuing a partly open/partly closed strategy, while AOL flourished. Linux has spread remarkably with a completely open strategy, but many Linux vendors have suffered. Sun and Apple are both wrestling with “open enough to attract developers, but closed enough to stave off competitors” strategies with Solaris and OS X respectively.

Hailstorm will not be launching in any real way until 2002, so it is too early to handicap Microsoft’s newest entrant in the “open for users but closed for competitors” category. But if it succeeds at even a fraction of its stated goals, Hailstorm will mark the full-scale arrival of Web services and set the terms of both competition and cooperation within the rest of the industry.

P2P Backlash!

First published on O’Reilly’s OpenP2P.

The peer-to-peer backlash has begun. On the same day, the Wall St. Journal ran an article by Lee Gomes entitled “Is P2P plunging off the deep end?”, while Slashdot’s resident commentator, Jon Katz, ran a review of O’Reilly’s Peer to Peerbook under the title “Does peer-to-peer suck?”

It’s tempting to write this off as part of the Great Wheel of Hype we’ve been living with for years:

New Thing happens; someone thinks up catchy label for New Thing; press picks up on New Thing story; pundits line up to declare New Thing “Greatest Since Sliced Bread.” Whole world not transformed in matter of months; press investigates further; New Thing turns out to be only best thing since soda in cans; pundits (often the same ones) line up to say they never believed it anyway.

This quick reversal is certainly part of the story here. The Journal quoted entrepreneurs and investors recently associated with peer-to-peer who are now distancing themselves from the phrase in order to avoid getting caught in the backlash. There is more to these critiques than business people simply repositioning themselves when the story crescendos, however, because each of the articles captures something important and true about peer-to-peer.

Where’s the money?

The Wall St. Journal’s take on peer-to-peer is simple and direct: it’s not making investors any money right now. Mr. Gomes notes that many of the companies set up to take advantage of file sharing in the wake of Napster’s successes have hit on tough times, and that Napster’s serious legal difficulties have taken the bloom off the file sharing rose. Meanwhile, the distributed computing companies have found it hard to get either customers or investors, as the closing of Popular Power and the difficulties of the remaining field in finding customers have highlighted.

Furthermore, Gomes notes that P2P as a label has been taken on by many companies eager to seem cutting edge, even those whose technologies have architectures that differ scarcely at all from traditional client-server models. The principle critiques Gomes makes — P2P isn’t a well-defined business sector, nor a well-defined technology — are both sensible. From a venture capitalist’s point of view, P2P is too broad a category to be a real investment sector.

Is P2P even relevant?

Jon Katz’s complaints about peer-to-peer are somewhat more discursive, but seem to center on its lack of a coherent definition. Like Gomes, he laments the hype surrounding peer-to-peer, riffing off a book jacket blurb that overstates peer-to-peer’s importance, and goes on to note that the applications grouped together under the label peer-to-peer differ from one another in architecture and effect, often quite radically.

Katz goes on to suggest that interest in P2P is restricted to a kind of techno-elite, and is unlikely to affect the lives of “Harry and Martha in Dubuque.” While Katz’s writing is not as focused as Gomes’, he touches on the same points: there is no simple definition for what makes something peer-to-peer, and its application in people’s lives is unclear.

The unspoken premise of both articles is this: if peer-to-peer is neither a technology or a business model, then it must just be hot air. There is, however a third possibility besides “technology” and “business.” The third way is simply this: Peer-to-peer is an idea.

Revolution convergence

As Jon Orwant noted recently in these pages, “ Peer-to-peer is not a technology, it’s a mindset.”” Put another way, peer-to-peer is a related group of ideas about network architecture, ideas about how to achieve better integration between the Internet and the personal computer — the two computing revolutions of the last 15 years.

The history of the Internet has been told often — from the late ’60s to the mid-’80s, the DARPA agency in the Department of Defense commissioned work on a distributed computer network that used packet switching as a way to preserve the fabric of the network, even if any given node failed.

The history of the PC has likewise been often told, with the rise of DIY kits and early manufacturers of computers for home use — Osborne, Sinclair, the famous Z-80, and then the familiar IBM PC and with it Microsoft’s DOS.

In an accident of history, both of those movements were transformed in January 1984, and began having parallel but increasingly important effects on the world. That month, a new plan for handling DARPA net addresses was launched. Dreamed up by Vint Cerf, this plan was called the Internet Protocol, and required changing the addresses of every node on the network over to one of the new IP addresses, a unique, global, and numerical address. This was the birth of the Internet we have today.

Meanwhile, over at Apple Computer, January 1984 saw the launch of the first Macintosh, the computer that popularized the graphic user interface (GUI), with its now familiar point-and-click interactions and desktop metaphor. The GUI revolutionized the personal computer and made it accessible to the masses.

For the next decade, roughly 1984 to 1994, both the Internet and the PC grew by leaps and bounds, the Internet as a highly connected but very exclusive technology, and the PC as a highly dispersed but very inclusive technology, with the two hardly intersecting at all. One revolution for the engineers, another for the masses.

The thing that changed all of this was the Web. The invention of the image tag, as part of the Mosaic browser (ancestor of Netscape), brought a GUI to the previously text-only Internet in exactly the same way that, a decade earlier, Apple brought a GUI to the previously text-only operating system. The browser made the Internet point-and-click easy, and with that in place, there was suddenly pressure to fuse the parallel revolutions, to connect PCs to the Internet.

Which is how we got the mess we have today.

First and second-class citizens

In 1994, the browser created sudden pressure to wire the world’s PCs, in order to take advantage of the browser’s ability to make the network easy to use. The way the wiring happened, though — slow modems, intermittent connections, dynamic or even dummy IP addresses — meant that the world’s PCs weren’t being really connected to the Internet, so much as they were being hung off its edges, with the PC acting as no more than a life-support system for the browser. Locked behind their slow modems and impermanent addresses, the world’s PC owners have for the last half-dozen years been the second-class citizens of the Internet.

Anyone who wanted to share anything with the world had to find space on a “real” computer, which is to say a server. Servers are the net’s first-class citizens, with real connectivity and a real address. This is how the Geocities and Tripods of the world made their name, arbitraging the distinction between the PCs that were (barely) attached to the networks edge and the servers that were fully woven into the fabric of the Internet.

Big, sloppy ideas

Rejection of this gap between client and server is the heart of P2P. As both Gomes and Katz noted, P2P means many things to many people. PC users don’t have to be second-class citizens. Personal computers can be woven directly into the Internet. Content can be provided from the edges of the network just as surely as from the center. Millions of small computers, with overlapping bits of content, can be more reliable than one giant server. Millions of small CPUs, loosely coupled, can do the work of a supercomputer.

These are sloppy ideas. It’s not clear when something stops being “file sharing” and starts being “groupware.” It’s not clear where the border between client-server and peer-to-peer is, since the two-way Web moves power to the edges of the network while Napster and ICQ bootstrap connections from a big server farm. It’s not clear how ICQ and SETI@Home are related, other than deriving their power from the network’s edge.

No matter. These may be sloppy ideas, ideas that don’t describe a technology or a business model, but they are also big ideas, and they are also good ideas. The world’s Net-connected PCs host, both individually and in aggregate, an astonishing amount of power — computing power, collaborative power, communicative power.

Our first shot at wiring PCs to the Internet was a half-measure — second-class citizenship wasn’t good enough. Peer-to-peer is an attempt to rectify that situation, to really integrate personal devices into the Internet. Someday we will not need a blanket phrase like peer-to-peer, because we will have a clearer picture of what is really possible, in the same way the arrival of the Palm dispensed with any need to talk about “pen-based computing.” 

In the meantime, something important is happening, and peer-to-peer is the phrase we’ve got to describe it. The challenge now is to take all these big sloppy ideas and actually do something with them, or, as Michael Tanne of XDegrees put it at the end of the Journal article:

“P2P is going to be used very broadly, but by itself, it’s not going to create new companies. …[T]he companies that will become successful are those that solve a problem.”

Time-Warner and ILOVEYOU

First published in FEED, 05/00.

Content may not be king, but it was certainly making headlines last week. From the “content that should have been distributed but wasn’t” department, Time Warner’s spectacularly ill-fated removal of ABC from its cable delivery lineup ended up cutting off content essential to the orderly workings of America — Who Wants to Be A Millionaire? Meanwhile, from the “content that shouldn’t have been distributed but was” department, Spyder’s use of a loosely controlled medium spread content damaging to the orderly workings of America and everywhere else — the ILOVEYOU virus. Taken together, these events are making one message increasingly obvious: The power of corporations to make decisions about distribution is falling, and the power of individuals as media channels in their own right is rising.

The week started off with Time Warner’s effort to show Disney who was the boss, by dropping ABC from its cable lineup. The boss turned out to be Disney, because owning the delivery channel doesn’t give Time Warner half the negotiating leverage the cable owners at Time Warner thought it did. Time Warner was foolish to cut off ABC during sweeps month, when Disney had legal recourse, but their real miscalculation was assuming that owning the cable meant owning the customer. What had ABC back on the air and Time Warner bribing its customers with a thirty-day rebate was the fact that Americans resent any attempt to interfere with the delivery of content, legal issues or no. Indeed, the aftermath saw Peter Vallone of the NY City Council holding forth on the right of Americans to watch television. It is easy to mock this attitude, but Vallone has a point: People have become accustomed to constantly rising media access, from three channels to 150 in a generation, with the attendant rise in user access to new kinds of content. Any attempt to reintroduce artificial scarcity by limiting this access now creates so much blind fury that television might as well be ranked alongside water and electricity as utilities. The week ended as badly for Time Warner as it began, because even though their executives glumly refused to promise never to hold their viewers hostage as a negotiating tactic, their inability to face the wrath of their own paying customers had been exposed for all the world to see.

Meanwhile, halfway round the world, further proof of individual leverage over media distribution was mounting. The ILOVEYOU virus struck Thursday morning, and in less than twenty-four hours had spread further than the Melissa virus had in its entire life. The press immediately began looking for the human culprit, but largely missed the back story: The real difference between ILOVEYOU and Melissa was not the ability of Outlook to launch programs from within email, a security hole unchanged since last year. The real difference was the delivery channel itself — the number and interconnectedness of e-mail users — that makes ILOVEYOU more of a media virus than a computer virus. The lesson of a virus that starts in the Philippines and ends up flooding desktops from London to Los Angeles in a few hours is that while email may not be a mass medium, that reaches millions at the same time, it has become a massive one, reaching tens of millions in mere hours, one user at a time. With even a handful of globally superconnected individuals, the transmission rates for e-mail are growing exponentially, with no end in sight, either for viruses or legitimate material. The humble practice of forwarding e-mail, which has anointed The Onion, Mahir, and the Dancing Baby as pop-culture icons, has now crossed one of those invisible thresholds that makes it a new kind of force — e-mail as a media channel more global than CNN. As the world grows more connected, the idea that individuals are simply media consumers looks increasingly absurd — anyone with an email address is in fact a media channel, and in light of ILOVEYOU’s success as a distribution medium, we may have to revise that six degrees of separation thing downwards a little.

Both Time Warner’s failure and ILOVEYOUs success spread the bad news to several parties: TV cable companies, of course, but also cable ISPs, who hope to use their leverage over delivery to hold Internet content hostage; the creators of WAP, who hope to erect permanent tollbooths between the Internet and the mobile phone without enraging their subscribers; governments who hoped to control their citizens’ access to “the media” before e-mail turned out to be a media channel as well; and everyone who owns copyrighted material, for whom e-mail attachments threaten to create hundreds of millions of small leaks in copyright protection. (At least Napster has a business address.) There is a fear, shared by all these parties, that decisions about distribution — who gets to see what, when — will pass out of the hands of governments and corporations and into the hands of individuals. Given the enormity of the vested interests at stake, this scenario is still at the outside edges of the imaginable. But when companies that own the pipes can’t get any leverage over their users, and when users with access to e-mail can participate in a system whose ubiquity has been so dramatically illustrated, the scenario goes from unthinkable to merely unlikely.

The Real Wireless Innovators

First published on Biz2, 04/09/2001.

There is a song making the rounds in the wireless world right now that goes a
little something like this: “WAP was overhyped by the media, but we never expected
it to be a big deal. The real wireless action is coming in the future, from things
such as 3G and m-commerce. There is nothing wrong with what we are doing-wireless is simply taking a while to develop, just as the Web did.”

Don’t believe it. The comparison between the early days of the Web and wireless is
useful, but it is anything but favorable to wireless. The comparison actually
highlights what has gone wrong with wireless data services so far, and how much ground the traditional wireless players are giving up to new competitors, who have a much better idea of what users want and a much longer history of giving it to them.

As anyone who was around in 1993 can tell you, the Web was useful right out of the
box. Even in the days of the text-only Internet, Tim Berners-Lee’s original Web browser blew the other text-only search tools such as Archie and Gopher right out of the water. Unlike WAP, the Web got to where it is today by being useful when it launched, and staying useful every single day since.

Contrast the early user experiences with wireless data. When makers of wireless phones first turned their efforts to data services, they proposed uses for the wireless Web that ranged from the unimaginative (weather forecasts) to the downright ghastly (ads that ring your phone when you walk by a store).

Because the phone companies thought they owned their customers, it never occurred to them that a numeric keypad and a tiny screen might not be adequate for email. They seem to have actually believed that they had all the time in the world to develop their wireless data offerings-after all, who could possibly challenge them? So they have allowed companies that understand flexible devices, such as Motorola (MOT, info) and Research in Motion, to walk away with the wireless email market, the once and future killer app.

Wireless telcos would like you to believe that these are all just growing pains, but
there is another explanation for the current difficulties of the wireless sector:
Telephone companies are not very good at producing anything but telephones. Everything about the telcos-makers of inflexible hardware, with a form-factor optimized for voice, and notoriously bad customer service-suggests that they would be the last people on Earth you would trust to create a good experience with things such as wireless email or portable computing.

As always, the great exception here is NTT DoCoMo, which had the sense to embrace HTML (actually, a subset called compact HTML) and let anyone build content that its i-mode device could read. And NTT DoCoMo also made sure the services it provides do something its customers are interested in-and in many cases are willing to pay for.

The technology is not the difficult part of making useful wireless devices. The
companies creating good wireless customer experiences-Research in Motion with its
BlackBerry, Apple (AAPL, info) Computer with its AirPort wireless networking technology, and Motorola with its Talkabout-are companies that know how to create good customer experiences, period. If you know what customers want and how to give it to them, it is easier to go wireless than if you know only wireless technology and have to figure out what customers want.

Own worst enemies
The difficulties in the early days of wireless data had nothing to do with telcos
needing time to develop their services. Instead, those difficulties were caused by the
telcos’ determination to maintain a white-knuckled grip on their customers, a
determination that made them unwilling to embrace existing standards or share revenue with potential affiliates. Ironically, this grip has made it easier, not more difficult, for competitors to muscle in, because the gap between what users want and what the telcos were providing was so large.

The wireless sector is slowly melting, becoming part of lots of other businesses. If
you want to know who will create a good wireless shopping experience, bet on Amazon.com (AMZN, info), not Ericsson (ERICY, info). If you want to know who will create the best m-commerce infrastructure, look to Citibank, not Nokia (NOK, info). Contrary to the suggestion that the wireless sector will live apart from the rest of the technology landscape, wireless is an adjective-the things that make a good wireless personal digital assistant or a good wireless computer are very different from those that make a good wireless phone.

This is not to say there isn’t a fortune to be made in supplying wireless phones. Nor
is being a wireless network for BlackBerrys and Talkabouts a bad business-as I write
this column, GoAmerica (GOAM, info) Communications is doing quite well.

But the real breakout wireless services are being launched not by the telcos but by
innovative device and service companies who think of wireless as a feature, not as
an end in itself.

Interoperability, Not Standards

First published on O’Reilly’s OpenP2P.

“Whatever else you think about, think about interoperability. Don’t think about standards yet.”

Nothing I said at the O’Reilly P2P conference in San Francisco has netted me more flak than that statement. To advocate interoperability while advising caution on standards seems oxymoronic — surely standards and interoperability are inextricably linked?

Indeed, the coupling of standards and interoperability is the default for any widely dispersed technology. However, there is one critical period where interoperability is not synonymous with standardization, and that is in the earliest phases of work, when it is not entirely clear what, if anything, should be standardized.

For people working with hardware, where Pin 5 had better carry voltage on all plugs from the get-go, you need a body creating a priori standards. In the squishier field of software, however, the history of RFCs demonstrates a successful model where standards don’t get created out of whole cloth, but ratify existing practice. “We reject kings, presidents and voting. We believe in rough consensus and running code,” as David Clarke put it. Standardization of software can’t proceed in a single giant hop, but requires some practical solution to point to first.

I take standardization to be an almost recursive phenomena: a standard is any official designation of a protocol that is to be adopted by any group wanting to comply with the standard. Interoperability, meanwhile, is much looser: two systems are interoperable if a user of one system can access even some resources or functions of the other system.

Because standardization requires a large enough body of existing practice to be worth arguing over, and because P2P engineering is in its early phases, I believe that a focus on standardization creates two particular dangers: risk of premature group definition and damage to meaningful work. Focusing on the more modest goals of interoperability offers a more productive alternative, one that will postpone but improve the eventual standards that do arise.

Standardization and Group Definition

A standard implies group adoption, which presupposes the existence of a group, but no real P2P group exists yet. (The P2P Working Group is an obvious but problematic candidate for such a group.) The only two things that genuinely exist in the P2P world right now are software and conversations, which can can be thought of as overlapping circles:

  • There is a small set of applications that almost anyone thinking about P2P regards as foundational — Napster, ICQ and SETI@Home seem to be as close to canonical as we’re likely to get.
  • There is a much larger set of applications that combine or extend these functions, often with a view to creating a general purpose framework, like Gnutella, Jabber, Aimster, Bitzi, Allcast, Groove, Improv, and on and on.
  • There is a still larger set of protocols and concepts that seem to address the same problems as these applications, but from different angles — on the protocol front, there are attempts to standardize addressing and grid computing with things like UDDI, XNS, XML-RPC, and SOAP, and conceptually there are things like the two-way Web, reputation management and P2P journalism.
  • And covering all of these things is a wide-ranging conversation about something called P2P that, depending on your outlook, embraces some but probably not all of these things.

What is clear about this hodge-podge of concepts is that there are some powerful unlocking resources at the edges of the Internet and democratizing the Internet as a media channel. 

Does P2P even need standards? What should the work of the P2P Working Group be?
Tell us what you think.

What is not clear is which of these things constitute any sort of group amenable to standards. Should content networks use a standard format for hashing their content for identification by search tools? Probably. Would the distributed computation projects having a standard client engine to run code? Maybe. Should the people who care about P2P journalism create standards for all P2P journalists to follow. No.

P2P is a big tent right now, and it’s not at all clear that there is any one thing that constitutes membership in a P2P group, nor is there any reason to believe (and many reasons to disbelieve) that there is any one standard, other than eventually resolving to IP addresses for nodes, that could be adopted by even a large subset of companies who describe themselves as “P2P” companies.

Standardization and Damage to Meaningful Work

Even if at this point, P2P were a crystal-clear definition –within which it was clear which sub-groups should be adopting standards — premature standardization risks destroying meaningful work.

This is the biggest single risk with premature standardization — the loss of that critical period of conceptualization and testing that any protocol should undergo before it is declared superior to its competitors. It’s tempting to believe that standards are good simply because they are standard, but to have a good standard, you first need a good protocol, and to have a good protocol, you need to test it in real-world conditions.

Imagine two P2P companies working on separate metadata schemes; call them A and B. For these two companies to standardize, there are only two options: one standard gets adopted by both groups, or some hybrid standard is created.

Now if both A and B are in their 1.0 versions, simply dropping B in favor of A for the sole purpose of having a standard sacrifices any interesting or innovative work done on B, while the idea of merging A and B could muddy both standards, especially if the protocols have different design maxims, like “lightweight” vs. “complete.”

This is roughly the position of RSS and ICE, or XML-RPC and SOAP. Everyone who has looked at these protocols has had some sense that these pairs of protocols solve similar problems, but as it is not immediately obvious which one is better (and better here can mean “most lightweight” or “most complete,” “most widely implemented” or “most easily extensible,” and so on) the work goes on both of them.

This could also describe things like Gnutella vs. Freenet, or further up the stack, BearShare vs. ToadNode vs. Lime Wire. What will push these things in the end will be user adoption — faced with more than one choice, the balance of user favor will either tip decisively in one direction, as with the fight over whether HTML should include visual elements, or else each standard will become useful for particular kinds of tasks, as with Perl and C++.

Premature standardization is a special case of premature optimization, the root of all evil, and in many cases standardization will have to wait until something more organic happens: interoperability.

Interoperability Can Proceed by Pairwise Cooperation

Standardization requires group definition — interoperability can proceed with just a handshake between two teams or even two individuals — and by allowing this kind of pairwise cooperation, interoperability is more peer-to-peer in spirit than standardization is. By growing out of a shared conversation, two projects can pursue their own design goals, while working out between themselves only those aspects of interoperability both consider important.

This approach is often criticized because it creates the N2 problem, but the N2 problem is only a problem for large values of N. Even the largest P2P category in the O’Reilly P2P directory — file sharing — contains only 50 entries, and it’s obvious that many of these companies, like Publius, are not appropriate targets for standardization now, and may not even be P2P.

For small numbers of parallel engineering efforts, pairwise cooperation maximizes the participation of each member of the collaboration, while minimizing bureaucratic overhead.

Interoperability Can Proceed Without Pairwise Cooperation

If a protocol or format is well-documented and published, you can also create interoperability without pairwise cooperation. The OpenNAP servers adopted the Napster protocol without having to coordinate with Napster; Gnutella was reverse-engineered from the protocol used by the original binary; and after Jabber published its messaging protocol, Roku adopted it and built a working product without ever having to get Jabber’s sign-off or help.

Likewise, in what is probably the picture-perfect test case of the way interoperability may grow into standards in P2P, the P2P conference in San Francisco was the site of a group conversation about adopting SHA1 instead of MD5 as the appropriate hash for digital content. This came about not because of a SHA1 vs MD5 Committee, but because Bitzi and OpenCOLA thought it was a good idea, and talked it up to Freenet, and to Gnutella, and so on. It’s not clear how many groups will eventually adopt SHA1, but it is clear that interoperability is growing, all without standards being sent down from a standards body.

Even in an industry as young as ours, there is a tradition of alternative interfaces to file-sharing networks for things like Mac, Linux and Java clients being created by groups who have nothing more than access to publicly published protcols. There is widespread interoperability for the Napster protocol, which is a standard in all but name, and it has approached this state of de facto standardhood without any official body to nominate it.

Interoperability Preserves Meaningful Work

The biggest advantage of pursuing interoperability is that it allows for partial or three-layer solutions, where interested parties agree to overlap in some but not all places, or where an intermediate layer that speaks to both protocols is created. In the early days, when no one is sure what will work, and user adoption has not yet settled any battles, the kludgy aspects of translation layers can, if done right, be more than offset by the fact that two protocols can be made interoperable to some degree without having to adjust the core protocols themselves. 

What Is Needed

To have standards, you need a standards body. To have interoperability, you just need software and conversations, which is good news, since that’s all we have right now.

The bad news is that the conversations are still so fragmented and so dispersed. 

There are only a handful of steady sources for P2P news and opinion: this site, Peerprofits.com,, the decentralization@yahoogroups.com mailing list, the P2P Working Group and a handful of people who have been consistently smart and public about this stuff — Dan Bricklin, Doc Searls, Dan Gillmor, Dave Winer and Jon Udell. While each of these sources is interesting, the conversation carried on in and between them is far from being spread widely enough to get the appropriate parties talking about interoperability.

As a quick sampling, Openp2p.com’s P2P directory and Peerprofit.com’s P2P directory list about 125 projects, but only 50 groups appear on both lists. Likewise, the Members List at the P2P Working Group is heavy on participating technology companies, but does not include Freenet, Gnutella, OpenCola or AIMster.

The P2P Working Group is one logical place to begin public conversations about interoperability, but it may be so compromised by its heritage as a corporate PR site that it can never perform this function. That in itself is a conversation we need to have, because while it may be premature to have a “Standards Body,” it is probably not premature to have a place where people are tying to hammer out rough consensus about running code. 

The decentralization list is the other obvious candidate, but with 400 messages a month recently, it may be too much for people wanting to work out specific interoperability issues.

But whatever the difficulties in finding a suitable place or places to have these conversations, now is the time for it. The industry is too young for standards, but old enough for interoperability. So don’t think about standards yet, but whatever else you think about, think about interoperability.

P2P Smuggled In Under Cover of Darkness

First published on O’Reilly’s OpenP2P, 2/14/2001

2001 is the year peer-to-peer will make its real appearance in the enterprise, but most of it isn’t going to come in the front door. Just as workers took control of computing 20 years ago by smuggling PCs into businesses behind the backs of the people running the mainframes, workers are now taking control of networking by downloading P2P applications under the noses of the IT department.

Although it’s hard to remember, the PC started as a hobbyist’s toy in the late ’70s, and personal computers appeared in the business world not because management decided to embrace them, but because individual workers brought them in on their own. At the time, PCs were slow and prone to crashing, while the mainframes and minis that ran businesses were expensive but powerful. This quality gap made it almost impossible for businesses to take early PCs seriously.

However, workers weren’t bringing in PCs because of some sober-minded judgment about quality, but because they wanted to be in control. Whatever workers thought about the PC’s computational abilities relative to Big Iron, the motivating factor was that a PC was your own computer.

Today, networking — the ability to configure and alter the ways those PCs connect — is as centralized a function as computation was in the early ’80s, and thanks to P2P, this central control is just as surely and subtly being eroded. The driving force of this erosion is the same as it was with the PC: Workers want, and will agitate for, control over anything that affects their lives.

This smuggling in of P2P applications isn’t just being driven by the human drive for control of the environment. There is another, more proximate cause of the change.

You Hate the IT department, and They Hate You Right Back

The mutual enmity between the average IT department and the average end user is the key feature driving P2P adoption in the business setting.

The situation now is all but intolerable: No matter who you are, unless you are the CTO, the IT department does not work for you, so your interests and their interests are not aligned.

The IT department is rewarded for their ability to keep bad things from happening, and that means there is a pressure to create and then preserve stability. Meanwhile, you are rewarded for your ability to make good things happen, meaning that a certain amount of risk-taking is a necessary condition of your job.

Risk-taking undermines stability. Stability deflects risk-taking. You think your IT department are jerks for not helping you do what you want to do; they consider you an idiot for installing software without their permission. Also, because of the way your interests are (mis)aligned, you are both right.

Thought Experiment

Imagine that you marched into your IT department and explained that you wanted the capability to have real-time conversations with Internet users directly from your PC, that you wanted this set up within the hour, and that you had no budget for it.

Now imagine being laughed out of the room.

Yet consider ICQ. Those are exactly its characteristics, and it is second only to e-mail, and well ahead of things such as Usenet and Web bulletin boards, as the tool of choice for text messaging in the workplace. Furthermore, chat is a “ratchet” technology: Once workers start using chat, they will never go back to being disconnected, even if the IT department objects.

And all this happened in less than 4 years, with absolutely no involvement from the IT department. Chat was offered directly to individual users as a new function, and since the business users among them knew (even if only unconsciously) that the chances of getting the IT department to help them get it were approximately “forget it.” Their only other option was to install and configure the application themselves; which they promptly did.

So chat became the first corporate networking software never approved by the majority of the corporations whose employees use it. It will not be the last.

Chat Is Just the Beginning

ICQ was the first application that made creating a public network address effortless. Because ICQ simply ignored the idea that anyone else had any say over how you use your computer, you never had to ask the IT department about IP addresses, domain name servers or hosting facilities. You could give your PC an network address, and that PC could talk to any other PC with an address in the ICQ name space, all on your own.

More recently, Napster has made sharing files as easy as ICQ made chat. Before Napster, if you wanted to serve files from your PC, you needed a permanent IP address, a domain name, registration with domain name servers and properly configured Web server software on the PC. With Napster, you could be serving files within 5 minutes of having downloaded the software. Napster is so simple that it is easy to forget that it performs all of the functions of a Web server with none of the hassle.

Napster is optimized for MP3s, but there is no reason general purpose file sharing can’t make the same leap. File sharing is especially ripe for a P2P solution, as the current norm for file sharing in the workplace — e-mail attachments — notoriously falls victim to arbitrary limits on file sizes, mangled MIME headers and simple failure of users to attach the documents they meant to attach. (How may times have you received otherwise empty “here’s that file” mail?)

Though there are several systems vying for the title of general file-sharing network, the primary reason holding back systems such as Gnutella is their focus on purity of decentralization rather than ease of use. The reason that brought chat and Napster into the workplace is the same reason that brought PCs into the workplace two decades ago: They were easy enough to use that non-technical workers felt comfortable setting them up themselves.

Necessity Is the Mother of Adoption

Workers’ desire for something to replace the e-mail attachment system of file sharing is so great that some system or systems will be adopted. Perhaps it could be Aimster, which links chat with file sharing; perhaps Groove, which is designed to set up an extensible group work environment without a server; perhaps Roku, OpenCola or Globus, all of which are trying to create general purpose P2P computing solutions; and there are many others.

The first workplace P2P solution may also be a specific tool for a specific set of workers. One can easily imagine a P2P environment for programmers, where the version control system reverses its usual course, and instead of checking out files stored centrally, it checks in files stored on individual desktops. And a system whose compiler knows where the source files are, even if they are spread across a dozen PCs.

And as with chat, once a system like this exists and crosses some threshold of ease of use, users will adopt it without asking or even informing the IT department.

End-to-End

As both Jon Udell and Larry Lessig have pointed out from different points of view, the fundamental promise of the Internet is end-to-end communications, where any node can get to any other node on its own. Things such as firewalls, NAT translation and dynamic IP addresses violate the fundamental promise of the Internet both at the protocol level, by breaking the implicit contract of TCP/IP (two nodes can always contact each other) and on a social level (the Internet has no second-class citizens).

Business users have been second-class citizens for some time. Not only do systems such as ICQ and Napster undo this by allowing users to create their own hosted network applications, but systems such as Mojo Nation are creating connection brokers that allow two machines — both behind firewalls — to talk to each other by taking the e-mail concept of store and forward, and using it to broker requests for files and other resources.

The breaking of firewalls by the general adoption of port 80 as a front door is nothing compared to the ability to allow users to create network identities for themselves without having to ask for either permission or help.

Security, Freedom and the Pendulum

Thus P2P represents a swing of the pendulum back toward user control. Twenty years ago, the issue was control over the center of a business where the mainframes sat. Today, it is over the edges of a business, where the firewalls sit. However, the tension between the user’s interests and corporate policy is the same.

The security-minded will always complain about the dangers of users controlling their own network access, just like the mainframe support staff worried that users of PCs were going to destroy their tidy environments with their copies of VisiCalc. And, like the mainframe guys, they will be right. Security is only half the story, however.

Everyone knows that the easiest way to secure a PC is to disconnect it from the Internet, but no one outside of the NSA seriously suggests running a business where the staff has no Internet access. Security, in other words, always necessitates a tradeoff with convenience, and there are times when security can go too far. What the widespread adoption of chat software is telling us is that security concerns have gone too far, and that workers not only want more control over how and when their computers connect to the network, but that when someone offers them this control, they will take it.

This is likely to make for a showdown over P2P technologies in the workplace, with an argument between the freedom of individual workers vs. the advantages of centralized control, and of security vs. flexibility. Adoption of some form of P2P addressing, addressing that bypasses DNS to give individual PCs externally contactable addresses, is now in the tens of millions thanks to Napster and ICQ.

By the time general adoption of serverless intranets begins, workers will have gone too far to integrate P2P functions into their day for IT departments to simply ban them. As with the integration of the PC, expect the workers to win more control over the machines on their desk, and for the IT departments to accept this change as the new norm over time.

Peak Performance Pricing

First published at Biz2, February 2001.

Of all the columns I have written, none has netted as much contentious mail as
“Moving from Units to Eunuchs” (October 10, 2000, p114, and at Business2.com).
That column argued that Napster was the death knell for unit pricing of online music.
By allowing users to copy songs from one another with no per-unit costs, Napster
introduced the possibility of “all you can eat” pricing for music, in the same way
that America Online moved to “all you can eat” pricing for email.

Most of the mail I received disputed the idea that Napster had no per-unit costs.
That idea, said many readers, violates every bit of common sense about the economics of resource allotment. If more resources are being used, the users must be paying more for them somewhere, right?

Wrong. The notion that Napster must generate per-unit costs fails the most obvious
test: reality. Download Napster, download a few popular songs, and then let other
Napster users download those songs from you. Now scan your credit-card bills to see where the extra costs for those 10 or 100 or 1,000 downloads come in.

You can perform this experiment month after month, and the per-unit costs will never show up-you are not charged per byte for bandwidth. Even Napster’s plan to charge a subscription doesn’t change this math, because the charge is for access to the system, not for individual songs.

‘Pay as you go’
Napster and other peer-to-peer file-sharing systems take advantage of the curious way individual users pay for computers and bandwidth. While common sense suggests using a “pay as you go” system, the average PC user actually pays for peak performance, not overall resources, and it is peak pricing that produces the excess resources that let Napster and its cousins piggyback for free.

Pay as you go is the way we pay for everything from groceries to gasoline. Use some,
pay some. Use more, pay more. At the center of the Internet, resources like bandwidth are indeed paid for in this way. If you host a Web server that sees a sudden spike in demand, your hosting company will simply deliver more bandwidth, and then charge you more for it on next month’s bill.

The average PC user, on the other hand, does not buy resources on a pay-as-you-go
basis. First of all, the average PC is not operating 24 hours a day. Furthermore,
individual users prize predictability in pricing. (This is why AOL was forced to drop
its per-hour pricing in favor of the now-standard flat rate.) Finally, what users pay
for when they buy a PC is not steady performance but peak performance. PC buyers don’t choose a faster chip because it will give them more total cycles; they choose a faster chip because they want Microsoft Excel to run faster. Without even doing the math, users understand that programs that don’t use up all of the available millions of instructions per second will be more responsive, while those that use all the CPU cycles (to perform complicated rendering or calculations) will finish sooner.

Likewise, they choose faster DSL so that the line will be idle more often, not less.
Paying for peak performance sets a threshold between a user’s impatience and the size of their wallet, without exposing them to extra charges later.

A side effect of buying peak cycles and bandwidth is that resources that don’t get used have nevertheless been paid for. People who understand the economics of money but not of time don’t understand why peak pricing works. But anyone who has ever paid for a faster chip to improve peak performance knows instinctively that paying for resources upfront, no matter what you end up using, saves enough hassles to be worth the money.

The Napster trick
The genius of Napster was to find a way to piggyback on these already-paid-up resources in order to create new copies of songs with no more per-unit cost than new pieces of email, a trick now being tried in several other arenas. The SETI@home project creates a virtual supercomputer out of otherwise unused CPU time, as do Popular Power, DataSynapse, and United Devices.

The flagship application of openCola combines two of the most talked-about trends on the Internet: peer-to-peer networking and expert communities that lets users share knowledge instead of songs. It turns the unused resources at the edge of the network into a collaborative platform on which other developers can build peer-to-peer applications, as does Groove Networks.

As more users connect to the Internet every day and as both their personal computers and their bandwidth gets faster, the amount of pre-paid but unused resources at the edges of the network is growing to staggering proportions.

By cleverly using those resources in a way that allowed it to sidestep per-unit
pricing, Napster demonstrated the value of the world’s Net-connected PCs. The race is now on to capitalize on them in a more general fashion.