| TOC |
| TOC |
| TOC |
Peer-to-peer technologies have been thrust into the public's awareness with recent controversies over Napster and similar programs. However, peer-to-peer communications have been in the foundations of a number of important Internet services for a long time, including electronic mail, routing, clock synchronization protocols, and domain name services.
Even client-server Internet applications, such as browsing web pages or looking up records in a database, are peer-to-peer when examined at a lower level of detail. Almost all such protocols rely on TCP, which is a peer-to-peer protocol. Web proxies such as are found in firewalls act as both client and server in passing requests between web browsers and web servers. Indeed, nothing in the basic Internet architecture makes a distinction between client and server, except for the obvious restriction that an application must be listening for messages before it can hear them. Hence, a client-server architecture is really just a peer-to-peer architecture with restrictions on what each end is allowed to do: "clients" may only make requests and "servers" may only reply to requests. In contrast, a peer-to-peer connection allows the application at either end to issue a request or to reply to one.
| TOC |
In 1998, one of the most prolific creators of Internet protocols, Dr. Marshall T. Rose, realized that the design of new Internet protocols had reached a critical point. Marshall Rose teamed up with Carl Malamud (a pioneer in sending voice and FAX over the Internet) to invent BXXP, the Blocks Extensible Exchange Protocol, now also known as "BEEP".
Many new protocols were solving the same problems over and over, such as negotiating encryption, logging in users, separating each request from the next, matching responses to requests, reporting errors, and so on. At the same time, administrative problems such as network address translation, firewalls, and dynamic IP address assignment were making it progressively harder to make sure multiple requests were going to the same machine.
BEEP specifies a reusable solution to these problems, instead of requiring the same decisions to be made over again for each new application. BEEP incorporates the knowledge and experience of leading Internet protocol architects into a framework that connects existing Internet standards for encryption and authentication and new standards for connection management. A new networked application needs merely to supply the "important" part, the part that distinguishes it from other applications, as a BEEP "profile". The mundane part is the same for all protocols, and therefore can be coded as a library, freeing the application designer to focus on the interesting bits.
| TOC |
BEEP is inherently a peer-to-peer point-to-point protocol. Naturally, one process must be "listening" while the other process is "initiating" a connection, but once the TCP connection is established, the two processes are completely symmetric in their use of the protocol. Each peer starts by offering a list of supported profiles to the other. Either peer can create a new channel (i.e., a new encapsulated sub-connection) at any time. Either peer can issue a request on any channel at any time theparticular profile on that channel allows it. Either peer can close a channel or an entire connection. Either peer can request to encrypt the connection. Either peer can "log in" to the other.
One can contrast a peer-to-peer protocol like BEEP with popular client-server protocols such as HTTP, the protocol of the World Wide Web. The original HTTP protocol was designed to be very simple to implement. It was intended to be used for downloading technical documentation in a research campus environment. Hence, a very lightweight and simple design made for the most efficient mechanism. However, with the addition of images to web pages, active server pages, shopping carts, and so on, these simplifying choices started to be less efficient. For example, the original HTTP specification simplified implementation of servers by requiring that every transaction establish a new connection, which was a perfectly reasonable simplification when a single web page might take half an hour to read. That is, with HTTP, every request must perform a 3-way TCP handshake to establish the connection, must include identity information and authentication information in each request, must determine the optimal TCP window size and IP packet size, must renegotiate encryption, and must use "session cookies" or similar technology for each request to route the request to the same machine and process that handled the previous request. This extra overhead becomes especially onerous when applications that would be better served by a new protocol are layered on top of HTTP instead, in an attempt to bypass firewall restrictions. Indeed, later versions of HTTP have grown in complexity in an attempt to reduce this overhead, while still remaining client-server in nature.
Peer-to-peer protocols require a different approach. Typically, either application can initiate a transaction, and any number of related transactions are passed back and forth during the course of a conversation. Because of the former, a protocol that disconnects between requests would be problematic, as it may not be possible to connect in both directions due to firewalls. It is also inefficient to do so, for all the reasons enumerated above.
In contrast, BEEP-based applications can be sure that multiple requests are all being handled by the same peer, eliminating the need for "session cookies" and other such state-management information. An application supporting search, download, and chat (just as an example) can use three profiles over the same connection, with the multiplexing and prioritization handled by BEEP libraries, considerably simplifying the programming task. Fetching a web page with multiple embedded images using BEEP instead of HTTP simply uses multiple requests over the same channel, or multiple channels if interleaving of downloads is desired. Since only one connection is established for the lifetime of the conversation, the overhead of three-way handshakes, MTU determination, authentication, encryption negotiation, TCP window "slow start", and the bandwidth and processing power that goes with all of that, is conserved.
There are, however, a number of common functions not directly supported by BEEP. One of these functions is service discovery that is, how does an application learn the address of the server to which it is interested in talking? Another of these functions is authorization while BEEP handles authentication, it leaves authorization up to each protocol. In other words, BEEP helps you identify to whom you are connected, but does not specify how to determine what that identity is allowed to do. Both of these functions are addressed by APEX.
| TOC |
APEX is a BEEP profile, also defined by Marshall Rose, along with Dave Crocker (who defined the formats for Internet e-mail in 1982) and Graham Klyne (who pioneered the use of Internet e-mail to interface to telephone services and defined format and content negotiation for it). On top of BEEP, APEX adds service discovery, application-level addressing, presence information, and permission management. In other words, it defines how to find a person with whom you are interested in communicating, regardless of where they are, not unlike an electronic mail address does. Also, it defines how one application can choose which other application to communicate with, allowing a chat program and a shared whiteboard (for example) to both be available. It also allows discovery of presence information, meaning that one can be notified (for example) when a coworker logs in or becomes available for a video conference. Finally, it provides a standard mechanism for permissions, specifying a service and database for defining permissions associated with user names, applications, and transactions.
At the protocol level, APEX allows an application to communicate with a peer by sending an arbitrary MIME message into a "mesh" of interconnected relays, which in turn pass the message to the appropriate endpoint. All the routing is taken care of without concerning the application the proper peer is found, regardless of where it is connected. An endpoint can be a service, which makes APEX suitable for client-server applications, since such services are not associated with any particular person. For example, retrieving presence information and permission information is implemented via messages to the presence and permission services, each of which is addressed as APEX endpoints. One needs the presence service available even if the person being sought is not connected at the time; otherwise, one could not tell the person is not connected.
Future application-level services could also be implemented on top of APEX. Users could take advantage of the distributed power of the Internet by launching requests into a mesh of servers that then use peer-to-peer mechanisms to locate a file, ray-trace a computer-generated video, or analyze genetic information, just to name a few popular examples. APEX provides a foundation for building a variety of such systems, once again taking care of the details common to all such applications.
An APEX endpoint can also be a person, identified by something that looks like an e-mail address. In most cases, it actually will be the person's email address, since that would make the person easy to find. However, for those concerned about such, the protocol does not require any special connection between a person and his APEX identity. Knowing this base APEX endpoint name allows the discovery of other endpoints for this person, such as voice and FAX telephone numbers, e-mail addresses, and other APEX-enabled applications.
Finally, an APEX endpoint can address a specific APEX-enabled application being run by a specific person. Part of the presence information maintained by APEX is a list of applications currently available for communication. For example, starting up a video conferencing application might announce to everyone else in the company that this user is available to accept calls. Other applications subscribe to receive updates of this information, and applications can also see who is subscribed to get such information. As people log in and log out, a list of coworkers available for conferencing could be automatically updated in each application.
Of course, few people want to be available to everyone in the world. APEX defines a permission management service, which allows users or administrators to define who is allowed to send various types of messages to various applications for each user. One might be willing to accept conference calls from anyone in one's company, but only share files with those in one's department, while letting anyone in the world leave virtual sticky-note messages on one's screen. All these permissions can be specified with a simple utility using a standard format specified by APEX, again solving the problem just once.
| TOC |
There are a number of benefits to using a single foundation and framework (like BEEP) for multiple protocols, particularly when the foundation supports multiplexing, privacy, and authentication. Some of the benefits come from the usual "N plus M" rather than "N times M" structure of such a framework. That is, each foundation functionality (such as encryption, authentication, concurrency, and so on) works easily with each application (such as file sharing, chat, database search, and so on). Hence, improving the encryption portion of the framework improves every application implemented on that framework, and any new application automatically makes use of the best features of the framework. That's the "N plus M" part "N" foundation implementations and "M" application implementations on top. When every application reinvents everything it needs, one ends up with "N" implementations for each of "M" applications, most of which are incompatible.
As an example of this problem, one can look at FTP and HTTP. Both are designed as client-server applications for transferring files. Both support user name and password authentication, but they are incompatible with each other. Indeed, in modern web browsers that handle both HTTP and FTP, the code must implement both methods for authenticating the client to the server. HTTP supports SSL encryption, while FTP normally does not, again even when both are coded in the same client program. Caching of open connections is also done in completely different ways, as is pipelining of requests, framing, and so on.
A unified framework allows one library not only to service multiple protocols in the same client, but also to service multiple clients. Indeed, with BEEP it is possible to serve completely different applications over the same connection at the same time. For example, reconfiguring an Apache server to add a new BEEP profile would integrate an entirely new application into the server, making use of all the configuration and authentication code already present.
An unobvious benefit of separating the privacy and authentication mechanisms from the application protocol can be demonstrated with an example. In HTTP and FTP, for example, user authentication is based on user names and passwords, and this is coded directly into the protocol. If the operating system underlying the implementation does not use user names and passwords for authentication, the protocols must be perverted to normalize user credentials to fit into the model the protocol presents. For example, if a security rating needs to be attached to a request, or a security token generating single-use passwords is required, there is no place in HTTP or FTP to supply that information. Instead, some "misuse" of the mechanism must be sanctioned, such as requiring different user names for the same user, depending on the security rating desired.
Contrast this with BEEP, in which authentication is based on SASL, a standardized interface that can support a literally unlimited number of authentication schemes, including mechanisms not yet invented. For example, a sufficiently uptight organization could supply employees with a new SASL profile that would handle retina scanners, and all their BEEP applications would automatically be able to use this. Similar benefits arise from the privacy functionality being in the framework rather than the application.
Having the concurrency mechanisms being in the framework allows a number of sophisticated applications that would otherwise likely be just too difficult. Just as most modern sophisticated programs rely on the underlying operating system to schedule multiple processes and threads, creating sophisticated and efficient peer-to-peer software is simplified when one uses a tested and stable framework for managing bandwidth and framing. Once the framework is available, the kinds of simplifications that early versions of HTTP used are no longer necessary. When starting from scratch, one must slowly build up functionality as various needs become pressing, striving to maintain both efficiency and compatibility. However, with an effective framework supplying the common and tedious parts of a network-enabled application, one can concentrate on making the actual protocol the best possible, rather than worrying about minutia that is nevertheless vital to get right.
| TOC |
There is a fundamental difference between an application framework and a single-application protocol. An application framework provides a set of functions that can be used in a variety of applications, saving implementation and testing effort and promoting interoperability. Frameworks (using the term in a broader sense) include systems like BEEP and TCP (data transfer frameworks), MIME and XML (data description frameworks), relational database theory (a data anagement framework), and so on. Each of these was designed from the start to be independent of any given application. Each has in mind extensibility, making the framework useful for a wide variety of current and future applications, including application areas completely unforeseen today.
In contrast, specific application protocols, such as HTTP, SMTP, and SNMP are all designed for a specific purpose. HTTP was originally designed to transport research papers, SMTP for electronic mail, and SNMP for network management data. Because underlying infrastructure for these systems already exists and is widely deployed, some application engineers use these systems for new applications. For example, some people run shared whiteboard applications over the web, while others clear credit-card charges via electronic mail. In each case, using a protocol designed for one purpose as the substrate for a completely different application can lead to innumerable practical problems. In the former example, getting second-by-second updates to web pages through caching proxies is problematic. In the second example, vast number of different electronic mail client applications can make any sort of automation difficult to make reliable. In general, the lack of designed-in flexibility is a practical impediment to making a new application with different characteristics use an old application's protocol. Using a client-server protocol as the underlying transport for a peer-to-peer application generally results in a fragile and inefficient system.
There is also often a temptation to gateway one application to another. Sometimes this is simply to make the original application more accessible, such as a gateway from network newsgroups to web pages. Others, such as Java and other "plug-ins", use HTTP as a gateway protocol for distributing executable code, which predictably leads to all kinds of security problems.
A gateway between one system and another leads to a situation where the combination as a whole has all the problems of each component, with few (if any) of the benefits of the whole. This is called "impedance mismatch", in an analogy with electrical engineering, where part of a signal "bounces back" from a connection between two different size wires. As an example in protocols, consider online banking served over the web. The bank is certainly using a database system with sophisticated logging and authentication and authorization components. It is virtually certain that every action taken by any employee of the bank can be traced back to that employee individually. Yet, when banking services are provided to customers over the web, the database most likely sees every user as "a web customer", leaving the details of logging and authentication and authorization to the scripts on the web server. It is up to the web server software to prevent one user from seeing another user's accounts, not the database where all the information is stored. This naturally leads to two potential failure points, either of which could allow a break-in.
One of the most common tendencies is to use HTTP as a carrier for inappropriate applications because there is already a "hole" in corporate firewalls allowing HTTP connections to pass. However, for actual business use, such an end-run is unwise. If there is a legitimate reason to be using the protocol, it should be easy to have the appropriate port opened in the firewall. One would expect to give a legitimate office key to the cleaning staff, rather than having one of the employees sneak down to the hardware store and make a copy to save on paperwork. Similarly, the ease of getting through a corporate "internet door" without telling the security staff should not dictate the form of an application critical to a business' success. The loss of efficiency and security is just not worth the savings of using a back door. And since HTTP is a client-server protocol with a short connection lifetime, mapping a peer-to-peer protocols with extensive state information is difficult to implement well, inefficient, usually less secure, and generally not worth the effort compared to doing it right in the first place.
| TOC |
Even though peer-to-peer applications are becoming more popular, developing new peer-to-peer application protocols outside of the context of reusable protocol frameworks is wasteful, inefficient, and error-prone. Using reusable frameworks that will reduce complexity, increase efficiency and interoperability, speed standardization and implementation, and mitigate risk. BEEP and APEX can be the scrubbing bubbles that clear out the layers of accumulated waxy build-up of inappropriate protocol use, allowing the natural beauty of the Internet to shine through.
| TOC |
The author appreciates the valuable comments of Marshall Rose, Wen New, Marco Gazzetta, and Kris Magnusson.