RESEARCH INTERESTS
I am mainly interested in building large-scale distributed systems with a special focus on peer to peer (P2P) and Grid systems. These systems are by nature decentralized at multiple levels: e.g. ownership, trust, and resource management. I believe that decentralization offers a challenging set of problems and that solutions in this space have the potential for high social impact as the rapid acceptance of recent P2P and Grid research indicates.
PAST AND CURRENT PROJECTS
I am an experimentalist. My approach is (1) to characterize existing systems to understand their usage patterns, limitations, and the factors contributing to their success; (2) to build new systems optimized for common usage patterns while trying to capitalize on existing successful designs; and (3) to maintain close contact with users and application groups to ensure that my research remains relevant.
Large scale system characterization. Spurred by the widespread Internet connectivity, P2P systems have gained popularity at an unprecedented rate: few, if any, other man built systems have grown this large this fast. I believe this dynamic makes P2P systems interesting artifacts to study as they anticipate both requirements and potential solutions for a new generation of distributed systems. My work has focused on understanding the macroscopic properties of these large scale, self organizing systems [4, 5, 6, 9, 16, 18], on studying incentives for participation and fair sharing [10, 18, 22], and on investigating mechanisms to improve resource usage efficiency [2, 7, 8, 12, 13, 20, 21, 22]. I studied the Gnutella network, estimated its generated traffic, and determined that its naïve topology mapping is the main barrier to growth [4]. The data collected during this project have been used by numerous external groups to model realistic unstructured P2P topologies. I analyzed another P2P file sharing network, Kazaa, using traces collected at a commercial ISP. The goal was to understand Kazaa's content, user behavior, and node availability, and to quantify the potential benefits of introducing transparent caches at ISPs to reduce generated traffic [9, 13]. I looked at data sharing patterns in large user communities: the Web, Kazaa, and a scientific collaboration. Our effort uncovered the small world structure of these sharing patterns [16] and proposed solutions to exploit this structure for data location and placement [6].
Building new systems. An efficient index service is a critical component of any distributed system or application. I built a replica location service [7] targeted for environments where location queries dominate updates and the dynamic component of the system (e.g., node failures) cannot be neglected. This solution is based on three mechanisms: probabilistic representations of location information, soft state protocols, and a flat overlay network to disseminate location information. These choices offer important benefits: low query latency, support for finding co located sets of replicas, and adaptability to dynamic settings. An implementation incorporating some of these ideas is distributed with the Globus Toolkit and has numerous deployments [8].
This effort is in part a vehicle to explore self organization and the limits of unstructured overlay-based solutions. Structured overlays offer an elegant solution to aggregate a large number of resources at the cost of creating and maintaining the regular overlay structure. The alternative unstructured overlays, though less efficient for some applications, are inherently less expensive to create and maintain: they map more naturally on inherently heterogeneous sets of end nodes linked by a physical topology (the Internet) that displays scale-free and small-world characteristics. Thus it is interesting to contrast these two approaches for various application domains. As a first step in this direction, I have built a multi source multicast layer [25] based on a unstructured overlay that offers data dissemination trees with better performance (as reflected by the traditional metrics of stress and stretch) than that achieved by state of the art solutions based on structured overlays.
Support for applications. I used Grid middleware, detailed performance modeling, and application level techniques to support efficient execution of parallel scientific applications on geographically distributed resources. The challenges were: firstly, to coordinate resource usage across multiple administrative domains, and secondly, to mask latency and improve execution efficiency for this tightly coupled, parallel application through adaptive, application specific customizations. This effort [1, 2, 3] led winning The Gordon Bell Award at SC2001 conference.
Service deployment with performance guarantees.Together with A. Dan (IBM)I have designed a framework to address the gap between high-level service performance objectives and the resource management infrastructure used to allocate resources to these services [15, 21]. We have identified key functions for managing service level agreements demonstrated feasibility through a prototype implementation.
Utility services for Grid virtual organizations. Today, "virtual organizations" developed around new Grid deployments are forming at a high rate. Yet, in spite of remarkable progress, significant resources (human and hardware) are allocated to keep each of these organizations functioning. One factor that makes the whole process complex is the large number of functionally independent services that have to be deployed, configured, and maintained: information services, services to orchestrate data transfers and processing, planning and scheduling services, archival services, etc. In addition, this effort is duplicated in each virtual organization. One way to reduce this deployment and operations burden is to build the infrastructure that allows virtual organizations to outsource non critical services to utility like service providers. The potential benefits are similar to those of outsourcing in the corporate world: specialization and economies of scale. Additional benefits come from a level playing field for competing service implementation and simplified integration of community contributions (as local changes at a service provider). And finally, this approach would free human and hardware resources in each virtual organization to focus on critical, application specific services. This move towards utility like service providers is only an intermediary step towards the vision of Grid computing as a ubiquitous utility. However, even this intermediary step towards the utility paradigm brings significant challenges. In addition to the traditional concerns for service scalability, robustness, or security, services need to implement two new properties: contextualization and isolation. All requests must be processed at the utility in the context of the virtual organization that is issuing them. Additionally the utility must isolate virtual organizations both to ensure confidentiality and to make sure that load bursts in one organization do not impact users' perceived quality of service of in other organizations.
Congestion control for structured overlays. In the past few years Distributed Hash Tables (DHTs) and their supporting structured overlays have gained widespread attention as an elegant and scalable solution to aggregate a large number of resources. Distributed indexing and search services, distributed file systems, request indirection infrastructures, or publish-subscribe systems are only a few of the applications that attempt to capitalize on the strengths of the structured overlay substrate. Yet, there has been surprisingly little effort to design and implement congestion control mechanisms for structured overlays. The current state of the art is perhaps similar to that of the Internet in early '80s, before a few congestion collapse events focused efforts to incorporate congestion control mechanisms. Structured overlays present similar challenges with large store and forward networks: on one side, to obtain high throughput, enough data must be kept "in flight" at any time. On the other side, physical network links have limited capacity and the overlay must have congestion control mechanisms to avoid overflowing buffers and unnecessary message loss. I plan to use accumulated Internet experience to design congestion control mechanisms for structured overlays. One example is window-based control at the sender which worked well for the Internet and can be adapted to the one-to many usage pattern of structured overlays. A second example is the choice of a congestion indicator: Experience proved that packet loss is an inexpressive and unreliable congestion indicator (and this approach is one of today's obstacles in achieving high throughput on high delay, high speed TCP networks). Instead, messages traversing a structured overlay could collect congestion indications at each node they traverse and carry it back to the original message source.
LONG TERM VISION: SELF ORGANIZING LARGE-SCALE SYSTEMS
The traditional approach to understanding large, complex systems (like the Internet, a P2P file sharing network, or an ecosystem) is to gather a macroscopic view of the system, and/or to deconstruct the system into components and attempt to understand them and their relationships. This approach, however, does not offer insights into building new systems where desired system properties emerge as a result of local interactions between components that make independent decisions based on incomplete information. Understanding how to build such self organizing systems is a required step if we are to cope, without constantly guiding and supervising, with the increased scale, volatility, and complexity of the large, distributed applications we envision.