Peer-to-Peer Services for Robust Grid Computing
Principle InvestigatorsAlan Sussman, Ph.D.
Pete Keleher, Ph.D.
Bobby Bhattacharjee, Ph.D.
Michael Marsh, Ph.D.
Jik-Soo Kim, M.S.
Derek C. Richardson, Ph.D.
Dennis Wellnitz, Ph.D.
Overall Project Description
Desktop grids are an increasingly common platform used to perform large computations. Desktop grids use opportunistic sharing to exploit large collections of personal computers and workstations across the Internet and can achieve tremendous computing power with low cost. However, current systems are typically based on a traditional client-server architecture, which has inherent shortcomings with respect to robustness, reliability and scalability. The goal of this project is to design and build a massively scalable infrastructure for executing grid applications on a widely distributed set of resources. Such infrastructure must be decentralized, robust , highly available, and secure, while effectively mapping grid application instances to available resources throughout the system. Fortunately, these are precisely the characteristics promised by new techniques and approaches in Peer-to-Peer (P2P) systems.
We are targeting a system composed from a relatively loosely coupled set of distributed, cooperating users (peers). Our goal is to use P2P services to allow users to submit jobs to be run in the system and to run jobs submitted by other users on any resources available in the system that meet the minimum job requirements (e.g., memory amount, disk space, etc.). The overall system, from the point of view of a user, can be thought of as a combination of a centralized, Condor-like Grid system for submitting and running arbitrary jobs, and a system such as SETI@Home or BOINC for farming out jobs from a server to be run on a potentially very large collection of machines in a completely distributed environment. Using P2P services can provide a robust, reliable, scalable job submission and execution system that is able to efficiently use widely distributed available computational resources. Such a confluence of peer-to-peer and distributed computing is a natural step in the progression of Grid computing. However, as such a system scales to large configurations, matching jobs with different levels of resource requirements to the set of available heterogeneous computational resources becomes a challenging problem. Therefore, our project develops a set of distributed and decentralized algorithms for submitting jobs and matching them to available resources and uses P2P techniques for both balancing load and for resilience. We expect our scheme to scale well with system size, and to be robust against component failures and peer departures/joins.