Modeling and Performance Evaluation Methodology QoS-oriented Design of Large-scale Storage Systems Scalable Infrastructure for Wide-area Uploads Research
UMD Home
Leana's Home Page
Selected Publications


Scalable Infrastructure for Wide-area Uploads

Hotspots are a major obstacle to achieving scalability in the Internet; they are usually caused by either high demand for some data or high demand for a certain service. At the application layer, hotspot problems have traditionally been dealt with using some combination of increasing capacity, spreading the load over time and/or space, and changing the workload. Some examples of these are data replication (web caching, ftp mirroring), data replacement (multi-resolution images, video), service replication (DNS lookup, Network Time Protocol), and server push (news or software distribution).

These classes of solutions have been studied in the context of applications using the following types of communication: (a) one-to-many (data travels primarily from a server to multiple clients, e.g., web download, software distribution, video-on-demand); (b) many-to-many (data travels between multiple clients, through either a centralized or a distributed server, e.g., chat rooms, video conferencing); and (c) one-to-one (data travels between two clients, e.g., e-mail, e-talk). However, to the best of our knowledge there is no existing work, except ours [D21, D24, D25], on making applications using many-to-one communication scalable and efficient; existing solutions, such as web based uploads, simply use many independent one-to-one transfers. This corresponds to an important class of applications, whose examples include the various upload applications such as submission of income tax forms to IRS, conference paper submission, proposal submission through the NSF FastLane, homework and project submissions in distance education, Internet-based storage, and many more. The main focus of our work is scalable infrastructure design for wide-area upload applications.

Traditional solutions aimed at downloads are data replication (e.g., caching) and data replacement. Clearly, these techniques are not applicable to uploads since all the data is distinct. Recently [D21] we proposed Bistro, a framework for building scalable wide-area upload applications which employs the use of intermediaries, termed bistros, for improving the efficiency and scalability of uploads. We observed that the existence of hotspots in many upload applications is due to approaching deadlines and long transfer times (although here we focus on uploads with deadlines, our framework can provide a scalable solution to other upload applications as well). We also observed that what is actually required by many upload applications is an assurance that specific data was submitted before a specific time, and that the transfer of the data needs to be done in a timely fashion, but does not have to occur by that deadline (since the data is often not consumed by the server immediately upon receipt). Thus, our approach is to break the original deadline-driven upload problem into the following pieces: (a) a real-time timestamp subproblem, where we ensure that the data is timestamped and that the data cannot be subsequently tampered with; (b) a low latency commit subproblem, where the data goes ``somewhere'' (to an intermediary) and the user is assured that the data is safely and securely ``on its way'' to the server; and (c) a timely data transfer subproblem, which can be carefully planned (and coordinated with other uploads) and results in data delivery to the original destination. This means that we have taken a traditionally synchronized client-push solution and replaced it with a non-synchronized solution that uses some combination of client-push and server-pull approaches. Consequently, we eliminate the hotspots by spreading most of the demand on the server over time.

Bistro's ability to share an infrastructure, such as an infrastructure of proxies, between a variety of wide-area applications has clear advantages over the more traditional solutions. In [D25], we conducted a performance study which demonstrated the potential performance gains of the Bistro framework as well as provided insight into the general upload problem. Moreover, Bistro does not rely on the existence of a private infrastructure; however it does not preclude it either. Since confidentiality of data as well as other security issues are especially important in upload applications and in our solution where we introduced untrusted (public) intermediaries (i.e., bistros), we also developed [D24] a secure data transfer protocol within the Bistro framework, which not only ensures the privacy and integrity of the data but also takes scalability considerations into account.

[Last updated Sat Jul 29 2000]