ACM Home
IFIP Home
----
SIGMETRICS 2001 / Performance 2001 Home
Call for Papers
Organizing Committee
Technical Program Committee
Registration Information
Advanced Technical Program
Tutorials
Workshops
Travel Support for Students
Travel Related Information
Other Links of Interest
----

 

SIGMETRICS 2001 / Performance 2001

Analysis and Implementation of Software Rejuvenation in Cluster Systems

Authors
K. Vaidyanathan
Duke University

R. E. Harper
IBM

S. W. Hunter
IBM

K. S. Trivedi
Duke University
 

Abstract
Several recent studies have reported the phenomenon of ``software aging'', one in which the state of a software system degrades with time. This may eventually lead to performance degradation of the software or crash/hang failure or both. ``Software rejuvenation'' is a pro-active technique aimed to prevent unexpected or unplanned outages due to aging. The basic idea is to stop the running software, clean its internal state and restart it. In this paper, we discuss software rejuvenation as applied to cluster systems. This is both an innovative and an efficient way to improve cluster system availability and productivity. Using Stochastic Reward Nets (SRNs), we model and analyze cluster systems which employ software rejuvenation. For our proposed time-based rejuvenation policy, we determine the optimal rejuvenation interval based on system availability and cost. We also introduce a new rejuvenation policy based on prediction and show that it can dramatically increase system availability and reduce downtime cost. These models are very general and can capture a multitude of cluster system characteristics, failure behavior and performability measures, which we are just beginning to explore. We then briefly describe an implementation of a software rejuvenation system that performs periodic and predictive rejuvenation, and show some empirical data from systems that exhibit aging.

[Last updated Fri Mar 23 2001]