Discussions about Software at UMCP

October 9, 2006
Sandro Fouche, Adaptive Sampling for Classification of Software Executions.

There is an increasing interest in techniques that support measurement and analysis of fielded software systems. Previously we developed a lightweight technique for automatically classifying execution data as belonging to one of several classes. This technique produced accurate classification models, while greatly reducing the data collection burdens needed by previous approaches. To do this we first used a sequential data collection strategy in which each program instance collects data from only a small weighted random sample (say 5%) of all potential measurements. Initially, the sampling probabilities (weights) are uniform across all measurements, but as each instance returns its observed execution data we adapt the weights to favor measurements with demonstrated predictive power. Next, after all data has been collected, we fed them to our newly invented learning algorithm, association trees. We specially designed this technique for use with the very sparse data sets created by our sampling approach. Our empirical evaluation suggested that this approach was nearly as good as traditional techniques, while collecting only a fraction of the data. One key limitation of this technique, however, is that, while it limits the absolute number of measurement probes enabled in each program instance, this may not translate to large reductions in runtime overhead when probe costs are non-uniform. Here, we present refinements to our technique to correct this situation, as well as, empirical results evaluating the effectiveness of our new approach and discuss tradeoffs between the previous and current techniques.

Web Accessibility