Valentina Bayer
bayer@cs.orst.edu
www.cs.orst.edu/~bayer
Affiliation: Department of Computer Science, Oregon State University
Advisor: prof. Tom Dietterich
Title: Machine Learning for Diagnosis: Initial Results
Abstract
Consider the problem of diagnosing a patient (or a device).
The diagnostician iteratively selects a test to perform, analyzes the
results of the test, and decides whether there is enough information
to make a diagnosis or whether to perform another test. Each test has
a cost, and there are also costs for false positive and false negative
diagnoses. Given a complete probabilistic model of the possible
states of the system and the possible results of each test, the AO*
algorithm can be applied to compute the optimal diagnostic procedure.
However, if the probabilistic model is constructed from a sample of
data points, then the model is not known with complete certainty. We
show experimentally that if AO* is applied to such an empirical model,
it can seriously overfit the data and produce a diagnostic procedure
whose expected cost (measured using the true probabilistic model) is
far larger than optimal. We then show that this overfitting problem
can be substantially removed by introducing a form of "statistical
pruning" into the AO* algorithm.