Valentina Bayer bayer@cs.orst.edu www.cs.orst.edu/~bayer Affiliation: Department of Computer Science, Oregon State University Advisor: prof. Tom Dietterich Title: Machine Learning for Diagnosis: Initial Results Abstract Consider the problem of diagnosing a patient (or a device). The diagnostician iteratively selects a test to perform, analyzes the results of the test, and decides whether there is enough information to make a diagnosis or whether to perform another test. Each test has a cost, and there are also costs for false positive and false negative diagnoses. Given a complete probabilistic model of the possible states of the system and the possible results of each test, the AO* algorithm can be applied to compute the optimal diagnostic procedure. However, if the probabilistic model is constructed from a sample of data points, then the model is not known with complete certainty. We show experimentally that if AO* is applied to such an empirical model, it can seriously overfit the data and produce a diagnostic procedure whose expected cost (measured using the true probabilistic model) is far larger than optimal. We then show that this overfitting problem can be substantially removed by introducing a form of "statistical pruning" into the AO* algorithm.