A major challenge in data management is how to manage uncertain data. Many reasons for the uncertainty exists: the data may be extracted automatically from text, it may be derived from the physical world such as RFID data, it may be integrated using fuzzy matches, or may be the result of complex stochastic models. Whatever the reason for the uncertainty, a data management system needs to offer predictable performance to queries over such data.
In this talk I will address a fundamental computational problem in probabilistic databases: given a query, what is the complexity of evaluating it over probabilistic databases? Probabilistic inference is known to be hard in general, but once we fix a query, it becomes a specialized problem. I will show that Unions of Conjunctive Queries (also known as non-recursive datalog rules) admit a dichotomy: every query is either provably #P hard, or can be evaluated in PTIME. For practicaly purposes, the most interesting part of this dichotomy is the PTIME algorithm. It uses in a fundamental way the Mobius' inversion formula on finite lattices (which is the inclusion-exclusion formula plus term cancellation), and, because of that, it can perform probabilistic inference in PTIME on classes of Boolean expressions where other established methods fail, including OBDDs, FBDDs, inference based on bounded tree widths, or d-DNNF's.