Mining Reliable Information from Crowdsourced Data
With the proliferation of mobile devices and social media platforms, any person can publicize observations about any activity, event or object anywhere and at any time. The confluence of these enormous crowdsourced data can contribute to an inexpensive, sustainable and large-scale decision support system that has never been possible before. The main obstacle in building such a system lies in the problem of information veracity, i.e., it is hard to distinguish true or accurate information from false or inaccurate ones. In this talk, I will present our efforts towards solving information veracity challenge when crowdsourced data are ubiquitous but their reliability is suspect. When there is no supervision available, we model the task as an optimization problem that jointly searches for source reliability and true facts without any supervision. We showed how our proposed models handle different kinds of data, including data with long-tail distributions, data of heterogeneous types, spatial-temporal data, streaming and distributed data, and how they can support a wide range of applications, including crowdsourcing question answering, knowledge base construction and environmental monitoring. When there exist a small set of annotated samples (training data), we model the reliability assessment problem as a binary classification task. Using fake news detection on social media as an example, I will introduce our proposed framework that incorporates adversarial learning and reinforcement learning to extract event-invariant features and leverage user feedback for improved detection performance. At the end of the talk, I will briefly introduce my other work, which is the integration of complementary views for improved inference in healthcare and transportation domains.