Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes

by Srijan Kumar, Robert West and Jure Leskovec

While information on the web has tremendous positive effect on the lives of billions of people worldwide, false information has many dangerous and harmful impact! Hoaxes are delibirately fabricated falsehoods made to masquarade as truth. Therefore, in this work, we conduct a thorough study of all 20,000+ hoaxes created on Wikipedia throughout its history, and understand their impact, characteristics and detection.

Impact of Wikipedia Hoaxes:


We find the impact of hoaxes by quantifying (i) how long they last, (ii) how much traffic they receive (shown on left), and (iii) how heavily they are cited on the Web.
We find that most hoaxes have negligible impact along all of these three dimensions, but that 1% of hoaxes survive for over an year, 1% receive significant attention (more than 100 pageviews a day) even before being uncovered, and are heavily referenced within Wikipedia and across the web.

Characteristics of Wikipedia Hoaxes:

We find typical characteristics of hoaxes by comparing them to non-hoax articles.
We study the characteristics along four dimentions:

  • Appearance: How the article looks like.
  • Link network: How an article's hyperlinked articles are connected to each other. (shown on right)
  • Support: How other articles refer to this article.
  • Editor: Experience of the article creator.

Detection of Wikipedia Hoaxes:


We build machine learning classifiers for various tasks, most notably to identify whether a given article is a hoax or not. Our algorithm has very high performance (shown on left). Simply training on appearance features do no better than random, but digging in with editor properties and link features boosts the performance. This means that faking the content of the article is easy, but faking its relation other Wikipedia articles is not!


The publicly available hoax and similar non-hoax articles can be downloaded below!


Newer hoaxes can be found at: Speedy Deletion Wikia and Deletionpedia.