The master webpage for this project is hosted at the University of Maryland:
http://cs.umd.edu/~srijan/hoax/
Wikipedia has over 35,000,000 articles in over 290 languages. However, not all the articles are genuine. Hoax articles are purely fabricated articles that were created to mislead people.
In the paper cited below we study all actual and wrongly suspected hoaxes ever identified in the English version of Wikipedia. Most of them have been permanently deleted from Wikipedia's version history, so we had access to them only under a non-disclosure agreement. Therefore we are unable to publish the full dataset we work with in the paper. Instead, we publish a smaller dataset of hoaxes that are also publicly available (on websites such as Speedy Deletion Wiki or Deletionpedia), alongside an equally-sized set of non-hoaxes.
This dataset contains a set of 64 hoax articles that are publicly available, and have the following properties:
The dataset contains four folders:
File | Description | Size |
---|---|---|
wiki-hoaxes.zip | Content of hoax and non-hoax articles | 1.0 MB |