Open positions
Open research positions in SNAP group are available at undergraduate, graduate and postdoctoral levels.

96 million memes from Memetracker

Dataset information

96 million memes from the Memetracker. Memetracker tracks the quotes and phrases that appear most frequently over time across this entire online news spectrum. This makes it possible to see how different stories compete for news and blog coverage each day, and how certain stories persist while others fade quickly.

Overall Memetracker tracks more than 17 million different phrases and about 54% of the total phrase/quote mentions appear on blos and 46% in news media.

For each document (blog post or news media article):


Dataset statistics
Number of documents 96,608,034
Number of memes 210,999,824
Number of links 418,237,269

Source (citation)


Files

File Description
quotes_2008-08.txt.gzMemes and links from Aug 2008
quotes_2008-09.txt.gzMemes and links from Sep 2008
quotes_2008-10.txt.gzMemes and links from Oct 2008
quotes_2008-11.txt.gzMemes and links from Nov 2008
quotes_2008-12.txt.gzMemes and links from Dec 2008
quotes_2009-01.txt.gzMemes and links from Jan 2009
quotes_2009-02.txt.gzMemes and links from Feb 2009
quotes_2009-03.txt.gzMemes and links from Mar 2009
quotes_2009-04.txt.gzMemes and links from Apr 2009

Data format

P http://blogs.abcnews.com/politicalpunch/2008/09/obama-says-mc-1.html T 2008-09-09 22:35:24 Q that's not change Q you know you can put lipstick on a pig Q what's the difference between a hockey mom and a pit bull lipstick Q you can wrap an old fish in a piece of paper called change L http://reuters.com/article/politicsnews/idusn2944356420080901?pagenumber=1&virtualbrandchannel=10112 L http://cbn.com/cbnnews/436448.aspx L http://voices.washingtonpost.com/thefix/2008/09/bristol_palin_is_pregnant.html?hpid=topnews

where the first letter of the line encodes:

Note some documents have zero phrases or zero links.