MemeTracker is an approach for extracting short textual phrases from web documents (news articles and blog posts) and then tracking how such prases spread over the Web and how they change and evolve as they spread.
MemeTracker data contains two datasets:
A collection of raw blog posts and news media articles collected by Spinn3r and released as a part of International Conference on Weblogs and Social Media 2009.
A collection of web crawls from Stanford InfoLab. The web crawls go almost 10 years back.