We crawled a URL dataset parsed from Alexa - Top Sites in United States on January 7, 2013: Top 500 (plaintext), Top 500 HTML crawl scripts

After a series of crashes caused by websites containing non-western character sets, we made the decision to focus on the Top US Sites instead of the Top Global Sites. The crawl scripts were generated with the CrawlGenerator program mentioned in the Setting up an automatic multi-website crawl section of the How-To Appendix.


We have crawl results for 6 different devices. All crawls were conducted between January 7, 2013 and January 10, 2013:

Desktop (Ubuntu 12.04, Firefox 11.0) - download

Asus Transformer Pad TF300T (10.1-inch Tablet) - download

Samsung Galaxy Tab 2 (7.0-inch Tablet) - download

Emulated Nexus 7 (7.0-inch Tablet) - download

HTC Evo 4G (4.8-inch Smartphone) - download

Emulated Nexus S (4.0-inch Smartphone) - download

data.txt · Last modified: 2013/02/12 19:26 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki