We crawled a URL dataset parsed from Alexa - Top Sites in United States on January 7, 2013: Top 500 (plaintext), Top 500 HTML crawl scripts
After a series of crashes caused by websites containing non-western character sets, we made the decision to focus on the Top US Sites instead of the Top Global Sites. The crawl scripts were generated with the CrawlGenerator program mentioned in the Setting up an automatic multi-website crawl section of the How-To Appendix.
We have crawl results for 6 different devices. All crawls were conducted between January 7, 2013 and January 10, 2013:
Desktop (Ubuntu 12.04, Firefox 11.0) - download
Asus Transformer Pad TF300T (10.1-inch Tablet) - download
Samsung Galaxy Tab 2 (7.0-inch Tablet) - download
Emulated Nexus 7 (7.0-inch Tablet) - download
HTC Evo 4G (4.8-inch Smartphone) - download
Emulated Nexus S (4.0-inch Smartphone) - download