Download the FourthPartyMobile project report here
Fourthparty is an open-source platform that automates the measurement of dynamic web content (e.g. cookies and Javascript calls) by instrumenting Mozilla Firefox and runs on virtually every modern desktop operating system. The FourthParty codebase is at the core of FourthPartyMobile, so definitely check out their website!
FourthPartyMobile is a modified version of FourthParty that supports Android-based mobile devices, such as smartphones and tablets. It is implemented in Java and Javascript, leveraging both the Android SDK and the Mozilla Add-On SDK. Persistent storage is fully compliant with FourthParty's SQLite database schema. Thus, we provide a standardized representation for traditional and mobile crawls, which facilitates data analysis.
Below is an UML diagram of the FourthParty database schema (click on it to enlarge):
FourthPartyMobile was developed as part of a final project for the Fall 2012 offering of Arvind Narayanan's COS 597D - Advanced Topics in Computer Science: Privacy Technologies at Princeton University. We wished to automate the detection of third-party tracking mechanisms while browsing the web on a mobile device. To this end, we decided to adopt the FourthParty project’s approach and instrument a popular open-source mobile browser (i.e. Firefox Mobile) to be used as an enhanced web crawler. This enabled us to log realistic end-user interactions (e.g. execution of embedded scripts) as opposed to just downloading each web page’s static content, which is what traditional web crawlers do.
Mobile application development poses a variety of challenges that need to be addressed for a mobile web crawler to be materialized:
FourthPartyMobile's architecture delegates most of the computation and storage to a supporting server, limiting the mobile device’s responsibilities to fetching one website at a time and generating a log of its latest interactions (e.g. cookies, javascript, embedded HTTP objects). The crawling plugin running on the mobile device sends the interaction log corresponding to the website being visited in the form of SQL statements to the crawling backend running on a server. This way, the amount of state kept in the mobile device’s main memory is minimal and the crawl database, which can be several Megabytes in size, is generated by the supporting server's side.