Bug Fixes, Improvements, ..., and Privacy Leaks

A Longitudinal Study of PII Leaks Across Android App Versions


The one task we do almost weekly is to update apps in our phones. How does that affect our privacy? Is mobile privacy getting better or worse over time? In this paper, we address this question by studying privacy leaks from historical and current versions of 512 popular Android apps, covering 7,665 app releases over 8 years of app version history. Through automated and scripted interaction with apps and analysis of the network traffic they generate on real mobile devices, we identify how privacy changes over time for individual apps and in aggregate. We find several trends that include increased collection of personally identifiable information (PII) over time across most apps, slow adoption of HTTPS to secure the information sent to other parties, and a large number of third parties being able to link user activity and locations across apps. Interestingly, while privacy is getting worse in aggregate, we find that the privacy risk of individual apps varies greatly over time, and a substan- tial fraction of apps see little change or even improvement in privacy. Given these trends, we propose metrics for quantifying privacy risk and for providing this risk assessment proactively to help users balance the risks and benefits of installing new versions of apps.

Tool. Use our App Versions tool to determine which versions have better privacy based on your privacy preferences.

Anonymized Dataset

The anonymized data for NDSS 2018 paper "app-versions" is here. The file name is organized by {package_name}/logs/{version_code}_http(s).log.

package_name: the package name used by Goolge Play Store to uniquely identify an app
version_code: an integer used internally to keep track of version, which is monotonically increased at each release
Under each {pacakge_name}/ folder, there are two files summarizing the traffic and leaked PII.
{package_name}/raw_traffic.txt: package_name,version_code,time_stamp,unique_flow_id,second_level_domain,host_name,protocol(http|https)
{package_name}/raw_pii.txt: package_name,version_code,time_stamp,unique_flow_id,pii_type

Each test consisted of interacting with a given version for about 5000 Monkey automated interactions. We collected network traffic generated during each experiment using Meddle, and used Mitmproxy to capture both HTTP and the plaintext content of HTTPS flows. For each app requiring a login, we recorded the login events in one version and replaying the events in other versions, using the RERAN tool.

We used five Android devices: one Nexus 6P phone and one Nexus 5X phone, both with Android 6.0.0; and three Nexus 5 phones with Android 4.4.4. All three phones were factory reset before our experiments, and included no apps beyond the stock services and the apps evaluated in this work.

Publication

For more details, please check out the NDSS paper:
  • Jingjing Ren, Martina Lindorfer, Daniel Dubois, Ashwin Rao, David Choffnes, Narseo Vallina-Rodriguez. Bug Fixes, Improvements, ... and Privacy Leaks - A Longitudinal Study of PII Leaks Across Android App Versions. In Proceedings of Network and Distributed System Security Symposium (NDSS'18), February, 2018. [PDF]
To cite:
  • @article{ren2018appversions,
    title={Bug Fixes, Improvements, ... and Privacy Leaks: A Longitudinal Study of PII Leaks Across Android App Versions},
    author={Ren, Jingjing and Lindorfer, Martina and Dubois, Daniel J. and Rao, Ashwin and Choffnes, David R. and Vallina-Rodriguez, Narseo},
    booktitle={Proceedings of Network and Distributed System Security Symposium},
    year={2018}
    }

Following are complete versions of the tables referred in the paper: