Our key observation is that personal information leaks must occur over the network, so we implement our system in the network using a software middlebox built atop the Meddle platform. ReCon analyzes your network traffic to identify and potentially block personal information leaks according to your preferences. Instead of learning how any one specific app or tracker gathers information from users, our approach uses machine learning to infer when privacy leaks occur based on contextual clues. This means that we can detect when your personal information is shared with other parties without needing you to tell us what your personal information is.
We trained our system on more than 1000 popular iOS and Android apps, and we continuously update ReCon to adapt to the ever-changing privacy risks when using mobile devices. Our system is accurate, identifying 98.2% of leaks for the vast majority of flows in our dataset using a C4.5 Decision Tree (DT) classifier. It is also efficient---it takes less than one milliseconds to identify information leakage for the vast majority of flows. We display personal information leaks via a visualization tool and let the user decide how the system should act on them (e.g., blocking, modifying, or ignoring).
ReCon currently uses VPNs as a portable mechanism to tunnel the data traffic from mobile devices to a machine where users can exert control over network flows. VPNs also reduce the barrier to entry for deploying Meddle because Android, iOS, and Windows, which represent the vast majority of the mobile device market, have native VPN support. Our currently deployment runs in the cloud, and we are also working on in-network deployments, such as a Raspberry Pi plugged into your home network.
Please see the technical report below for more details.