Identifying User-Input Privacy in Mobile Applications at a Large Scale
Additional Document Info
2005-2012 IEEE. Identifying sensitive user inputs is a prerequisite for privacy protection in mobile applications. When it comes to today's program analysis systems, however, only those data that go through well-defined system Application Program Interface (system controlled resources) can be automatically labeled. In this paper, we show that this conventional approach is far from adequate, as most sensitive inputs are actually entered by the user at an app's runtime. In this paper, we inspect 13,072 top apps from Google Play, and find that 38.69% of them involve sensitive user inputs. Just like system controlled resources, these data are also exposed to a series of privacy leakage threats. For these sensitive user inputs, manually marking them involves a lot of efforts, impeding a large-scale, automated analysis of apps to defend against potential privacy leakage. To address this important issue, we present UIPicker, an adaptable framework for automatic identification of sensitive user inputs as the first step. UIPicker is designed to detect the semantic information within the application layout resources and the program code, and further analyze it for the locations where security-critical information may show up. This approach can support a variety of existing security analysis on mobile apps. We evaluate our approach over randomly selected popular apps on Google Play. UIPicker is able to accurately label sensitive user inputs most of the time, with 94.0% precision and 96.0% recall.