The Usable Privacy Policy Project

Towards Effective Web Privacy Notice and Choice

Natural language privacy policies have become the de facto standard to address expectations of “notice and choice” on the Web. However, users generally do not read these policies and those who do struggle to understand them. Initiatives to overcome this problem with machine readable privacy policies or other solutions that require website operators to adhere to more stringent requirements have run into obstacles, with website operators showing reluctance to commit to anything more than what they currently do.
Our project combines machine learning, natural language processing and crowdsourcing to semi-automatically annotate privacy policies. We build models of issues that people are least likely to be aware of and most likely to care about to focus the annotation process and provide users with succinct privacy notices. Learn More

Explore some of our project’s data and analysis results on a dedicated website. Check out

Bank Privacy Website

We automatically collected and analyzed 6,324 standardized privacy notices from financial institutions. See how your bank stacks up.

Privacy Day 2016

Data Privacy Day at CMU featured many events, including a keynote by Ed Felten, the Deputy U.S. Chief Technology Officer. Learn more.

ACL/COLING 2014 Dataset

We created a corpus of 1,010 privacy policies from the top websites ranked on Get the dataset.

This project is funded by the National Science Foundation under its Secure and Trustworthy Computing initiative (CNS-1330596).