SlideShare a Scribd company logo
1
Helping Developers with
Privacy
University of Wisconsin, Madison
Dec 2023
Jason Hong
Computer
Human
Interaction:
Mobility
Privacy
Security
Helping
Developers
with
Privacy:
2
Helping
Developers
with
Privacy:
3
Helping
Developers
with
Privacy:
4
Helping
Developers
with
Privacy:
5
New Kinds of Guidelines and Regulations
European Union
General Data Protection
California Consumer
Privacy Act
Platform requirements
(Google Play)
Helping
Developers
with
Privacy:
6
A Long View on Privacy
• I’ve been working on mobility, privacy, security
for ~20 years
• My previous work focused on end-users
– Better user interfaces for sharing / configuration
– But this focus kept the burden entirely on end-users
– Cookies, VPNs, trackers, permissions… it’s too much!
• Current stance: foster ecosystem of privacy
– Analogy with email spam
– Shift burden onto HW, OS, service providers, auditors
– Devs are a major (and underlooked) point of leverage
Helping
Developers
with
Privacy:
7
Why is Privacy Hard?
There’s a lot to know and do
• On the one hand, things have gotten better
– 20+ years ago, we weren’t much further beyond
“privacy is the right to be let alone”
– Few tools, little guidance, best practices for devs
• Today, more clarity, but a lot of work for devs
– Users have right to see / export their data
– Data retention (and right to be forgotten)
– Privacy nutrition labels (more on this later)
– In-app notifications for smartphone apps
– Keeping up to date with specific privacy APIs in OS
– And many more… again, this is a lot of work!
Helping
Developers
with
Privacy:
8
Why is Privacy Hard?
Devs lack Awareness, Motivation, Ability
• Security & Privacy Acceptance Framework (SPAF)
– Foundations & Trends in Security and Privacy
Helping
Developers
with
Privacy:
9
Why is Privacy Hard?
Devs lack Awareness, Motivation, Ability
• Security & Privacy Acceptance Framework (SPAF)
– Awareness: understands threats and protections
– Motivation: wants to employ best practices
– Ability: capable of converting intention into action
• Awareness of privacy issues is low
– A summary from our surveys, interviews, user studies
– Low awareness of privacy regulations in general
– Low awareness of what to do about privacy
– /r/AndroidDev subreddit, devs rarely talk about privacy
Helping
Developers
with
Privacy:
10
Why is Privacy Hard?
Devs lack Awareness, Motivation, Ability
• Motivation is uneven
– /r/AndroidDev, privacy is sticks with no carrots
• Lots of work, but unclear benefit for the app or devs
– Platform requirements are a strong point of leverage
• Platform requirements >> legal requirements
• Ability is low
– StackOverflow survey about how devs learned to code
https://insights.stackoverflow.com/survey/2021
Helping
Developers
with
Privacy:
11
Helping
Developers
with
Privacy:
12
Why is Privacy Hard?
Devs lack Awareness, Motivation, Ability
• Ability
– Most devs lack formal CS education
• And even then, few require security, let alone privacy
• So broadly, almost everyone unprepared for privacy
– Most devs are C programmers (and I’m not referring
to the programming language 😊)
Helping
Developers
with
Privacy:
13
Today’s Talk
Whirlwind Tour of Past Decade of Our Team’s Work
• Why is Privacy Important?
• Why is Privacy Hard?
• Our Studies on App Developers and Privacy
– Survey + interviews w/ smartphone app developers
– Analysis of /r/AndroidDev subreddit
– User studies on privacy nutrition labels
• Our Tools for Developers and Privacy
– Source code privacy annotations
– Peekaboo software architecture for smart homes
• Reflections on Privacy and What’s Next
Helping
Developers
with
Privacy:
14
Study 1 – Interviews and Survey
What Do App Developers Know about Privacy?
• What knowledge do devs have? What tools?
What incentives?
• Are there potential points of leverage?
• Interviewed 13 smartphone app developers
• Surveyed 228 smartphone app developers
– Got a good mix of experiences and size of orgs
Balebako et al, The Privacy and Security Behaviors
of Smartphone App Developers. USEC 2014.
Helping
Developers
with
Privacy:
15
Study 1 Summary of Findings
Third-party Libraries Problematic
• Devs often use ads and analytics libraries
Helping
Developers
with
Privacy:
16
Study 1 Summary of Findings
Third-party Libraries Problematic
• Devs often use ads and analytics libraries
• But hard to understand library behaviors
– A few didn’t know they were using libraries
(based on inconsistent answers)
– Some didn’t know these libraries collected data
• We’ll see issue of libraries repeated quite often
– In a later study, 40% Android apps used sensitive data
only b/c of libraries [Chitkara 2017]
Helping
Developers
with
Privacy:
17
Study 1 Summary of Findings
Devs Don’t Know What to Do
• Low awareness of existing privacy guidelines
– Had little knowledge of Fair Information Practices,
FTC guidelines, Google requirements
– Often just ask others around them
• Now also ask StackOverflow, Reddit, etc
– Note that this study was before GDPR and CCPA
• Low perceived value of privacy policies
– Mostly protection from lawsuits
– “I haven’t even read [our privacy policy]. I mean, it’s
just legal stuff that’s required, so I just put in there.”
Helping
Developers
with
Privacy:
18
Study 2 – Interviews
How do developers address privacy when coding?
• Semi-structured interview of 9 Android devs
asking about their three most recent apps
– What data collected in app and how used
• Ex. Libraries used?
• Ex. Was data sent to cloud server?
• Ex. How and where data stored?
Li et al, Coconut: An IDE plugin for developing
privacy-friendly apps. IMWUT / Ubicomp 2018.
Helping
Developers
with
Privacy:
19
Study 2 Findings
Inaccurate Understanding of Their Own Apps
• We compared responses if app on app store
– Some data practices they claimed didn’t match app
behaviors
• Three major reasons for mismatch
– Lacked knowledge of library behaviors
– Fast iterations led to changes in data collection and
data use, hard to keep up to date
– Team dynamics
• Don’t know what other devs on team doing
• Dev turnover, use of sensitive data not documented
Helping
Developers
with
Privacy:
20
Study 2 Findings
Lack of Knowledge of Alternatives and Tradeoffs
• Ex. Many apps use some kind of identifier of who
you are, and different identifiers have tradeoffs
– Hardware identifiers (riskiest since persistent)
– Application-specific identifier (email, hashcode)
– Advertising identifier
• However, devs lacked knowledge of alternatives
and tradeoffs
– Devs often just went with first solution they found
online (i.e. StackOverflow)
Helping
Developers
with
Privacy:
21
Study 2 Findings
Lack of Motivation to Address Privacy Issues
• Might ignore privacy issues if not required
– Ex. Might get location permission for one reason
(maps), but also use for other reasons (ads)
– Ex. Might get name and email, but only need email
– Ex. Might get device ID because no permission needed
• Two things that did work: Android permissions
and Play Store requirements
– Platforms and OSes have major role in improving
privacy, but there are tradeoffs
Helping
Developers
with
Privacy:
22
Study 3 – Analyzing Discussion Forum
How Devs Talk about Privacy on /r/AndroidDev
Li et al, How Developers Talk about Personal Data
and What It Means for User Privacy: A Case Study
of a Developer Forum on Reddit. CSCW 2020.
Helping
Developers
with
Privacy:
23
Study 3 Method
How Devs Talk about Privacy on /r/AndroidDev
• At time of our study (early 2020):
– Had over 144,000 subscribed readers
– ~12 new threads and 175 posts per day
• Method
– Crawled 46k threads (666k posts) Mar2009-Feb2020
– Identified list of 44 keywords suggesting personal info
• Ex. ssn, passport number, phone number, street name
• We used this rather than “privacy” since too restrictive
– Resulted in 6827 threads -> manually examined
– Resulted in 329 threads -> manually coded
Helping
Developers
with
Privacy:
24
Study 3 Findings
How Devs Talk about Privacy on /r/AndroidDev
• Devs frustrated and discontented about privacy
– Often felt like sticks with no carrots
1. Privacy enhancement measures not always
perceived as helpful
– Each new version of Android has new privacy changes
and enforcement mechanisms
– But lacked explanation of why it will help with privacy
– Confusion and skepticism
• “Scoped storage - Hey, wait?! What exact problem they
are trying to solve with this?”
Helping
Developers
with
Privacy:
25
Study 3 Findings
How Devs Talk about Privacy on /r/AndroidDev
2. Privacy restrictions rigid and hurt legitimate uses
– Protects from malicious apps, but breaks legit apps
– Ex. Regarding Android P (v9) disabling foreground
access to location when power-save mode is on:
“Expecting that the user will understand that power
saving needs to be disabled for navigation to work is
ridiculous. And when navigation doesn’t work, guess
who’s app gonna get 1-star angry review?”
Helping
Developers
with
Privacy:
26
Study 3 Findings
How Devs Talk about Privacy on /r/AndroidDev
3. Lack sufficient support for compliance
– Wanted more support to comply with GDPR + CCPA
• Ex. AdMob too opaque to comply with consent dialogs
– Wanted more support with Android OS pop-ups
• Ex. Just writing any file requires storage permission,
but this pops up a dialog that says “access photo
media and files on your devices” which can be scary
– Unpredictable app review process when using
sensitive permissions led to lots of frustration
Helping
Developers
with
Privacy:
27
Study 4 – User study
How Well Can Devs Fill Out Privacy Nutrition Labels?
Apple App Privacy
Details (2020)
Google Safety
Section (2022)
Helping
Developers
with
Privacy:
28
How Nutrition Labels Are Created Today
Is the app collecting data?
For each
data type:
Select collected
data type(s)
from 32 data types
in 14 categories
Select data purpose
Is the data
linked to users?
Is the data
used to track users?
Helping
Developers
with
Privacy:
29
Two Notes about Nutrition Labels
• Currently required for all new / updated apps
– For both Apple App Store and Google Play
• Currently some enforcement of labels
– Heard informally that Apple might call up some devs
and let them know of nutrition label mismatches
– Not clear how they check (probably manually),
and unlikely this approach scales
Helping
Developers
with
Privacy:
30
Pop Quiz (yes this will be on final  )
• Imagine you are an iOS app developer
– Your health app periodically gets and stores users’
location data (e.g. think Strava)
• Is this location data “data used to track you”?
– Y / N / ?
Helping
Developers
with
Privacy:
31
Study 4
• Conducted study with 12 iOS developers to
understand their experiences and perceptions
• (Re)create nutrition label for their apps, then
semi-structured interviews to identify errors
• Errors were common
– 9/12 made errors that weren’t corrected before
prompted by interviewer
– Among 8 apps that already had a privacy label,
6 were re-created inconsistently (!)
Li et al, Understanding Challenges for Developers to
Create Accurate Privacy Nutrition Labels. CHI 2022.
Helping
Developers
with
Privacy:
32
Study 4 Kinds of Errors
Under-reporting and Over-reporting Errors
• Under-Reporting (didn’t report but should have)
• Over-Reporting (did report but shouldn’t have)
– Ex. some devs thought location tracking was
definitely a type of data used to track users
– Apple’s definition: “track” only refers to tracking for
advertising purposes
For quiz, getting
location data for health
reasons is not tracking
for Apple
Helping
Developers
with
Privacy:
33
Study 4 Results
Why do Devs Under/Over-Report?
• Unknown unknowns
– Devs don’t realize they don’t know something
– Ex. Preconceptions of what “tracking” means
– Ex. Not realizing documentation about third-party
library privacy behaviors exist (e.g. Google Analytics)
• Known unknowns
– Unfamiliar terms: hashed email address, a latitude
and longitude with three or more decimal places, etc.
– Jargon: data broker, purchase tendencies, etc.
– Cultural: what’s a credit score?
Helping
Developers
with
Privacy:
34
Study 4 Results
Why do Devs Under/Over-Report?
• Complexity in general
The text developers
need to read for
selecting data types
collected by the app (!)
Helping
Developers
with
Privacy:
35
Study 4 Results
Why do Devs Under/Over-Report?
• Complexity in general
– Of documentation (previous slide)
– Cross-platform differences
• Google: Collect = transmit the data off the device
• Apple: Collect = transmit the data off the device and
store it
Helping
Developers
with
Privacy:
36
Recap of Studies and Some Reflections
• Awareness of privacy in general is low
– Of privacy regulations, best practices, and sometimes
even their own app
– Latter due to third-party libraries, team dynamics
• Motivation is uneven
– Not great, but must comply with platform and OS
• Ability is low
– Lots of errors / misconceptions in nutrition labels
– Complexity high (team dynamics, documentation)
– Few tools to help
Helping
Developers
with
Privacy:
37
Today’s Talk
• Why is Privacy Important?
• Why is Privacy Hard?
• Our Studies on App Developers and Privacy
– Survey + interviews w/ smartphone app developers
– Analysis of /r/AndroidDev subreddit
– User studies on privacy nutrition labels
• Our Tools for Developers and Privacy
– Source code privacy annotations
– Peekaboo software architecture for smart homes
• Reflections on Privacy and What’s Next
Helping
Developers
with
Privacy:
38
Systems Our Team has Built
• Static and dynamic analysis tools to infer purpose
• PrivacyProxy VPN to find likely PII
• PrivacyGrade.org
• Privacy Enhancements
for Android (DARPA)
• PrivacyStreams
• Privacy Annotations
• Peekaboo software architecture for smart homes
Helping
Developers
with
Privacy:
39
Privacy Annotations in Source Code
• Problem: Hard to correctly infer what data an
app is accessing and its purpose of use
– Today, can say app uses “location” or “contact list”
– We want something like app uses “location for maps”
or “contact list for backups”
• Our idea: have devs declare purposes in source
code using privacy annotations
– Hints are a common idea in systems research
– Annotations exist in Java and many other languages
Helping
Developers
with
Privacy:
40
Example Privacy Annotation for Android
• Our long-term vision for annotations
– If developers do a little extra work adding annotations,
we can greatly improve entire privacy ecosystem
– Give devs feedback about APIs (e.g. identifiers)
– Facilitate auditing internally and externally
– Help generate UIs for privacy (less work for devs!)
– Help generate privacy nutrition labels (less work again)
Helping
Developers
with
Privacy:
41
Our Work in Privacy Annotations
• Three different plug-ins for Android Studio
• Coconut
– Use annotations for API feedback and basic auditing
[Li et al, Ubicomp 2018]
• Honeysuckle
– Use annotations to help auto-generate privacy user
interfaces [Li et al, Ubicomp 2021]
• Matcha
– Use annotations to help auto-generate privacy
nutrition labels [under review]
Helping
Developers
with
Privacy:
42
Matcha Plug-In
Help Generate Privacy Nutrition Labels
• As we previously saw, lots of errors in labels
• Idea: Use privacy annotations facilitated by
(simple) static analysis
• Available for Android Studio/IntelliJ IDEA users
– https://matcha-ide.github.io
Helping
Developers
with
Privacy:
43
Matcha Plug-In
Help Generate Privacy Nutrition Labels
• Matcha looks for use of sensitive data
– API calls that access user data / send user data out
– Keywords too (e.g. “email”, “ssn”)
• Prompts dev to add annotations
– On Data access / Data egress
• Matcha detects most popular libraries
– Devs fill out XML file saying how a library is used
(since not all functionality in library might be used)
• Matcha translates annotations into a label
– Google Developer Console supports CSV upload
Helping
Developers
with
Privacy:
44
Matcha Plug-In
Help Generate Privacy Nutrition Labels
• User study with eight Android devs on their apps
– Create label using Google Play dev console (baseline)
– Then create label using Matcha
– Review any discrepancies between the two versions
• Matcha was able to correct a large number of
errors over baseline condition
– Matcha labels reported 5.2x data types collected or
shared than their baseline counterparts
• Matcha took longer, but all devs preferred it
• Lead author currently at Google Checks team
Helping
Developers
with
Privacy:
45
Peekaboo Smart Home Infrastructure
• Imagine you’re trying to buy a smart TV
– Box says “only sends summary viewing data to us”
– How can we know it’s really doing what it claims?
Helping
Developers
with
Privacy:
46
Peekaboo Smart Home Infrastructure
• Imagine you’re trying to buy a smart TV
– Box says “only sends summary viewing data to us”
– How can we know it’s really doing what it claims?
• This is a fundamental challenge for CompSci
– Ex. Verifier or proof of correctness
– Ex. Some kind of hypervisor or sandbox
– Ex. Digital signature (so you know who to blame)
– Ex. Trusted computing base
– Ex. Multi-party computation / Federated learning
– Ex. Differential privacy
Helping
Developers
with
Privacy:
47
Peekaboo Smart Home Infrastructure
• What if every app and device had a manifest?
– Short, human readable, computationally enforceable
description of what the app can do (whitelist)
– Ex. Android manifest (location data, contact list, etc)
– Ex. Manufacturer Usage Descriptions (IP addresses)
• This is a good start, but “all or nothing” access
– Entire contact list, or nothing
– All SMS messages, or nothing
Jin et al, Peekaboo: A Hub-Based Approach to Enable Transparency
in Data Processing within Smart Homes. Oakland 2022.
Helping
Developers
with
Privacy:
48
Peekaboo Smart Home Infrastructure
• Overaccess is the mismatch
between what apps need
and what underlying
permission system supports
– Ex. Sleep monitor app uses
mic but only needs loudness
– Ex. App wants “SMS messages”
but really only checks a few
phone#s and for 2FA codes
– Ex. Zoom wants access to full
calendar, but only needs to
add calendar events
Helping
Developers
with
Privacy:
49
Peekaboo Smart Home Infrastructure
• Overaccess is also true for smart home scenarios
– Analyzed 200+ smart home scenarios, 77% didn’t need
raw data, instead some processed form of data
– Ex. Smart TV needs “most popular channels viewed”
– Ex. Sleep monitor needs microphone “loudness”
– Ex. Face recognition needs just the face
• Can’t have a unique permission for every case,
doesn’t scale
– Too many permissions, hard for developers to learn
and choose
– Inflexible, hard to adapt for new use cases
Helping
Developers
with
Privacy:
50
Peekaboo Smart Home Infrastructure
• Idea: Three interlocking parts
– Devs must declare access to all sensitive data in a
manifest, consisting of what data + transformations
– Transformations via fixed set of operators (like Unix pipes)
• Microphone -> loudness -> sleep.com
• @1 week -> get TV logs -> sort -> filter -> hdtv.com
– Trusted hub enforces manifest and runs all operators
Helping
Developers
with
Privacy:
51
Discussion
• Benefits of Peekaboo manifest
– Can see hypothetical data accesses (before install)
– Can monitor actual data flows, and to where
– Can combine manifests together (e.g. house)
– Can convert into human-readable descriptions
– Can insert own operators (e.g. blur all faces)
– Can turn off some flows selectively
– Auto-generate interactive privacy nutrition labels
• But
– Requires trusted and fixed set of operators
– Requires operators to be “complete”
– Good for dataflow architectures, unclear about others
Helping
Developers
with
Privacy:
52
Some Reflections
• Use privacy annotations throughout entire
codebase to help with ecosystem of privacy
– Check what’s collected, where, how used
(frontend + storage + network traffic + backend)
– Enforce (e.g. when uploading app to app store)
– Audit (e.g. for devs, for non-technical people)
– A little extra work for devs, but big benefits for all
• Auditing is under-researched but has big potential
– Unlikely can prove correctness, so make auditing easier
– Journalists, FTC, third-parties can have large effect
– FTC in particular is major potential point of leverage
Helping
Developers
with
Privacy:
53
Some Reflections
• Annotations for X
– Developers need to know a lot to be effective
– Can annotations help with other non-functional
requirements like security, accessibility, usability?
• Manifests are like high level abstraction of code
– Are there other ways of specifying behaviors?
– Can they also help us build larger systems?
• Ex. Isolate Log4j along with developer manifest only
allowing access to writing logs (and not LDAP and JNDI)
– Can they be applied elsewhere?
• Plugins (e.g. for web browsers), docker instances
Helping
Developers
with
Privacy:
54
Some Reflections
New book chapter on
mobile sensing + privacy
Similarities with privacy +
AI Bias in CACM Aug 2023
Helping
Developers
with
Privacy:
55
Closing Thoughts
• Whirlwind tour of some of our research on devs
and privacy over the past decade
– Libraries are problematic
– Platforms+OS and journalists+FTC are major points
of leverage for improving ecosystem
– Privacy annotations + Manifests
• Stepping back, even though privacy challenges
are bigger now, I’m more hopeful than ever
– We’re at an inflection point for privacy
– But, still lots more to do, and could use your help!
Helping
Developers
with
Privacy:
56
Thanks!
Many, many collaborators:
Special thanks to:
• DARPA Brandeis
• Google
• NSF
• Yuvraj Agarwal
• Shah Amini
• Rebecca Balebako
• Deyuan Chen
• Fanglin Chen
• Lorrie Cranor
• Mike Czapik
• Kevan Dodhia
• Matt Fredrikson
• Bill Guo
• Yao Guo
• Yuanchun Li
• Shawn Hanna
• Gang Huang
• David Hwang
• Haojian Jin
• Swarun Kumar
• Tianshi Li
• Toby Li
• Yucheng Li
• Jialiu Lin
• Gram Liu
• Minyi Liu
• Elizabeth Louie
• Song Luan
• Abby Marsh
• CMU Cylab
• NQ Mobile
• Alfred P. Sloan
• Elijah Neundorfer
• Kayla Reiman
• Ritu Roychoudhury
• Norman Sadeh
• Swarup Sahoo
• Gaurav Srivastava
• Mike Villena
• Haoyu Wang
• Jason Wiese
• Yaxing Yao
• Alex Yu
• And many more…
• Cisco
• Intel
Helping
Developers
with
Privacy:
57
Helping
Developers
with
Privacy:
58
Helping
Developers
with
Privacy:
59
Some Smartphone Apps Use Your Data in
Unexpected Ways
Shared your location,
gender, unique phone ID,
phone# with advertisers
Uploaded your entire
contact list to their server
(including phone #s)
Helping
Developers
with
Privacy:
60
More Unexpected Uses of Your Data
Location Data
Unique device ID
Location Data
Network Access
Unique device ID
Location Data
Microphone
Unique device ID
Helping
Developers
with
Privacy:
61
PrivacyGrade.org
• Improve transparency
• Assign privacy grades to all
1M+ Android apps
• Does not help devs directly
Helping
Developers
with
Privacy:
62
Helping
Developers
with
Privacy:
63
Helping
Developers
with
Privacy:
64
Helping
Developers
with
Privacy:
65
Helping
Developers
with
Privacy:
66
Expectations vs Reality
Helping
Developers
with
Privacy:
67
Privacy as Expectations
Use crowdsourcing to compare what people expect
an app to do vs what an app actually does
App Behavior
(What an app
actually does)
User Expectations
(What people think
the app does)
Helping
Developers
with
Privacy:
68
How PrivacyGrade Works
• We crowdsourced people’s expectations of
core set of 837 apps
– Ex. “How comfortable are you with
Drag Racing using your location for ads?”
• We generated purposes by examining
what third-party libraries used by app
• Created a model to predict people’s likely
privacy concerns and applied to 1M Android apps
Helping
Developers
with
Privacy:
69
How PrivacyGrade Works
Helping
Developers
with
Privacy:
70
How PrivacyGrade Works
• Long tail distribution of libraries
• We focused on top 400 libraries, which covers
vast majority of cases
Helping
Developers
with
Privacy:
71
Impact of PrivacyGrade
• Popular Press
– NYTimes, CNN, BBC, CBS, more
• Government
– Earlier work helped lead to FTC fines
• Google
– Google has something like PrivacyGrade internally
• Developers
Helping
Developers
with
Privacy:
72
Market Failure for Privacy
• Let’s say you want to purchase a web cam
– Go into store, can compare price, color, features
– But can’t easily compare security (hidden feature)
– So, security does not influence customer purchases
– So, devs not incentivized to improve
• Same is true for privacy
– This is where things like PrivacyGrade can help
– Improve transparency, address market failures
– More broadly, what other ways to incentivize?
Helping
Developers
with
Privacy:
73
How to Get People to Change Behaviors?
Security Sensitivity Stack
Awareness
Knowledge
Motivation
Does person know of existing threat?
Does person know tools, behaviors,
strategies to protect?
Can person identify attack / problem?
Can person use tools, behaviors,
strategies?
Does person care?
Helping
Developers
with
Privacy:
74
Security Sensitivity Stack Adapted for
Developers and Privacy
Awareness
Knowledge
Motivation
Are devs aware of privacy problem?
Ex. Identifier tradeoffs, library behavior
Do devs know how to address?
Ex. Might not know right API call
Do devs care?
Ex. Sometimes ignore issues if not required
Helping
Developers
with
Privacy:
75
Coconut IDE Plug-In Evaluation
• Lab study of Coconut
– Lab studies: 9 + 9 developers (w/ and w/o plug-in)
– Tasks: build a weather app, use 3rd party library for ad
monetization, store ID and location locally (analytics)
• Ideally: coarse-grained location for weather and ads,
private storage for local data, not hardware ID
– Participants were informed privacy important here
– Could also use any resource (e.g. search engine)
– Interview, surveys, answer questions about app
behavior, write a 1 paragraph privacy policy for app
Helping
Developers
with
Privacy:
76
Coconut IDE Plug-In Evaluation Results
• Participants with plug-in
– Better privacy practices (more likely to follow ideal case)
– Better at answering questions about their app
• Ex. Granularity of location used, frequency, sent
• Participants w/o plug-in
– Many didn’t realize ad library was sending data
• Had two judges evaluate privacy policies
– Coconut avg = 5.8, control = 2.8 (out of 10)
• Perceived as not too disruptive, also very useful
– Med. for “Disruptive” & “Time consuming” = 2 out of 7
Helping
Developers
with
Privacy:
77
Opportunities with Annotations
• Use annotations to help other aspects of privacy
– Annotations can be embedded into compiled code
• Can be used to help with checking
• Ex. App says it only uses location for maps, verify that
– Use annotations to help generate privacy policies
– Use annotations to generate good UIs
• Ex. Runtime UIs
• Ex. Better explanations
• Stepping back: the more
value to annotations,
more likely to be adopted
Helping
Developers
with
Privacy:
78
PrivacyStreams Programming Model
Observation 1: Many Apps Don’t Need Raw Data
# apps need coarse-grained data
# apps need fine-grained data
Based on a manual examination of 99 popular apps in Google Play and 20
apps in research papers.
location microphone contacts messages
Li et al. PrivacyStreams: Enabling Transparency in Personal Data
Processing for Mobile Apps. PACM on Interactive, Mobile,
Wearable, and Ubiquitous Technologies (IMWUT) 1(3). 2017.
Helping
Developers
with
Privacy:
79
PrivacyStreams Programming Model
Observation 2: Difficult for Devs to Get Sensitive Data
int sampleRate = 8000;
int bufferSize = AudioRecord.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_IN_DEFAULT,
AudioFormat.ENCODING_PCM_16BIT);
AudioRecord audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, sampleRate,
AudioFormat.CHANNEL_IN_DEFAULT, AudioFormat.ENCODING_PCM_16BIT, bufferSize);
Deal with encoding, format, etc.
audioRecord.startRecording();
long startTime = System.currentTimeMillis();
double rmsAmplitude = 0;
long bufferTotalLen = 0;
while (true) {
short[] buffer = new short[bufferSize];
int bufferLen = audioRecord.read(buffer, 0, bufferSize);
for (int i=0; i < bufferLen; i++) {
rmsAmplitude += (double) buffer[i] * buffer[i] / 10000;
}
bufferTotalLen += bufferLen;
long currentTime = System.currentTimeMillis();
if (currentTime - startTime > DURATION) {
break;
}
}
Process raw data
while (true) {
// …
try {
Thread.sleep(INTERVAL);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Handle threads
if (ContextCompat.checkSelfPermission(this.context,
Manifest.permission.RECORD_AUDIO)
!= PackageManager.PERMISSION_GRANTED) {
Log.d("Task0", "Permission denied.");
ActivityCompat.requestPermissions(thisActivity,
new String[]{Manifest.permission.READ_CONTACTS}, 1);
return;
} Handle permissions
80
UQI.getData(Audio.recordPeriodic(DURATION, INTERVAL),
Purpose.HEALTH("monitor sleep"))
.setField("loudness", calcLoudness(Audio.AUDIO_DATA))
.forEach("loudness", callback);
Developers
Auditors
End-users
Audio loudness app
calcLoudness callback
“This app will only get access to the
microphone loudness.”
PrivacyStreams Makes Privacy a Side
Effect of Helping Developers
See tutorials and code at privacystreams.github.io
Helping
Developers
with
Privacy:
81
User Study
• Goal
– Is PrivacyStreams easy to use and liked?
– Can we correctly analyze apps?
• Study 1: Lab study
– 10 Android devs, 5 programming tasks
– Use both PrivacyStreams and Android standard APIs
• Study 2: Field study
– 5 experienced Android devs, 5 real apps (2 weeks)
– Writes/rewrite an app with PrivacyStreams
• Study 3: Privacy analysis
– Analyze the 5 apps developed in the field study
82
N=2
N=2
N=2
N=1
N=2
N=4 N=4
N=3
N=6
N=3
Average time
(minutes)
Contact Location SMS Image Geofence
Study 1 Results
Devs More Efficient Using PrivacyStreams
83
App
Analysis
time (s)
Generated description
Speedometer 12.17
This app requests LOCATION permission
to get the speed continuously.
Lockscreen app 2.94
This app requests CALL_LOG
permission to get the last missed call.
Weather app 14.72
This app requests LOCATION permission
to get the city-level location.
Sleep monitor 13.03
This app requests MICROPHONE
permission to get how loud it is.
Album app 14.36
This app requests STORAGE permission
to get all local images.
Study 3 Results
Analyzing Developed Apps
Helping
Developers
with
Privacy:
84
Opportunities for PrivacyStreams
• We think this could be a new and general way to
manage third-party access to sensitive data
– Ex. Browser plug-ins, IoT, databases of sensitive data
• Looking at how to incorporate machine learning
into pipeline (combining multiple streams)
• Looking to integrate this into Privacy-Enhanced
Android, DARPA Brandeis project on privacy
– And then convince Google, Apple, others that this is
the way to go for third-party APIs
Helping
Developers
with
Privacy:
85
Some Reflections on Privacy,
and a Call to Action
• Smartphone privacy is just one slice of privacy
• Devs need privacy help for web, IoT, cloud,
backend database processing, and more
– Third-party libraries too (both creating and using)
• Devs also need help with entire lifecycle of data
– Collection, storage, inferencing, usage, sharing,
presentation to end-users, auditing, documentation
– Distributed teams, turnover, versioning
• Close with two frameworks for thinking about
research in this space
Helping
Developers
with
Privacy:
86
Allen Newell’s Time Bands of Cognition
Applied to Developers and Privacy
101 Unit Task
100 Operations
10-1 Deliberate Act
104 Task
103 Task
102 Task
107
106
105
Scale (sec)
Cognitive
Rational
Social
Stratum
Annotations
API usage
Quick fixes
Understanding a library
Design Patterns
Code documentation
Sharing best practices
Defining privacy policies
Code reviews
Examples
Helping
Developers
with
Privacy:
87
Allen Newell’s Time Bands of Cognition
Applied to Developers and Privacy
101 Unit Task
100 Operations
10-1 Deliberate Act
104 Task
103 Task
102 Task
107
106
105
Scale (sec)
Cognitive
Rational
Social
Stratum
Annotations
API usage
Quick fixes
Understanding a library
Design Patterns
Code documentation
Sharing best practices
Defining privacy policies
Code reviews
Examples
Consider how to link your
idea across time scales; a
single point solution might
not have enough value to
be adopted
Helping
Developers
with
Privacy:
88
How Can We Help Developers Do Better
with Respect to Privacy?
• Why devs? Shouldn’t lawyers and management
be handling privacy issues?
• Lots of privacy decisions will be made by devs
– Google, Facebook, etc can afford privacy teams,
but still require devs to design, implement, check
– For long tail of small and medium businesses,
devs will have to make a lot of decisions
• Also, privacy is a lot of work for devs and hard to
get right!
Helping
Developers
with
Privacy:
89
DARPA Brandeis
• There are all these amazing things we could do if
we can legitimately address privacy concerns
• Four year program seeking to advance privacy
– Enterprise privacy
– IoT privacy
– Smartphone Privacy -> Privacy-enhanced Android
• Note: some work I’ll present done before this
program, but easier to understand in this context
• Also, not presenting in chronological order
Helping
Developers
with
Privacy:
90
DARPA Brandeis Smartphone Privacy
• Our approach: have devs declare in apps the
purpose of why sensitive data being used
– Devs select from a small set of defined purposes
• Today: “This app uses location”
• Ours: “This app uses location for advertising”
– Use these purposes throughout ecosystem
• Ex. IDE support for purposes
• Ex. New ways of checking purposes
• Ex. Use in GUIs to help end-users
Helping
Developers
with
Privacy:
91
People Won’t Adopt Apps & Services
• Pew Research Center survey in 2015 found:
– 60% of people chose not to install an app when they
discovered how much personal info it required
– 43% uninstalled app after downloading it for the
same reason
– http://www.pewinternet.org/2015/11/10/apps-
permissions-in-the-google-play-store/
• So pragmatically, if we don’t address privacy,
people won’t adopt the new tech we create
Helping
Developers
with
Privacy:
92
Study 3 Findings
In-depth Discussions of Privacy Were Rare
• Ex. Android API updates, app store policy updates,
privacy law updates
• One interpretation: devs not proactive
Helping
Developers
with
Privacy:
93
Pop Quiz 1 (yes this will be graded  )
• Imagine you are an app developer
– You are storing a user ID and date of last login
• Is “date of last login” “data linked to you”?
– Y / N / ?
Helping
Developers
with
Privacy:
94
Study 4 Results
• Developers’ reactions were positive overall
– “I really like that Apple has done this, even though it
might be a pain for a little bit for developers to get
used to.” (P7)
– “I think the positive thing is, it forces the developer to
think about all the data that they’re capturing" (P6)
Helping
Developers
with
Privacy:
95
Study 4 Kinds of Errors
Under-reporting and Over-reporting Errors
• Under-Reporting (didn’t report but should have)
– Many devs thought “linked data” means data is
identifiable on its own
– Apple official docs says “linked data” includes data
stored with other identifiable data
Since “date of last login”
stored with ID, yes this is
linked data
So this nutrition label
would be incorrect
Helping
Developers
with
Privacy:
96
Coconut Plug-In
Detect Potential Privacy Issues in Code
• Simple static analysis to detect certain APIs
• Offers suggestions for alternatives
– Devs have limited knowledge of APIs
– Devs copy-paste first solution found on StackOverflow
– Help devs understand design options
– Offer “quick fix” functionality
Helping
Developers
with
Privacy:
97
Coconut Plug-In
Aggregate Sensitive Data Usage in One Place
• All annotations gathered and categorized in one
tool window called PrivacyChecker
– See all uses of data in a single place
– Helps with multiple team members and versioning
– Helps with auditing by other team members
Helping
Developers
with
Privacy:
98
Honeysuckle Plug-In
Help Generate Privacy User Interfaces
Helping
Developers
with
Privacy:
99
Matcha Plug-In
Help Generate Privacy Nutrition Labels
• Ex. Matcha detects access to pictures / videos
• Gives skeleton annotation, dev fills it out

More Related Content

PDF
MỨC LƯƠNG & MONG ĐỢI NGHỀ NGHIỆP CỦA CÁC CHUYÊN GIA IT 2022 - 2023
PDF
Facebook Open Graph API and How To Use It
PPTX
6-Python-Recursion PPT.pptx
PPT
Data structure lecture7
PPT
Understanding THML
PPT
Chapter 5 ds
PPTX
Linked list
PDF
44CON London - Attacking VxWorks: from Stone Age to Interstellar
MỨC LƯƠNG & MONG ĐỢI NGHỀ NGHIỆP CỦA CÁC CHUYÊN GIA IT 2022 - 2023
Facebook Open Graph API and How To Use It
6-Python-Recursion PPT.pptx
Data structure lecture7
Understanding THML
Chapter 5 ds
Linked list
44CON London - Attacking VxWorks: from Stone Age to Interstellar

Similar to Helping Developers with Privacy, Distinguished Lecture at University of Wisconsin-Madison, Dec 2023 (20)

PPTX
Helping Developers with Privacy
PPTX
Helping Developers with Privacy
PPTX
Fostering an Ecosystem for Smartphone Privacy
PDF
Over The Air 2010: Privacy for Mobile Developers
PPTX
The Privacy and Security Behaviors of Smartphone, at USEC 2014
PDF
Us and Them — A Study of Privacy Requirements Across North America, Asia, and...
PDF
Visualizing Privacy
PDF
Designing for Privacy
PDF
Designing for Privacy
PDF
Golden Gekko, 10 burning questions on privacy
 
PPTX
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
PDF
Polina Zvyagina - Airbnb - Privacy & GDPR Compliance - Stanford Engineering -...
PPTX
Data Privacy presentation for companies.pptx
PPTX
Privacy on Mobile Apps
PDF
Privacy UX - UX Scotland 2023
PPTX
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
PPTX
Designing for Privacy in an Increasingly Public World
PDF
Introduction to privacy feedback research @ DesRes2016
PDF
Privacy-by-design for Startups - why, what and how
PPT
How to Analyze the Privacy of 750000 Smartphone Apps
Helping Developers with Privacy
Helping Developers with Privacy
Fostering an Ecosystem for Smartphone Privacy
Over The Air 2010: Privacy for Mobile Developers
The Privacy and Security Behaviors of Smartphone, at USEC 2014
Us and Them — A Study of Privacy Requirements Across North America, Asia, and...
Visualizing Privacy
Designing for Privacy
Designing for Privacy
Golden Gekko, 10 burning questions on privacy
 
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
Polina Zvyagina - Airbnb - Privacy & GDPR Compliance - Stanford Engineering -...
Data Privacy presentation for companies.pptx
Privacy on Mobile Apps
Privacy UX - UX Scotland 2023
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Designing for Privacy in an Increasingly Public World
Introduction to privacy feedback research @ DesRes2016
Privacy-by-design for Startups - why, what and how
How to Analyze the Privacy of 750000 Smartphone Apps
Ad

Recently uploaded (20)

PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
top salesforce developer skills in 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Transform Your Business with a Software ERP System
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ai tools demonstartion for schools and inter college
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
history of c programming in notes for students .pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
medical staffing services at VALiNTRY
PPTX
L1 - Introduction to python Backend.pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Digital Strategies for Manufacturing Companies
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
How Creative Agencies Leverage Project Management Software.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Transform Your Business with a Software ERP System
CHAPTER 2 - PM Management and IT Context
ai tools demonstartion for schools and inter college
PTS Company Brochure 2025 (1).pdf.......
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Understanding Forklifts - TECH EHS Solution
history of c programming in notes for students .pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Reimagine Home Health with the Power of Agentic AI​
medical staffing services at VALiNTRY
L1 - Introduction to python Backend.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Digital Strategies for Manufacturing Companies
How to Migrate SBCGlobal Email to Yahoo Easily
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
How Creative Agencies Leverage Project Management Software.pdf
Ad

Helping Developers with Privacy, Distinguished Lecture at University of Wisconsin-Madison, Dec 2023

  • 1. 1 Helping Developers with Privacy University of Wisconsin, Madison Dec 2023 Jason Hong Computer Human Interaction: Mobility Privacy Security
  • 5. Helping Developers with Privacy: 5 New Kinds of Guidelines and Regulations European Union General Data Protection California Consumer Privacy Act Platform requirements (Google Play)
  • 6. Helping Developers with Privacy: 6 A Long View on Privacy • I’ve been working on mobility, privacy, security for ~20 years • My previous work focused on end-users – Better user interfaces for sharing / configuration – But this focus kept the burden entirely on end-users – Cookies, VPNs, trackers, permissions… it’s too much! • Current stance: foster ecosystem of privacy – Analogy with email spam – Shift burden onto HW, OS, service providers, auditors – Devs are a major (and underlooked) point of leverage
  • 7. Helping Developers with Privacy: 7 Why is Privacy Hard? There’s a lot to know and do • On the one hand, things have gotten better – 20+ years ago, we weren’t much further beyond “privacy is the right to be let alone” – Few tools, little guidance, best practices for devs • Today, more clarity, but a lot of work for devs – Users have right to see / export their data – Data retention (and right to be forgotten) – Privacy nutrition labels (more on this later) – In-app notifications for smartphone apps – Keeping up to date with specific privacy APIs in OS – And many more… again, this is a lot of work!
  • 8. Helping Developers with Privacy: 8 Why is Privacy Hard? Devs lack Awareness, Motivation, Ability • Security & Privacy Acceptance Framework (SPAF) – Foundations & Trends in Security and Privacy
  • 9. Helping Developers with Privacy: 9 Why is Privacy Hard? Devs lack Awareness, Motivation, Ability • Security & Privacy Acceptance Framework (SPAF) – Awareness: understands threats and protections – Motivation: wants to employ best practices – Ability: capable of converting intention into action • Awareness of privacy issues is low – A summary from our surveys, interviews, user studies – Low awareness of privacy regulations in general – Low awareness of what to do about privacy – /r/AndroidDev subreddit, devs rarely talk about privacy
  • 10. Helping Developers with Privacy: 10 Why is Privacy Hard? Devs lack Awareness, Motivation, Ability • Motivation is uneven – /r/AndroidDev, privacy is sticks with no carrots • Lots of work, but unclear benefit for the app or devs – Platform requirements are a strong point of leverage • Platform requirements >> legal requirements • Ability is low – StackOverflow survey about how devs learned to code https://insights.stackoverflow.com/survey/2021
  • 12. Helping Developers with Privacy: 12 Why is Privacy Hard? Devs lack Awareness, Motivation, Ability • Ability – Most devs lack formal CS education • And even then, few require security, let alone privacy • So broadly, almost everyone unprepared for privacy – Most devs are C programmers (and I’m not referring to the programming language 😊)
  • 13. Helping Developers with Privacy: 13 Today’s Talk Whirlwind Tour of Past Decade of Our Team’s Work • Why is Privacy Important? • Why is Privacy Hard? • Our Studies on App Developers and Privacy – Survey + interviews w/ smartphone app developers – Analysis of /r/AndroidDev subreddit – User studies on privacy nutrition labels • Our Tools for Developers and Privacy – Source code privacy annotations – Peekaboo software architecture for smart homes • Reflections on Privacy and What’s Next
  • 14. Helping Developers with Privacy: 14 Study 1 – Interviews and Survey What Do App Developers Know about Privacy? • What knowledge do devs have? What tools? What incentives? • Are there potential points of leverage? • Interviewed 13 smartphone app developers • Surveyed 228 smartphone app developers – Got a good mix of experiences and size of orgs Balebako et al, The Privacy and Security Behaviors of Smartphone App Developers. USEC 2014.
  • 15. Helping Developers with Privacy: 15 Study 1 Summary of Findings Third-party Libraries Problematic • Devs often use ads and analytics libraries
  • 16. Helping Developers with Privacy: 16 Study 1 Summary of Findings Third-party Libraries Problematic • Devs often use ads and analytics libraries • But hard to understand library behaviors – A few didn’t know they were using libraries (based on inconsistent answers) – Some didn’t know these libraries collected data • We’ll see issue of libraries repeated quite often – In a later study, 40% Android apps used sensitive data only b/c of libraries [Chitkara 2017]
  • 17. Helping Developers with Privacy: 17 Study 1 Summary of Findings Devs Don’t Know What to Do • Low awareness of existing privacy guidelines – Had little knowledge of Fair Information Practices, FTC guidelines, Google requirements – Often just ask others around them • Now also ask StackOverflow, Reddit, etc – Note that this study was before GDPR and CCPA • Low perceived value of privacy policies – Mostly protection from lawsuits – “I haven’t even read [our privacy policy]. I mean, it’s just legal stuff that’s required, so I just put in there.”
  • 18. Helping Developers with Privacy: 18 Study 2 – Interviews How do developers address privacy when coding? • Semi-structured interview of 9 Android devs asking about their three most recent apps – What data collected in app and how used • Ex. Libraries used? • Ex. Was data sent to cloud server? • Ex. How and where data stored? Li et al, Coconut: An IDE plugin for developing privacy-friendly apps. IMWUT / Ubicomp 2018.
  • 19. Helping Developers with Privacy: 19 Study 2 Findings Inaccurate Understanding of Their Own Apps • We compared responses if app on app store – Some data practices they claimed didn’t match app behaviors • Three major reasons for mismatch – Lacked knowledge of library behaviors – Fast iterations led to changes in data collection and data use, hard to keep up to date – Team dynamics • Don’t know what other devs on team doing • Dev turnover, use of sensitive data not documented
  • 20. Helping Developers with Privacy: 20 Study 2 Findings Lack of Knowledge of Alternatives and Tradeoffs • Ex. Many apps use some kind of identifier of who you are, and different identifiers have tradeoffs – Hardware identifiers (riskiest since persistent) – Application-specific identifier (email, hashcode) – Advertising identifier • However, devs lacked knowledge of alternatives and tradeoffs – Devs often just went with first solution they found online (i.e. StackOverflow)
  • 21. Helping Developers with Privacy: 21 Study 2 Findings Lack of Motivation to Address Privacy Issues • Might ignore privacy issues if not required – Ex. Might get location permission for one reason (maps), but also use for other reasons (ads) – Ex. Might get name and email, but only need email – Ex. Might get device ID because no permission needed • Two things that did work: Android permissions and Play Store requirements – Platforms and OSes have major role in improving privacy, but there are tradeoffs
  • 22. Helping Developers with Privacy: 22 Study 3 – Analyzing Discussion Forum How Devs Talk about Privacy on /r/AndroidDev Li et al, How Developers Talk about Personal Data and What It Means for User Privacy: A Case Study of a Developer Forum on Reddit. CSCW 2020.
  • 23. Helping Developers with Privacy: 23 Study 3 Method How Devs Talk about Privacy on /r/AndroidDev • At time of our study (early 2020): – Had over 144,000 subscribed readers – ~12 new threads and 175 posts per day • Method – Crawled 46k threads (666k posts) Mar2009-Feb2020 – Identified list of 44 keywords suggesting personal info • Ex. ssn, passport number, phone number, street name • We used this rather than “privacy” since too restrictive – Resulted in 6827 threads -> manually examined – Resulted in 329 threads -> manually coded
  • 24. Helping Developers with Privacy: 24 Study 3 Findings How Devs Talk about Privacy on /r/AndroidDev • Devs frustrated and discontented about privacy – Often felt like sticks with no carrots 1. Privacy enhancement measures not always perceived as helpful – Each new version of Android has new privacy changes and enforcement mechanisms – But lacked explanation of why it will help with privacy – Confusion and skepticism • “Scoped storage - Hey, wait?! What exact problem they are trying to solve with this?”
  • 25. Helping Developers with Privacy: 25 Study 3 Findings How Devs Talk about Privacy on /r/AndroidDev 2. Privacy restrictions rigid and hurt legitimate uses – Protects from malicious apps, but breaks legit apps – Ex. Regarding Android P (v9) disabling foreground access to location when power-save mode is on: “Expecting that the user will understand that power saving needs to be disabled for navigation to work is ridiculous. And when navigation doesn’t work, guess who’s app gonna get 1-star angry review?”
  • 26. Helping Developers with Privacy: 26 Study 3 Findings How Devs Talk about Privacy on /r/AndroidDev 3. Lack sufficient support for compliance – Wanted more support to comply with GDPR + CCPA • Ex. AdMob too opaque to comply with consent dialogs – Wanted more support with Android OS pop-ups • Ex. Just writing any file requires storage permission, but this pops up a dialog that says “access photo media and files on your devices” which can be scary – Unpredictable app review process when using sensitive permissions led to lots of frustration
  • 27. Helping Developers with Privacy: 27 Study 4 – User study How Well Can Devs Fill Out Privacy Nutrition Labels? Apple App Privacy Details (2020) Google Safety Section (2022)
  • 28. Helping Developers with Privacy: 28 How Nutrition Labels Are Created Today Is the app collecting data? For each data type: Select collected data type(s) from 32 data types in 14 categories Select data purpose Is the data linked to users? Is the data used to track users?
  • 29. Helping Developers with Privacy: 29 Two Notes about Nutrition Labels • Currently required for all new / updated apps – For both Apple App Store and Google Play • Currently some enforcement of labels – Heard informally that Apple might call up some devs and let them know of nutrition label mismatches – Not clear how they check (probably manually), and unlikely this approach scales
  • 30. Helping Developers with Privacy: 30 Pop Quiz (yes this will be on final  ) • Imagine you are an iOS app developer – Your health app periodically gets and stores users’ location data (e.g. think Strava) • Is this location data “data used to track you”? – Y / N / ?
  • 31. Helping Developers with Privacy: 31 Study 4 • Conducted study with 12 iOS developers to understand their experiences and perceptions • (Re)create nutrition label for their apps, then semi-structured interviews to identify errors • Errors were common – 9/12 made errors that weren’t corrected before prompted by interviewer – Among 8 apps that already had a privacy label, 6 were re-created inconsistently (!) Li et al, Understanding Challenges for Developers to Create Accurate Privacy Nutrition Labels. CHI 2022.
  • 32. Helping Developers with Privacy: 32 Study 4 Kinds of Errors Under-reporting and Over-reporting Errors • Under-Reporting (didn’t report but should have) • Over-Reporting (did report but shouldn’t have) – Ex. some devs thought location tracking was definitely a type of data used to track users – Apple’s definition: “track” only refers to tracking for advertising purposes For quiz, getting location data for health reasons is not tracking for Apple
  • 33. Helping Developers with Privacy: 33 Study 4 Results Why do Devs Under/Over-Report? • Unknown unknowns – Devs don’t realize they don’t know something – Ex. Preconceptions of what “tracking” means – Ex. Not realizing documentation about third-party library privacy behaviors exist (e.g. Google Analytics) • Known unknowns – Unfamiliar terms: hashed email address, a latitude and longitude with three or more decimal places, etc. – Jargon: data broker, purchase tendencies, etc. – Cultural: what’s a credit score?
  • 34. Helping Developers with Privacy: 34 Study 4 Results Why do Devs Under/Over-Report? • Complexity in general The text developers need to read for selecting data types collected by the app (!)
  • 35. Helping Developers with Privacy: 35 Study 4 Results Why do Devs Under/Over-Report? • Complexity in general – Of documentation (previous slide) – Cross-platform differences • Google: Collect = transmit the data off the device • Apple: Collect = transmit the data off the device and store it
  • 36. Helping Developers with Privacy: 36 Recap of Studies and Some Reflections • Awareness of privacy in general is low – Of privacy regulations, best practices, and sometimes even their own app – Latter due to third-party libraries, team dynamics • Motivation is uneven – Not great, but must comply with platform and OS • Ability is low – Lots of errors / misconceptions in nutrition labels – Complexity high (team dynamics, documentation) – Few tools to help
  • 37. Helping Developers with Privacy: 37 Today’s Talk • Why is Privacy Important? • Why is Privacy Hard? • Our Studies on App Developers and Privacy – Survey + interviews w/ smartphone app developers – Analysis of /r/AndroidDev subreddit – User studies on privacy nutrition labels • Our Tools for Developers and Privacy – Source code privacy annotations – Peekaboo software architecture for smart homes • Reflections on Privacy and What’s Next
  • 38. Helping Developers with Privacy: 38 Systems Our Team has Built • Static and dynamic analysis tools to infer purpose • PrivacyProxy VPN to find likely PII • PrivacyGrade.org • Privacy Enhancements for Android (DARPA) • PrivacyStreams • Privacy Annotations • Peekaboo software architecture for smart homes
  • 39. Helping Developers with Privacy: 39 Privacy Annotations in Source Code • Problem: Hard to correctly infer what data an app is accessing and its purpose of use – Today, can say app uses “location” or “contact list” – We want something like app uses “location for maps” or “contact list for backups” • Our idea: have devs declare purposes in source code using privacy annotations – Hints are a common idea in systems research – Annotations exist in Java and many other languages
  • 40. Helping Developers with Privacy: 40 Example Privacy Annotation for Android • Our long-term vision for annotations – If developers do a little extra work adding annotations, we can greatly improve entire privacy ecosystem – Give devs feedback about APIs (e.g. identifiers) – Facilitate auditing internally and externally – Help generate UIs for privacy (less work for devs!) – Help generate privacy nutrition labels (less work again)
  • 41. Helping Developers with Privacy: 41 Our Work in Privacy Annotations • Three different plug-ins for Android Studio • Coconut – Use annotations for API feedback and basic auditing [Li et al, Ubicomp 2018] • Honeysuckle – Use annotations to help auto-generate privacy user interfaces [Li et al, Ubicomp 2021] • Matcha – Use annotations to help auto-generate privacy nutrition labels [under review]
  • 42. Helping Developers with Privacy: 42 Matcha Plug-In Help Generate Privacy Nutrition Labels • As we previously saw, lots of errors in labels • Idea: Use privacy annotations facilitated by (simple) static analysis • Available for Android Studio/IntelliJ IDEA users – https://matcha-ide.github.io
  • 43. Helping Developers with Privacy: 43 Matcha Plug-In Help Generate Privacy Nutrition Labels • Matcha looks for use of sensitive data – API calls that access user data / send user data out – Keywords too (e.g. “email”, “ssn”) • Prompts dev to add annotations – On Data access / Data egress • Matcha detects most popular libraries – Devs fill out XML file saying how a library is used (since not all functionality in library might be used) • Matcha translates annotations into a label – Google Developer Console supports CSV upload
  • 44. Helping Developers with Privacy: 44 Matcha Plug-In Help Generate Privacy Nutrition Labels • User study with eight Android devs on their apps – Create label using Google Play dev console (baseline) – Then create label using Matcha – Review any discrepancies between the two versions • Matcha was able to correct a large number of errors over baseline condition – Matcha labels reported 5.2x data types collected or shared than their baseline counterparts • Matcha took longer, but all devs preferred it • Lead author currently at Google Checks team
  • 45. Helping Developers with Privacy: 45 Peekaboo Smart Home Infrastructure • Imagine you’re trying to buy a smart TV – Box says “only sends summary viewing data to us” – How can we know it’s really doing what it claims?
  • 46. Helping Developers with Privacy: 46 Peekaboo Smart Home Infrastructure • Imagine you’re trying to buy a smart TV – Box says “only sends summary viewing data to us” – How can we know it’s really doing what it claims? • This is a fundamental challenge for CompSci – Ex. Verifier or proof of correctness – Ex. Some kind of hypervisor or sandbox – Ex. Digital signature (so you know who to blame) – Ex. Trusted computing base – Ex. Multi-party computation / Federated learning – Ex. Differential privacy
  • 47. Helping Developers with Privacy: 47 Peekaboo Smart Home Infrastructure • What if every app and device had a manifest? – Short, human readable, computationally enforceable description of what the app can do (whitelist) – Ex. Android manifest (location data, contact list, etc) – Ex. Manufacturer Usage Descriptions (IP addresses) • This is a good start, but “all or nothing” access – Entire contact list, or nothing – All SMS messages, or nothing Jin et al, Peekaboo: A Hub-Based Approach to Enable Transparency in Data Processing within Smart Homes. Oakland 2022.
  • 48. Helping Developers with Privacy: 48 Peekaboo Smart Home Infrastructure • Overaccess is the mismatch between what apps need and what underlying permission system supports – Ex. Sleep monitor app uses mic but only needs loudness – Ex. App wants “SMS messages” but really only checks a few phone#s and for 2FA codes – Ex. Zoom wants access to full calendar, but only needs to add calendar events
  • 49. Helping Developers with Privacy: 49 Peekaboo Smart Home Infrastructure • Overaccess is also true for smart home scenarios – Analyzed 200+ smart home scenarios, 77% didn’t need raw data, instead some processed form of data – Ex. Smart TV needs “most popular channels viewed” – Ex. Sleep monitor needs microphone “loudness” – Ex. Face recognition needs just the face • Can’t have a unique permission for every case, doesn’t scale – Too many permissions, hard for developers to learn and choose – Inflexible, hard to adapt for new use cases
  • 50. Helping Developers with Privacy: 50 Peekaboo Smart Home Infrastructure • Idea: Three interlocking parts – Devs must declare access to all sensitive data in a manifest, consisting of what data + transformations – Transformations via fixed set of operators (like Unix pipes) • Microphone -> loudness -> sleep.com • @1 week -> get TV logs -> sort -> filter -> hdtv.com – Trusted hub enforces manifest and runs all operators
  • 51. Helping Developers with Privacy: 51 Discussion • Benefits of Peekaboo manifest – Can see hypothetical data accesses (before install) – Can monitor actual data flows, and to where – Can combine manifests together (e.g. house) – Can convert into human-readable descriptions – Can insert own operators (e.g. blur all faces) – Can turn off some flows selectively – Auto-generate interactive privacy nutrition labels • But – Requires trusted and fixed set of operators – Requires operators to be “complete” – Good for dataflow architectures, unclear about others
  • 52. Helping Developers with Privacy: 52 Some Reflections • Use privacy annotations throughout entire codebase to help with ecosystem of privacy – Check what’s collected, where, how used (frontend + storage + network traffic + backend) – Enforce (e.g. when uploading app to app store) – Audit (e.g. for devs, for non-technical people) – A little extra work for devs, but big benefits for all • Auditing is under-researched but has big potential – Unlikely can prove correctness, so make auditing easier – Journalists, FTC, third-parties can have large effect – FTC in particular is major potential point of leverage
  • 53. Helping Developers with Privacy: 53 Some Reflections • Annotations for X – Developers need to know a lot to be effective – Can annotations help with other non-functional requirements like security, accessibility, usability? • Manifests are like high level abstraction of code – Are there other ways of specifying behaviors? – Can they also help us build larger systems? • Ex. Isolate Log4j along with developer manifest only allowing access to writing logs (and not LDAP and JNDI) – Can they be applied elsewhere? • Plugins (e.g. for web browsers), docker instances
  • 54. Helping Developers with Privacy: 54 Some Reflections New book chapter on mobile sensing + privacy Similarities with privacy + AI Bias in CACM Aug 2023
  • 55. Helping Developers with Privacy: 55 Closing Thoughts • Whirlwind tour of some of our research on devs and privacy over the past decade – Libraries are problematic – Platforms+OS and journalists+FTC are major points of leverage for improving ecosystem – Privacy annotations + Manifests • Stepping back, even though privacy challenges are bigger now, I’m more hopeful than ever – We’re at an inflection point for privacy – But, still lots more to do, and could use your help!
  • 56. Helping Developers with Privacy: 56 Thanks! Many, many collaborators: Special thanks to: • DARPA Brandeis • Google • NSF • Yuvraj Agarwal • Shah Amini • Rebecca Balebako • Deyuan Chen • Fanglin Chen • Lorrie Cranor • Mike Czapik • Kevan Dodhia • Matt Fredrikson • Bill Guo • Yao Guo • Yuanchun Li • Shawn Hanna • Gang Huang • David Hwang • Haojian Jin • Swarun Kumar • Tianshi Li • Toby Li • Yucheng Li • Jialiu Lin • Gram Liu • Minyi Liu • Elizabeth Louie • Song Luan • Abby Marsh • CMU Cylab • NQ Mobile • Alfred P. Sloan • Elijah Neundorfer • Kayla Reiman • Ritu Roychoudhury • Norman Sadeh • Swarup Sahoo • Gaurav Srivastava • Mike Villena • Haoyu Wang • Jason Wiese • Yaxing Yao • Alex Yu • And many more… • Cisco • Intel
  • 59. Helping Developers with Privacy: 59 Some Smartphone Apps Use Your Data in Unexpected Ways Shared your location, gender, unique phone ID, phone# with advertisers Uploaded your entire contact list to their server (including phone #s)
  • 60. Helping Developers with Privacy: 60 More Unexpected Uses of Your Data Location Data Unique device ID Location Data Network Access Unique device ID Location Data Microphone Unique device ID
  • 61. Helping Developers with Privacy: 61 PrivacyGrade.org • Improve transparency • Assign privacy grades to all 1M+ Android apps • Does not help devs directly
  • 67. Helping Developers with Privacy: 67 Privacy as Expectations Use crowdsourcing to compare what people expect an app to do vs what an app actually does App Behavior (What an app actually does) User Expectations (What people think the app does)
  • 68. Helping Developers with Privacy: 68 How PrivacyGrade Works • We crowdsourced people’s expectations of core set of 837 apps – Ex. “How comfortable are you with Drag Racing using your location for ads?” • We generated purposes by examining what third-party libraries used by app • Created a model to predict people’s likely privacy concerns and applied to 1M Android apps
  • 70. Helping Developers with Privacy: 70 How PrivacyGrade Works • Long tail distribution of libraries • We focused on top 400 libraries, which covers vast majority of cases
  • 71. Helping Developers with Privacy: 71 Impact of PrivacyGrade • Popular Press – NYTimes, CNN, BBC, CBS, more • Government – Earlier work helped lead to FTC fines • Google – Google has something like PrivacyGrade internally • Developers
  • 72. Helping Developers with Privacy: 72 Market Failure for Privacy • Let’s say you want to purchase a web cam – Go into store, can compare price, color, features – But can’t easily compare security (hidden feature) – So, security does not influence customer purchases – So, devs not incentivized to improve • Same is true for privacy – This is where things like PrivacyGrade can help – Improve transparency, address market failures – More broadly, what other ways to incentivize?
  • 73. Helping Developers with Privacy: 73 How to Get People to Change Behaviors? Security Sensitivity Stack Awareness Knowledge Motivation Does person know of existing threat? Does person know tools, behaviors, strategies to protect? Can person identify attack / problem? Can person use tools, behaviors, strategies? Does person care?
  • 74. Helping Developers with Privacy: 74 Security Sensitivity Stack Adapted for Developers and Privacy Awareness Knowledge Motivation Are devs aware of privacy problem? Ex. Identifier tradeoffs, library behavior Do devs know how to address? Ex. Might not know right API call Do devs care? Ex. Sometimes ignore issues if not required
  • 75. Helping Developers with Privacy: 75 Coconut IDE Plug-In Evaluation • Lab study of Coconut – Lab studies: 9 + 9 developers (w/ and w/o plug-in) – Tasks: build a weather app, use 3rd party library for ad monetization, store ID and location locally (analytics) • Ideally: coarse-grained location for weather and ads, private storage for local data, not hardware ID – Participants were informed privacy important here – Could also use any resource (e.g. search engine) – Interview, surveys, answer questions about app behavior, write a 1 paragraph privacy policy for app
  • 76. Helping Developers with Privacy: 76 Coconut IDE Plug-In Evaluation Results • Participants with plug-in – Better privacy practices (more likely to follow ideal case) – Better at answering questions about their app • Ex. Granularity of location used, frequency, sent • Participants w/o plug-in – Many didn’t realize ad library was sending data • Had two judges evaluate privacy policies – Coconut avg = 5.8, control = 2.8 (out of 10) • Perceived as not too disruptive, also very useful – Med. for “Disruptive” & “Time consuming” = 2 out of 7
  • 77. Helping Developers with Privacy: 77 Opportunities with Annotations • Use annotations to help other aspects of privacy – Annotations can be embedded into compiled code • Can be used to help with checking • Ex. App says it only uses location for maps, verify that – Use annotations to help generate privacy policies – Use annotations to generate good UIs • Ex. Runtime UIs • Ex. Better explanations • Stepping back: the more value to annotations, more likely to be adopted
  • 78. Helping Developers with Privacy: 78 PrivacyStreams Programming Model Observation 1: Many Apps Don’t Need Raw Data # apps need coarse-grained data # apps need fine-grained data Based on a manual examination of 99 popular apps in Google Play and 20 apps in research papers. location microphone contacts messages Li et al. PrivacyStreams: Enabling Transparency in Personal Data Processing for Mobile Apps. PACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT) 1(3). 2017.
  • 79. Helping Developers with Privacy: 79 PrivacyStreams Programming Model Observation 2: Difficult for Devs to Get Sensitive Data int sampleRate = 8000; int bufferSize = AudioRecord.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_IN_DEFAULT, AudioFormat.ENCODING_PCM_16BIT); AudioRecord audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, sampleRate, AudioFormat.CHANNEL_IN_DEFAULT, AudioFormat.ENCODING_PCM_16BIT, bufferSize); Deal with encoding, format, etc. audioRecord.startRecording(); long startTime = System.currentTimeMillis(); double rmsAmplitude = 0; long bufferTotalLen = 0; while (true) { short[] buffer = new short[bufferSize]; int bufferLen = audioRecord.read(buffer, 0, bufferSize); for (int i=0; i < bufferLen; i++) { rmsAmplitude += (double) buffer[i] * buffer[i] / 10000; } bufferTotalLen += bufferLen; long currentTime = System.currentTimeMillis(); if (currentTime - startTime > DURATION) { break; } } Process raw data while (true) { // … try { Thread.sleep(INTERVAL); } catch (InterruptedException e) { e.printStackTrace(); } } Handle threads if (ContextCompat.checkSelfPermission(this.context, Manifest.permission.RECORD_AUDIO) != PackageManager.PERMISSION_GRANTED) { Log.d("Task0", "Permission denied."); ActivityCompat.requestPermissions(thisActivity, new String[]{Manifest.permission.READ_CONTACTS}, 1); return; } Handle permissions
  • 80. 80 UQI.getData(Audio.recordPeriodic(DURATION, INTERVAL), Purpose.HEALTH("monitor sleep")) .setField("loudness", calcLoudness(Audio.AUDIO_DATA)) .forEach("loudness", callback); Developers Auditors End-users Audio loudness app calcLoudness callback “This app will only get access to the microphone loudness.” PrivacyStreams Makes Privacy a Side Effect of Helping Developers See tutorials and code at privacystreams.github.io
  • 81. Helping Developers with Privacy: 81 User Study • Goal – Is PrivacyStreams easy to use and liked? – Can we correctly analyze apps? • Study 1: Lab study – 10 Android devs, 5 programming tasks – Use both PrivacyStreams and Android standard APIs • Study 2: Field study – 5 experienced Android devs, 5 real apps (2 weeks) – Writes/rewrite an app with PrivacyStreams • Study 3: Privacy analysis – Analyze the 5 apps developed in the field study
  • 82. 82 N=2 N=2 N=2 N=1 N=2 N=4 N=4 N=3 N=6 N=3 Average time (minutes) Contact Location SMS Image Geofence Study 1 Results Devs More Efficient Using PrivacyStreams
  • 83. 83 App Analysis time (s) Generated description Speedometer 12.17 This app requests LOCATION permission to get the speed continuously. Lockscreen app 2.94 This app requests CALL_LOG permission to get the last missed call. Weather app 14.72 This app requests LOCATION permission to get the city-level location. Sleep monitor 13.03 This app requests MICROPHONE permission to get how loud it is. Album app 14.36 This app requests STORAGE permission to get all local images. Study 3 Results Analyzing Developed Apps
  • 84. Helping Developers with Privacy: 84 Opportunities for PrivacyStreams • We think this could be a new and general way to manage third-party access to sensitive data – Ex. Browser plug-ins, IoT, databases of sensitive data • Looking at how to incorporate machine learning into pipeline (combining multiple streams) • Looking to integrate this into Privacy-Enhanced Android, DARPA Brandeis project on privacy – And then convince Google, Apple, others that this is the way to go for third-party APIs
  • 85. Helping Developers with Privacy: 85 Some Reflections on Privacy, and a Call to Action • Smartphone privacy is just one slice of privacy • Devs need privacy help for web, IoT, cloud, backend database processing, and more – Third-party libraries too (both creating and using) • Devs also need help with entire lifecycle of data – Collection, storage, inferencing, usage, sharing, presentation to end-users, auditing, documentation – Distributed teams, turnover, versioning • Close with two frameworks for thinking about research in this space
  • 86. Helping Developers with Privacy: 86 Allen Newell’s Time Bands of Cognition Applied to Developers and Privacy 101 Unit Task 100 Operations 10-1 Deliberate Act 104 Task 103 Task 102 Task 107 106 105 Scale (sec) Cognitive Rational Social Stratum Annotations API usage Quick fixes Understanding a library Design Patterns Code documentation Sharing best practices Defining privacy policies Code reviews Examples
  • 87. Helping Developers with Privacy: 87 Allen Newell’s Time Bands of Cognition Applied to Developers and Privacy 101 Unit Task 100 Operations 10-1 Deliberate Act 104 Task 103 Task 102 Task 107 106 105 Scale (sec) Cognitive Rational Social Stratum Annotations API usage Quick fixes Understanding a library Design Patterns Code documentation Sharing best practices Defining privacy policies Code reviews Examples Consider how to link your idea across time scales; a single point solution might not have enough value to be adopted
  • 88. Helping Developers with Privacy: 88 How Can We Help Developers Do Better with Respect to Privacy? • Why devs? Shouldn’t lawyers and management be handling privacy issues? • Lots of privacy decisions will be made by devs – Google, Facebook, etc can afford privacy teams, but still require devs to design, implement, check – For long tail of small and medium businesses, devs will have to make a lot of decisions • Also, privacy is a lot of work for devs and hard to get right!
  • 89. Helping Developers with Privacy: 89 DARPA Brandeis • There are all these amazing things we could do if we can legitimately address privacy concerns • Four year program seeking to advance privacy – Enterprise privacy – IoT privacy – Smartphone Privacy -> Privacy-enhanced Android • Note: some work I’ll present done before this program, but easier to understand in this context • Also, not presenting in chronological order
  • 90. Helping Developers with Privacy: 90 DARPA Brandeis Smartphone Privacy • Our approach: have devs declare in apps the purpose of why sensitive data being used – Devs select from a small set of defined purposes • Today: “This app uses location” • Ours: “This app uses location for advertising” – Use these purposes throughout ecosystem • Ex. IDE support for purposes • Ex. New ways of checking purposes • Ex. Use in GUIs to help end-users
  • 91. Helping Developers with Privacy: 91 People Won’t Adopt Apps & Services • Pew Research Center survey in 2015 found: – 60% of people chose not to install an app when they discovered how much personal info it required – 43% uninstalled app after downloading it for the same reason – http://www.pewinternet.org/2015/11/10/apps- permissions-in-the-google-play-store/ • So pragmatically, if we don’t address privacy, people won’t adopt the new tech we create
  • 92. Helping Developers with Privacy: 92 Study 3 Findings In-depth Discussions of Privacy Were Rare • Ex. Android API updates, app store policy updates, privacy law updates • One interpretation: devs not proactive
  • 93. Helping Developers with Privacy: 93 Pop Quiz 1 (yes this will be graded  ) • Imagine you are an app developer – You are storing a user ID and date of last login • Is “date of last login” “data linked to you”? – Y / N / ?
  • 94. Helping Developers with Privacy: 94 Study 4 Results • Developers’ reactions were positive overall – “I really like that Apple has done this, even though it might be a pain for a little bit for developers to get used to.” (P7) – “I think the positive thing is, it forces the developer to think about all the data that they’re capturing" (P6)
  • 95. Helping Developers with Privacy: 95 Study 4 Kinds of Errors Under-reporting and Over-reporting Errors • Under-Reporting (didn’t report but should have) – Many devs thought “linked data” means data is identifiable on its own – Apple official docs says “linked data” includes data stored with other identifiable data Since “date of last login” stored with ID, yes this is linked data So this nutrition label would be incorrect
  • 96. Helping Developers with Privacy: 96 Coconut Plug-In Detect Potential Privacy Issues in Code • Simple static analysis to detect certain APIs • Offers suggestions for alternatives – Devs have limited knowledge of APIs – Devs copy-paste first solution found on StackOverflow – Help devs understand design options – Offer “quick fix” functionality
  • 97. Helping Developers with Privacy: 97 Coconut Plug-In Aggregate Sensitive Data Usage in One Place • All annotations gathered and categorized in one tool window called PrivacyChecker – See all uses of data in a single place – Helps with multiple team members and versioning – Helps with auditing by other team members
  • 99. Helping Developers with Privacy: 99 Matcha Plug-In Help Generate Privacy Nutrition Labels • Ex. Matcha detects access to pictures / videos • Gives skeleton annotation, dev fills it out

Editor's Notes

  • #3: https://commons.wikimedia.org/wiki/File:Dell_Desktop_Computer_in_school_classroom.jpg When I first started in computer science about 30 years ago, this is what computers looked like Primarily large boxes that came with a monitor, keyboard, and mouse Takes up the entire desk
  • #4: Today, computers come in all kinds of form factors Smartphones, tablets, glasses, cars, watches, clothes, fitness trackers, health monitoring devices, parking meters, electronic locks, smart mirrors, drones, and yes, even smart toilets.
  • #5: These technologies offer tremendous potential for improving healthcare, sustainability, physical safety, and more. However, every week, we’re seeing more and more news articles like these. The AirBnB story is actually about a professor at CMU, his young child was running around naked too The Grindr story, it turns out Grindr sells a lot of location data of their users, and some organization was able to use this to find a priest that was using Grindr
  • #6: There are also a growing number of guidelines and regulations about how these technologies should be designed and be operated. So even if you don’t personally believe privacy is an issue, it’s still something that has to be addressed in the design and operation of systems we build. Otherwise, major fines (or app rejected from app stores). https://www.ftc.gov/sites/default/files/documents/reports/mobile-privacy-disclosures-building-trust-through-transparency-federal-trade-commission-staff-report/130201mobileprivacyreport.pdf https://oag.ca.gov/sites/all/files/agweb/pdfs/privacy/privacy_on_the_go.pdf
  • #7: Want to take a step back here Previously, I looked at end-users, better user interfaces, configuring privacy preferences, etc. But this kept the burden of privacy on end-users Current view: let’s look at how to shift burden onto rest of ecosystem Email spam analogy: in 1990s, everyone was inundated with spam, huge burden. But eventually better spam filters, email service providers put in better protocols and started blocking bad actors, law enforcement shut down large botnets Want the same for privacy, looking at how other parts of ecosystem can help shoulder burden of privacy, so that end-users don’t have to make every decision
  • #8: When I first started looking at privacy 20+ years ago, it was in the context of ubicomp Things were really fuzzy. We knew that privacy was a problem, but it was unclear what the salient issues were and how to tackle them There were also few tools or guidance or best practices for devs Today, a lot more clarity, but a lot of work too
  • #9: Another reason why privacy is hard is that devs lack awareness, motivation, and ability These are three factors in what my colleagues and I call the Security and Privacy Acceptance Framework (SPAF). Yes, the acronym is conveniently the same as the nickname for Eugene Spafford, a well-known security researcher Tip for junior researchers, name your papers after someone famous, so they’ll help advertise for you
  • #10: Another reason why privacy is hard is that devs lack awareness, motivation, and ability These are three factors in what my colleagues and I call the Security and Privacy Acceptance Framework (SPAF). Yes, the acronym is conveniently the same as the nickname for Eugene Spafford, a well-known security researcher
  • #11: Every year, Stack Overflow does a survey of developers
  • #12: https://insights.stackoverflow.com/survey/2021#experience-learn-code
  • #13: A lot of devs out there who probably don’t have strong foundation in CS Of those who do, most undergrad programs in USA don’t require security course (or privacy) to graduate Lack of tools, best practices, design patterns, etc We hang around too many smart people, makes us forget what “average” is
  • #15: http://www.cmuchimps.org/publications/the_privacy_and_security_behaviors_of_smartphone_app_developers_2014/pub_download
  • #16: We knew this already, but was based on our experiences and not really systematically probed
  • #17: “If either Facebook or Flurry had a privacy policy that was short and concise and condensed into real English rather than legalese, we definitely would have read it.” Separate study is Chitkara, S., N. Gothoskar, S. Harish, J.I. Hong, Y. Agarwal. Does this App Really Need My Location? Context aware Privacy Management on Android. PACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT) 1(3). 2017. http://www.cmuchimps.org/publications/does_this_app_really_need_my_location_context-aware_privacy_management_for_smartphones_2017
  • #19: Their understanding of privacy Any privacy training they received too
  • #23: Popular discussion forum for Android developers Screenshot from Oct 31 2022 204,354 subscribers as of this date
  • #26: Side note, Android P (Android 9) was last “letter” Android, next release was Android 10. Ex. Android 10 put new restrictions on getting location data in background Protects from malicious apps, but breaks legit apps
  • #27: Privacy requirements may break compatibility Ex. Code might run on Android P but not Android 10, or vice versa
  • #28: Idea that researchers first investigated about a decade ago Finally deployed in practice, first by Apple and later by Google Summarize privacy behaviors of an app using a standard format Our focus for this next study is on iOS Can see example of left, “Data Used to Track You” and “Data Linked to You” These nutrition labels are currently filled out manually using a separate web site on the app store
  • #30: Tangent: If interested in stats on adoption of iOS nutrition labels, see our CHI 2022 short paper Li et al, Understanding iOS Privacy Nutrition Labels: An Exploratory Large-Scale Analysis of App Store Data. CHI 2022.
  • #32: From 7 different countries App downloads from <1K to 500K-1M Half were also the backend developer
  • #36: Of app updates “you have a six month gap where you can collect data without telling people,” because “upgrading it, or at least reviewing it on every update would be tiresome.”
  • #40: Decompiling apps and looking at names of classes, methods, variables (~90% accuracy) [Wang et al 2015] Network data analysis of destination and payload (~85% accuracy) [Jin et al 2018]
  • #41: To some extent, asking devs to document intentions Can give devs feedback about APIs (or other design decisions) as they code Since we now have a major hint as to intended behavior of app, greatly facilitates auditing internally by team or by other third parties Can auto-generate UIs Can auto-generate privacy nutrition labels
  • #51: Smart TV example Once a week, get log of data, summarize it, and send to www.abc.com
  • #61: Moto Racing / https://play.google.com/store/apps/details?id=com.motogames.supermoto
  • #67: On the left is Nissan Maxima gear shift. It turns out my brother was driving in 3rd gear for over a year before I pointed out to him that 3 and D are separate. The older Nissan Maxima gear shift on the right makes it hard to make this mistake.
  • #69: Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings. SOUPS 2014. INTERNET, READ_PHONE_STATES, ACCESS_COARSE_LOCATION, ACCESS_FINE_LOCATION, CAMERA, GET_ACCOUNTS, SEND_SMS, READ_SMS, RECORD_AUDIO, BLUE_TOOTH and READ_CONTACT
  • #71: INTERNET, READ_PHONE_STATES, ACCESS_COARSE_LOCATION, ACCESS_FINE_LOCATION, CAMERA, GET_ACCOUNTS, SEND_SMS, READ_SMS, RECORD_AUDIO, BLUE_TOOTH and READ_CONTACT
  • #76: Surprisingly, some devs couldn’t finish warm-up task Coconut enhances developer knowledge about privacy Coconut nudges developers towards better privacy choices Coconut helps developers improve their privacy notices Developers like Coconut! (would use it, find it useful, …)
  • #77: Surprisingly, some devs couldn’t finish warm-up task Coconut enhances developer knowledge about privacy Coconut nudges developers towards better privacy choices Coconut helps developers improve their privacy notices Developers like Coconut! (would use it, find it useful, …)
  • #79: iOS and Android offer all-or-nothing access
  • #81: Here is an example of using PrivacyStreams to access microphone loudness. Developers’ life can be much easier, as they only need three lines of code. I will talk about the API later but it is easy to understand this piece of code. The first line, gets a stream of audio records, the second line calculates loudness based on the audio records, and the third line output the loudness value through callbacks. As the code is largely simplified, it is also easy for auditors or markets to analyze the code. In this example, auditors are able to extract a data flow from the code, and the data flow can be used to generate a privacy description for end-users. In this example, we can tell user that only the loudness value reaches the app.
  • #83: Here is the result of the lab study. The blue bars show the number of completions and the average time of completion for each task using Android standard API, While the red bars are for using PrivacyStreams. As we can see, developers using PrivacyStreams can complete tasks with PrivacyStreams with shorter time. As all participants is the first time using PS and we only gave them a short tutorial, it is a very positive result that they can be more efficient with PrivacyStreams. Short description to tasks
  • #84: In the field study, we let each of five participants develop an application using PrivacyStreams. In the end we have 5 apps developed with PrivacyStreams. Then we use the static analysis algorithm described before to extract the data flow from the apps and generate a privacy descriptive sentence based on the data flow. The result shows that we are able to analyze the data flow and generate privacy description for all the apps, and the time spent for analysis is around 10 seconds.
  • #87: A lot of
  • #89: So this leads to key theme: how to help developers do better with respect to privacy But why developers?
  • #92: http://www.pewinternet.org/2015/11/10/apps-permissions-in-the-google-play-store/
  • #93: A lot of different findings, will focus on just two of them. How often do developers talk about privacy? Not very often. The y-axis are some of the codes, looking at what led to a discussion of privacy