SlideShare a Scribd company logo
(Ab)using Identifiers


            @   Ben Gross
                               BayCHI
                             2009-11-10




University of Illinois Urbana Champaign
   Library and Information Science
            bgross@acm.org
         http://bengross.com/             @
@
Why I am interested


                 @
                           bengross@gmail.com
 bgross@uiuc.edu
                                @bengross
bgross@acm.org

                            bgross@bgross.com
http://bengross.com
                      http://flickr.com/bengross
  bgross@ischool.berkeley.edu
                http://facebook.com/bengross
 bgross@messagingnews.com                     @
How many


                  @
Email addresses       Web site logins

   Instant            Social network
messenger IDs            profiles

Domain names          Phone numbers


          Do you have?
                                        @
All your @’s




are belong to us
                   @
Why you might care

 •Usability implications
 •Productivity implications
 •Security implications
 •Employee satisfaction
                              @
How did I get here?

•“I only have one email address...”
•“Well, except that one I only use for...”
•“And that other one I use with...”

                                             @
Half a million users
“... average user has 6.5 passwords, each of
which is shared across 3.9 different sites.
Each user has about 25 accounts that require
passwords, and types an average of 8
passwords per day.”

Dinei Florêncio and Cormac Herley. A Large-
Scale Study of Web Password Habits. WWW ’07

                                               @
Population

•Qualitative in-depth interview study
•44 people across two Bay Area firms
•Financial services firm (regulated)
•Design firm (unregulated)
•
                                        @
Data
• Financial services   • Average # of
                         email addresses = 1.8
                         min 1 / max 4.
                         IM = 1.8
                         min 1 / max 4

• Design Firm          • Average # of
                         email addresses = 3.6
                         min 1 / max 10
                         IM = 1.7
                         min 1 / max 3

•   Combined total     • Average = 3.3
                                                 @
“The individual in ordinary work situations
presents himself and his activity to others, the
ways in which he guides and controls the
impression they form of him and the kinds of
things he may and may not do while sustaining
his performance before them.”

Erving Goffman
Presentation of Self in Everyday Life, 1959.


                                                   @
Why more than one?




                     @
Social factors
•“I knew that my college one wasn't
  forever, so I wanted something more
  permanent after I graduated.”
 •“...I didn't like the name that I
   picked when it was my first email.”
•“...you just say oh my first name and
  last name at gmail.com ... something
  easy to remember.”

                                         @
Technical factors
•Namespace saturation AKA the
 jimsm1th77@hotmail.com problem

 •Firewalls and VPNs AKA “They
   don’t let me use Hotmail at work...”

•Configuration problems AKA “What
 does SMTP-AUTH with MD5
 checksums on port 567 mean?”

                                          @
Regulatory factors




                     @
It’s Just Data...
    “We’re an information economy. They
  teach you that in school. What they don't
  tell you is that it's impossible to move, to
live, to operate at any level without leaving
     traces, bits, seemingly meaningless
       fragments that can be retrieved
                  amplified...”

    William Gibson Johnny Mnemonic

                                                 @
What’s Underneath?

•Developer Tools
•FireBug/FireCookie
•Safari Web Inspector
•Charles Proxy/HTTP Analyzer
•Forensic Tools
                               @
Cookies




          @
More detail




              @
Bake Your Own




                @
Managing Flash Cookies




   http://www.macromedia.com/support/
    documentation/en/flashplayer/help/
          settings_manager07.html       @
Referer (sic)

•adsl-75-18-132-43.dsl.pltn13.sbcglobal.net -
  - [10/Nov/2009:14:50:56 -0800] "GET /
  wireless.html HTTP/1.1" 200 29149
  "http://bengross.com/voip.html" "Mozilla/
  5.0 (Macintosh; U; Intel Mac OS X 10_6_2;
  en-us) AppleWebKit/531.9 (KHTML, like
  Gecko) Version/4.0.3 Safari/531.9"


                                                @
Leaky Headers




 On the Leakage of Personally Identifiable
  Information Via Online Social Networks
Balachander Krishnamurthy and Craig Wills
                                            @
More Options

•URL Munging and Session IDs in URL
•Flash Cookies/Local Shared Object
•Silverlight Cookies
 •Virtual Page Views, Event (Google
   Analytics) User Defined Values


                                      @
Synthetic IDs
•Everything in the Referer header can
  be used to for a synthetic identifier.
•The User Agent is a good source
•IP addresses if you have them
•Screen dimensions, user agent
•Hash of IP address/remote ports
                                          @
Other Sources of Bits

•Last Modified and ETag headers
•HTTP Keepalive
•SSL Session IDs
•TCP Timestamps

                                 @
The Art of Being Lost

•“We do not collect personal contact
  information from visitors to your
  website. Personal contact information
  means billing address, physical
  address, individual name, email
  address, etc.” (OpenTracker.com)


                                          @
Netflix Data Released
•Dataset contains 100,480,507 movie
  ratings, created by 480,189 Netflix
  subscribers between December 1999 and
  December 2005.
•“...all customer identifying information
  has been removed; all that remains are
  ratings and dates.
 This
follows our
  privacy policy...”
•No unique identifiers or quasi-identifiers
                                            @
You Only Need Two
•Robust De-anonymization of Large Sparse
  Datasets by Arvind Narayanan and Vitaly
  Shmatikov
•IMBD as a source of entropy
•“With 8 movie ratings (of which 2 may be
  completely wrong) and dates that may have
  a 14-day error, 99% of records can be
  uniquely identified in the dataset.”

                                              @
It comes down to this
“Q: If you don't publicly rate movies on IMDb and similar
         forums, there is nothing to worry about.

   A: ...you should not ever mention any movies you
   watched prior to 2005 on a public blog or website.
 Everybody who was a Netflix subscriber prior to 2005
  should restrain themselves from these activities...

  We do not think this is a feasible privacy policy.”

                      FAQ
“How to Break Anonymity of the Netflix Prize Dataset”
                                                            @
Guessing Your SSN
•Predicting Social Security Numbers
  from Public Data by Alessandro Acquisti
  and Ralph Gross

•...I’ll just need the last 4 of your SSN for
  verification purposes...

•“...we accurately predicted the first 5
  digits of 2% of California records with
  1980 birthdays, and 90% of Vermont
  records with 1995 birthdays.”
                                                @
Disclosure and UI
•“Facebook Beacon is a way for you to
  bring actions you take online into
  Facebook. Beacon works by allowing
  affiliate websites to send stories about
  actions you take to Facebook.”
•Launched November 2007
•Class action lawsuit August 2008
•Shut down September 2009
                                            @
Opt Out: First Try




                     @
Opt Out: Second Try




                      @
Evasion
•Ghostery
•Opt Out Tools
•Ad Blockers/Flash Blockers
•HTTP Cookie/LSO Managers
•Header Modification Tools
•Proxies/Tor
                              @
@
@
@
@
What’s Next?

•Geolocation
•Roll up for more large collections
•More of addition bits need for de-
  anonymization available via social
  networks



                                       @
@   Ben Gross
University of Illinois Urbana Champaign
   Library and Information Science
            bgross@acm.org
         http://bengross.com/             @

More Related Content

PDF
Searching the Internet
KEY
Internet and Social Media for Beginners
PPTX
Coding 101: A hands-on introduction
PDF
Computer basics #3
PPT
OpenID Security
PDF
SlideShare 101
PPT
Managing Your Digital Footprint - 2012 National BDPA Conference Presentation
PPTX
Digital Security and safety for journalists
Searching the Internet
Internet and Social Media for Beginners
Coding 101: A hands-on introduction
Computer basics #3
OpenID Security
SlideShare 101
Managing Your Digital Footprint - 2012 National BDPA Conference Presentation
Digital Security and safety for journalists

Similar to (Ab)using Identifiers: Indiscernibility of Identity (20)

PPTX
Group 18 fringe services
PPTX
Going beyond google 2 philadelphia loss conference
PPT
Security and Privacy Brown Bag
PDF
Identity Vs Reputation
PDF
Bar Camp 11 Oct09 Hacking
PPTX
Internet Safety
PDF
Social Media is Okay - Jeopardizing Security Isn't
PPTX
Online Presence
PDF
Online Focus Groups Privacy and Security Considerations
PDF
The Open, Social Web Workshop
PPTX
Protect Your Brand by Protecting Privacy
PDF
Online Identity for Community Managers: OpenID, OAuth, Information Cards
PPT
Facebook security
PDF
Mangaing online identities with a personal landing page web version
PPTX
Cyber security
PDF
Iiw2007b Madsen 01
PDF
Introducing the Portability Policy
PPT
Research Project Ms
PPTX
Dox Yourself BSides Orlando
KEY
Identity soup
Group 18 fringe services
Going beyond google 2 philadelphia loss conference
Security and Privacy Brown Bag
Identity Vs Reputation
Bar Camp 11 Oct09 Hacking
Internet Safety
Social Media is Okay - Jeopardizing Security Isn't
Online Presence
Online Focus Groups Privacy and Security Considerations
The Open, Social Web Workshop
Protect Your Brand by Protecting Privacy
Online Identity for Community Managers: OpenID, OAuth, Information Cards
Facebook security
Mangaing online identities with a personal landing page web version
Cyber security
Iiw2007b Madsen 01
Introducing the Portability Policy
Research Project Ms
Dox Yourself BSides Orlando
Identity soup
Ad

More from BayCHI (20)

PDF
The Importance of Facial Features, Gretchen Anderson at BayCHI
PDF
Chris Longhurst at BayCHI: Unintended Consequences of Healthcare IT
PDF
Peter Merholz at BayCHI July 8, 2008: Creating Great Products and Services in...
PDF
Xianhang Zhang: Lessons from Social Software: From Facebook to Face to Face D...
PDF
Scot Hacker: Building a Killer Bucketlist Site with Python/Django
PDF
Jared Spool: Revealing Design Treasures from The Amazon
PDF
Luke Wroblewski at BayCHI IxD BOF: Input: Moving Beyond Static Forms
PDF
Jeff Johnson at BayCHI: Designing with the Mind in Mind
PDF
Mindset for Achievement: How to Boost Achievement and Fulfillment Through Min...
PDF
Designing Social Interfaces: 5 Principles, 5 Practices, 5 Anti-Patterns
PDF
Elaine Wherry, Meebo: What Web Application Design Can Learn from the Harpsichord
PPT
Scott MacKenzie at BayCHI: Evaluating Eye Tracking Systems for Computer Data ...
KEY
Juliette Melton at BayCHI: Real World Remote Research
KEY
Juliette Melton at BayCHI: Real World Remote Research
PPTX
Conrad Albrecht-Buehler at BayCHI: Heed or: How I Learned to Stop Monitoring ...
PPTX
Conrad Albrecht-Buehler at BayCHI: Heed or: How I Learned to Stop Monitoring ...
PDF
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
PDF
BayCHI: Wikimania Redux
PPT
Taming Complexity and Sparking Innovation Through Ideation and Design Thinking
PDF
Aiming for Innovation: Living Design in a Business World
The Importance of Facial Features, Gretchen Anderson at BayCHI
Chris Longhurst at BayCHI: Unintended Consequences of Healthcare IT
Peter Merholz at BayCHI July 8, 2008: Creating Great Products and Services in...
Xianhang Zhang: Lessons from Social Software: From Facebook to Face to Face D...
Scot Hacker: Building a Killer Bucketlist Site with Python/Django
Jared Spool: Revealing Design Treasures from The Amazon
Luke Wroblewski at BayCHI IxD BOF: Input: Moving Beyond Static Forms
Jeff Johnson at BayCHI: Designing with the Mind in Mind
Mindset for Achievement: How to Boost Achievement and Fulfillment Through Min...
Designing Social Interfaces: 5 Principles, 5 Practices, 5 Anti-Patterns
Elaine Wherry, Meebo: What Web Application Design Can Learn from the Harpsichord
Scott MacKenzie at BayCHI: Evaluating Eye Tracking Systems for Computer Data ...
Juliette Melton at BayCHI: Real World Remote Research
Juliette Melton at BayCHI: Real World Remote Research
Conrad Albrecht-Buehler at BayCHI: Heed or: How I Learned to Stop Monitoring ...
Conrad Albrecht-Buehler at BayCHI: Heed or: How I Learned to Stop Monitoring ...
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
BayCHI: Wikimania Redux
Taming Complexity and Sparking Innovation Through Ideation and Design Thinking
Aiming for Innovation: Living Design in a Business World
Ad

Recently uploaded (20)

PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
01-Introduction-to-Information-Management.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Cell Types and Its function , kingdom of life
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
master seminar digital applications in india
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Business Ethics Teaching Materials for college
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Abdominal Access Techniques with Prof. Dr. R K Mishra
01-Introduction-to-Information-Management.pdf
O7-L3 Supply Chain Operations - ICLT Program
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pre independence Education in Inndia.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Basic Mud Logging Guide for educational purpose
Cell Types and Its function , kingdom of life
STATICS OF THE RIGID BODIES Hibbelers.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
master seminar digital applications in india
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Classroom Observation Tools for Teachers
Business Ethics Teaching Materials for college
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

(Ab)using Identifiers: Indiscernibility of Identity

  • 1. (Ab)using Identifiers @ Ben Gross BayCHI 2009-11-10 University of Illinois Urbana Champaign Library and Information Science bgross@acm.org http://bengross.com/ @
  • 2. @
  • 3. Why I am interested @ bengross@gmail.com bgross@uiuc.edu @bengross bgross@acm.org bgross@bgross.com http://bengross.com http://flickr.com/bengross bgross@ischool.berkeley.edu http://facebook.com/bengross bgross@messagingnews.com @
  • 4. How many @ Email addresses Web site logins Instant Social network messenger IDs profiles Domain names Phone numbers Do you have? @
  • 5. All your @’s are belong to us @
  • 6. Why you might care •Usability implications •Productivity implications •Security implications •Employee satisfaction @
  • 7. How did I get here? •“I only have one email address...” •“Well, except that one I only use for...” •“And that other one I use with...” @
  • 8. Half a million users “... average user has 6.5 passwords, each of which is shared across 3.9 different sites. Each user has about 25 accounts that require passwords, and types an average of 8 passwords per day.” Dinei Florêncio and Cormac Herley. A Large- Scale Study of Web Password Habits. WWW ’07 @
  • 9. Population •Qualitative in-depth interview study •44 people across two Bay Area firms •Financial services firm (regulated) •Design firm (unregulated) • @
  • 10. Data • Financial services • Average # of email addresses = 1.8 min 1 / max 4. IM = 1.8 min 1 / max 4 • Design Firm • Average # of email addresses = 3.6 min 1 / max 10 IM = 1.7 min 1 / max 3 • Combined total • Average = 3.3 @
  • 11. “The individual in ordinary work situations presents himself and his activity to others, the ways in which he guides and controls the impression they form of him and the kinds of things he may and may not do while sustaining his performance before them.” Erving Goffman Presentation of Self in Everyday Life, 1959. @
  • 12. Why more than one? @
  • 13. Social factors •“I knew that my college one wasn't forever, so I wanted something more permanent after I graduated.” •“...I didn't like the name that I picked when it was my first email.” •“...you just say oh my first name and last name at gmail.com ... something easy to remember.” @
  • 14. Technical factors •Namespace saturation AKA the jimsm1th77@hotmail.com problem •Firewalls and VPNs AKA “They don’t let me use Hotmail at work...” •Configuration problems AKA “What does SMTP-AUTH with MD5 checksums on port 567 mean?” @
  • 16. It’s Just Data... “We’re an information economy. They teach you that in school. What they don't tell you is that it's impossible to move, to live, to operate at any level without leaving traces, bits, seemingly meaningless fragments that can be retrieved amplified...” William Gibson Johnny Mnemonic @
  • 17. What’s Underneath? •Developer Tools •FireBug/FireCookie •Safari Web Inspector •Charles Proxy/HTTP Analyzer •Forensic Tools @
  • 18. Cookies @
  • 21. Managing Flash Cookies http://www.macromedia.com/support/ documentation/en/flashplayer/help/ settings_manager07.html @
  • 22. Referer (sic) •adsl-75-18-132-43.dsl.pltn13.sbcglobal.net - - [10/Nov/2009:14:50:56 -0800] "GET / wireless.html HTTP/1.1" 200 29149 "http://bengross.com/voip.html" "Mozilla/ 5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9" @
  • 23. Leaky Headers On the Leakage of Personally Identifiable Information Via Online Social Networks Balachander Krishnamurthy and Craig Wills @
  • 24. More Options •URL Munging and Session IDs in URL •Flash Cookies/Local Shared Object •Silverlight Cookies •Virtual Page Views, Event (Google Analytics) User Defined Values @
  • 25. Synthetic IDs •Everything in the Referer header can be used to for a synthetic identifier. •The User Agent is a good source •IP addresses if you have them •Screen dimensions, user agent •Hash of IP address/remote ports @
  • 26. Other Sources of Bits •Last Modified and ETag headers •HTTP Keepalive •SSL Session IDs •TCP Timestamps @
  • 27. The Art of Being Lost •“We do not collect personal contact information from visitors to your website. Personal contact information means billing address, physical address, individual name, email address, etc.” (OpenTracker.com) @
  • 28. Netflix Data Released •Dataset contains 100,480,507 movie ratings, created by 480,189 Netflix subscribers between December 1999 and December 2005. •“...all customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy...” •No unique identifiers or quasi-identifiers @
  • 29. You Only Need Two •Robust De-anonymization of Large Sparse Datasets by Arvind Narayanan and Vitaly Shmatikov •IMBD as a source of entropy •“With 8 movie ratings (of which 2 may be completely wrong) and dates that may have a 14-day error, 99% of records can be uniquely identified in the dataset.” @
  • 30. It comes down to this “Q: If you don't publicly rate movies on IMDb and similar forums, there is nothing to worry about. A: ...you should not ever mention any movies you watched prior to 2005 on a public blog or website. Everybody who was a Netflix subscriber prior to 2005 should restrain themselves from these activities... We do not think this is a feasible privacy policy.” FAQ “How to Break Anonymity of the Netflix Prize Dataset” @
  • 31. Guessing Your SSN •Predicting Social Security Numbers from Public Data by Alessandro Acquisti and Ralph Gross •...I’ll just need the last 4 of your SSN for verification purposes... •“...we accurately predicted the first 5 digits of 2% of California records with 1980 birthdays, and 90% of Vermont records with 1995 birthdays.” @
  • 32. Disclosure and UI •“Facebook Beacon is a way for you to bring actions you take online into Facebook. Beacon works by allowing affiliate websites to send stories about actions you take to Facebook.” •Launched November 2007 •Class action lawsuit August 2008 •Shut down September 2009 @
  • 33. Opt Out: First Try @
  • 35. Evasion •Ghostery •Opt Out Tools •Ad Blockers/Flash Blockers •HTTP Cookie/LSO Managers •Header Modification Tools •Proxies/Tor @
  • 36. @
  • 37. @
  • 38. @
  • 39. @
  • 40. What’s Next? •Geolocation •Roll up for more large collections •More of addition bits need for de- anonymization available via social networks @
  • 41. @ Ben Gross University of Illinois Urbana Champaign Library and Information Science bgross@acm.org http://bengross.com/ @