SlideShare a Scribd company logo
Exploring New Methods for Protecting and Distributing  Confidential Research  Data Bryan Beecher Felicia LeClere ICPSR/University of Michigan
Today’s Talk What’s ICPSR? How do organizations distribute confidential research data today? What are the problems? What can we improve?
What’s ICPSR? Inter-university Consortium for Political and Social Research JSTOR for social science data Serving billions since 1962
Who does ICPSR serve? Research universities Discover and download data Teaching universities and colleges On-line analysis Federal agencies Data management, preservation, and dissemination
Distributing data
Distributing data Most of our content is public-use Anonymized public opinion Aggregate government data Little risk of disclosure But what about the  good  stuff?
Distributing sensitive data
Distributing sensitive data Higher risk of breech of confidentiality Variables that give geographic information that might be combined with other data sources to identify a respondent Requires special handling
Distributing sensitive data Researcher agrees to protect the data and identities Delivered securely Harsh penalty deterrent http://www.flickr.com/photos/lwr/521394398
National Longitudinal Study of Adolescent Health Add Health Highly used and cited study Very frank questions Kids in 7 th  through 12 th  grade Carolina Population Center Gold standard in data protection
Traditional Approach http://www.flickr.com/photos/videolux/2389320345/ http://www.flickr.com/photos/curiousexpeditions/3767246490/
Traditional Approach Confidential research data Apply  for access Write  security plan Repeat
Can we improve upon it? Paperwork How do we speed the application process? Security How do we ensure the data are going to a good home?
Paperwork Web portal Research plan IRB approval CVs Confidentiality agreements
Paperwork Web portal Behavioral questionnaire Electronic copy of contract (HTML, PDF) Database back-end to drive workflow systems
Restricted data Contracting System Integrated with ICPSR’s existing Web download mechanism Collects information that would ordinarily be provided through paper “ Tickler” system to send reminders, nag about missing items
Security Current system relies on… The data provider to maintain security templates The researcher to write an IT security plan The data provider to read and understand the plan The researcher to execute the plan
Current access model Researcher Workstation ICPSR
A new access model? Secure Area Researcher Workstation ICPSR
Secure area = the cloud? Cloud-based access Convenient Scalable Economical Perfect? http://www.flickr.com/photos/docbudie/2240764187/
What could go wrong?
Almost everything Is the cloud reliable? Will the data be safe? We are building an analytic environment for a researcher, how will we know what to provide? Will this perform well for the researcher? This is the main story…
Cloud reliability Already using the cloud for DR purposes since January 2009 The Merit Network Operations Center monitors all of our stuff Ping, http GET every minute 24 x 7 Results?
Local v. cloud – CY 2009
Conclusion Cloud has been more reliable than local environment If local power was better, cloud would still be better, but only a little better Certainly seems to be  good enough
Cloud security Absolute security? Who cares? More secure than the typical WinTel desktop of a social science researcher? That’s the goal http://www.flickr.com/photos/amagill/235453953
Current practice Data archive maintains per-platform guidelines on IT security Researcher downloads a template and writes his/her own IT security plan Data provider reviews plan; approves or iterates until approved or rejected
Sample items I secured the computer on which the Add Health data resides in a locked room, or secured the computer to a table with a lock and cable (locking the case so the battery cannot be removed). I turned off all unneeded services and disabled unneeded network protocols.
Brutal facts Data providers are  not IT experts Researchers are not  experts in IT security Even if the system is secure on Day One, what assurance is there that it continues to be secure? http://www.flickr.com/photos/42dreams/1878611309
Our approach to security Leverage tools from the cloud provider (AWS access control lists) Leverage tools from UMich (regular Retina and Nessus scans) Engage a  white hat  hacker to probe and evaluate the system
Conclusion Expecting researchers to build and maintain secure IT environments is not reasonable We think we can build something at least as secure in the cloud We’ll evaluate our environment using outside evaluators
What to deploy? Model means we need to distribute a working analytic environment, not just the data Also gives the researcher the opportunity to limit access to only a subset of contractees
May I Take Your Order? Operating system? Analysis software? Who’s allowed to use the system? Anything else? http://www.flickr.com/photos/stephenpougas/2267503544
The ACI Chooser Analytic Cloud Instance Cumulus The ACI Chooser Takes your order Brings your ACI to your table (in the cloud)
Conclusion We’re building this now Issues to resolve How do we get passwords to people? Remote access mechanism? Citrix?  Terminal Services? Should we encrypt the data?
Performance Will a cloud-based analysis system meet the expectations of a researcher? Will one size fit all?
Amazon EC2 Regular S (1 CPU, 2GB, $0.12) L (4 CPU, 7GB, $0.48) XL (8 CPU, 15GB, $0.96) High memory XXL (13 CPU, 34GB, $1.44) XXXXL (26 CPU, 68GB, $2.88) High CPU M (5 CPU, 2GB, $0.29) XL (20 CPU, 7GB, $1.16)
Strategy Balance cost and performance Start small, but give opportunity to grow Easy to move an image from one instance size to another Measure performance via researcher’s experience
Conclusion Partners Panel Study of Income Dynamics (PSID) Los Angeles Family and Neighborhood Study (LA FANS) Start small; re-launch larger Ask how well it works
Thanks and Final Thoughts Could preserve machine image + data + software + “program” for replication purposes enclavecloud.blogspot.com charts our adventures Cloud-related work sponsored by a recent NIH Challenge Grant

More Related Content

PDF
Meetup presenation 06192013
PDF
Study on Cyber Security:Establishing a Sustainable Cyber Security Framework f...
PDF
Current trends in data security nursing research ppt
DOCX
Sqrrl
PPTX
Irving-TeraData: data and science driven big industry-nfdp13
PPT
Privacy Preserving DB Systems
PDF
Braveheart Cloud Storage 2014 Student Showcase
PDF
BIOMAG2018 - Denis Engemann - MNE-HCP
Meetup presenation 06192013
Study on Cyber Security:Establishing a Sustainable Cyber Security Framework f...
Current trends in data security nursing research ppt
Sqrrl
Irving-TeraData: data and science driven big industry-nfdp13
Privacy Preserving DB Systems
Braveheart Cloud Storage 2014 Student Showcase
BIOMAG2018 - Denis Engemann - MNE-HCP

What's hot (20)

PPTX
Security bigdata
DOCX
Thought leaders in big data ulf mattsson, cto of protegrity (part 4)
PPTX
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
PDF
Forecast 2014: eDiscovery and Forensics
PPTX
Discovery of rest at data
PDF
Towards Trusted eHealth Services in the Cloud
PPTX
No Free Lunch: Metadata in the life sciences
PPTX
Big Data Security Analytics (BDSA) with Randy Franklin
PPTX
Med.data.edu.au project
PPTX
Running Research as a Service. Implications for Privacy Policies and Ethics
PPTX
Managing Complexity in a World of Surprise David L. Alderson, PhD
PPTX
Web scraping and healthcare
PDF
Capsule Computing: Safe Open Science
PPTX
Discovery of rest at data
PPT
current-trends
PDF
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
PDF
Machine Learning in Healthcare: What's Now & What's Next
PDF
Secure sensitive data sharing on a big data platform
PDF
Some Frameworks for Improving Analytic Operations at Your Company
Security bigdata
Thought leaders in big data ulf mattsson, cto of protegrity (part 4)
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
Forecast 2014: eDiscovery and Forensics
Discovery of rest at data
Towards Trusted eHealth Services in the Cloud
No Free Lunch: Metadata in the life sciences
Big Data Security Analytics (BDSA) with Randy Franklin
Med.data.edu.au project
Running Research as a Service. Implications for Privacy Policies and Ethics
Managing Complexity in a World of Surprise David L. Alderson, PhD
Web scraping and healthcare
Capsule Computing: Safe Open Science
Discovery of rest at data
current-trends
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
Machine Learning in Healthcare: What's Now & What's Next
Secure sensitive data sharing on a big data platform
Some Frameworks for Improving Analytic Operations at Your Company
Ad

Similar to Exploring New Methods for Protecting and Distributing Confidential Research Data (20)

PPTX
Research, the Cloud, and the IRB
PPTX
Preparing for the Cybersecurity Renaissance
PPTX
Intro to RDM
PPT
0th PPT - BLOCKCHAIN-CBE (1).ppt
PPTX
Cloud-Computing-and-Big-Data-Internship (1).pptx
PDF
A proposed Solution: Data Availability and Error Correction in Cloud Computing
PDF
Iaetsd enhancement of performance and security in bigdata processing
PPTX
Breed data scientists_ A Presentation.pptx
PDF
Cloud and Bid data Dr.VK.pdf
PPTX
Big Data PPT by Rohit Dubey
PPT
Ahearn Cloud Presentation
DOCX
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
PPTX
Cyber security within Organisations: A sneaky peak of current status, trends,...
PDF
J017547478
DOCX
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
PPTX
MongoDB IoT City Tour EINDHOVEN: IoT in Healthcare: by, Microsoft & Barco
PDF
Kaasaegse andmekeskuse arhitektuur ja andmete turvalisus
PPTX
Overview of GovCloud Today
PPTX
Komatsoulis internet2 executive track
PDF
Cloud data governance, risk management and compliance ny metro joint cyber...
Research, the Cloud, and the IRB
Preparing for the Cybersecurity Renaissance
Intro to RDM
0th PPT - BLOCKCHAIN-CBE (1).ppt
Cloud-Computing-and-Big-Data-Internship (1).pptx
A proposed Solution: Data Availability and Error Correction in Cloud Computing
Iaetsd enhancement of performance and security in bigdata processing
Breed data scientists_ A Presentation.pptx
Cloud and Bid data Dr.VK.pdf
Big Data PPT by Rohit Dubey
Ahearn Cloud Presentation
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Cyber security within Organisations: A sneaky peak of current status, trends,...
J017547478
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
MongoDB IoT City Tour EINDHOVEN: IoT in Healthcare: by, Microsoft & Barco
Kaasaegse andmekeskuse arhitektuur ja andmete turvalisus
Overview of GovCloud Today
Komatsoulis internet2 executive track
Cloud data governance, risk management and compliance ny metro joint cyber...
Ad

Recently uploaded (20)

PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Insiders guide to clinical Medicine.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
RMMM.pdf make it easy to upload and study
PPTX
Cell Types and Its function , kingdom of life
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Basic Mud Logging Guide for educational purpose
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
GDM (1) (1).pptx small presentation for students
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Insiders guide to clinical Medicine.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
RMMM.pdf make it easy to upload and study
Cell Types and Its function , kingdom of life
Microbial diseases, their pathogenesis and prophylaxis
PPH.pptx obstetrics and gynecology in nursing
Microbial disease of the cardiovascular and lymphatic systems
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Basic Mud Logging Guide for educational purpose
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
TR - Agricultural Crops Production NC III.pdf
Sports Quiz easy sports quiz sports quiz
GDM (1) (1).pptx small presentation for students
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Renaissance Architecture: A Journey from Faith to Humanism

Exploring New Methods for Protecting and Distributing Confidential Research Data

  • 1. Exploring New Methods for Protecting and Distributing Confidential Research Data Bryan Beecher Felicia LeClere ICPSR/University of Michigan
  • 2. Today’s Talk What’s ICPSR? How do organizations distribute confidential research data today? What are the problems? What can we improve?
  • 3. What’s ICPSR? Inter-university Consortium for Political and Social Research JSTOR for social science data Serving billions since 1962
  • 4. Who does ICPSR serve? Research universities Discover and download data Teaching universities and colleges On-line analysis Federal agencies Data management, preservation, and dissemination
  • 6. Distributing data Most of our content is public-use Anonymized public opinion Aggregate government data Little risk of disclosure But what about the good stuff?
  • 8. Distributing sensitive data Higher risk of breech of confidentiality Variables that give geographic information that might be combined with other data sources to identify a respondent Requires special handling
  • 9. Distributing sensitive data Researcher agrees to protect the data and identities Delivered securely Harsh penalty deterrent http://www.flickr.com/photos/lwr/521394398
  • 10. National Longitudinal Study of Adolescent Health Add Health Highly used and cited study Very frank questions Kids in 7 th through 12 th grade Carolina Population Center Gold standard in data protection
  • 11. Traditional Approach http://www.flickr.com/photos/videolux/2389320345/ http://www.flickr.com/photos/curiousexpeditions/3767246490/
  • 12. Traditional Approach Confidential research data Apply for access Write security plan Repeat
  • 13. Can we improve upon it? Paperwork How do we speed the application process? Security How do we ensure the data are going to a good home?
  • 14. Paperwork Web portal Research plan IRB approval CVs Confidentiality agreements
  • 15. Paperwork Web portal Behavioral questionnaire Electronic copy of contract (HTML, PDF) Database back-end to drive workflow systems
  • 16. Restricted data Contracting System Integrated with ICPSR’s existing Web download mechanism Collects information that would ordinarily be provided through paper “ Tickler” system to send reminders, nag about missing items
  • 17. Security Current system relies on… The data provider to maintain security templates The researcher to write an IT security plan The data provider to read and understand the plan The researcher to execute the plan
  • 18. Current access model Researcher Workstation ICPSR
  • 19. A new access model? Secure Area Researcher Workstation ICPSR
  • 20. Secure area = the cloud? Cloud-based access Convenient Scalable Economical Perfect? http://www.flickr.com/photos/docbudie/2240764187/
  • 21. What could go wrong?
  • 22. Almost everything Is the cloud reliable? Will the data be safe? We are building an analytic environment for a researcher, how will we know what to provide? Will this perform well for the researcher? This is the main story…
  • 23. Cloud reliability Already using the cloud for DR purposes since January 2009 The Merit Network Operations Center monitors all of our stuff Ping, http GET every minute 24 x 7 Results?
  • 24. Local v. cloud – CY 2009
  • 25. Conclusion Cloud has been more reliable than local environment If local power was better, cloud would still be better, but only a little better Certainly seems to be good enough
  • 26. Cloud security Absolute security? Who cares? More secure than the typical WinTel desktop of a social science researcher? That’s the goal http://www.flickr.com/photos/amagill/235453953
  • 27. Current practice Data archive maintains per-platform guidelines on IT security Researcher downloads a template and writes his/her own IT security plan Data provider reviews plan; approves or iterates until approved or rejected
  • 28. Sample items I secured the computer on which the Add Health data resides in a locked room, or secured the computer to a table with a lock and cable (locking the case so the battery cannot be removed). I turned off all unneeded services and disabled unneeded network protocols.
  • 29. Brutal facts Data providers are not IT experts Researchers are not experts in IT security Even if the system is secure on Day One, what assurance is there that it continues to be secure? http://www.flickr.com/photos/42dreams/1878611309
  • 30. Our approach to security Leverage tools from the cloud provider (AWS access control lists) Leverage tools from UMich (regular Retina and Nessus scans) Engage a white hat hacker to probe and evaluate the system
  • 31. Conclusion Expecting researchers to build and maintain secure IT environments is not reasonable We think we can build something at least as secure in the cloud We’ll evaluate our environment using outside evaluators
  • 32. What to deploy? Model means we need to distribute a working analytic environment, not just the data Also gives the researcher the opportunity to limit access to only a subset of contractees
  • 33. May I Take Your Order? Operating system? Analysis software? Who’s allowed to use the system? Anything else? http://www.flickr.com/photos/stephenpougas/2267503544
  • 34. The ACI Chooser Analytic Cloud Instance Cumulus The ACI Chooser Takes your order Brings your ACI to your table (in the cloud)
  • 35. Conclusion We’re building this now Issues to resolve How do we get passwords to people? Remote access mechanism? Citrix? Terminal Services? Should we encrypt the data?
  • 36. Performance Will a cloud-based analysis system meet the expectations of a researcher? Will one size fit all?
  • 37. Amazon EC2 Regular S (1 CPU, 2GB, $0.12) L (4 CPU, 7GB, $0.48) XL (8 CPU, 15GB, $0.96) High memory XXL (13 CPU, 34GB, $1.44) XXXXL (26 CPU, 68GB, $2.88) High CPU M (5 CPU, 2GB, $0.29) XL (20 CPU, 7GB, $1.16)
  • 38. Strategy Balance cost and performance Start small, but give opportunity to grow Easy to move an image from one instance size to another Measure performance via researcher’s experience
  • 39. Conclusion Partners Panel Study of Income Dynamics (PSID) Los Angeles Family and Neighborhood Study (LA FANS) Start small; re-launch larger Ask how well it works
  • 40. Thanks and Final Thoughts Could preserve machine image + data + software + “program” for replication purposes enclavecloud.blogspot.com charts our adventures Cloud-related work sponsored by a recent NIH Challenge Grant