SlideShare a Scribd company logo
Managing	and	sharing	confiden/al	
data	in	Australian	social	science	
	
05	May	2017	
Dr.	Steven	McEachern	
Director,	Australian	Data	Archive	
ANU	Centre	for	Social	Research	and	Methods	
Australian	Na/onal	University
Overview	
•  The	“problem”	of	“sensi/ve	data”	-	the	5	Safes	
model	
•  The	“problem”	of	open	and	transparent	
research	–	the	FAIR	principles	
•  From	problems	to	solu/ons	
– Access	to	sensi/ve	data	in	Australia	
– ADA	as	a	model	for	journal	data	access	system
Sensi/ve	data	
•  “Sensi/ve 	data 	are 	data 	that 	can 	be
	used	to	iden/fy	an	individual,	species,
	object, 	process,	or	loca/on	that	introduces
	a 	risk 	of	discrimina/on, 	harm, 	or
	unwanted	aSen/on.”	
– ANDS	Guide	on	Publishing	and	Sharing	Sensi/ve	
Data,	p.7	
– hSp://www.ands.org.au/__data/assets/pdf_file/
0010/489187/Sensi/ve-data.pdf
The	5	safes	
1.  Safe	people:	Can	the	researchers	be	trusted	to	do	the	
right	thing?	
2.  Safe	projects:	Is	the	data	to	be	used	for	an	appropriate	
purpose?	
3.  Safe	se^ngs:	Is	the	environment	in	which	the	analysis	
takes	place	safe?	
4.  Safe	data:	Is	the	data	appropriately	protected?	
5.  Safe	output:	Is	there	a	low	risk	of	disclosure	in	research	
outputs?	
	
Desai,	T.,	F.	Ritchie	and	R.	Welpton	(2016)	Five	Safes:	designing	data	access	
for	research.	Economics	Working	Paper	Series	
1601,	University	of	the	West	of	England.	
hSp://www2.uwe.ac.uk/facul/es/BBS/Documents/1601.pdf
Open	and	transparent	research	
•  “It	is	the	responsibility	of	everyone	involved	to	ensure	
that	the	published	record	is	an	unbiased,	accurate	
representa/on	of	research”	
–  PLoS	Medicine	Editors.	(2009)	An	unbiased	scien/fic	
record	should	be	everyone’s	agenda.	PLoS	Medicine,	
6:e1000038.	DOI:	10.1371/journal.pmed.1000038	
•  “Improving	the	reliability	and	efficiency	of	scien/fic	
research	will	increase	the	credibility	of	the	published	
scien/fic	literature	and	accelerate	discovery.”	
–  Munafo	et	al.	(2017)	A	manifesto	for	reproducible	science.	
Nature	Human	Behaviour,	1,	Ar/cle	No.	0021.	doi:
10.1038/s41562-016-0021
Data	transparency:		
DA-RT,	FAIR	principles	
DA-RT:	Data	Access	and	Research	Transparency	
– hSps://www.dartstatement.org/	
FAIR:	Data	should	be:	
•  Findable,	Accessible,	Interoperable,	Reuseable	
– hSps://www.force11.org/group/fairgroup/
fairprinciples
What	do	secondary	users	“need”?	
(or	What	is	wanted?	:-)	
•  “We	emphasize	that	direct	access	to	micro-data	
is	cri/cal	for	success.	Alterna/ves	such	as	access	
to	synthe/c	data	or	submission	of	computer	
programs	to	agency	employees	will	not	address	
the	key	problem	of	restoring	US	leadership	with	
cu^ng-edge	policy-relevant	research.”	
•  Card,	CheSy,	Feldstein	and	Saez,	2010	(emphasis	
in	original)	
–  hSp://rajcheSy.com/cheSyfiles/NSFdataaccess.pdf
What	is	expected?	
•  “Here's	what	you	need	to	do	if	you	want	an	anonymised	1%	sample	
of	the	US	Census	
–  Go	to	Google	and	type	US	Census	1%	sample,	click	on	link	to	the	
Census.	
–  Download	each	of	the	state	files	from	the	FTP	site	and	merge	them	
yourself.	Or	just	check	things	out	for	one	of	the	states.	Whatever	you	
like.	
–  Start	mucking	about	to	test	whether	your	pet	theory	is	plausible.	
•  Here's	what	you	need	to	do	if	you	want	an	anoymised	sample	of	
the	NZ	Census,	or	a	Confiden/alised	Unit	Record	File	(CURF)	of	any	
of	big	Stats	series:	
–  Go	to	Stats	NZ's	site,	here.	
–  Follow	the	instruc/ons	below:	…”	
•  (Followed	by	several	pages	of	instruc/ons,	Applica/on	Process,	
Assessment	Criteria,	Methods	of	Access,	…)	
	
Eric	Crampton,	the	New	Zealand	Ini/a/ve,	Wellington,	formerly	University	of	
Canterbury.	
hSp://offse^ngbehaviour.blogspot.com.au/2015/10/curf-and-turf.html
What	do	depositors	want?	
•  Is	my	data	safe	and	secure?	
•  Can	others	iden/fy	my	par/cipants?	
•  Is	my	data	being	used	“appropriately”?	
•  Can	my	data	really	be	“understood”	by	others?	
•  Will	someone	else	get	there	before	me?	
•  Will	I	be	“gazumped”?	
•  Will	someone	else	find	out	what	I	did	wrong?	
•  Can	we	bridge	depositor	and	user	expecta9ons?	
–  Remember:	these	are	ooen	THE	SAME	PEOPLE
Frameworks	for	research	prac/ce	
•  There	are	exis/ng	Australian	frameworks	for	
researcher	accountability	and	responsibili/es:	
–  the	Australian	Code	for	the	Responsible	Conduct	of	
Research	(ACRCR),		which	sets	out	ins/tu/onal	and	
researcher	responsibili/es	for	conduct	of	research	
–  (Note	that	this	is	currently	under	review)	
–  Human	Research	Ethics	CommiSees	(HREC)	
•  Increasingly,	professional	and	journal	
requirements	for	data	sharing:	
–  E.g.	PLOS	One,	AEA,	DA-RT	(poli/cal	science)		
–  hSps://www.aeaweb.org/aer/data.php		
–  hSp://journals.plos.org/plosone/s/data-availability
Relevant	content	from	ACRCR	
•  S.2:	Management	of	research	data	and	primary	materials	
–  E.g.	2.7	Maintain	confiden/ality	of	research	data	and	primary	
materials	
–  Researchers	given	access	to	confiden/al	informa/on	must	
maintain	that	confiden/ality.	Primary	materials	and	confiden/al	
research	data	must	be	kept	in	secure	storage.	Confiden/al	
informa/on	must	only	be	used	in	ways	agreed	with	those	who	
provided	it.	Par/cular	care	must	be	exercised	when	confiden/al	
data	are	made	available	for	discussion.	
•  S.4:	Publica/on	and	dissemina/on	of	research	findings	
–  E.g.	4.2.3	Ins/tu/ons	must	ensure	that	the	sponsors	of	research	
understand	the	importance	of	publica/on	in	research	and	do	
not	delay	publica/on	beyond	the	/me	needed	to	protect	
intellectual	property	and	other	relevant	interests.	
•  S.9:	Breaches	of	the	Code	and	misconduct	in	research
Current	models	in	Australia	
ABS:	
•  Confiden/alised	Unit	
Record	Files	(CURFs)	
•  RADL	
•  ABS	Remote	Data	Lab	
•  TableBuilder	
ADA:	
•  Confiden/alised	Unit	
Record	Files	(CURFs)	
Shared	(ooen	remote	
access)	infrastructure:	
•  AURIN	
•  SURE	(PHRN)	
•  Data	linkage	facili/es	
	
Ad	hoc	arrangements:	
•  “Secure	rooms”	
•  Departmental	
arrangements
Applying	the	5	Safes	
People	 Projects	 SeEngs	 Data	 Output	
CURFs	 Yes?	 Yes?	 Yes?	 YES	 YES	
TableBuilder	 No	 No	 YES	 YES	 YES	
RADL	 Yes?	 Yes?	 YES	 YES	 YES	
ABSDL	 Yes?	 Yes?	 YES	 YES	 YES	
ABS	Remote	
Data	Lab	 Yes	 Yes?	 YES	 YES	 YES	
ADA	 Yes?	 No	 No	 YES	 No	
AURIN	 No	 No	 YES	 YES	 Yes?	
SURE	(PHRN)	 Yes	 Yes	 YES	 No?	 ???	
Data	Linkage	
facili9es	 No?	 YES	 Yes?	 YES	 ???	
Secure	rooms	 Yes?	 Yes?	 YES	 No?	 ???
The	principles	should	enable	the	right	
mix	of	“safes”	for	a	given	data	source	
•  Different	exis/ng	
models	(each	a	mix	of	
the	5	safes)	all	have	
their	place	
	
Source:	hSp://www.shinyshiny.tv/2009/12/easymix_-_a_mix.html
A	solu/on	for	journals:	
The	ADA	Dataverse
ADA	model	
•  Safe	data:	data	is	anonymised/de-
iden/fied/“confiden/alised”	either	prior	to	
deposit	or	by	ADA	archivists	
•  Safe	people:	data	access	can	mediated,	and	users	
must	be	iden/fied	and	provide	contact	and	
supervisor	details	
•  Safe	projects:	users	provide	a	project	descrip/on	
•  Safe	se^ngs:	NOT	applied	(but	could	be)	
•  Safe	outputs:	terms	of	use
ADA	Dataverse	
•  Dataverse	is	an	open	source	web	applica/on	to	share,	
preserve,	cite,	explore,	and	analyze	research	data.	It	
facilitates	making	data	available	to	others,	and	allows	you	
to	replicate	others'	work	more	easily.	Researchers,	data	
authors,	publishers,	data	distributors,	and	affiliated	
ins/tu/ons	all	receive	academic	credit	and	web	visibility.	
•  A	Dataverse	repository	is	the	sooware	installa/on,	which	
then	hosts	mul/ple	dataverses.	Each	dataverse	contains	
datasets,	and	each	dataset	contains	descrip/ve	metadata	
and	data	files	(including	documenta/on	and	code	that	
accompany	the	data).	As	an	organizing	method,	dataverses	
may	also	contain	other	dataverses.
Incen/ves	for	Dataverse	
•  Developed	under	the	direc/on	of	Gary	King	(head	of	the	
Ins/tute	of	Quan/ta/ve	Social	Science)	at	Harvard	
•  “The	idea	is	to	facilitate	the	public	distribu/on	of	
persistent,	authorized,	and	verifiable	data,	with	powerful	
but	easy-to-use	technology,	even	when	the	data	are	
confiden/al	or	proprietary.		
•  We	intend	to	solve	some	of	the	sociological	problems	of	
data	sharing	via	technological	means,	with	the	result	
intended	to	benefit	both	the	scien/fic	community	and	the	
some/mes	apparently	contradictory	goals	of	individual	
researchers.”	
–  Gary	King.	2007.	“An	Introduc/on	to	the	Dataverse	Network	as	
an	Infrastructure	for	Data	Sharing.”	Sociological	Methods	and	
Research,	36:	173–199.	DOI:	10.1177/0049124107306660.	
–  Copy	at	hSp://j.mp/2owjuRr
Managing and sharing confidential data	in Australian social science
The	ADA	Dataverse		
(coming	June	2017)	
hSps://dataverse.ada.edu.au
American	Journal	of	Poli/cal	Science	
Dataverse	
hSps://dataverse.harvard.edu/dataverse/ajps
What	can	you	put	in	there?
Gustavo Durand
Technical Lead
IQSS, Harvard University
August 31, 2016
CONNECTING JOURNALS
TO THE DATAVERSE
WHAT IS DATAVERSE?
Open Source Software
Since 2006
https://github.com/IQSS/dataverse
18+ data repositories
worldwide
Example: Harvard Dataverse open to
researchers, journals and research
institutions worldwide to deposit data.
http://dataverse.org
https://dataverse.harvard.edu
STEP 1 – RECOMMEND A
DATAVERSE REPOSITORY
Some examples of Journals that recommend Dataverse:
Journals can:
• Recommend Dataverse in
their data policy for
authors to deposit data
• Ask authors to send them
the data citation to
include in article
STEP 2 – JOURNAL DATAVERSE
Journals setup a Dataverse to:
• Organize all datasets per article / issue
• Can review datasets before publishing
them
• Get a data citation w/ PID and link to
the article
Over 90 journals in Harvard Dataverse.
Some examples of Journals/Publishers w/ Dataverses in Harvard Dataverse:
STEP 3 – DATA
CURATION+ REPLICATION
VERIFICATION
Journals currently using this service from UNC Odum:
For Journals w/ Dataverses:
• That would like independent
third-party curation and
verification of replication data
Contact Odum Archive:
odumarchive@unc.edu
STEP 4 – AUTOMATED
INTEGRATION (SWORD API)
For Journals with a Dataverse that would like:
• Authors to submit data at the same time as the article in journal system.
• Data is automatically archived in Dataverse w/ data citation added to
article
Currently available for journals using OJS.
Otherwise contact support@dataverse.org so we can work with you!
SWORD
FEATURES MADE FOR JOURNALS
Data File Widget to insert in article (Fall 2016)
Private URL For Dataset Review (Summer 2016)
Ques/ons?	
Steven	McEachern	
steven.mceachern@anu.edu.au	
ada@anu.edu.au		
hSp://ada.edu.au

More Related Content

PPTX
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
PPTX
Research Data Management Services at UWA (November 2015)
PDF
Open Science: Research Data Management
PPTX
Why does research data matter to libraries
PDF
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
PDF
Practical Implementation of research data policies: Solutions with Dataverse
PDF
Research Data Management @Harvard
PDF
Levine - Data Curation; Ethics and Legal Considerations
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
Research Data Management Services at UWA (November 2015)
Open Science: Research Data Management
Why does research data matter to libraries
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Practical Implementation of research data policies: Solutions with Dataverse
Research Data Management @Harvard
Levine - Data Curation; Ethics and Legal Considerations

What's hot (20)

PPTX
Strand 1: Connecting research and researchers: An introduction to ORCID by Ed...
PPTX
THOR Workshop - Introduction
PPTX
SciDataCon - How to increase accessibility and reuse for clinical and persona...
PDF
Stephenson - Data Curation for Quantitative Social Science Research
PDF
THOR Workshop - Services PANGAEA
PPTX
Publishing perspectives on data management & future directions
PPTX
Data Access & Storage @ UWA - UWA Research Week September 2017
PPTX
FAIR for the future: embracing all things data
PPTX
Stop press: should embargo conditions apply to metadata?
PPTX
The Dataverse Commons
PDF
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
PPTX
THOR Workshop - Data Publishing PLOS
PDF
Sharing Sensitive Data With Confidence: The DataTags system
PPTX
THOR Workshop - Data Publishing Elsevier
PDF
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
PPTX
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
PPTX
Grampian safe haven, research data network
PPTX
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
PPTX
The Landscape of Research Data Management
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Strand 1: Connecting research and researchers: An introduction to ORCID by Ed...
THOR Workshop - Introduction
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Stephenson - Data Curation for Quantitative Social Science Research
THOR Workshop - Services PANGAEA
Publishing perspectives on data management & future directions
Data Access & Storage @ UWA - UWA Research Week September 2017
FAIR for the future: embracing all things data
Stop press: should embargo conditions apply to metadata?
The Dataverse Commons
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
THOR Workshop - Data Publishing PLOS
Sharing Sensitive Data With Confidence: The DataTags system
THOR Workshop - Data Publishing Elsevier
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Grampian safe haven, research data network
The Kaleidoscope of Impact: same data, different perspectives, constantly cha...
The Landscape of Research Data Management
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Ad

Similar to Managing and sharing confidential data in Australian social science (20)

PPTX
Fsci 2018 thursday2_august_am6
PPTX
Transparency and reproducibility in research
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PDF
Research Integrity Advisor and Data Management
PPTX
Research-Data-Management-and-your-PhD
PDF
Natasha intro to rdm c3 dis may 2018.pptx
PDF
Alain Frey Research Data for universities and information producers
PPTX
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
PPTX
Publishing your data smyth
PPTX
Rscd 2018 Journal policies - natasha simons
PPTX
Journal Data Sharing Policies rscd2018
PDF
Facilitating good research data management practice as part of scholarly publ...
PPTX
UWA Research Week 2016
PPTX
Research Data Management and your PhD
PDF
How to overcome obstacles to data publication: Issues, requirements, and good...
PPTX
Data, Data Everywhere: What's A Publisher to Do?
PDF
FAIR Ddata in trustworthy repositories: the basics
PDF
Data publication: Discover, Explore, Visualise
PDF
Preparing your data for sharing and publishing
PPTX
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
Fsci 2018 thursday2_august_am6
Transparency and reproducibility in research
Managing, Sharing and Curating Your Research Data in a Digital Environment
Research Integrity Advisor and Data Management
Research-Data-Management-and-your-PhD
Natasha intro to rdm c3 dis may 2018.pptx
Alain Frey Research Data for universities and information producers
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Publishing your data smyth
Rscd 2018 Journal policies - natasha simons
Journal Data Sharing Policies rscd2018
Facilitating good research data management practice as part of scholarly publ...
UWA Research Week 2016
Research Data Management and your PhD
How to overcome obstacles to data publication: Issues, requirements, and good...
Data, Data Everywhere: What's A Publisher to Do?
FAIR Ddata in trustworthy repositories: the basics
Data publication: Discover, Explore, Visualise
Preparing your data for sharing and publishing
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
Ad

More from ARDC (20)

PPTX
Introduction to ADA
PPTX
Architecture and Standards
PPTX
Data Sharing and Release Legislation
PPT
Australian Dementia Network (ADNet)
PPTX
Investigator-initiated clinical trials: a community perspective
PPTX
NCRIS and the health domain
PPTX
International perspective for sharing publicly funded medical research data
PPTX
Clinical trials data sharing
PPTX
Clinical trials and cohort studies
PPTX
Introduction to vision and scope
PDF
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
PDF
Skilling-up-in-research-data-management-20181128
PDF
Research data management and sharing of medical data
PPTX
Findable, Accessible, Interoperable and Reusable (FAIR) data
PPTX
Applying FAIR principles to linked datasets: Opportunities and Challenges
PDF
How to make your data count webinar, 26 Nov 2018
PDF
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
PDF
How FAIR is your data? Copyright, licensing and reuse of data
PDF
Peter neish DMPs BoF eResearch 2018
PPTX
Connected DMPs at UoA - we have a dream
Introduction to ADA
Architecture and Standards
Data Sharing and Release Legislation
Australian Dementia Network (ADNet)
Investigator-initiated clinical trials: a community perspective
NCRIS and the health domain
International perspective for sharing publicly funded medical research data
Clinical trials data sharing
Clinical trials and cohort studies
Introduction to vision and scope
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
Skilling-up-in-research-data-management-20181128
Research data management and sharing of medical data
Findable, Accessible, Interoperable and Reusable (FAIR) data
Applying FAIR principles to linked datasets: Opportunities and Challenges
How to make your data count webinar, 26 Nov 2018
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
How FAIR is your data? Copyright, licensing and reuse of data
Peter neish DMPs BoF eResearch 2018
Connected DMPs at UoA - we have a dream

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Introduction to Data Science and Data Analysis
PPTX
Managing Community Partner Relationships
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
modul_python (1).pptx for professional and student
PDF
Business Analytics and business intelligence.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Introduction to the R Programming Language
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Acceptance and paychological effects of mandatory extra coach I classes.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Data Science and Data Analysis
Managing Community Partner Relationships
Reliability_Chapter_ presentation 1221.5784
STERILIZATION AND DISINFECTION-1.ppthhhbx
IBA_Chapter_11_Slides_Final_Accessible.pptx
IB Computer Science - Internal Assessment.pptx
Supervised vs unsupervised machine learning algorithms
STUDY DESIGN details- Lt Col Maksud (21).pptx
Predictive modeling basics in data cleaning process
modul_python (1).pptx for professional and student
Business Analytics and business intelligence.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to the R Programming Language
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

Managing and sharing confidential data in Australian social science