SlideShare a Scribd company logo
Data	
  accessibility	
  and	
  challenges	
  
Jyo2	
  khadake	
  
24th	
  October	
  2016	
  
EMBL-­‐ABR	
  workshop	
  
	
  
Data	
  life	
  cycle	
  	
  
The	
  life	
  cycle	
  of	
  data	
  depends	
  on	
  Project	
  
aims	
  and	
  purpose.	
  	
  
Planning/	
  project	
  design	
  
Finding/crea2ng	
  the	
  data	
  
Extrac2ng	
  Transforming	
  and	
  Loading	
  
Processing	
  
Analyzing	
  data	
  –informa2on	
  –	
  publica2on	
  
Data	
  associated	
  with	
  study	
  can	
  be	
  reused	
  
Planning	
  
Genera2ng/	
  
Reliability	
  
Ownership	
  
Metadata	
  
Versioning	
  
Standardisa2on	
  
Quality	
  
Publishing	
  
Data	
  access	
  and	
  data	
  sharing	
  
•  What	
  do	
  you	
  expect	
  when	
  we	
  access	
  data?	
  
•  What	
  do	
  you	
  expect	
  when	
  we	
  share	
  data?	
  
•  These	
  are	
  two	
  sides	
  of	
  the	
  same	
  coin	
  
Open	
  access	
  data	
  policy	
  
•  Data	
  created	
  from	
  research	
  are	
  valuable	
  
resources	
  that	
  can	
  be	
  used	
  and	
  reused	
  for	
  future	
  
scien2fic	
  and	
  educa2onal	
  purposes.	
  Sharing	
  data	
  
facilitates	
  new	
  scien2fic	
  inquiry,	
  avoids	
  duplicate	
  
data	
  collec2on	
  and	
  provides	
  real	
  life	
  resources	
  
for	
  educa2on	
  and	
  training	
  	
  
OR	
  
•  Publicly	
  funded	
  research	
  data	
  should	
  be	
  as	
  far	
  as	
  
possible	
  openly	
  available	
  to	
  the	
  scien2fic	
  
community	
  
What	
  does	
  this	
  achieve	
  
•  Encourages	
  scien2fic	
  enquiry	
  and	
  debate	
  
•  Promotes	
  innova2on	
  and	
  poten2al	
  new	
  data	
  uses	
  
•  New	
  collabora2ons	
  between	
  users	
  and	
  creators	
  of	
  data	
  
•  Maximises	
  transperancy	
  and	
  accoun2bility	
  
•  Enables	
  scru2ny	
  of	
  research	
  findings	
  
•  Encourages	
  improvement	
  and	
  valida2on	
  of	
  research	
  
findings	
  
•  Reduces	
  cost	
  of	
  supplica2ng	
  data	
  collec2on	
  
•  Increases	
  visibility	
  of	
  research	
  
•  Provides	
  direct	
  credit	
  to	
  researcher	
  
•  Research	
  outcome	
  for	
  educa2on	
  and	
  training	
  
Encouraged	
  by	
  	
  
•  Research	
  funders	
  under	
  guidance	
  from	
  OECD	
  have	
  
developed	
  data	
  sharing	
  policies	
  that	
  allow	
  researches	
  2me	
  
for	
  exclusive	
  use	
  of	
  data	
  for	
  a	
  limited	
  2me	
  with	
  a	
  mandate	
  
to	
  publish	
  at	
  the	
  end	
  of	
  agreed	
  period.	
  This	
  can	
  be	
  done	
  via	
  
repositories	
  or	
  data	
  centers.	
  The	
  funders	
  also	
  require	
  data	
  
management	
  and	
  sharing	
  plan	
  	
  
•  Journals	
  	
  data	
  that	
  forms	
  basis	
  of	
  publica2on	
  needs	
  to	
  be	
  
shared	
  or	
  deposited	
  within	
  an	
  accessible	
  accessible	
  
database	
  or	
  repository.	
  	
  
•  Ini2a2ves	
  like	
  DataCite	
  registry	
  assign	
  Unique	
  digital	
  object	
  
iden2fiers	
  DOIs	
  to	
  research	
  data	
  helping	
  scien2st	
  make	
  
data	
  discoverable,	
  citable	
  and	
  tracable	
  so	
  research	
  data	
  as	
  
well	
  as	
  publica2on	
  based	
  on	
  those	
  data	
  form	
  part	
  of	
  
scien2fic	
  output.	
  
•  Use	
  of	
  Metadata	
  dependent	
  URIs	
  to	
  iden2fy	
  and	
  share	
  data	
  
How	
  to	
  share	
  /	
  access	
  data	
  
•  Specialist	
  data	
  centers,	
  archives	
  or	
  data	
  
banks	
  
•  Journal	
  to	
  support	
  publica2on	
  
•  Ins2tu2onal	
  repository	
  
•  Online	
  via	
  project	
  or	
  ins2tu2onal	
  website	
  
•  Informally	
  between	
  researchers	
  on	
  a	
  peer-­‐
to-­‐peer	
  basis	
  
	
  
URI	
  iden2fies	
  data	
  
Advantages	
  of	
  deposi2ng	
  data	
  with	
  data	
  
center	
  or	
  repository	
  
•  Assurance	
  that	
  data	
  meets	
  set	
  standards	
  
•  Long	
  term	
  preserva2on	
  of	
  standardised	
  accessible	
  data	
  format,	
  format	
  
conversion	
  when	
  so_ware	
  upgraded	
  
•  Safe	
  keeping	
  with	
  a`ribu2on	
  in	
  secure	
  environment	
  
•  Regular	
  data	
  backup	
  
•  Online	
  resource	
  discovery	
  through	
  catalogues	
  
•  Access	
  in	
  popular	
  formats	
  
•  Licensing	
  arrangement	
  to	
  acknowledge	
  data	
  rights	
  
•  Standardised	
  cita2on	
  mechanism	
  to	
  acknowledge	
  data	
  ownership	
  
•  Pormo2on	
  of	
  data	
  to	
  many	
  users	
  
•  Monitoring	
  secondary	
  usage	
  of	
  data	
  
•  Management	
  of	
  access	
  to	
  data	
  and	
  user	
  queries	
  on	
  behalf	
  of	
  data	
  owner
So	
  we	
  need	
  to	
  share	
  data	
  
	
  
&	
  
	
  
Shared	
  data	
  is	
  available	
  to	
  us	
  
What	
  affects	
  Sharing/Accessing	
  data	
  
Size	
  of	
  data	
  and	
  compute	
  
Community	
  developed	
  of	
  data	
  standards	
  
Exis2ng	
  repositories	
  or	
  storage	
  facili2es	
  
Nature	
  of	
  data	
  
Appropriate	
  data	
  tracking	
  and	
  governance	
  
Key	
  management	
  points	
  
Metadata	
  
	
  
Size	
  of	
  data	
  
Decides	
  what	
  kind	
  of	
  storage/	
  archival	
  is	
  used	
  	
  
Cloud	
  storage	
  
OK	
  for	
  data	
  that	
  does	
  not	
  go	
  into	
  terabytes	
  or	
  
does	
  not	
  have	
  restric2ons	
  
Cost	
  implica2ons	
  
Available	
  as	
  DaaS,	
  SaaS,	
  PaaS,	
  IaaS	
  
Sta2c	
  storage:	
  Cluster	
  based	
  compu2ng/storage	
  
	
  Geographical	
  restric2ons	
  
	
  Provides	
  compute	
  for	
  analysis	
  since	
  big	
  data	
  
does	
  not	
  move.	
  
	
  Good	
  access	
  control?	
  
Compute	
  for	
  analysis	
  
•  Once	
  there	
  is	
  data,	
  access	
  decision	
  needs	
  to	
  
be	
  made	
  on	
  how	
  much	
  compute	
  is	
  required	
  
for	
  analysis.	
  
•  Cloud	
  based	
  solu2ons	
  are	
  available	
  for	
  small	
  
scale	
  data	
  
•  Data	
  centers	
  like	
  Aimes	
  allow	
  for	
  compute	
  on	
  
clusters	
  
•  Ins2tute/repository	
  may	
  provide	
  HPC	
  as	
  well	
  
as	
  so_ware	
  for	
  analysis	
  
Community	
  developed	
  data	
  standards	
  
An	
  ac2ve	
  collabora2ve	
  community	
  is	
  essen2al	
  for	
  
development	
  of	
  community	
  standards	
  
	
  
The	
  standards	
  are	
  required	
  for	
  	
  
	
  format/s	
  for	
  data	
  storage/exchange	
  
	
  vocabulary	
  for	
  data	
  representa2on	
  
	
  
Absence	
  of	
  Community	
  standards?	
  
	
  
	
  	
  	
  	
  	
  Catalogues	
  can	
  be	
  found	
  at:	
  	
  
	
   	
  	
  	
  h`p://www.ebi.ac.uk/ols/index	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  h`p://bioportal.bioontology.org/	
  
Exis2ng	
  data	
  repositories/storage	
  
•  Topic	
  specific	
  repositories	
  will	
  give	
  maximum	
  
exposure	
  to	
  the	
  data	
  /	
  access	
  to	
  relevant	
  data	
  
•  Issue	
  with	
  mul2ple	
  repositories	
  –	
  collabora2ve	
  
approaches	
  to	
  repositories	
  eg.	
  RCSB	
  for	
  structure	
  
data	
  
•  Absence	
  of	
  repositories	
  ??	
  
•  h`p://datacite.org/repolist	
  
•  h`p://databib.org	
  
Nature	
  of	
  data	
  
•  This	
  decides	
  whether	
  the	
  data	
  can	
  be	
  open	
  
access	
  or	
  controlled	
  access.	
  
•  There	
  may	
  be	
  further	
  geographical	
  restric2on	
  
on	
  the	
  data.	
  
•  If	
  controlled	
  access	
  is	
  required	
  there	
  is	
  a	
  need	
  
for	
  development	
  of	
  Data	
  Access	
  Agreements	
  
&	
  Applica2on	
  Forms.	
  
•  Management	
  of	
  the	
  access	
  control	
  
Approaches	
  to	
  secure	
  access	
  
•  DAC	
  controlled	
  access	
  but	
  with	
  /	
  without	
  
monitoring	
  
•  Highly	
  controlled	
  access	
  where	
  only	
  analysis	
  
results	
  can	
  be	
  taken	
  away	
  -­‐	
  Datasheild	
  
Roles	
  and	
  responsibili2es	
  
Par2cularly	
  important	
  where	
  sensi2ve	
  data,	
  personal	
  
data	
  or	
  patent	
  data	
  are	
  involved.	
  	
  
Appropriate	
  consents	
  and	
  ethics	
  need	
  to	
  be	
  in	
  place	
  
Some2mes	
  only	
  processed	
  ananomized	
  data	
  can	
  be	
  
used.	
  
•  Requires	
  the	
  establishment	
  of	
  DAC	
  and	
  MC	
  
– Manages	
  applica2ons	
  
– Approves	
  applica2ons	
  
– Manages	
  access	
  
– Manages	
  destruc2on	
  of	
  data	
  if	
  required	
  
Data	
  governance	
  
Data	
  management	
  planning	
  
•  Plan	
  ahead	
  to	
  create	
  high	
  –	
  quality	
  and	
  
sustainable	
  data	
  that	
  can	
  be	
  shared	
  
•  This	
  will	
  need	
  checking	
  periodically	
  to	
  see	
  that	
  
the	
  plan	
  s2ll	
  meets	
  requirements	
  
Available	
  resources:	
  	
  
h`ps://dmponline.dcc.ac.uk	
  
h"p://www.mrc.ac.uk/documents/doc/data-­‐
management-­‐plan-­‐template/	
  
Data	
  cycle	
  
Metadata	
  
•  What	
  is	
  metadata?	
  
– Documenta2on	
  and	
  descrip2on	
  associate	
  with	
  
data	
  
– Required	
  to	
  make	
  sense	
  of	
  the	
  data	
  eg	
  descrip2on	
  
of	
  variables,	
  classifica2on	
  scheme,	
  dates	
  and	
  
project..	
  
There	
  are	
  Metadata	
  standards	
  
Eg.	
  Dublin	
  core,	
  Darwin	
  core,	
  OECD	
  minimal	
  data	
  
set,	
  AGROVOC	
  
Data accessibilityandchallenges
Forma2ng	
  your	
  data	
  
•  Different	
  formats	
  good	
  for	
  different	
  purposes	
  
•  Open	
  formats	
  adopted	
  by	
  community	
  are	
  more	
  
sustainable	
  eg.	
  Re,	
  2f,	
  vaw,	
  xml,	
  csv	
  
•  Proprietary	
  and/or	
  compressed	
  formats	
  that	
  
have	
  widespread	
  use	
  eg.	
  Doc,	
  jpg,	
  mp3,	
  gzip	
  
•  Organising	
  files	
  and	
  folders	
  
•  Quality	
  assurance	
  
•  Version	
  control	
  and	
  authen2city	
  transcrip2on	
  
Available	
  resources	
  
Data accessibilityandchallenges
Storing	
  your	
  data	
  
•  Keep	
  your	
  digital	
  data	
  safe	
  secure	
  and	
  recoverable	
  
•  Making	
  backups	
  at	
  least	
  2	
  
•  Ins2tu2onal	
  back-­‐up	
  policies	
  
•  Manage	
  backups:	
  snapshots,	
  integrity,	
  recoverability	
  	
  
•  Data	
  storage	
  strategy	
  
•  Data	
  security	
  
•  Security	
  of	
  personal	
  data	
  
•  Data	
  destruc2on	
  /	
  disposal	
  
•  Data	
  transmission	
  and	
  encryp2on	
  
•  File	
  sharing	
  and	
  collabora2ve	
  environment	
  
	
  -­‐	
  email,	
  dropbox,	
  _p,	
  encrypted	
  media,	
  file	
  store,	
  
VRES	
  ..	
  
Ins2tu2onal	
  backup/storage	
  
Ins2tutes	
  are	
  required	
  to	
  provide	
  storage	
  of	
  
data.	
  
Make	
  sure	
  you	
  allocate	
  funds	
  for	
  this	
  when	
  you	
  
write	
  proposal.	
  
Planning	
  
Genera2ng/	
  
Reliability	
  
Ownership	
  
Metadata	
  
Versioning	
  
Standardisa2on	
  
Quality	
  
Publishing	
  
Archiving	
  
*	
   *	
  
*	
  
*	
  Destroy	
  
*	
  
Resources	
  for	
  archiving	
  data	
  
•  Dryad	
  —	
  Dryad	
  is	
  an	
  interna2onal	
  repository	
  
of	
  data	
  underlying	
  peer-­‐reviewed	
  ar2cles	
  in	
  
the	
  basic	
  and	
  applied	
  biosciences.	
  
•  The	
  Dataverse	
  Network	
  —	
  The	
  Dataverse	
  
Network	
  is	
  an	
  open	
  source	
  applica2on	
  to	
  
publish,	
  share,	
  reference,	
  extract	
  and	
  analyze	
  
research	
  data.	
  (Harvard)	
  
Destroy	
  data	
  
•  Physical	
  destruc2on	
  
•  Overwri2ng	
  
•  Demagne2sing	
  the	
  storage	
  
•  Disc	
  distruc2on	
  
•  Purging	
  the	
  printers	
  and	
  other	
  devices	
  
Best	
  Prac2ces	
  
•  Make	
  DMP	
  
•  Use	
  standard	
  vocabulary	
  
•  Standardised	
  format	
  
•  Check	
  ins2tu2onal	
  policy	
  for	
  data	
  storage	
  and	
  
exchange	
  
•  Check	
  funders	
  policy	
  for	
  data	
  exchange	
  	
  
•  Check	
  legal	
  constraints	
  and	
  requirements.	
  
•  Make	
  data	
  available	
  under	
  DAA	
  
•  Wri`en	
  policy	
  for	
  reten2on	
  and	
  disposal	
  of	
  data	
  
•  Safe	
  and	
  secure	
  sharing	
  of	
  data	
  
Strategies	
  for	
  centers	
  
•  Provide	
  management	
  framework	
  for	
  
researchers	
  	
  
Some	
  sources	
  are:	
  
UK	
  data	
  archive	
  
Boston	
  university	
  
Melbourne	
  
Data	
  Cura2on	
  Center	
  
Improve	
  Data	
  Access	
  

More Related Content

PPTX
Intro to Data Management Plans
PPTX
RDM and DMP intro
PDF
Developing a Data Management Plan
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PDF
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
PPT
Introduction to Data Management Planning
PPTX
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
PPTX
2013 ICPSR Data Services
Intro to Data Management Plans
RDM and DMP intro
Developing a Data Management Plan
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
Introduction to Data Management Planning
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
2013 ICPSR Data Services

What's hot (20)

PPTX
Supporting the development of a national Research Data Discovery Service - A ...
PPTX
ERA CoBioTech Data Management Webinar
PPTX
Data Management Planning for Engineers
PPT
What is-rdm
PPTX
Guidelines for OSTP Data Access Plans
PPTX
DMP health sciences
PPTX
Research support-challenges
PPTX
EPSRC research data expectations and PURE for datasets
PPTX
Research Data Management: Why is it important?
PPTX
H2020 Open Research Data pilot
PPT
Supporting-DMPs
PPT
Digital Curation 101 - Taster
PPTX
Horizon 2020 and the open research data pilot
PPTX
Developing metadata curation processes for data that can’t be shared openly
PDF
Engaging with students and researchers: the case of the social sciences
PPTX
Writing a successful data management plan with the DMPTool
PPTX
Managing and sharing data
PPTX
H2020 Open Data Pilot
PPT
Dc101 oxford sj_16062010
PPTX
Digital curation for postgraduate students
Supporting the development of a national Research Data Discovery Service - A ...
ERA CoBioTech Data Management Webinar
Data Management Planning for Engineers
What is-rdm
Guidelines for OSTP Data Access Plans
DMP health sciences
Research support-challenges
EPSRC research data expectations and PURE for datasets
Research Data Management: Why is it important?
H2020 Open Research Data pilot
Supporting-DMPs
Digital Curation 101 - Taster
Horizon 2020 and the open research data pilot
Developing metadata curation processes for data that can’t be shared openly
Engaging with students and researchers: the case of the social sciences
Writing a successful data management plan with the DMPTool
Managing and sharing data
H2020 Open Data Pilot
Dc101 oxford sj_16062010
Digital curation for postgraduate students
Ad

Viewers also liked (20)

PPTX
resumen
PDF
Case study for agile software development:
PPS
El puente avanzapormas-com
DOC
Practicar la misericordia en el cuidado de la casa común
PPTX
New media careers
PDF
Grup novalians] tasca 2
PDF
Reprise de Chambord Prestige
PDF
port for metu-4
PDF
slzkq.pdf
PDF
BCS - Excellent Cust Serv
PDF
Photo oct 15, 10 09 51 am
PDF
ztabv.pdf
DOCX
Discurso an[1]
PPTX
Installar un paquete_rpm_linux
PDF
Horari abril juliol 14 públic
PDF
Natural slightly wavy hair, deep colors, Silky and soft human hair
DOCX
Tugas 2
PDF
Grup lul·lianes] tasca 2
resumen
Case study for agile software development:
El puente avanzapormas-com
Practicar la misericordia en el cuidado de la casa común
New media careers
Grup novalians] tasca 2
Reprise de Chambord Prestige
port for metu-4
slzkq.pdf
BCS - Excellent Cust Serv
Photo oct 15, 10 09 51 am
ztabv.pdf
Discurso an[1]
Installar un paquete_rpm_linux
Horari abril juliol 14 públic
Natural slightly wavy hair, deep colors, Silky and soft human hair
Tugas 2
Grup lul·lianes] tasca 2
Ad

Similar to Data accessibilityandchallenges (20)

PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
RDM & ELNs @ Edinburgh
PPTX
Shareable by Design: Making Better Use of your Research
PDF
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
PPTX
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
PPTX
Research Data Mangagement Essentials, 5th July 2017
PDF
The state of global research data initiatives: observations from a life on th...
PPTX
Creating a Data Management Plan for your Grant Application
PPTX
Creating a Data Management Plan for your Grant Application
PPTX
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
PDF
dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...
PPTX
FAIRsharing - ENVRI-FAIR Webinar
PDF
Planning for Research Data Management
PPTX
RDM Roadmap to the Future, or: Lords and Ladies of the Data
PPTX
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
PPTX
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
PPT
DC101 UWE
PPTX
Research methods group accelarating impact by sharing data
PPTX
Creating a Data Management Plan for your Research
PDF
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
RDM & ELNs @ Edinburgh
Shareable by Design: Making Better Use of your Research
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
Research Data Mangagement Essentials, 5th July 2017
The state of global research data initiatives: observations from a life on th...
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant Application
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...
FAIRsharing - ENVRI-FAIR Webinar
Planning for Research Data Management
RDM Roadmap to the Future, or: Lords and Ladies of the Data
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
DC101 UWE
Research methods group accelarating impact by sharing data
Creating a Data Management Plan for your Research
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...

Recently uploaded (20)

PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
.pdf is not working space design for the following data for the following dat...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
Business Acumen Training GuidePresentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Miokarditis (Inflamasi pada Otot Jantung)
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Knowledge Engineering Part 1
IB Computer Science - Internal Assessment.pptx
Introduction to machine learning and Linear Models
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
.pdf is not working space design for the following data for the following dat...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Data_Analytics_and_PowerBI_Presentation.pptx

Data accessibilityandchallenges

  • 1. Data  accessibility  and  challenges   Jyo2  khadake   24th  October  2016   EMBL-­‐ABR  workshop    
  • 2. Data  life  cycle     The  life  cycle  of  data  depends  on  Project   aims  and  purpose.     Planning/  project  design   Finding/crea2ng  the  data   Extrac2ng  Transforming  and  Loading   Processing   Analyzing  data  –informa2on  –  publica2on   Data  associated  with  study  can  be  reused  
  • 3. Planning   Genera2ng/   Reliability   Ownership   Metadata   Versioning   Standardisa2on   Quality   Publishing  
  • 4. Data  access  and  data  sharing   •  What  do  you  expect  when  we  access  data?   •  What  do  you  expect  when  we  share  data?   •  These  are  two  sides  of  the  same  coin  
  • 5. Open  access  data  policy   •  Data  created  from  research  are  valuable   resources  that  can  be  used  and  reused  for  future   scien2fic  and  educa2onal  purposes.  Sharing  data   facilitates  new  scien2fic  inquiry,  avoids  duplicate   data  collec2on  and  provides  real  life  resources   for  educa2on  and  training     OR   •  Publicly  funded  research  data  should  be  as  far  as   possible  openly  available  to  the  scien2fic   community  
  • 6. What  does  this  achieve   •  Encourages  scien2fic  enquiry  and  debate   •  Promotes  innova2on  and  poten2al  new  data  uses   •  New  collabora2ons  between  users  and  creators  of  data   •  Maximises  transperancy  and  accoun2bility   •  Enables  scru2ny  of  research  findings   •  Encourages  improvement  and  valida2on  of  research   findings   •  Reduces  cost  of  supplica2ng  data  collec2on   •  Increases  visibility  of  research   •  Provides  direct  credit  to  researcher   •  Research  outcome  for  educa2on  and  training  
  • 7. Encouraged  by     •  Research  funders  under  guidance  from  OECD  have   developed  data  sharing  policies  that  allow  researches  2me   for  exclusive  use  of  data  for  a  limited  2me  with  a  mandate   to  publish  at  the  end  of  agreed  period.  This  can  be  done  via   repositories  or  data  centers.  The  funders  also  require  data   management  and  sharing  plan     •  Journals    data  that  forms  basis  of  publica2on  needs  to  be   shared  or  deposited  within  an  accessible  accessible   database  or  repository.     •  Ini2a2ves  like  DataCite  registry  assign  Unique  digital  object   iden2fiers  DOIs  to  research  data  helping  scien2st  make   data  discoverable,  citable  and  tracable  so  research  data  as   well  as  publica2on  based  on  those  data  form  part  of   scien2fic  output.   •  Use  of  Metadata  dependent  URIs  to  iden2fy  and  share  data  
  • 8. How  to  share  /  access  data   •  Specialist  data  centers,  archives  or  data   banks   •  Journal  to  support  publica2on   •  Ins2tu2onal  repository   •  Online  via  project  or  ins2tu2onal  website   •  Informally  between  researchers  on  a  peer-­‐ to-­‐peer  basis     URI  iden2fies  data  
  • 9. Advantages  of  deposi2ng  data  with  data   center  or  repository   •  Assurance  that  data  meets  set  standards   •  Long  term  preserva2on  of  standardised  accessible  data  format,  format   conversion  when  so_ware  upgraded   •  Safe  keeping  with  a`ribu2on  in  secure  environment   •  Regular  data  backup   •  Online  resource  discovery  through  catalogues   •  Access  in  popular  formats   •  Licensing  arrangement  to  acknowledge  data  rights   •  Standardised  cita2on  mechanism  to  acknowledge  data  ownership   •  Pormo2on  of  data  to  many  users   •  Monitoring  secondary  usage  of  data   •  Management  of  access  to  data  and  user  queries  on  behalf  of  data  owner
  • 10. So  we  need  to  share  data     &     Shared  data  is  available  to  us  
  • 11. What  affects  Sharing/Accessing  data   Size  of  data  and  compute   Community  developed  of  data  standards   Exis2ng  repositories  or  storage  facili2es   Nature  of  data   Appropriate  data  tracking  and  governance   Key  management  points   Metadata    
  • 12. Size  of  data   Decides  what  kind  of  storage/  archival  is  used     Cloud  storage   OK  for  data  that  does  not  go  into  terabytes  or   does  not  have  restric2ons   Cost  implica2ons   Available  as  DaaS,  SaaS,  PaaS,  IaaS   Sta2c  storage:  Cluster  based  compu2ng/storage    Geographical  restric2ons    Provides  compute  for  analysis  since  big  data   does  not  move.    Good  access  control?  
  • 13. Compute  for  analysis   •  Once  there  is  data,  access  decision  needs  to   be  made  on  how  much  compute  is  required   for  analysis.   •  Cloud  based  solu2ons  are  available  for  small   scale  data   •  Data  centers  like  Aimes  allow  for  compute  on   clusters   •  Ins2tute/repository  may  provide  HPC  as  well   as  so_ware  for  analysis  
  • 14. Community  developed  data  standards   An  ac2ve  collabora2ve  community  is  essen2al  for   development  of  community  standards     The  standards  are  required  for      format/s  for  data  storage/exchange    vocabulary  for  data  representa2on     Absence  of  Community  standards?              Catalogues  can  be  found  at:            h`p://www.ebi.ac.uk/ols/index                            h`p://bioportal.bioontology.org/  
  • 15. Exis2ng  data  repositories/storage   •  Topic  specific  repositories  will  give  maximum   exposure  to  the  data  /  access  to  relevant  data   •  Issue  with  mul2ple  repositories  –  collabora2ve   approaches  to  repositories  eg.  RCSB  for  structure   data   •  Absence  of  repositories  ??   •  h`p://datacite.org/repolist   •  h`p://databib.org  
  • 16. Nature  of  data   •  This  decides  whether  the  data  can  be  open   access  or  controlled  access.   •  There  may  be  further  geographical  restric2on   on  the  data.   •  If  controlled  access  is  required  there  is  a  need   for  development  of  Data  Access  Agreements   &  Applica2on  Forms.   •  Management  of  the  access  control  
  • 17. Approaches  to  secure  access   •  DAC  controlled  access  but  with  /  without   monitoring   •  Highly  controlled  access  where  only  analysis   results  can  be  taken  away  -­‐  Datasheild  
  • 18. Roles  and  responsibili2es   Par2cularly  important  where  sensi2ve  data,  personal   data  or  patent  data  are  involved.     Appropriate  consents  and  ethics  need  to  be  in  place   Some2mes  only  processed  ananomized  data  can  be   used.   •  Requires  the  establishment  of  DAC  and  MC   – Manages  applica2ons   – Approves  applica2ons   – Manages  access   – Manages  destruc2on  of  data  if  required  
  • 20. Data  management  planning   •  Plan  ahead  to  create  high  –  quality  and   sustainable  data  that  can  be  shared   •  This  will  need  checking  periodically  to  see  that   the  plan  s2ll  meets  requirements   Available  resources:     h`ps://dmponline.dcc.ac.uk   h"p://www.mrc.ac.uk/documents/doc/data-­‐ management-­‐plan-­‐template/  
  • 22. Metadata   •  What  is  metadata?   – Documenta2on  and  descrip2on  associate  with   data   – Required  to  make  sense  of  the  data  eg  descrip2on   of  variables,  classifica2on  scheme,  dates  and   project..   There  are  Metadata  standards   Eg.  Dublin  core,  Darwin  core,  OECD  minimal  data   set,  AGROVOC  
  • 24. Forma2ng  your  data   •  Different  formats  good  for  different  purposes   •  Open  formats  adopted  by  community  are  more   sustainable  eg.  Re,  2f,  vaw,  xml,  csv   •  Proprietary  and/or  compressed  formats  that   have  widespread  use  eg.  Doc,  jpg,  mp3,  gzip   •  Organising  files  and  folders   •  Quality  assurance   •  Version  control  and  authen2city  transcrip2on   Available  resources  
  • 26. Storing  your  data   •  Keep  your  digital  data  safe  secure  and  recoverable   •  Making  backups  at  least  2   •  Ins2tu2onal  back-­‐up  policies   •  Manage  backups:  snapshots,  integrity,  recoverability     •  Data  storage  strategy   •  Data  security   •  Security  of  personal  data   •  Data  destruc2on  /  disposal   •  Data  transmission  and  encryp2on   •  File  sharing  and  collabora2ve  environment    -­‐  email,  dropbox,  _p,  encrypted  media,  file  store,   VRES  ..  
  • 27. Ins2tu2onal  backup/storage   Ins2tutes  are  required  to  provide  storage  of   data.   Make  sure  you  allocate  funds  for  this  when  you   write  proposal.  
  • 28. Planning   Genera2ng/   Reliability   Ownership   Metadata   Versioning   Standardisa2on   Quality   Publishing   Archiving   *   *   *   *  Destroy   *  
  • 29. Resources  for  archiving  data   •  Dryad  —  Dryad  is  an  interna2onal  repository   of  data  underlying  peer-­‐reviewed  ar2cles  in   the  basic  and  applied  biosciences.   •  The  Dataverse  Network  —  The  Dataverse   Network  is  an  open  source  applica2on  to   publish,  share,  reference,  extract  and  analyze   research  data.  (Harvard)  
  • 30. Destroy  data   •  Physical  destruc2on   •  Overwri2ng   •  Demagne2sing  the  storage   •  Disc  distruc2on   •  Purging  the  printers  and  other  devices  
  • 31. Best  Prac2ces   •  Make  DMP   •  Use  standard  vocabulary   •  Standardised  format   •  Check  ins2tu2onal  policy  for  data  storage  and   exchange   •  Check  funders  policy  for  data  exchange     •  Check  legal  constraints  and  requirements.   •  Make  data  available  under  DAA   •  Wri`en  policy  for  reten2on  and  disposal  of  data   •  Safe  and  secure  sharing  of  data  
  • 32. Strategies  for  centers   •  Provide  management  framework  for   researchers     Some  sources  are:   UK  data  archive   Boston  university   Melbourne   Data  Cura2on  Center