SlideShare a Scribd company logo
Richard AkermanNRC-CISTIPresented at Access 2009, Oct. 1, 2009Will We Command Our Data?From the Petascale to the Personal
OverviewDefinitions / AssumptionsHow Big is Data?Four Sources of DataDriversActivities
Definitions / AssumptionsPetabyte = 1000 Terabytesdata = datasets“data is”
How Big is Data? http://www.instructables.com/file/FA9N61CF54HJ6GG/
How Big is Data?http://www.flickr.com/photos/doctorow/2731870631/
How Big is Data?http://en.wikipedia.org/wiki/File:Postduif.jpg
Four Sources of DataResearch dataGovernment dataLibrary dataPersonal data
General DriversSince 2000, a convergence of factors:Value of sharingEase of sharingLevel of sharing (machine level)
Specific Drivers: Research DataOECD Principles and Guidelines for Access to Research Data from Public Funding (April 2007)The Toronto Statement on prepublication data sharing (September 2009)
OECD Principles“Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based.”http://www.flickr.com/photos/ben-zvan-photography/468487548/
Specific Drivers: Open Government DataUS Memorandum on Transparency and Open Government (January 2009)US Memorandum on the Freedom of Information Act (January 2009)
Specific Drivers: Open Government DataUK Power of Information Task Force Report (March 2009)Modernise data publishing and reusehttp://poit.cabinetoffice.gov.uk/poit/category/data-final/“public information held by for example the police, health bodies and local authorities is often not available. This is bad for democratic expression, the economy and citizen customers.”Data.gov (May 2009)UK PM Brown meets with Sir Berners-Lee (Sept. 2009)
Specific Drivers: Library DataILS Customer Bill-of-Rights, John Blyberg (November 2005)“Berkeley Accord” (March 2008)
Specific Drivers: Personal DataWired cover feature “Living by numbers” (July 2009)“Know Thyself: Tracking Every Facet of Life, from Sleep to Mood to Pain, 24/7/365”“Numbers are making their way into the smallest crevices of our lives. We have pedometers in the soles of our shoes and phones that can post our location as we move around town. We can tweet what we eat into a database and subscribe to Web services that track our finances. There are sites and programs for monitoring mood, pain, blood sugar, blood pressure, heart rate, … and prayers.”
Why LibrariesAdvocatesExemplarsExperts
Research Data:DataCitehttp://www.datacite.org/“DOIs for data”“The long term vision of the partnership is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.”
Research Data: Gateway to Data SetsNRC-CISTI, Gateway to (Canadian) Scientific Data Setshttp://cisti-icist.nrc-cnrc.gc.ca/eng/services/cisti/scientific-data/data-sets/e.g. Canadian Astronomy Data Centre (CADC), Large Synoptic Survey Telescope (LSST)
Government Data: Canada - Federalhttp://geogratis.cgdi.gc.ca/StatsCanData Liberation Initiative (DLI)Ontario Data Documentation, Extraction Service and Infrastructure Initiative (ODESI)“The project will target Statistics Canada datasets... The files will be marked-up using DDI, an international, XML-based metadata tagging system which allows data resource discovery, distributed access, extraction and analysis.”
Government Data: Municipal - Vancouverhttp://data.vancouver.ca/
Government Data:Municipal - SFSan Francisco http://datasf.org/
Library DataA million free covers from LibraryThingOpen Library http://openlibrary.org/dev/docs/dataTalis Connected CommonsMESUR – Serviceshttp://id.loc.gov/ (LCSH)
APIs vs raw dataAPIsAlways serve up latest dataControl over accessTracking/statsAdvanced/complex functionality on top of the dataRaw dataUnconstrained / can do things never imagined by APIHard to track / versionCan lose metadataAllows choice of computing
Personal Data:Daytumhttp://www.daytum.com/
Personal Data:Total Recallhttp://totalrecallbook.com/(Sept. 2009)
Richard Akerman© 2009 Government of CanadaLicensed in the Creative CommonsThank Youhttp://creativecommons.org/licenses/by-nc-sa/2.5/ca/

More Related Content

PPTX
Fair - Interoperability - Keith Russell
PPT
British Library Social Science National Postgraduate Training Day - Datasets ...
PPTX
#1 FAIR: Into to FAIR and F for Findable
PPTX
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
PPT
SOC2002 Lecture 6
PPT
Being a Good Data Provider, by Alastair Dunning
PDF
Briefing on US EPA Open Data Strategy using a Linked Data Approach
PPT
Being A Good Data Provider
Fair - Interoperability - Keith Russell
British Library Social Science National Postgraduate Training Day - Datasets ...
#1 FAIR: Into to FAIR and F for Findable
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
SOC2002 Lecture 6
Being a Good Data Provider, by Alastair Dunning
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Being A Good Data Provider

What's hot (19)

PPTX
Reusable data for biomedicine: A data licensing odyssey
PPT
BL Social Sciences Post Graduate Training Day - Datasets
PPTX
Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science ...
PPTX
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
PPT
Manchester Business School Nov 2010
PPTX
FAIR data
PPTX
ALSWH accessible webinar 6 Sep 2017
PPT
Seeking serendipity
PPTX
Introducing linked data
PPTX
OzNome - Interoperable data as an example of FAIR data principlesfair
PPTX
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
PPTX
ANDS presentation from Menzies HIQ Symposium: The Future of Data Sharing in a...
PPTX
Modeling Data Life Cycles with PROV
PPTX
Open Data and Cross Disciplinary Research - EUDAT Summer School (Brian Matthe...
PPTX
Data are the new black : Susan Robbins
PDF
A Scientist's Perspective on Open Access and Data Management by Leigh Winowiecki
PPTX
Open, FAIR data and RDM
PPTX
Responsible Research Data Management - RMIT - Mar 19
Reusable data for biomedicine: A data licensing odyssey
BL Social Sciences Post Graduate Training Day - Datasets
Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science ...
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
Manchester Business School Nov 2010
FAIR data
ALSWH accessible webinar 6 Sep 2017
Seeking serendipity
Introducing linked data
OzNome - Interoperable data as an example of FAIR data principlesfair
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
ANDS presentation from Menzies HIQ Symposium: The Future of Data Sharing in a...
Modeling Data Life Cycles with PROV
Open Data and Cross Disciplinary Research - EUDAT Summer School (Brian Matthe...
Data are the new black : Susan Robbins
A Scientist's Perspective on Open Access and Data Management by Leigh Winowiecki
Open, FAIR data and RDM
Responsible Research Data Management - RMIT - Mar 19
Ad

Viewers also liked (20)

PPTX
Open Scientific Data
PDF
Science to the People
PPTX
Springtime for publishers - 20120711
PDF
Service-Oriented Architecture for Libraries
PPTX
When are we going to get to the science factory?
PPTX
Springtime for Publishers?
PPTX
Google Wave
PPTX
Culture Shock: Managing the Change in Publishing
PDF
Taller spl
PDF
DMNmedia - Our Capabilities - 072016
PDF
Principled Performance
PDF
Guest Lecture Business Rules Management / Decision Management Utrecht University
PPT
Library Web Services for Discovery and Delivery of Scientific Information
PPT
CISTI: Promoting Science Access
PDF
Fegas 04 wikis
PDF
Building SkyNet for Science: Discovering New Frontiers Using Embedded Knowledge
PDF
Medes_Margaix
PDF
Decision Management: Wendbaarheid
PPT
Paper Art
PDF
Accions des de la tecnologia per al PVFLL
Open Scientific Data
Science to the People
Springtime for publishers - 20120711
Service-Oriented Architecture for Libraries
When are we going to get to the science factory?
Springtime for Publishers?
Google Wave
Culture Shock: Managing the Change in Publishing
Taller spl
DMNmedia - Our Capabilities - 072016
Principled Performance
Guest Lecture Business Rules Management / Decision Management Utrecht University
Library Web Services for Discovery and Delivery of Scientific Information
CISTI: Promoting Science Access
Fegas 04 wikis
Building SkyNet for Science: Discovering New Frontiers Using Embedded Knowledge
Medes_Margaix
Decision Management: Wendbaarheid
Paper Art
Accions des de la tecnologia per al PVFLL
Ad

Similar to Will We Command Our Data? From the Petascale to the Personal (20)

PPTX
Intro to RDM
PPT
Health Policy and Management as it Relates to Big Data
PDF
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PPT
Evolution or revolution? The changing data landscape
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PPTX
Dataset Metadata, Tools and Approaches for Access and Preservation
PPTX
Datashare cni spring2013
PPTX
How and Why to Share Your Data
PPTX
Introduction to Data Management
PPTX
Department of Commerce App Challenge: Big Data Dashboards
PPTX
Emerging domain agnostic functionalities on the handle-centered networks
PDF
APLIC 2012: Discovering & Dealing with Data
PDF
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
PDF
How to overcome obstacles to data publication: Issues, requirements, and good...
PPTX
DataCite - services and support for opening up research data
PPTX
Big and Small Web Data
PPT
Riding the wave - Paradigm shifts in information access
PDF
TIB's action for research data managament as a national library's strategy in...
PPTX
Open Data - CESBA Session 308 Dec 2, 2016
Intro to RDM
Health Policy and Management as it Relates to Big Data
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
Managing, Sharing and Curating Your Research Data in a Digital Environment
Evolution or revolution? The changing data landscape
Dataverse in the Universe of Data by Christine L. Borgman
Dataset Metadata, Tools and Approaches for Access and Preservation
Datashare cni spring2013
How and Why to Share Your Data
Introduction to Data Management
Department of Commerce App Challenge: Big Data Dashboards
Emerging domain agnostic functionalities on the handle-centered networks
APLIC 2012: Discovering & Dealing with Data
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
How to overcome obstacles to data publication: Issues, requirements, and good...
DataCite - services and support for opening up research data
Big and Small Web Data
Riding the wave - Paradigm shifts in information access
TIB's action for research data managament as a national library's strategy in...
Open Data - CESBA Session 308 Dec 2, 2016

More from Richard Akerman (8)

PPTX
Open science in the Government of Canada
ZIP
Web 2.0 timeline and future
PPT
Trendspotting
PPT
Web Tools For Peer Reviewers... and Everyone
PPT
Library service-oriented architecture to enhance access to science
PDF
The Internet - A Scholarly Community?
PPT
Service-Oriented Architecture Methods to Develop Networked Library Services
PPT
Open science in the Government of Canada
Web 2.0 timeline and future
Trendspotting
Web Tools For Peer Reviewers... and Everyone
Library service-oriented architecture to enhance access to science
The Internet - A Scholarly Community?
Service-Oriented Architecture Methods to Develop Networked Library Services

Recently uploaded (20)

PDF
Architecture types and enterprise applications.pdf
PDF
Hybrid model detection and classification of lung cancer
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
August Patch Tuesday
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
TLE Review Electricity (Electricity).pptx
PDF
STKI Israel Market Study 2025 version august
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Modernising the Digital Integration Hub
Architecture types and enterprise applications.pdf
Hybrid model detection and classification of lung cancer
Chapter 5: Probability Theory and Statistics
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
August Patch Tuesday
Programs and apps: productivity, graphics, security and other tools
A novel scalable deep ensemble learning framework for big data classification...
1. Introduction to Computer Programming.pptx
Web App vs Mobile App What Should You Build First.pdf
The various Industrial Revolutions .pptx
NewMind AI Weekly Chronicles – August ’25 Week III
TLE Review Electricity (Electricity).pptx
STKI Israel Market Study 2025 version august
1 - Historical Antecedents, Social Consideration.pdf
Hindi spoken digit analysis for native and non-native speakers
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
NewMind AI Weekly Chronicles - August'25-Week II
OMC Textile Division Presentation 2021.pptx
Modernising the Digital Integration Hub

Will We Command Our Data? From the Petascale to the Personal

  • 1. Richard AkermanNRC-CISTIPresented at Access 2009, Oct. 1, 2009Will We Command Our Data?From the Petascale to the Personal
  • 2. OverviewDefinitions / AssumptionsHow Big is Data?Four Sources of DataDriversActivities
  • 3. Definitions / AssumptionsPetabyte = 1000 Terabytesdata = datasets“data is”
  • 4. How Big is Data? http://www.instructables.com/file/FA9N61CF54HJ6GG/
  • 5. How Big is Data?http://www.flickr.com/photos/doctorow/2731870631/
  • 6. How Big is Data?http://en.wikipedia.org/wiki/File:Postduif.jpg
  • 7. Four Sources of DataResearch dataGovernment dataLibrary dataPersonal data
  • 8. General DriversSince 2000, a convergence of factors:Value of sharingEase of sharingLevel of sharing (machine level)
  • 9. Specific Drivers: Research DataOECD Principles and Guidelines for Access to Research Data from Public Funding (April 2007)The Toronto Statement on prepublication data sharing (September 2009)
  • 10. OECD Principles“Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based.”http://www.flickr.com/photos/ben-zvan-photography/468487548/
  • 11. Specific Drivers: Open Government DataUS Memorandum on Transparency and Open Government (January 2009)US Memorandum on the Freedom of Information Act (January 2009)
  • 12. Specific Drivers: Open Government DataUK Power of Information Task Force Report (March 2009)Modernise data publishing and reusehttp://poit.cabinetoffice.gov.uk/poit/category/data-final/“public information held by for example the police, health bodies and local authorities is often not available. This is bad for democratic expression, the economy and citizen customers.”Data.gov (May 2009)UK PM Brown meets with Sir Berners-Lee (Sept. 2009)
  • 13. Specific Drivers: Library DataILS Customer Bill-of-Rights, John Blyberg (November 2005)“Berkeley Accord” (March 2008)
  • 14. Specific Drivers: Personal DataWired cover feature “Living by numbers” (July 2009)“Know Thyself: Tracking Every Facet of Life, from Sleep to Mood to Pain, 24/7/365”“Numbers are making their way into the smallest crevices of our lives. We have pedometers in the soles of our shoes and phones that can post our location as we move around town. We can tweet what we eat into a database and subscribe to Web services that track our finances. There are sites and programs for monitoring mood, pain, blood sugar, blood pressure, heart rate, … and prayers.”
  • 16. Research Data:DataCitehttp://www.datacite.org/“DOIs for data”“The long term vision of the partnership is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.”
  • 17. Research Data: Gateway to Data SetsNRC-CISTI, Gateway to (Canadian) Scientific Data Setshttp://cisti-icist.nrc-cnrc.gc.ca/eng/services/cisti/scientific-data/data-sets/e.g. Canadian Astronomy Data Centre (CADC), Large Synoptic Survey Telescope (LSST)
  • 18. Government Data: Canada - Federalhttp://geogratis.cgdi.gc.ca/StatsCanData Liberation Initiative (DLI)Ontario Data Documentation, Extraction Service and Infrastructure Initiative (ODESI)“The project will target Statistics Canada datasets... The files will be marked-up using DDI, an international, XML-based metadata tagging system which allows data resource discovery, distributed access, extraction and analysis.”
  • 19. Government Data: Municipal - Vancouverhttp://data.vancouver.ca/
  • 20. Government Data:Municipal - SFSan Francisco http://datasf.org/
  • 21. Library DataA million free covers from LibraryThingOpen Library http://openlibrary.org/dev/docs/dataTalis Connected CommonsMESUR – Serviceshttp://id.loc.gov/ (LCSH)
  • 22. APIs vs raw dataAPIsAlways serve up latest dataControl over accessTracking/statsAdvanced/complex functionality on top of the dataRaw dataUnconstrained / can do things never imagined by APIHard to track / versionCan lose metadataAllows choice of computing
  • 25. Richard Akerman© 2009 Government of CanadaLicensed in the Creative CommonsThank Youhttp://creativecommons.org/licenses/by-nc-sa/2.5/ca/

Editor's Notes

  • #7: http://en.wikipedia.org/wiki/File:Postduif.jpg (public domain)
  • #14: http://www.flickr.com/photos/rakerman/2907065239/