SlideShare a Scribd company logo
Data 101:
A Gentle Introduction
Presented by

Kimberly Silk, MLS,
Data Librarian, Martin Prosperity Institute,
Rotman School of Management, University of Toronto

23 October 2013
Our Agenda
•
•
•
•
•
•

Defining data librarianship
Basic terminology
Data sources
Big Data, Open Data
Data analysis tools
Our challenge: data management, preservation,
discovery and access
• What are data visualizations?
• Sources
• Q&A
2
Warnings:
• Data librarianship involves LOTS of acronyms
• There is NO MATH in data librarianship
– (well, almost none)

3
Defining Data Librarianship
• Data librarianship is a relatively new area of practice,
emerging with the growth of digital media since the
1970s;
• Data librarians are professional library staff engaged in
managing research data as a resource, and supporting
researchers in these activities;
• We support our institutions and researchers in the
areas of data management, metadata management,
and teaching how to use data as a resource;
• Many of us work in the social sciences, but there is
growth in the natural sciences and humanities as well.
4
Basic Terminology
• Data – plural!

Think: Squirrels!! 

• Microdata – raw data, individual records consisting of rows of
numbers (Excel spreadsheet);
• Statistics – summarized tables and cross-tabulations that have
been formulated from the raw data;
• Aggregate data – statistical summaries organized in a data file
structure (Excel) that permits further analysis;
• PUMF – Public Use Microdata File – raw data that is available for
public use; some data may be filtered and geographies repressed
to ensure personal privacy;
• Variables – a set of factors, traits or conditions that describes a
unit of analysis; for instance, sex, age, marital status, etc.
• Frequencies – the number of times an observation occurs in the
data;
5
Common Data Sources
• Gov’t- collected surveys
–
–
–
–
–
–
–
–
–
–
–

6

Statistics Canada – public data and the Data Liberation Initiative
US Census (American Fact Finder)
Bureau of Labor Statistics, Bureau of Economic Analysis
Roper (Public Opinion Polls)
ICPSR (Inter-University Consortium for Political and Social Research)
International sources such as UK Data Archive, Swedish National Data
Service, Australian Data Archive, etc.
OECD iLibrary
World Bank Open Data
Pew Research Center
Gallup
Thomson
Other International Data Sources
• Some countries do not gather data, have not
been gathering data for very long, or else limit
or filter available data
• For instance, developing countries may not
gather, preserve or release their data;
• The BRICs (Brazil, Russia, India, China) will
struggle with this issue as their economies
grow.
7
Uncommon Data Sources
• Data can come from everywhere;
• Occasionally, the MPI acquires data from
unusual sources, such as:
– Billboard magazine
– MySpace social media site for bands
– CrunchBase database of technology companies

8
Open Data
• Open data are data that are openly available, free of charge and
copyright, and available in non-proprietary formats
• Can be used and re-used
• Public money (taxes) funds data creation and collection, and
therefore own the data
• Many governments are moving toward open data, but it takes time,
management, and caution
• Issues: privacy, transparency, maintenance
• Examples:
– Toronto
– Vancouver
– U.S.

9
Big Data
• Big Data are data that are too large for the
average database management tool (Access and
Excel, for instance).
• Examples come from meteorology, genomics and
physics. At MPI we wrestle with large GIS data
sets (maps and satellite data), and deal with data
at the terabyte (1 trillion bytes) level.
• Larger data sets deal with petabytes (1
quadrillion bytes) and exabytes (1 quintillion
bytes).
10
Data Discovery Platforms
• Nesstar – developed in Norway by Norwegian Social
Science Data Services, used by Statistics Canada, UK
Data Archive, NORC at the University of Chicago
• SDA – developed at Berkley, University of Toronto,
ICPSR
• Equinox – used at Western
• ODESI – proprietary system developed and used by
Scholars Portal
• Dataverse – Open source system developed by the
Institute for Quantitative Social Science (IQSS) at
Harvard, used by NBER and ICPSR
11
Data Analysis Packages
• SPSS – great for beginners, easy to use. Pointand-click interface; power users will want
more.
• SAS – preferred by power users; sharp
learning curve.
• Stata – easy to learn, powerful.
• R – open source, free, powerful, no GUI.
• Use what your colleagues are using.
12
Data Management,
Preservation, Discovery &
Access
•
•
•
•

•

•

•
•

We’ve conquered print collections,
but data present a new challenge;
Like all digital files, metadata is
necessary to describe data assets;
Like images, a single data set can
mean many things to many people;
How do we manage these data to
make sure they are discoverable,
accessible, and preserved?
Traditionally, data files have been
stored on network drives, and shared
or restricted according to the groups
who need to use them;
Network drives are difficult to search,
can be hard to share and restrict, and
don’t deal with metadata well;
Web pages with links has been a
common way to distribute data sets;
We needed new tools – a new kind of
catalogue that is designed for the
specialized needs of data.
Dataverse
•

We installed an iteration
of Dataverse at the
University of Toronto, in
our “cloud”, and I manage
my data collections myself;

•

As an open source
solution, it’s cost-effective
and my colleagues at
Scholar’s Portal support it
for me and other Ontario
universities.

•

The data are associated
with studies; several data
sets can be associated
with a single study;

•

The world can see the
metadata for each data
collection, but access to
the data sets themselves
are restricted to those
who contact me to get
permission.
Data 101: A Gentle Introduction
Data Visualizations
• The visual representation of data ---- literally,
a picture can say a thousand [numbers]
• Edward Tufte is a key pioneer:
http://www.edwardtufte.com/tufte/
• Fantastic examples at Flowing Data:
http://flowingdata.com/
• RSA Animate: http://www.thersa.org/

16
Sources
• International Association for Social Science
Information Services & Technology (ASSIST) http://www.iassistdata.org/
•
•
•
•
•
17

OECD iLibrary - http://www.oecd-ilibrary.org/
World Bank Data - http://data.worldbank.org/
UK Data Archive - http://data-archive.ac.uk/
Nesstar - http://www.nesstar.com/
Dataverse - http://thedata.org/
Q&A
(and, Thank You!)

Kimberly Silk, MLS, Data Librarian,
Martin Prosperity Institute, University of Toronto
kimberly.silk@martinprosperity.org

More Related Content

PDF
APLIC 2012: Discovering & Dealing with Data
PPTX
Research Data Management in the Humanities and Social Sciences
PDF
Stephenson - Data Curation for Quantitative Social Science Research
PPT
Data Management for Undergraduate Research
PPTX
Authority files - Jisc Digital Festival 2014
PDF
Goldman "Collaboratively Build Data Science Services and Skills"
PPT
Data Management for Undergraduate Researchers
PPTX
Next generation data services at the Marriott Library
APLIC 2012: Discovering & Dealing with Data
Research Data Management in the Humanities and Social Sciences
Stephenson - Data Curation for Quantitative Social Science Research
Data Management for Undergraduate Research
Authority files - Jisc Digital Festival 2014
Goldman "Collaboratively Build Data Science Services and Skills"
Data Management for Undergraduate Researchers
Next generation data services at the Marriott Library

What's hot (20)

PPTX
Research Data Management
PPTX
Labou "Data Science and the Library at UC San Diego"
PPTX
The liaison librarian: connecting with the qualitative research lifecycle
PPTX
Research Data Services at the University of Utah
PPTX
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"
PPTX
Why does research data matter to libraries
PPTX
Data Management Planning for researchers
PDF
Implementing Linked Data in Low-Resource Conditions
PPTX
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...
PDF
Levine - Data Curation; Ethics and Legal Considerations
PPTX
Open Data and the Panton Principles in the Humanities
PPT
Ownership, intellectual property, and governance considerations for academic ...
PPT
The Importance of Marketing Digital Collections
PPTX
LOD/LAM Presentation
PPTX
Data management basics, for UC Davis EDU 292
PPT
Data Management for Undergraduate Researchers (updated - 02/2016)
PPTX
Don’t fear the data: Statistics in Information Literacy Instruction
PPT
Introduction to Digital File Management
PDF
ANDS and Data Management
PDF
Think Big about Data: Archaeology and the Big Data Challenge
Research Data Management
Labou "Data Science and the Library at UC San Diego"
The liaison librarian: connecting with the qualitative research lifecycle
Research Data Services at the University of Utah
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"
Why does research data matter to libraries
Data Management Planning for researchers
Implementing Linked Data in Low-Resource Conditions
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...
Levine - Data Curation; Ethics and Legal Considerations
Open Data and the Panton Principles in the Humanities
Ownership, intellectual property, and governance considerations for academic ...
The Importance of Marketing Digital Collections
LOD/LAM Presentation
Data management basics, for UC Davis EDU 292
Data Management for Undergraduate Researchers (updated - 02/2016)
Don’t fear the data: Statistics in Information Literacy Instruction
Introduction to Digital File Management
ANDS and Data Management
Think Big about Data: Archaeology and the Big Data Challenge
Ad

Viewers also liked (7)

PPTX
Big data 101
PPT
101 Marketing Charts
PPT
Data 101: Fundamentals of Data in GIS
PPTX
What is big data?
PPTX
Big Data for Beginners
PPTX
Big data ppt
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Big data 101
101 Marketing Charts
Data 101: Fundamentals of Data in GIS
What is big data?
Big Data for Beginners
Big data ppt
Big Data - 25 Amazing Facts Everyone Should Know
Ad

Similar to Data 101: A Gentle Introduction (20)

PPTX
Data 101: A Gentle Introduction
PPT
Data Munging in concepts of data mining in DS
PPTX
open-data-presentation.pptx
PDF
What Topics Are Covered in Data Science Courses in Delhi | IABAC
PPTX
AI Project Cycle Summary Class ninth please
PDF
What is Data Mining.pdf
PDF
Data Visualization in the Newsroom
PPTX
UNIT-1 Data Visualization for the life use
PPTX
UNIT-1 Data Visualization used in daily life
PDF
The new flow of information
PDF
KIT-601 Lecture Notes-UNIT-1.pdf
PDF
Data Visualisation: Types, Principles, and Tools
PPT
Data, data, data
PDF
10 best platforms to find free datasets
PDF
Big Data & Analytics (Conceptual and Practical Introduction)
PPTX
Bigdata Hadoop introduction
PDF
7 ‘Hidden’ Sources of Big Data That You Have
PPTX
Unit 1 Introduction to Data Analytics .pptx
PPTX
Introduction to data science
PPT
Manchester Business School Nov 2010
Data 101: A Gentle Introduction
Data Munging in concepts of data mining in DS
open-data-presentation.pptx
What Topics Are Covered in Data Science Courses in Delhi | IABAC
AI Project Cycle Summary Class ninth please
What is Data Mining.pdf
Data Visualization in the Newsroom
UNIT-1 Data Visualization for the life use
UNIT-1 Data Visualization used in daily life
The new flow of information
KIT-601 Lecture Notes-UNIT-1.pdf
Data Visualisation: Types, Principles, and Tools
Data, data, data
10 best platforms to find free datasets
Big Data & Analytics (Conceptual and Practical Introduction)
Bigdata Hadoop introduction
7 ‘Hidden’ Sources of Big Data That You Have
Unit 1 Introduction to Data Analytics .pptx
Introduction to data science
Manchester Business School Nov 2010

More from ksilk (20)

PDF
OLA Super Conference 2019: Data Skills for 21st Century Library Practice
PDF
OLA Super Conference 2019: Research Round-up
PDF
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
PDF
Constructing a Strategic Plan: Essential Processes and Components
PDF
Library Space Use Study: What we Learned
PPTX
Surfacing Integration in the Digital Scholarship Ecosystem
PPTX
Library Value Projects
PPTX
Trends in Demonstrating Library Value
PPT
All Together Now: Collaboration and Coordination in Canada's Digital Scholars...
PDF
L-Index: Designing a New Method for Measuring Library Impact in Canada
PDF
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
PPTX
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...
PPTX
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015
PPTX
Evidence-Based Innovation
PPTX
Library Impact Studies: Lessons Learned
PPTX
Data, Metrics, and our Profession
PPTX
CLA 2014: The Economic Impact of Libraries
PPTX
So Much More: The Economic Impact of Toronto Public Library on the City of To...
PDF
Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
PDF
TRY 2011 - Mentoring the 21st Century Information Professional
OLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Research Round-up
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
Constructing a Strategic Plan: Essential Processes and Components
Library Space Use Study: What we Learned
Surfacing Integration in the Digital Scholarship Ecosystem
Library Value Projects
Trends in Demonstrating Library Value
All Together Now: Collaboration and Coordination in Canada's Digital Scholars...
L-Index: Designing a New Method for Measuring Library Impact in Canada
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015
Evidence-Based Innovation
Library Impact Studies: Lessons Learned
Data, Metrics, and our Profession
CLA 2014: The Economic Impact of Libraries
So Much More: The Economic Impact of Toronto Public Library on the City of To...
Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
TRY 2011 - Mentoring the 21st Century Information Professional

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
Anesthesia in Laparoscopic Surgery in India
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
O7-L3 Supply Chain Operations - ICLT Program
Basic Mud Logging Guide for educational purpose
Week 4 Term 3 Study Techniques revisited.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
VCE English Exam - Section C Student Revision Booklet
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
Supply Chain Operations Speaking Notes -ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
Module 4: Burden of Disease Tutorial Slides S2 2025

Data 101: A Gentle Introduction

  • 1. Data 101: A Gentle Introduction Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, Rotman School of Management, University of Toronto 23 October 2013
  • 2. Our Agenda • • • • • • Defining data librarianship Basic terminology Data sources Big Data, Open Data Data analysis tools Our challenge: data management, preservation, discovery and access • What are data visualizations? • Sources • Q&A 2
  • 3. Warnings: • Data librarianship involves LOTS of acronyms • There is NO MATH in data librarianship – (well, almost none) 3
  • 4. Defining Data Librarianship • Data librarianship is a relatively new area of practice, emerging with the growth of digital media since the 1970s; • Data librarians are professional library staff engaged in managing research data as a resource, and supporting researchers in these activities; • We support our institutions and researchers in the areas of data management, metadata management, and teaching how to use data as a resource; • Many of us work in the social sciences, but there is growth in the natural sciences and humanities as well. 4
  • 5. Basic Terminology • Data – plural! Think: Squirrels!!  • Microdata – raw data, individual records consisting of rows of numbers (Excel spreadsheet); • Statistics – summarized tables and cross-tabulations that have been formulated from the raw data; • Aggregate data – statistical summaries organized in a data file structure (Excel) that permits further analysis; • PUMF – Public Use Microdata File – raw data that is available for public use; some data may be filtered and geographies repressed to ensure personal privacy; • Variables – a set of factors, traits or conditions that describes a unit of analysis; for instance, sex, age, marital status, etc. • Frequencies – the number of times an observation occurs in the data; 5
  • 6. Common Data Sources • Gov’t- collected surveys – – – – – – – – – – – 6 Statistics Canada – public data and the Data Liberation Initiative US Census (American Fact Finder) Bureau of Labor Statistics, Bureau of Economic Analysis Roper (Public Opinion Polls) ICPSR (Inter-University Consortium for Political and Social Research) International sources such as UK Data Archive, Swedish National Data Service, Australian Data Archive, etc. OECD iLibrary World Bank Open Data Pew Research Center Gallup Thomson
  • 7. Other International Data Sources • Some countries do not gather data, have not been gathering data for very long, or else limit or filter available data • For instance, developing countries may not gather, preserve or release their data; • The BRICs (Brazil, Russia, India, China) will struggle with this issue as their economies grow. 7
  • 8. Uncommon Data Sources • Data can come from everywhere; • Occasionally, the MPI acquires data from unusual sources, such as: – Billboard magazine – MySpace social media site for bands – CrunchBase database of technology companies 8
  • 9. Open Data • Open data are data that are openly available, free of charge and copyright, and available in non-proprietary formats • Can be used and re-used • Public money (taxes) funds data creation and collection, and therefore own the data • Many governments are moving toward open data, but it takes time, management, and caution • Issues: privacy, transparency, maintenance • Examples: – Toronto – Vancouver – U.S. 9
  • 10. Big Data • Big Data are data that are too large for the average database management tool (Access and Excel, for instance). • Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. • Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes). 10
  • 11. Data Discovery Platforms • Nesstar – developed in Norway by Norwegian Social Science Data Services, used by Statistics Canada, UK Data Archive, NORC at the University of Chicago • SDA – developed at Berkley, University of Toronto, ICPSR • Equinox – used at Western • ODESI – proprietary system developed and used by Scholars Portal • Dataverse – Open source system developed by the Institute for Quantitative Social Science (IQSS) at Harvard, used by NBER and ICPSR 11
  • 12. Data Analysis Packages • SPSS – great for beginners, easy to use. Pointand-click interface; power users will want more. • SAS – preferred by power users; sharp learning curve. • Stata – easy to learn, powerful. • R – open source, free, powerful, no GUI. • Use what your colleagues are using. 12
  • 13. Data Management, Preservation, Discovery & Access • • • • • • • • We’ve conquered print collections, but data present a new challenge; Like all digital files, metadata is necessary to describe data assets; Like images, a single data set can mean many things to many people; How do we manage these data to make sure they are discoverable, accessible, and preserved? Traditionally, data files have been stored on network drives, and shared or restricted according to the groups who need to use them; Network drives are difficult to search, can be hard to share and restrict, and don’t deal with metadata well; Web pages with links has been a common way to distribute data sets; We needed new tools – a new kind of catalogue that is designed for the specialized needs of data.
  • 14. Dataverse • We installed an iteration of Dataverse at the University of Toronto, in our “cloud”, and I manage my data collections myself; • As an open source solution, it’s cost-effective and my colleagues at Scholar’s Portal support it for me and other Ontario universities. • The data are associated with studies; several data sets can be associated with a single study; • The world can see the metadata for each data collection, but access to the data sets themselves are restricted to those who contact me to get permission.
  • 16. Data Visualizations • The visual representation of data ---- literally, a picture can say a thousand [numbers] • Edward Tufte is a key pioneer: http://www.edwardtufte.com/tufte/ • Fantastic examples at Flowing Data: http://flowingdata.com/ • RSA Animate: http://www.thersa.org/ 16
  • 17. Sources • International Association for Social Science Information Services & Technology (ASSIST) http://www.iassistdata.org/ • • • • • 17 OECD iLibrary - http://www.oecd-ilibrary.org/ World Bank Data - http://data.worldbank.org/ UK Data Archive - http://data-archive.ac.uk/ Nesstar - http://www.nesstar.com/ Dataverse - http://thedata.org/
  • 18. Q&A (and, Thank You!) Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto kimberly.silk@martinprosperity.org