Defining Datasets and Statistics
What is Data?
da·ta noun plural but singular or plural in construction, often
attributive ˈdā-tə, ˈda- also ˈdä-
1: factual information (as measurements or statistics) used as a
basis for reasoning, discussion, or calculation
2: information output by a sensing device or organ that includes
both useful and irrelevant or redundant information and must be
processed to be meaningful
3: information in numerical form
that can be digitally transmitted
or processed
Merriam-Webster (http://www.merriam-webster.com/dictionary/data)
Data can be
• Observational: Captured in real-time,
typically outside the lab
– Examples: Sensor readings, survey results,
images, audio, video
• Experimental: Typically generated in the
lab or under controlled conditions
– Examples: test results
• Simulation: Machine generated from test
models
– Examples: climate models, economic models
• Derived /Compiled: Generated from
existing datasets
– Examples: text and data mining, compiled
database, 3D models
Data can be
• Text: field or laboratory notes,
survey responses
• Numeric: tables, counts,
measurements
• Audiovisual: images, sound
recordings, video
• Models, computer code,
geospatial data
• Discipline-specific: FITS in
astronomy, CIF in chemistry
• Instrument-specific: equipment
outputs
Microdata
• Data directly observed or collected from a
specific unit of observation.
• Contain individual cases, usually individual
people, or in the case of Census data,
individual households
Examples:
• Census: the unit of observation is probably an
individual, a household or a family.
• Survey or poll: the responses of a single respondent
Aggregate Data
Is higher-level data that have been compiled
from smaller units of data.
Examples: inflation rate, consumer price index,
demographic data for city or state
Statistics
are numerical data that has
been organized and
interpreted, usually
displayed in tables.
Datasets
• A dataset or study is
made up of the raw data
file and any related files,
usually the codebook
and setup files.
• Most data sets require at
least basic statistical
analysis (Stata, SPSS, R,
etc.) or spreadsheet
programs (Excel) to use.
Repositories
• A data repository is a collection of
datasets that have been
deposited for storage and
findability.
• They are often
– discipline specific and/or
– affiliated with a research institution
• Examples
– ICPSR
– Harvard Dataverse Network
– University institutional repositories
To recap:
• Data are raw ingredients
from which statistics are
created.
• Statistical analysis can be
performed on data to
show relationships among
the variables collected.
• Through secondary data
analysis, many different
researchers can re-use the
same data set for
different purposes.
Finding Datasets
1. Think about who might
collect the data.
• Could it have been collected by a government
agency?
• A nonprofit or nongovernmental organization?
• A private business or industry group?
• Academic researchers?
2. Look for publications that use
the kind of data you’re looking for
and that cite the dataset
In other words, is the data you
want mentioned in scholarly
articles or government reports
or some other source?
3. Once you know that what you want
exists, it's time to hunt it down.
• Is it freely available on the web?
• Or part of a package to which the
library already subscribes?
• Is it something we can buy? (And is
it within the library's budget and
can the purchase be made quickly
enough to fit your timeframe?)
• Can it be requested directly from the
researcher?
Credits: adapted from and used with permission from UC SanDiego Library, “What is
Data,” August 2018. http://ucsd.libguides.com/data-statistics/whatisdata.

More Related Content

PPTX
Aep mc nairguide
PPTX
Gaining credit for sharing research data: Viewpoints on Data Publishing
PPTX
Finding statistics2
PDF
Peer Reviewing Data: experiences from a data journal
PPTX
The challenge of sharing data well, how publishers can help
PPTX
Transparency and reproducibility in research
PDF
Wc11 talk trawling_bibliome_3_r_alkema_25082021
PPTX
Wilson-npg-scientific data-nfdp13
Aep mc nairguide
Gaining credit for sharing research data: Viewpoints on Data Publishing
Finding statistics2
Peer Reviewing Data: experiences from a data journal
The challenge of sharing data well, how publishers can help
Transparency and reproducibility in research
Wc11 talk trawling_bibliome_3_r_alkema_25082021
Wilson-npg-scientific data-nfdp13

What's hot (20)

PDF
Analyzing Extended and Scientific Metadata for Scalable Index Designs
PDF
Data sharing as part of the research workflow
PDF
Scientific Data and peer review session at Dryad event, May 2015
PPTX
Introduction to Data Management
PPTX
Introduction to open-data
PPTX
Data Science, Data & Dashboards Design
PDF
A basic course on Research data management, part 4: caring for your data, or ...
PDF
A basic course on Research data management, part 1: what and why
PPTX
Research data management
PPTX
Data peer review workshop
PPTX
Data and Donuts: How to write a data management plan
PDF
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
PPTX
EDI Training Module 12: An Introduction to Metadata and Data Repositories
PPTX
Data and Donuts: The Impact of Data Management
PPTX
Data mining tasks
PPTX
Data Management for Librarians
PDF
data mining
PPTX
Exploratory data analysis with Python
PPTX
EDI Training Module 5: Creating Clean Data foro Publishing
PPTX
Data Mining: Classification and analysis
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Data sharing as part of the research workflow
Scientific Data and peer review session at Dryad event, May 2015
Introduction to Data Management
Introduction to open-data
Data Science, Data & Dashboards Design
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 1: what and why
Research data management
Data peer review workshop
Data and Donuts: How to write a data management plan
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
EDI Training Module 12: An Introduction to Metadata and Data Repositories
Data and Donuts: The Impact of Data Management
Data mining tasks
Data Management for Librarians
data mining
Exploratory data analysis with Python
EDI Training Module 5: Creating Clean Data foro Publishing
Data Mining: Classification and analysis
Ad

Similar to DataVsStatistics (20)

PPTX
Data 2014
PPTX
Data Science topic and introduction to basic concepts involving data manageme...
PDF
Data Management Lab: Session 2 slides
PPTX
ch2 DS.pptx
PPTX
Data mining introduction
PPTX
Data and Statistics library research at UCSD
PPTX
Data Science Introduction to Data Science
PPT
chap1.ppt
PPT
chap1.ppt
PPT
chap1.ppt
PPT
Information_System_and_Data_mining12.ppt
PDF
chapter 2 Data Science.pdf emerging ecnology freshman course
PPTX
Data Processing & Explain each term in details.pptx
PPTX
Researchpe-5.pptx
PPTX
Data analytics unit 1 aktu updated syllabus new
PPT
Data management plans (dmp) for nsf
PPT
Data management plans (dmp) for nsf
PPTX
DMDA Unit-1.pptx .
DOCX
ch 2 Tools of Research.docx
PPTX
Introduction to data science
Data 2014
Data Science topic and introduction to basic concepts involving data manageme...
Data Management Lab: Session 2 slides
ch2 DS.pptx
Data mining introduction
Data and Statistics library research at UCSD
Data Science Introduction to Data Science
chap1.ppt
chap1.ppt
chap1.ppt
Information_System_and_Data_mining12.ppt
chapter 2 Data Science.pdf emerging ecnology freshman course
Data Processing & Explain each term in details.pptx
Researchpe-5.pptx
Data analytics unit 1 aktu updated syllabus new
Data management plans (dmp) for nsf
Data management plans (dmp) for nsf
DMDA Unit-1.pptx .
ch 2 Tools of Research.docx
Introduction to data science
Ad

Recently uploaded (20)

PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Leprosy and NLEP programme community medicine
PDF
Introduction to Data Science and Data Analysis
PPTX
modul_python (1).pptx for professional and student
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
chrmotography.pptx food anaylysis techni
PDF
Microsoft 365 products and services descrption
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Steganography Project Steganography Project .pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
Managing Community Partner Relationships
PPT
Image processing and pattern recognition 2.ppt
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Business_Capability_Map_Collection__pptx
PDF
Global Data and Analytics Market Outlook Report
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Leprosy and NLEP programme community medicine
Introduction to Data Science and Data Analysis
modul_python (1).pptx for professional and student
SET 1 Compulsory MNH machine learning intro
IMPACT OF LANDSLIDE.....................
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
chrmotography.pptx food anaylysis techni
Microsoft 365 products and services descrption
SAP 2 completion done . PRESENTATION.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Steganography Project Steganography Project .pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Managing Community Partner Relationships
Image processing and pattern recognition 2.ppt
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Business_Capability_Map_Collection__pptx
Global Data and Analytics Market Outlook Report

DataVsStatistics

  • 2. What is Data? da·ta noun plural but singular or plural in construction, often attributive ˈdā-tə, ˈda- also ˈdä- 1: factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation 2: information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful 3: information in numerical form that can be digitally transmitted or processed Merriam-Webster (http://www.merriam-webster.com/dictionary/data)
  • 3. Data can be • Observational: Captured in real-time, typically outside the lab – Examples: Sensor readings, survey results, images, audio, video • Experimental: Typically generated in the lab or under controlled conditions – Examples: test results • Simulation: Machine generated from test models – Examples: climate models, economic models • Derived /Compiled: Generated from existing datasets – Examples: text and data mining, compiled database, 3D models
  • 4. Data can be • Text: field or laboratory notes, survey responses • Numeric: tables, counts, measurements • Audiovisual: images, sound recordings, video • Models, computer code, geospatial data • Discipline-specific: FITS in astronomy, CIF in chemistry • Instrument-specific: equipment outputs
  • 5. Microdata • Data directly observed or collected from a specific unit of observation. • Contain individual cases, usually individual people, or in the case of Census data, individual households Examples: • Census: the unit of observation is probably an individual, a household or a family. • Survey or poll: the responses of a single respondent
  • 6. Aggregate Data Is higher-level data that have been compiled from smaller units of data. Examples: inflation rate, consumer price index, demographic data for city or state
  • 7. Statistics are numerical data that has been organized and interpreted, usually displayed in tables.
  • 8. Datasets • A dataset or study is made up of the raw data file and any related files, usually the codebook and setup files. • Most data sets require at least basic statistical analysis (Stata, SPSS, R, etc.) or spreadsheet programs (Excel) to use.
  • 9. Repositories • A data repository is a collection of datasets that have been deposited for storage and findability. • They are often – discipline specific and/or – affiliated with a research institution • Examples – ICPSR – Harvard Dataverse Network – University institutional repositories
  • 10. To recap: • Data are raw ingredients from which statistics are created. • Statistical analysis can be performed on data to show relationships among the variables collected. • Through secondary data analysis, many different researchers can re-use the same data set for different purposes.
  • 12. 1. Think about who might collect the data. • Could it have been collected by a government agency? • A nonprofit or nongovernmental organization? • A private business or industry group? • Academic researchers?
  • 13. 2. Look for publications that use the kind of data you’re looking for and that cite the dataset In other words, is the data you want mentioned in scholarly articles or government reports or some other source?
  • 14. 3. Once you know that what you want exists, it's time to hunt it down. • Is it freely available on the web? • Or part of a package to which the library already subscribes? • Is it something we can buy? (And is it within the library's budget and can the purchase be made quickly enough to fit your timeframe?) • Can it be requested directly from the researcher?
  • 15. Credits: adapted from and used with permission from UC SanDiego Library, “What is Data,” August 2018. http://ucsd.libguides.com/data-statistics/whatisdata.