SlideShare a Scribd company logo
Part 2:
What are data?
[Hands-on exercise]
What are Data?
Data are one part of scholarly capital, along with
human capital and instrumentation.
Data have become essential scholarly objects to be
captured, mined, used and reused.
Research in all academic fields relies on data.
Research Data
Lays out a nice definition of data and how they vary in different disciplines
The Digital Future is Now: A Call to Action for the Humanities
(please read sections 25-44).
[http://www.digitalhumanities.org/dhq/vol/3/4/000077/000077.html]
Presidential Chair & Professor of
Information Studies,
University of California, Los Angeles
Christine Borgman
Definitions associated with archival information systems offer a
useful starting point:
Definition of data
A reinterpretable representation of
information in a formalized manner suitable
for communication, interpretation, or
processing.
Examples of data include a sequence of bits, a
table of numbers, the characters on a page, the
recording of sounds made by a person speaking,
or a moon rock specimen.
Source: Reference model for an open archival information system 2002, 1-9.
[http://public.ccsds.org/publications/archive/650x0b1s.pdf]
Technical definition
Definition of data
In Buckland’s terms, data are
“alleged evidence”
Source: Buckland,M.K. (1991). “Information as thing.” Journal of the American Society for Information Science, 42 (5): 351-360.
Socio-technical definition
What are data?
Think about data by its origin.
In the context of cyberinfrastructure, the four categories of data identified in an influential
U.S. policy report Long-lived Data Collections 2005, and incorporated in National Science
Foundation strategy Cyberinfrastructure Vision for 21st Century Discovery 2007, are now
widely accepted.
1. Observational data- include weather measurements and
attitude surveys...
2. Computational data- result from executing a computer model
or simulation whether for physics or cultural virtual reality.
3. Experimental data- include results from laboratory studies
such as measurements of chemical reactions …
4. Records of government, business and public and private life
yield useful data for scientific, social scientific, and humanistic
research.
Example 1
Audio analyser
Frequency analyser
Intelligent Speech Analyser
MS Excel spread sheet
Audio clips
Text reports
Certain parts of the content for example 1 have been removed due to sensitive content
and copyright issue.
Please contact WY for more information.
Video recorders
Voice recorders
Diary
Video clips
Audio clips
Diary entries
Data Variety
To give you a better idea of what can be data, Christine Borgman
later expands on her examples and sources of data and how they
vary by branch of research.
Scientific data Social scientific data Humanities and arts data
Examples Ecology: weather, ground
water, sensor readings,
historical record
Medicine: xrays
Chemistry: protein structures
Astronomy: spectral surveys
Biology: specimens
Physics: events, objects
Documentation: Lab and field
notebooks, spreadsheets
Opinion polls
Surveys, interviews
Mass media
Laboratory experiments
Field experiments
Demographic records
Census records
Voting records
Economic indicators
Newspapers
Photographs
Letters
Diaries
Books, articles
Birth, death, marriage
records
Church records
Court records
School and college
yearbooks
Maps…
Sources Generate own data
Acquire from collaborators,
other scientists
Data repository
Generate own data
Acquire from other
scholars
Data repositories: Social
Surveys
Government records
Corporate records
Libraries, archives,
museums
Public records
Corporate records, mass
media
Acquire from other
scholars
Data repositories:
Beazley, Arts &
Humanities Data Service
(UK)
Table: Examples and sources of data from the major research branches. (Borgman)
Example 2
Example 2 has been removed due to sensitive content.
Please contact WY for more information.
Exercise
1. Form a group based on subject or discipline.
[Those without subject role can join in any group]
2. Hands-on exercise for Librarians (please work in group)
- use OneSearch/ Databases/ DR-NTU/ Google to get an article published by
any of your faculty or researcher.
- quickly go through the research paper, particularly the methodology section.
3. Librarians among the group to ask and answer the following questions.
[see next slide]
4. Post the findings (title of the research article, question and answer) to PD blog.
Instructions:
1.
Who are they? What research community do they belong to?
What larger discipline is that community a part of?
2.
What data are they creating (i.e., data types, formats, etc)?
How are they creating these data?
3.
What are the roles of data in their research?
Title: Librarian Class Attendance: Methods, Outcomes and Opportunities
http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1757&context=iatul
http://www.iatul.org/doclibrary/public/Conf_Proceedings/2006/CmorMarshallpaper.pdf
Example sharing
1. Who are they? What research community do they belong to? What larger discipline is that
community a part of?
Dianne Cmor and Victoria Marshall. Library science research community. Information Science.
2. What data are they creating (i.e., data types, formats, etc)? How are they creating these data?
1. Diary entries
2. Qualitative data from Ethnograph and SPSS
3. Reftracker report
4. Interview notes
5. Survey feedback
The data are mostly text and numeric social scientific and humanities & arts data.
3. What are the roles of data in their research?
The information collected was converted/ translated into data. The researchers analysed the
data and got the findings out from the data. They examined and evaluated the outcomes/
findings and then built a convincing evidence to answer all the questions they have posed
earlier for their research.
Example: [Before the interview]
Who are they? What research community do they belong to? What larger discipline is that community a part of?
Dianne Cmor is the lead researcher for a research project. Victoria Marshall is another member of the research project. Dianne
and Victoria are both librarian in a university library.
The project is a library related research and the topic of her research is "Librarian class attendance: methods, outcomes and
opportunities". [Library science research community]
The discipline of the research project belongs to Information Science.
What data are they creating (i.e., data types, formats, etc)? How are they creating these data?
The researchers attended a number of seminars called “Journal club” for about 9 weeks. They have jotted down all their
observation in the seminar on a diary. The diary entries were typed out in MS Word and eventually converted to some
qualitative data by using the Ethnograph software and SPSS software.
Reftracker was used each week to document time spent and associated outcomes in relation to meetings with students,
students’ attendance, and the creation of course support content.
The researchers conducted a few interview with the students and faculty members to collect information. A paper survey
form was also created to collect feedback from the students and some faculty members. The researchers typed out all the
notes collected from the interview and survey in MS Word.
The hard copy of the diaries and survey forms were scanned and saved in PDF format.
The data are mostly text and numeric social scientific and humanities & arts data.
What are the roles of data in their research?
The information collected through the observation at various university lectures and seminars/ tutorials, interviews and
survey conducted for students and faculty members was translated into data. The team analysed the data and got the findings
out from the data. They examined and evaluated the outcomes/ findings and then built a convincing evidence to answer all
the questions they have posed earlier for their research.
Example: [After the interview]
[For reference only]
Data Stage Output
# of Files / Typical
Size Format Other / Notes
Primary Data
Raw Diary, interview notes
and survey forms
25 files/ unknown Handwritten hard copy
Processed Diary and survey forms 2 files/ < 3MB PDF Scanned copy of the diary (1 file)
and survey forms (1 file).
Original data from the
diary, interview notes
and survey forms
3 files/ < 3MB .doc [MS Word] All entries in the diary, notes &
feedback from the interview and
survey were typed out in MS
Word.
Analyzed Qualitative and
quantitative data
2 files/ < 500KB .CHN [Ethnograph]
.csv [MS Excel]
The researchers used
Ethnograph software and SPSS
software to generate qualitative
data.
A report generated from
RefTracker.
Finalized Report [tables and
figures]
<100KB .csv [MS Excel]
Note: The data specifically designated by the scientist to make publicly available are indicated by the
rows shaded in gray (the “Analyzed” row is shaded here as an example). Empty cells represent cases in
which information was not collected or the scientist could not provide a response.
The data table [For reference only. You don’t have to do this]
Example: Data curation profile

More Related Content

PDF
What is Data Science
PDF
Introduction to dataset
PDF
Data Governance Best Practices and Lessons Learned
PPTX
Introduction to database
PDF
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
PPTX
Major types of statistics terms that you should know
PDF
Conceptual vs. Logical vs. Physical Data Modeling
PPTX
Data analytics with R
What is Data Science
Introduction to dataset
Data Governance Best Practices and Lessons Learned
Introduction to database
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
Major types of statistics terms that you should know
Conceptual vs. Logical vs. Physical Data Modeling
Data analytics with R

Viewers also liked (8)

PPTX
Copyright for Librarians
PPT
E-books Mobility
DOCX
A crisis de 1929 y la gran depresión
PDF
Let's Network
PPTX
From vague objectives to focused outcomes
PDF
Research Data Overview
PDF
Data Management – A Case Study
PDF
Scholarly impact metrics traditions
Copyright for Librarians
E-books Mobility
A crisis de 1929 y la gran depresión
Let's Network
From vague objectives to focused outcomes
Research Data Overview
Data Management – A Case Study
Scholarly impact metrics traditions
Ad

Similar to What are Data? (20)

PPTX
Research data life cycle
PDF
Data collection methods in research
PPT
Htrm2009 Student Workshop Session1
PPTX
21st Century Research Landscape
PPT
Edirisingha ethics unisa2012_12_june2012
PPT
Introduction To Critical Enquiry Research
PPTX
LIS 653, Session 11: Data Management & Curation
PPTX
Building and providing data management services a framework for everyone!
DOC
Social Media Use by Canadian Academic Librarians
PPTX
Borgman orcid dryadsymposiumoxford20130523
PPTX
Ps rwebinar january2019final
PDF
Data Analysis Methods in Physical Oceanography 2nd Edition W. J. Emery
PPT
Alenka Sauperl: Qualitative Research Methods in Information and Library Science
PPT
SOC2002 Lecture 3
PDF
Va sla nov 15 final
PPTX
STEM Mom Speaks to Teachers at Princeton University
PPT
Information Skills For Researchers V3
PPTX
Research Data Management in the Humanities and Social Sciences
PDF
Data Analysis Methods in Physical Oceanography 2nd Edition W. J. Emery
Research data life cycle
Data collection methods in research
Htrm2009 Student Workshop Session1
21st Century Research Landscape
Edirisingha ethics unisa2012_12_june2012
Introduction To Critical Enquiry Research
LIS 653, Session 11: Data Management & Curation
Building and providing data management services a framework for everyone!
Social Media Use by Canadian Academic Librarians
Borgman orcid dryadsymposiumoxford20130523
Ps rwebinar january2019final
Data Analysis Methods in Physical Oceanography 2nd Edition W. J. Emery
Alenka Sauperl: Qualitative Research Methods in Information and Library Science
SOC2002 Lecture 3
Va sla nov 15 final
STEM Mom Speaks to Teachers at Princeton University
Information Skills For Researchers V3
Research Data Management in the Humanities and Social Sciences
Data Analysis Methods in Physical Oceanography 2nd Edition W. J. Emery
Ad

Recently uploaded (20)

PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Mega Projects Data Mega Projects Data
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Global journeys: estimating international migration
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Computer network topology notes for revision
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Moving the Public Sector (Government) to a Digital Adoption
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Mega Projects Data Mega Projects Data
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Global journeys: estimating international migration
Launch Your Data Science Career in Kochi – 2025
Computer network topology notes for revision
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Ppt On Nestle.pptx huunnnhhgfvu
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Database Infoormation System (DBIS).pptx
climate analysis of Dhaka ,Banglades.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
Miokarditis (Inflamasi pada Otot Jantung)
Data_Analytics_and_PowerBI_Presentation.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

What are Data?

  • 1. Part 2: What are data? [Hands-on exercise]
  • 3. Data are one part of scholarly capital, along with human capital and instrumentation. Data have become essential scholarly objects to be captured, mined, used and reused. Research in all academic fields relies on data.
  • 4. Research Data Lays out a nice definition of data and how they vary in different disciplines The Digital Future is Now: A Call to Action for the Humanities (please read sections 25-44). [http://www.digitalhumanities.org/dhq/vol/3/4/000077/000077.html] Presidential Chair & Professor of Information Studies, University of California, Los Angeles Christine Borgman
  • 5. Definitions associated with archival information systems offer a useful starting point: Definition of data A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen. Source: Reference model for an open archival information system 2002, 1-9. [http://public.ccsds.org/publications/archive/650x0b1s.pdf] Technical definition
  • 6. Definition of data In Buckland’s terms, data are “alleged evidence” Source: Buckland,M.K. (1991). “Information as thing.” Journal of the American Society for Information Science, 42 (5): 351-360. Socio-technical definition
  • 7. What are data? Think about data by its origin. In the context of cyberinfrastructure, the four categories of data identified in an influential U.S. policy report Long-lived Data Collections 2005, and incorporated in National Science Foundation strategy Cyberinfrastructure Vision for 21st Century Discovery 2007, are now widely accepted. 1. Observational data- include weather measurements and attitude surveys... 2. Computational data- result from executing a computer model or simulation whether for physics or cultural virtual reality. 3. Experimental data- include results from laboratory studies such as measurements of chemical reactions … 4. Records of government, business and public and private life yield useful data for scientific, social scientific, and humanistic research.
  • 8. Example 1 Audio analyser Frequency analyser Intelligent Speech Analyser MS Excel spread sheet Audio clips Text reports Certain parts of the content for example 1 have been removed due to sensitive content and copyright issue. Please contact WY for more information. Video recorders Voice recorders Diary Video clips Audio clips Diary entries
  • 9. Data Variety To give you a better idea of what can be data, Christine Borgman later expands on her examples and sources of data and how they vary by branch of research.
  • 10. Scientific data Social scientific data Humanities and arts data Examples Ecology: weather, ground water, sensor readings, historical record Medicine: xrays Chemistry: protein structures Astronomy: spectral surveys Biology: specimens Physics: events, objects Documentation: Lab and field notebooks, spreadsheets Opinion polls Surveys, interviews Mass media Laboratory experiments Field experiments Demographic records Census records Voting records Economic indicators Newspapers Photographs Letters Diaries Books, articles Birth, death, marriage records Church records Court records School and college yearbooks Maps… Sources Generate own data Acquire from collaborators, other scientists Data repository Generate own data Acquire from other scholars Data repositories: Social Surveys Government records Corporate records Libraries, archives, museums Public records Corporate records, mass media Acquire from other scholars Data repositories: Beazley, Arts & Humanities Data Service (UK) Table: Examples and sources of data from the major research branches. (Borgman)
  • 11. Example 2 Example 2 has been removed due to sensitive content. Please contact WY for more information.
  • 13. 1. Form a group based on subject or discipline. [Those without subject role can join in any group] 2. Hands-on exercise for Librarians (please work in group) - use OneSearch/ Databases/ DR-NTU/ Google to get an article published by any of your faculty or researcher. - quickly go through the research paper, particularly the methodology section. 3. Librarians among the group to ask and answer the following questions. [see next slide] 4. Post the findings (title of the research article, question and answer) to PD blog. Instructions:
  • 14. 1. Who are they? What research community do they belong to? What larger discipline is that community a part of? 2. What data are they creating (i.e., data types, formats, etc)? How are they creating these data? 3. What are the roles of data in their research?
  • 15. Title: Librarian Class Attendance: Methods, Outcomes and Opportunities http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1757&context=iatul http://www.iatul.org/doclibrary/public/Conf_Proceedings/2006/CmorMarshallpaper.pdf Example sharing
  • 16. 1. Who are they? What research community do they belong to? What larger discipline is that community a part of? Dianne Cmor and Victoria Marshall. Library science research community. Information Science. 2. What data are they creating (i.e., data types, formats, etc)? How are they creating these data? 1. Diary entries 2. Qualitative data from Ethnograph and SPSS 3. Reftracker report 4. Interview notes 5. Survey feedback The data are mostly text and numeric social scientific and humanities & arts data. 3. What are the roles of data in their research? The information collected was converted/ translated into data. The researchers analysed the data and got the findings out from the data. They examined and evaluated the outcomes/ findings and then built a convincing evidence to answer all the questions they have posed earlier for their research. Example: [Before the interview]
  • 17. Who are they? What research community do they belong to? What larger discipline is that community a part of? Dianne Cmor is the lead researcher for a research project. Victoria Marshall is another member of the research project. Dianne and Victoria are both librarian in a university library. The project is a library related research and the topic of her research is "Librarian class attendance: methods, outcomes and opportunities". [Library science research community] The discipline of the research project belongs to Information Science. What data are they creating (i.e., data types, formats, etc)? How are they creating these data? The researchers attended a number of seminars called “Journal club” for about 9 weeks. They have jotted down all their observation in the seminar on a diary. The diary entries were typed out in MS Word and eventually converted to some qualitative data by using the Ethnograph software and SPSS software. Reftracker was used each week to document time spent and associated outcomes in relation to meetings with students, students’ attendance, and the creation of course support content. The researchers conducted a few interview with the students and faculty members to collect information. A paper survey form was also created to collect feedback from the students and some faculty members. The researchers typed out all the notes collected from the interview and survey in MS Word. The hard copy of the diaries and survey forms were scanned and saved in PDF format. The data are mostly text and numeric social scientific and humanities & arts data. What are the roles of data in their research? The information collected through the observation at various university lectures and seminars/ tutorials, interviews and survey conducted for students and faculty members was translated into data. The team analysed the data and got the findings out from the data. They examined and evaluated the outcomes/ findings and then built a convincing evidence to answer all the questions they have posed earlier for their research. Example: [After the interview] [For reference only]
  • 18. Data Stage Output # of Files / Typical Size Format Other / Notes Primary Data Raw Diary, interview notes and survey forms 25 files/ unknown Handwritten hard copy Processed Diary and survey forms 2 files/ < 3MB PDF Scanned copy of the diary (1 file) and survey forms (1 file). Original data from the diary, interview notes and survey forms 3 files/ < 3MB .doc [MS Word] All entries in the diary, notes & feedback from the interview and survey were typed out in MS Word. Analyzed Qualitative and quantitative data 2 files/ < 500KB .CHN [Ethnograph] .csv [MS Excel] The researchers used Ethnograph software and SPSS software to generate qualitative data. A report generated from RefTracker. Finalized Report [tables and figures] <100KB .csv [MS Excel] Note: The data specifically designated by the scientist to make publicly available are indicated by the rows shaded in gray (the “Analyzed” row is shaded here as an example). Empty cells represent cases in which information was not collected or the scientist could not provide a response. The data table [For reference only. You don’t have to do this] Example: Data curation profile