SlideShare a Scribd company logo
Data for the Humanities
February 21, 2017
Rafia Mirza
Digital Humanities Librarian
rafia@uta.edu @librarianrafia
Peace Ossom Williamson
Director of Research Data Services
peace@uta.edu @123POW
Learning Outcomes
• Understand the use of data in answering humanities research
questions
• Understand descriptive metadata and the rationale for its use
• Recognize areas of potential bias and ambiguous or misleading
representation in reporting
What are data?
“All content in digital formats can
be characterized as structured or
unstructured data.”
Introduction to Digital Humanities: Concepts, Methods, and Tutorials
Examples:
•Audio
•Notes
•Geospatial
•Textual
Data are more than numbers
https://www.lib.umn.edu/datamanagement/whatdata
What is data literacy?
the ability to read,
create, utilize,
communicate, and
criticize data.
Data Literacy
data quality
accessibility, usability,
and
understandability on
the basis of context,
providence, and
metadata
Data Literacy
data structure
of different objects in
a way that works to
evaluate developing
hypotheses
Data Literacy
recognize
Research potential
be aware of
Research methods
understand
Context and provenience
Humanities Data Literacy
“Humanists have data, and
they need data skills.”
Digital Humanities Data Curation
Data in the Humanities
Types of Humanities Data
• Scholarly editions
• Text corpora
• Text with markup
• Thematic research collections
• Data with accompanying analysis or annotation
• Finding aids and other information maps, such as
bibliographies
Digital Humanities Data Curation Introduction
Big Data Digital Humanities vs.
Small Data Digital Humanities
• “Research in Big Data Digital Humanities focuses on large or dense
cultural datasets, which call for new processing and interpretation
methods”
• “..Small Data Digital Humanities regroup more focused works that do
not use massive data processing..”
• A map for big data research in digital humanities, Frédéric Kaplan
1. research the context:
know the data about the data (so meta!)
How to understand data
Data versus Metadata
Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch
Metadata Metadata Metadata Metadata
data data data data
data data data data
data data data data
data data data data
About this dataset:
Title: Metadata
Date Created: Metadata
Creator: Metadata
Methods Used: Metadata
2. research who the data is about
How to understand data
What are historical
contexts around their
language and style?
A note on data ethics.
Zine Librarians Code of Ethics
• “Zines are not like mass-distributed books. They are often self-
published and self-distributed, and sometimes printed in very
small runs, intended for a small audience. In addition, perzines
are by definition “personal”, and zinesters may feel different
about having their zines distributed in print than they would
about having them openly available on the internet or print.
This can be especially true in the case of “historical” zines in
library collections — for example, a teen girl writing a zine for
her close friends in 1994 may not want her zine distributed
online or in print 20 years later.”
• Via Zinelibraries
Ethics
• Choosing tools:
• Omeka CMS vs Mukurtu CMS
• Collecting data:
• Boston College Oral Histories
3. investigate the source
How to understand data
Recognizing uncertainty and bias
Data on killings in the Syrian conflict.
https://responsibledata.io/reflection-stories/uncertainty-
statistics/
Let’s investigate the source…
Recognizing uncertainty and bias
Sources include
• Syrian government
• Syrian Center for Statistics and Research
• Syrian Network for Human Rights
• Syrian Observatory for Human Rigets
and many more.
https://responsibledata.io/reflection-stories/uncertainty-
statistics/
Data for the Humanities
there are lots of human decisions that go into
creating these statistics
without knowing how these deaths have
been coded, it’s difficult to trust in the
figures
4. highlight un/common data entries to gain
rough insights
How to understand data
Descriptive analysis
i.e., description of the
data from a sample
Quick descriptive statistics
•frequency
•rank from lowest to highest
•average (mean, median, mode)
•variability
Bivariate descriptive statistics
fancy way of saying
we are looking at two
variables at once
Hamlet Macbeth Othello
Similes 50 9 59
Metaphors 20 38 58
Total 70 47 117
Evaluating Comparison Methods
Correlation
most common way to
describe a relationship
between two measures
Finding Data
What type of data are you looking for?
List of Data Repositories
DH Toychest: Data Collections and Datasets
• Texts: HathiTrust Digital Library
• Spatial or numeric datasets: Data.gov
• Images: British Library Images
• Hybrid data sets: Digital Public Library of America
Via
What if the dataset you need
does not exist?
How to data
1. Determine what to say
2. Find/collect/create the data
you need
3. Wrangle!
4. Clean!
5. Do it many more times.
ID Religion Income Age Q1 Q2 Q3
26371 Jewish <$10K 19 Yes 6 20
26372 Atheist $50-75K 24 - 4 21
26373 Catholic $75-100K 56 Yes 3 21
26374 Withheld $75-100K 33 No 6 21
26375 Pentecostal withheld 49 Yes 8 20
26376 Jewish $40-50K 29 Yes 5 19
26377 Catholic $20-30K 37 No 4 22
http://vita.had.co.nz/papers/tidy-data.pdf
Tidy Data
Most common problems
• Column headers are values, not variable names.
• Multiple variables are stored in one column.
• Variables are stored in both rows and columns.
• Multiple types of observational units are stored in the same table.
• A single observational unit is stored in multiple tables
http://vita.had.co.nz/papers/tidy-data.pdf
if you torture data long enough,
it will confess to anything
How can a
visualization be
misleading?
What’s wrong?
A little less
dramatic
than you thought.
http://www.visualisingdata.com/2014/04/the-fine-line-
between-confusion-and-deception/
https://thesyriacampaign.org/
Open Data: Things to Consider
http://www.slideshare.net/libereurope/humanities-data-literacy-student-
perspective-on-digital-cultural-heritage-collections?qid=70bd86f2-10c5-43a6-
b053-56d264ca28ab&v=&b=&from_search=1
Recommended Reading / Viewing
“Numbers are Only Human” – Brian Root
“Ethical Principles of Psychologists and Code of Conduct” –
American Psychological Association
“On Not Looking: Ethics and Access in the Digital Humanities” –
Kimberly Cristen-Withey
Upcoming Workshops and Events
library.uta.edu/scholcomm
Rafia Mirza
rafia@uta.edu @librarianrafia
Peace Ossom Williamson
peace@uta.edu @123POW

More Related Content

PPTX
Introduction to digital scholarship tools
PPTX
Introduction to databases and metadata
PPTX
Social Networking & Libraries: Best Practices & Challenges
PPTX
Introduction To Searching The Web
PPTX
From Open Data to Open Pedagogy
PPTX
Social Media & Online Presence
PPTX
Your Digital Identity: Social Media & Online Presence
PPTX
Digital Humanities & UTA libraries
Introduction to digital scholarship tools
Introduction to databases and metadata
Social Networking & Libraries: Best Practices & Challenges
Introduction To Searching The Web
From Open Data to Open Pedagogy
Social Media & Online Presence
Your Digital Identity: Social Media & Online Presence
Digital Humanities & UTA libraries

What's hot (20)

PPTX
Building the Archive of DH Research
PPTX
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
PDF
User Engagement with Digital Archives: A Case Study of Emblematica Online
PPTX
Personal Learning Networks and Professional Learning Communities in Informati...
PDF
Alexander - Education in the Internet of Everything
PPTX
Relationship status: Libraries and linked data in Europe
PDF
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
PPTX
LIS 653 Posters Spring 2013
PDF
Using social media to address professional issues in LIS
PPTX
LIS 653 Posters Fall 2014
PDF
Weisberg - Museums and the Internet of Things
PPTX
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
PPT
Library Of The Future – An Academic Librarian
PDF
Digital Public History and Collaborative Teaching Initiatives
PPT
Electronic publishing
PPTX
Pratt sils knowledge organization spring 2014
PPTX
08 datasets
PPT
LIS 653 fall 2013 final project posters
PPTX
Digital Odyssey 2015 - Open Collections
Building the Archive of DH Research
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
User Engagement with Digital Archives: A Case Study of Emblematica Online
Personal Learning Networks and Professional Learning Communities in Informati...
Alexander - Education in the Internet of Everything
Relationship status: Libraries and linked data in Europe
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
LIS 653 Posters Spring 2013
Using social media to address professional issues in LIS
LIS 653 Posters Fall 2014
Weisberg - Museums and the Internet of Things
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
Library Of The Future – An Academic Librarian
Digital Public History and Collaborative Teaching Initiatives
Electronic publishing
Pratt sils knowledge organization spring 2014
08 datasets
LIS 653 fall 2013 final project posters
Digital Odyssey 2015 - Open Collections
Ad

Similar to Data for the Humanities (20)

PDF
Digital Humanities and “Digital” Social Sciences
PPTX
Marketing Gold for Libraries - The Data Inside
PPTX
Miscellaneous Info: The Digital Past, Present, Future
PDF
Big data for qualitative research by kathy a. mills (z lib.org)
PDF
Linked Data: opening Scotland’s library content to the world
PPT
Steps for research process
PPTX
The Power of Open Data!
PDF
Big Data For Qualitative Research Kathy A Mills
PPTX
AAPOR - comparing found data from social media and made data from surveys
PDF
A Pedagogical Approach to Web Scale Discovery User Interface
PPTX
Flames summer school 2016 slides
PPTX
Cj 3901 transnational crime
PPT
Envisioning Social Applications of Library Linked Data
PPTX
Privacy in the Digital Age, Helen Cullyer
PPTX
Digital Humanities - Conversation Starter 2015
PPTX
Digital Humanities by Ingrid Thomson
PPT
LSC Glasgow 061609
PPTX
The secret mission that people yearn to have libraries address
PPTX
Digitization and public libraries
PPT
Library Science Students and Digital Libraries
Digital Humanities and “Digital” Social Sciences
Marketing Gold for Libraries - The Data Inside
Miscellaneous Info: The Digital Past, Present, Future
Big data for qualitative research by kathy a. mills (z lib.org)
Linked Data: opening Scotland’s library content to the world
Steps for research process
The Power of Open Data!
Big Data For Qualitative Research Kathy A Mills
AAPOR - comparing found data from social media and made data from surveys
A Pedagogical Approach to Web Scale Discovery User Interface
Flames summer school 2016 slides
Cj 3901 transnational crime
Envisioning Social Applications of Library Linked Data
Privacy in the Digital Age, Helen Cullyer
Digital Humanities - Conversation Starter 2015
Digital Humanities by Ingrid Thomson
LSC Glasgow 061609
The secret mission that people yearn to have libraries address
Digitization and public libraries
Library Science Students and Digital Libraries
Ad

More from librarianrafia (20)

PPTX
Know your author's rights
PPTX
Publishing in the digital humanities
PPTX
Social Network Visualization 101
PPTX
Introduction to WordPress (blogging)
PPTX
Digital project planning and pedagogy
PPTX
Introduction To Wordpress
PPTX
Digital Frontiers 2016: Memorandums of Understanding Workshop
PPTX
Create a (free) Wordpress Site
PPTX
Digitization for accessibility
PPTX
CTLC Annual 2016 slides
PPTX
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
PPTX
Memorandum of Understanding Workshop: Creating a Process for Successful Digit...
PPTX
Digital humanities and libraries
PPTX
Digital Humanities for Historians: An introduction
PPTX
Using Omeka as a Gateway to Digital Projects
PPTX
Introduction to Omeka
PPTX
The Silver Age of Comics 1956-c.1970
PPTX
The Golden Age of Comics c.1938-c.1950
PPTX
Open Access: Open Access Looking for ways to increase the reach and impact of...
PPTX
Digital Frontiers 2014: Developing Library Services for Digital Humanities & ...
Know your author's rights
Publishing in the digital humanities
Social Network Visualization 101
Introduction to WordPress (blogging)
Digital project planning and pedagogy
Introduction To Wordpress
Digital Frontiers 2016: Memorandums of Understanding Workshop
Create a (free) Wordpress Site
Digitization for accessibility
CTLC Annual 2016 slides
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
Memorandum of Understanding Workshop: Creating a Process for Successful Digit...
Digital humanities and libraries
Digital Humanities for Historians: An introduction
Using Omeka as a Gateway to Digital Projects
Introduction to Omeka
The Silver Age of Comics 1956-c.1970
The Golden Age of Comics c.1938-c.1950
Open Access: Open Access Looking for ways to increase the reach and impact of...
Digital Frontiers 2014: Developing Library Services for Digital Humanities & ...

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Structure & Organelles in detailed.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Classroom Observation Tools for Teachers
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Pre independence Education in Inndia.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
RMMM.pdf make it easy to upload and study
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Microbial disease of the cardiovascular and lymphatic systems
2.FourierTransform-ShortQuestionswithAnswers.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPH.pptx obstetrics and gynecology in nursing
Cell Structure & Organelles in detailed.
Supply Chain Operations Speaking Notes -ICLT Program
Classroom Observation Tools for Teachers
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Pre independence Education in Inndia.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Types and Its function , kingdom of life
RMMM.pdf make it easy to upload and study
TR - Agricultural Crops Production NC III.pdf
VCE English Exam - Section C Student Revision Booklet
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Microbial disease of the cardiovascular and lymphatic systems

Data for the Humanities

  • 1. Data for the Humanities February 21, 2017 Rafia Mirza Digital Humanities Librarian rafia@uta.edu @librarianrafia Peace Ossom Williamson Director of Research Data Services peace@uta.edu @123POW
  • 2. Learning Outcomes • Understand the use of data in answering humanities research questions • Understand descriptive metadata and the rationale for its use • Recognize areas of potential bias and ambiguous or misleading representation in reporting
  • 4. “All content in digital formats can be characterized as structured or unstructured data.” Introduction to Digital Humanities: Concepts, Methods, and Tutorials
  • 5. Examples: •Audio •Notes •Geospatial •Textual Data are more than numbers https://www.lib.umn.edu/datamanagement/whatdata
  • 6. What is data literacy?
  • 7. the ability to read, create, utilize, communicate, and criticize data. Data Literacy
  • 8. data quality accessibility, usability, and understandability on the basis of context, providence, and metadata Data Literacy
  • 9. data structure of different objects in a way that works to evaluate developing hypotheses Data Literacy
  • 10. recognize Research potential be aware of Research methods understand Context and provenience Humanities Data Literacy
  • 11. “Humanists have data, and they need data skills.” Digital Humanities Data Curation Data in the Humanities
  • 12. Types of Humanities Data • Scholarly editions • Text corpora • Text with markup • Thematic research collections • Data with accompanying analysis or annotation • Finding aids and other information maps, such as bibliographies Digital Humanities Data Curation Introduction
  • 13. Big Data Digital Humanities vs. Small Data Digital Humanities • “Research in Big Data Digital Humanities focuses on large or dense cultural datasets, which call for new processing and interpretation methods” • “..Small Data Digital Humanities regroup more focused works that do not use massive data processing..” • A map for big data research in digital humanities, Frédéric Kaplan
  • 14. 1. research the context: know the data about the data (so meta!) How to understand data
  • 15. Data versus Metadata Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch Metadata Metadata Metadata Metadata data data data data data data data data data data data data data data data data About this dataset: Title: Metadata Date Created: Metadata Creator: Metadata Methods Used: Metadata
  • 16. 2. research who the data is about How to understand data
  • 17. What are historical contexts around their language and style?
  • 18. A note on data ethics.
  • 19. Zine Librarians Code of Ethics • “Zines are not like mass-distributed books. They are often self- published and self-distributed, and sometimes printed in very small runs, intended for a small audience. In addition, perzines are by definition “personal”, and zinesters may feel different about having their zines distributed in print than they would about having them openly available on the internet or print. This can be especially true in the case of “historical” zines in library collections — for example, a teen girl writing a zine for her close friends in 1994 may not want her zine distributed online or in print 20 years later.” • Via Zinelibraries
  • 20. Ethics • Choosing tools: • Omeka CMS vs Mukurtu CMS • Collecting data: • Boston College Oral Histories
  • 21. 3. investigate the source How to understand data
  • 22. Recognizing uncertainty and bias Data on killings in the Syrian conflict. https://responsibledata.io/reflection-stories/uncertainty- statistics/ Let’s investigate the source…
  • 23. Recognizing uncertainty and bias Sources include • Syrian government • Syrian Center for Statistics and Research • Syrian Network for Human Rights • Syrian Observatory for Human Rigets and many more. https://responsibledata.io/reflection-stories/uncertainty- statistics/
  • 25. there are lots of human decisions that go into creating these statistics without knowing how these deaths have been coded, it’s difficult to trust in the figures
  • 26. 4. highlight un/common data entries to gain rough insights How to understand data
  • 27. Descriptive analysis i.e., description of the data from a sample
  • 28. Quick descriptive statistics •frequency •rank from lowest to highest •average (mean, median, mode) •variability
  • 29. Bivariate descriptive statistics fancy way of saying we are looking at two variables at once Hamlet Macbeth Othello Similes 50 9 59 Metaphors 20 38 58 Total 70 47 117 Evaluating Comparison Methods
  • 30. Correlation most common way to describe a relationship between two measures
  • 31. Finding Data What type of data are you looking for? List of Data Repositories DH Toychest: Data Collections and Datasets • Texts: HathiTrust Digital Library • Spatial or numeric datasets: Data.gov • Images: British Library Images • Hybrid data sets: Digital Public Library of America Via
  • 32. What if the dataset you need does not exist?
  • 33. How to data 1. Determine what to say 2. Find/collect/create the data you need 3. Wrangle! 4. Clean! 5. Do it many more times.
  • 34. ID Religion Income Age Q1 Q2 Q3 26371 Jewish <$10K 19 Yes 6 20 26372 Atheist $50-75K 24 - 4 21 26373 Catholic $75-100K 56 Yes 3 21 26374 Withheld $75-100K 33 No 6 21 26375 Pentecostal withheld 49 Yes 8 20 26376 Jewish $40-50K 29 Yes 5 19 26377 Catholic $20-30K 37 No 4 22 http://vita.had.co.nz/papers/tidy-data.pdf Tidy Data
  • 35. Most common problems • Column headers are values, not variable names. • Multiple variables are stored in one column. • Variables are stored in both rows and columns. • Multiple types of observational units are stored in the same table. • A single observational unit is stored in multiple tables http://vita.had.co.nz/papers/tidy-data.pdf
  • 36. if you torture data long enough, it will confess to anything
  • 37. How can a visualization be misleading?
  • 42. Open Data: Things to Consider http://www.slideshare.net/libereurope/humanities-data-literacy-student- perspective-on-digital-cultural-heritage-collections?qid=70bd86f2-10c5-43a6- b053-56d264ca28ab&v=&b=&from_search=1
  • 43. Recommended Reading / Viewing “Numbers are Only Human” – Brian Root “Ethical Principles of Psychologists and Code of Conduct” – American Psychological Association “On Not Looking: Ethics and Access in the Digital Humanities” – Kimberly Cristen-Withey
  • 44. Upcoming Workshops and Events library.uta.edu/scholcomm Rafia Mirza rafia@uta.edu @librarianrafia Peace Ossom Williamson peace@uta.edu @123POW

Editor's Notes

  • #4: RAFIA
  • #5: Data are anything which is used or created to generate new knowledge and interpretations. “Anything” may be objective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tactic; and is consciously or unconsciously referenced by the researcher at some point during the course of their research. Research data may or may not lead to a research output, which regardless of method of presentation, is a planned public statement of new knowledge or interpretation. Garrett, 2012)
  • #8: Involves knowledge of quantitative (statistical) methods, metadata standards, and the data curation lifecycle. But also the understanding of
  • #10: Identifying problems that a dataset can answer
  • #11: Recognizing research potential of an existing heritage collection, or identifying ways to answer questions or problems. Develop a hypothesis based on the data. Becoming aware of the data features that enable new quantitative methods (including access, format, systematicity, and metadata) Understanding the context and provenience of a collection (including extension, representativeness, openness, and copyright and privacy issues)
  • #12: As the materials and analytical practices of research become increasingly digital, the theoretical knowledge and practical skills of information science, librarianship, and archival science will become ever more vital to humanists and to anyone working with cultural heritage.”
  • #13: Heritage collections
  • #16: “Another important distinction is between data and metadata. Here, the term “data” refers to the part of a file or dataset which contains the actual representation of an object of inquiry, while the term “metadata” refers to data about that data: metadata explicitly describes selected aspects of a dataset, such as the time of its creation, or the way it was collected, or what entity external to the dataset it is supposed to represent. Read papers and study accompanying documentation
  • #17: Need to know your data. What is the background of the person, time period, or language you are studying? What elements are represented and how were they obtained? What elements are missing or misrepresented? What are other questions you can ask or ways you can find out answers?
  • #18: How does this reflect itself in writing? How does one show their background? How does one signify?
  • #19: The present moment is filled with DH practitioners creating visualizations of ‘big data,’ mapping connections between people and ancient cities, and building archives dedicated to long-dead authors. These worthwhile academic and practical pursuits point us to the center of the digital humanities landscape. But, if we move to the margins and begin to look at the projects and tools that emerge from indigenous communities, archivists and cultural specialists, we see a different pattern: images are purposely removed, archives are not ‘open to the public,’ maps of sacred sites are consciously not created, defined or linked to. How do we integrate these varied practices and philosophies into the possibilities offered by digital humanities scholars? It is one thing to call attention to difference, it is another to alter our display practices, question access parameters, and redefine our own ways of knowing based on systems of accountability that define an ethical field of visually based on not looking. If seeing is believing and a picture is worth a thousand words, what can we learn from the act of not looking, or perhaps, more specifically, not seeing? 
  • #22: PEACE Talk to subject experts, read papers, and study accompanying documentation
  • #23: More often than not, it is not the writer that is twisting the numbers but the numbers themselves twisting up the writer. Manipulation of the facts or of the reader is usually not intentional.
  • #24: More often than not, it is not the writer that is twisting the numbers but the numbers themselves twisting up the writer. Manipulation of the facts or of the reader is usually not intentional.
  • #25: Estimations from the Syrian Observatory for Human Rights. Imagine the decision that might have to be made to categorize a typical citizen with no military training, who has picked up a gun shortly before his death. Perhaps the coder might have a bias to continue calling this person a civilian. But this person took up arms against the government, did they not? How would you code a Syrian army defector now fighting with an opposition group? … Without some sort of standard protocol, rigorously followed, the coding of affiliation allows for a degree of subjectivity
  • #27: Talk to subject experts, read papers, and study accompanying documentation
  • #28: What is a numerical way to describe data?
  • #29: Frequency – how often a value exists (e.g., a name or gender) 50 women, 38 men, 2 other, 10 unknown Rank from lowest to highest – list albums’ use of personal pronouns in order of high to low Average - Measure of central tendency Variability – how different or similar scores are to each other (range, standard deviation)
  • #30: Contingency table, a type of summary table. This shows frequency distribution.
  • #31: Contingency table, a type of summary table. This shows frequency distribution.
  • #32: HathiTrust: More than 2 million volumes are in the public domain and freely viewable on the Web. More information about obtaining the texts can be found here British Library Images: Millions of images from the pages of 17th, 18th and 19th century books digitized DPLA: "brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world"; API enabled
  • #33: Data literacy is important so that we can • Tell compelling stories that others are more likely to repeat, remember, and act on • Determine when someone else is trying to mislead using visualizations
  • #34: Clean- open refine
  • #35: Identifier Independent information / fixed variables Dependent variables / measured variables
  • #39: This is similar to when you got that mosquito bite and were sure you were getting Zika Virus or West Nile, only to realize your arm itched for 24 hours.
  • #41: Business insider published using the chart but sought permission to reverse it, keeping the same design but turning it upside down for their readers.
  • #42: It’s true though, that images such as the visualisation above draw attention to some important issues. Though they state their data source (the Syria Network for Human Rights) what we’ve explored here so far makes it clear that this data has flaws. We can’t know for sure the extent of those flaws, though, and some might argue that as long as the main message is transmitted, the details don’t matter so much.