SlideShare a Scribd company logo
R in the Humanities: Text Analysis
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
Who am I?
• A Lecturer in Digital Media
• A book historian
• A digital humanist
• Canadian 🍁
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
R in the Humanities: Text Analysis
Session 1:
Gettin’ to Grips with R
CC Image: https://en.wikipedia.org/wiki/File:Piratey,_vector_version.svg
Overview
This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the
powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio
development environment.
By the end of the course, you will be able to:
• Navigate the RStudio development environment
• Prepare long-form prose texts for computational analysis using R
• Conduct basic computational analyses of long-form prose texts
• Construct and explain visualisations of computed results
• Critically apply computational text analysis to complement other analytical methods
To complete this course you will need to install:
• R version 3.6 or higher (download at https://www.r-project.org)
• RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
Session 1 Agenda
1. What are R and RStudio?
2. What can R help you do?
3. A quick note about Computational Literary Analysis
4. Getting started with R
5. Cleaning text
CC Image: https://pixabay.com/photos/dog-laptop-computer-glasses-2983021
What are R and RStudio?
R is:
• a programming language
• a software environment
• a really fancy calculator
• free/open source
Download: https://cran.r-project.org/mirrors.html
RStudio is:
• an integrated development environment (IDE)
• a great way to make your coding experiences easier, more colourful,
and more fun!
Download: https://www.rstudio.com/products/rstudio/download
What can R help you do?
• Count words
• Find linguistic patterns within and across texts
• Compare texts
• Make pretty pictures
But it’s still up to you to explain results.
Also, is R always the most appropriate tool?
CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
A quick note about Computational Literary
Analysis (CLS)
CLS has a long history (for example, Father Robert Busa, ~1940s),
but has been criticised for:
• Misinterpretation of statistical data (Da)
• Unchecked enthusiasm for technological ‘hype’ (Kirsch)
• Turning literature into data and neglecting reception of works
(Marche)
Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019,
pp. 601-639.
Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014,
https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020.
Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books,
2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21
December 2020.
CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
Let’s get started!
Double click ‘Terminal’.
Terminal (write your script)
Console (run your script)
Environment (your data)
Everything else!
The Basics (1/2)
Calculating
• 10 + 2 (spaces optional)
• 10 – 2
• 10 * 2
• 10 / 2
Strings and Things
• 1:50
• print(“Hello world!!”)
• [variable name] <- c(1, 2, 3)
• [variable name][2]
Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
The Basics (2/2)
• Data types: character, numeric, integer, logical, complex
• Data structures: vector, list, matrix, data frame, factors
• Keep notes using #
• Need help?
• ?____________
• help()
• install.packages(“[name of package]”)
Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
Tools > Global Options >
Appearance
(You will need to restart
RStudio to apply these
changes).
Let’s clean some text!
CC Image: https://thenounproject.com/term/cleaning/199037
You can use whatever corpus you’d like for this course.
However, I have prepared a corpus of six texts for you. You may download the corpus at http://tinyurl.com/n8texts.
This corpus includes six public domain texts (1870-1914) about the women’s suffrage movement in the United States and the
United Kingdom:
• debate: Debate on Woman Suffrage in the Senate of the United States (https://www.gutenberg.org/ebooks/11114)
• femalesuffrage: Female Suffrage: A Letter to the Christian Women of America, Susan Fenimore Cooper
(https://www.gutenberg.org/ebooks/2157)
• myownstory: My Own Story, Emmeline Pankhurst (https://www.gutenberg.org/ebooks/34856)
• republic: Woman and the Republic, Helen Kendrick Johnson (https://www.gutenberg.org/ebooks/7300)
• unexpurgated: The Unexpurgated Case Against Woman Suffrage, Almroth Wright
(https://www.gutenberg.org/ebooks/5183)
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final)
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
If you’re interested in stylometry specifically…
The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’
workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)!
Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online-
edition-workshops.
Session 2:
Charts, Clouds, and Confidence
Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
Session 2 Agenda
1. Any questions from last week?
2. Review of last week’s session (i.e. cleaning text)
3. Counting words
4. Plotting results
5. Making word clouds
6. Wrapping up
CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final)
Getting word frequencies and associations:
freq <- colSums(as.matrix(dtm))
freq[1:10]
freq_d <- sort(freq, decreasing=TRUE)
freq_d[1:10]
findFreqTerms(dtm, lowfreq=100)
findAssocs(dtm, "women", 0.95)
?findAssocs
Making a bar chart (and then making it look nice):
barplot(freq_d[1:10])
?barplot
install.packages("RColorBrewer")
library(RColorBrewer)
?RColorBrewer
display.brewer.all|)
cols <- brewer.pal(8, "Spectral")
barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
Making a word cloud (and then making it look nice):
install.packages("wordcloud")
library(wordcloud)
matrix <- as.matrix(dtm)
words <- sort(colSums(matrix), decreasing=TRUE)
df <- data.frame(word=names(words), freq=words)
?data.frame
wordcloud(words=df$word, freq=df$freq, max.words=100, random.order=FALSE, col=cols)
?wordcloud
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
If you’re interested in stylometry specifically…
The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’
workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)!
Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online-
edition-workshops.
Thank you!
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson

More Related Content

PPTX
R in the Humanities: Text Analysis (v2)
PPTX
The Future of Authorship: AI Text Generation
PPTX
Towards greater transparency in digital literary analysis
PDF
Insight demo 2 16-15
PPT
Dmdh winter 2015 session #2
PDF
The Riddle of Literary Quality Project
PPTX
R in the Humanities: Text Analysis (v2)
The Future of Authorship: AI Text Generation
Towards greater transparency in digital literary analysis
Insight demo 2 16-15
Dmdh winter 2015 session #2
The Riddle of Literary Quality Project

Similar to R in the Humanities: Text Analysis (20)

PPTX
Digital Humanities: A brief introduction to the field
PPTX
Data Mining with R programming
PPTX
I want to know more about compuerized text analysis
PPTX
D2L_2014
PPT
Data versus Text: 30 years of confrontation
PDF
LIWC-ing at Texts for Insights from Linguistic Patterns
PDF
Annotated Corpora for Research in the Humanities
PDF
Sharing - Collecting our DAH Thoughts
PPTX
PDF
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
PPTX
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
PPTX
James baker bronte 11.10pptx
PPTX
Open Research
PPTX
DH Tools Workshop #1: Text Analysis
PPTX
Text Mining Infrastructure in R
PPT
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
PPTX
Aquiles imlr seminar
PDF
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
PDF
Digital Research at the British Library: Libraries full of data and mainstrea...
Digital Humanities: A brief introduction to the field
Data Mining with R programming
I want to know more about compuerized text analysis
D2L_2014
Data versus Text: 30 years of confrontation
LIWC-ing at Texts for Insights from Linguistic Patterns
Annotated Corpora for Research in the Humanities
Sharing - Collecting our DAH Thoughts
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
James baker bronte 11.10pptx
Open Research
DH Tools Workshop #1: Text Analysis
Text Mining Infrastructure in R
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Aquiles imlr seminar
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Digital Research at the British Library: Libraries full of data and mainstrea...
Ad

More from Leah Henrickson (20)

PDF
Teaching with Text Generators (Version 2.0) (Workshop)
PDF
Teaching with Text Generators (Workshop)
PDF
AI Alter Egos?: The Agency of Digital Human Versions
PDF
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
PDF
Versions of Intimacy: Talking To and About CarynAI
PDF
Digital Storytelling for Collaborative Scholarship
PDF
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
PDF
Chatting with Computers
PPTX
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
PDF
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
PDF
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
PDF
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
PDF
Telling Your Story for Effect and Affect
PDF
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
PDF
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
PPTX
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
PDF
'Education Espresso: Changing Assessment' Panelist Self-Introduction
PPTX
Grieving via GPT: Circling Around Cadaverous Chatbots
PDF
Achieving Success in an Interdisciplinary Team
PPTX
Reading Computer-Generated Books: Artificial Versifying
Teaching with Text Generators (Version 2.0) (Workshop)
Teaching with Text Generators (Workshop)
AI Alter Egos?: The Agency of Digital Human Versions
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Versions of Intimacy: Talking To and About CarynAI
Digital Storytelling for Collaborative Scholarship
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Chatting with Computers
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
Telling Your Story for Effect and Affect
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
'Education Espresso: Changing Assessment' Panelist Self-Introduction
Grieving via GPT: Circling Around Cadaverous Chatbots
Achieving Success in an Interdisciplinary Team
Reading Computer-Generated Books: Artificial Versifying
Ad

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Modernising the Digital Integration Hub
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
1. Introduction to Computer Programming.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Chapter 5: Probability Theory and Statistics
PDF
STKI Israel Market Study 2025 version august
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Architecture types and enterprise applications.pdf
OMC Textile Division Presentation 2021.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
A comparative study of natural language inference in Swahili using monolingua...
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Zenith AI: Advanced Artificial Intelligence
Modernising the Digital Integration Hub
Enhancing emotion recognition model for a student engagement use case through...
Group 1 Presentation -Planning and Decision Making .pptx
1. Introduction to Computer Programming.pptx
cloud_computing_Infrastucture_as_cloud_p
Chapter 5: Probability Theory and Statistics
STKI Israel Market Study 2025 version august
observCloud-Native Containerability and monitoring.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
O2C Customer Invoices to Receipt V15A.pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Module 1.ppt Iot fundamentals and Architecture
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Architecture types and enterprise applications.pdf

R in the Humanities: Text Analysis

  • 1. R in the Humanities: Text Analysis Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 2. Who am I? • A Lecturer in Digital Media • A book historian • A digital humanist • Canadian 🍁 L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 4. Session 1: Gettin’ to Grips with R CC Image: https://en.wikipedia.org/wiki/File:Piratey,_vector_version.svg
  • 5. Overview This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio development environment. By the end of the course, you will be able to: • Navigate the RStudio development environment • Prepare long-form prose texts for computational analysis using R • Conduct basic computational analyses of long-form prose texts • Construct and explain visualisations of computed results • Critically apply computational text analysis to complement other analytical methods To complete this course you will need to install: • R version 3.6 or higher (download at https://www.r-project.org) • RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
  • 6. Session 1 Agenda 1. What are R and RStudio? 2. What can R help you do? 3. A quick note about Computational Literary Analysis 4. Getting started with R 5. Cleaning text CC Image: https://pixabay.com/photos/dog-laptop-computer-glasses-2983021
  • 7. What are R and RStudio? R is: • a programming language • a software environment • a really fancy calculator • free/open source Download: https://cran.r-project.org/mirrors.html RStudio is: • an integrated development environment (IDE) • a great way to make your coding experiences easier, more colourful, and more fun! Download: https://www.rstudio.com/products/rstudio/download
  • 8. What can R help you do? • Count words • Find linguistic patterns within and across texts • Compare texts • Make pretty pictures But it’s still up to you to explain results. Also, is R always the most appropriate tool? CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
  • 9. A quick note about Computational Literary Analysis (CLS) CLS has a long history (for example, Father Robert Busa, ~1940s), but has been criticised for: • Misinterpretation of statistical data (Da) • Unchecked enthusiasm for technological ‘hype’ (Kirsch) • Turning literature into data and neglecting reception of works (Marche) Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019, pp. 601-639. Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014, https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020. Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books, 2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21 December 2020. CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
  • 12. Terminal (write your script) Console (run your script) Environment (your data) Everything else!
  • 13. The Basics (1/2) Calculating • 10 + 2 (spaces optional) • 10 – 2 • 10 * 2 • 10 / 2 Strings and Things • 1:50 • print(“Hello world!!”) • [variable name] <- c(1, 2, 3) • [variable name][2] Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
  • 14. The Basics (2/2) • Data types: character, numeric, integer, logical, complex • Data structures: vector, list, matrix, data frame, factors • Keep notes using # • Need help? • ?____________ • help() • install.packages(“[name of package]”) Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
  • 15. Tools > Global Options > Appearance (You will need to restart RStudio to apply these changes).
  • 16. Let’s clean some text! CC Image: https://thenounproject.com/term/cleaning/199037
  • 17. You can use whatever corpus you’d like for this course. However, I have prepared a corpus of six texts for you. You may download the corpus at http://tinyurl.com/n8texts. This corpus includes six public domain texts (1870-1914) about the women’s suffrage movement in the United States and the United Kingdom: • debate: Debate on Woman Suffrage in the Senate of the United States (https://www.gutenberg.org/ebooks/11114) • femalesuffrage: Female Suffrage: A Letter to the Christian Women of America, Susan Fenimore Cooper (https://www.gutenberg.org/ebooks/2157) • myownstory: My Own Story, Emmeline Pankhurst (https://www.gutenberg.org/ebooks/34856) • republic: Woman and the Republic, Helen Kendrick Johnson (https://www.gutenberg.org/ebooks/7300) • unexpurgated: The Unexpurgated Case Against Woman Suffrage, Almroth Wright (https://www.gutenberg.org/ebooks/5183)
  • 18. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final)
  • 19. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 20. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 21. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 22. If you’re interested in stylometry specifically… The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’ workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)! Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online- edition-workshops.
  • 23. Session 2: Charts, Clouds, and Confidence Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
  • 24. Session 2 Agenda 1. Any questions from last week? 2. Review of last week’s session (i.e. cleaning text) 3. Counting words 4. Plotting results 5. Making word clouds 6. Wrapping up CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
  • 25. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final)
  • 26. Getting word frequencies and associations: freq <- colSums(as.matrix(dtm)) freq[1:10] freq_d <- sort(freq, decreasing=TRUE) freq_d[1:10] findFreqTerms(dtm, lowfreq=100) findAssocs(dtm, "women", 0.95) ?findAssocs
  • 27. Making a bar chart (and then making it look nice): barplot(freq_d[1:10]) ?barplot install.packages("RColorBrewer") library(RColorBrewer) ?RColorBrewer display.brewer.all|) cols <- brewer.pal(8, "Spectral") barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
  • 28. Making a word cloud (and then making it look nice): install.packages("wordcloud") library(wordcloud) matrix <- as.matrix(dtm) words <- sort(colSums(matrix), decreasing=TRUE) df <- data.frame(word=names(words), freq=words) ?data.frame wordcloud(words=df$word, freq=df$freq, max.words=100, random.order=FALSE, col=cols) ?wordcloud
  • 29. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 30. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 31. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 32. If you’re interested in stylometry specifically… The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’ workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)! Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online- edition-workshops.
  • 33. Thank you! Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson