SlideShare a Scribd company logo
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Examining data and importing data in R
Richard L. Zijdeman
May 29, 2015
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
1 Recap
2 Getting data in R
3 Do it yourself!
4 Plotting using ggplot2
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Recap
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
The structure of objects
Store just about anything in R: numbers, sentences, datasets
Objects
Study the structure of objects: str()
type of object
features of object
ships <- data.frame(year = c(1850, 1860, 1870, 1880),
inbound = c(215, 237, 237, NA),
outbound = c(212, 239, 260, 265))
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Study the structure of object “ships”"
str(ships)
## 'data.frame': 4 obs. of 3 variables:
## $ year : num 1850 1860 1870 1880
## $ inbound : num 215 237 237 NA
## $ outbound: num 212 239 260 265
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Characteristics of objects
Class: class()
Length: length()
Dimensions: dim()
class(ships)
## [1] "data.frame"
length(ships)
## [1] 3
dim(ships) # rows, columns
## [1] 4 3
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Closer inspection of data.frames
names of columns (variables): names()
top/bottom rows: head(), tail()
missing data: is.na()
names(ships)
## [1] "year" "inbound" "outbound"
is.na(ships)
## year inbound outbound
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE TRUE FALSE
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Summarizing data in data.frames
descriptive statistics: summary()
calculus: e.g. min(), mean(), sum()
results table format: table()
summary(ships)
## year inbound outbound
## Min. :1850 Min. :215.0 Min. :212.0
## 1st Qu.:1858 1st Qu.:226.0 1st Qu.:232.2
## Median :1865 Median :237.0 Median :249.5
## Mean :1865 Mean :229.7 Mean :244.0
## 3rd Qu.:1872 3rd Qu.:237.0 3rd Qu.:261.2
## Max. :1880 Max. :237.0 Max. :265.0
## NA's :1
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
is.na(ships)
## year inbound outbound
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE TRUE FALSE
table(is.na(ships))
##
## FALSE TRUE
## 11 1
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Visualizing your data
Not just for analyses!
Data quality
representativeness
missing data
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
plot(ships)
year
215 220 225 230 235
1850186018701880
215220225230235
inbound
1850 1855 1860 1865 1870 1875 1880 210 220 230 240 250 260
210220230240250260
outbound
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Getting data in R
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Data already in R
The “datasets” package
very slim datasets
specific example data
To obtain list of datasets, type:
library(help = "datasets")
To obtain information on a specific dataset, type:
help(swiss) # thus: help(name_of_package)
or to just see the data:
help(swiss)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Reading in data
Different functions for different files:
Base R: read.table() (read.csv())
foreign package: read.spss(), read.dta(), read.dbf()
openxlsx package: read.xlsx()
alternatives packages:
xlsx(Java required)
gdata (perl-based)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
read.xlsx() from openxlsx package
file: your file, including directory
sheet: name of sheet
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
read.csv()
file: your file, including directory
header: variable names or not?
sep: seperator
read.csv default: “,”
read.csv2 default: “;”
skip: number of rows to skip
nrows: total number of rows to read
stringsAsFactors
encoding (e.g. “latin1” or “UTF-8”)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Do it yourself!
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Read in the following files as data.frames:
HSN_basic.xlsx
check the data.frame: using dim(), length()
check the variables: using summary(), min(), table()
Repeat for HSN_marriages.csv:
read in only 100 lines
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Plotting using ggplot2
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
ggplot2
Package by Hadley Wickham
Generic plotting for a great range of plots
ggplot2 website: http://ggplot2.org
excellent tutorial:
https://jofrhwld.github.io/avml2012/#Section_1.1
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Building your graph
Each plot consists of multiple layers
Think of a canvas on which you ‘paint’
data layer
geometries layer
statistics layer
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Data layer
data.frame and aesthetics
ggplot(data.frame, aes(x= ..., y = ...))
geometries layer
ggplot(..., aes(x= ..., y = ...)) +
geom_...() # e.g. geom_line
statistics layer
ggplot(..., aes(x= ..., y = ...)) +
geom_...() +
stat_...() # e.g. stat_smooth
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
an example
Reading in the data
hmar <- read.csv("./../data/derived/HSN_marriages.csv",
stringsAsFactors = FALSE,
encoding = "latin1",
header = TRUE,
nrows = 100)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Plotting the data
install.packages(ggplot2)
library(ggplot2)
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point()
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Improving the plot
Specify characteristics of the geom_layer
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point(colour = "blue", size = 3, shape = 18)
See http:
//www.cookbook-r.com/Graphs/Shapes_and_line_types/
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Specify characteristics of the geom_layer
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A PTE example
Does age at marriage depend on educational attainment?
To marry you need resources
the more attainment the longer it takes to acquire resources
ergo: brides with edu attainment marry later in life
Not a statistical test: but let’s graph this
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A request from yesterday
Can I plot labels?
ggplot(hmar, aes(x= M_year, y = Age_bride,
label = SIgn_bride)) +
geom_text()
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Yes you can!
Not really useful though. . .
h
a
h
h
h
a
h
a
h
a
a
a
a
h
a
a
h
h
h
h
h
h
h
a
a
h
h
a
a
h
a
a
a
hh
h hh
a
a
a
a
h
a
h
a
h
h
a
a
h
hh
h
a
h
h h
h
h
h
h
a
h
a
h
h
a
h
a
h
h
a
hh
a
h
h
h
h
h
h
a
a
h
h
h
h
h
h
h
h
h
a
h
a
a
h
a
h
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Let’s try with colours. . .
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point(aes(colour = factor(SIgn_bride)),
size = 3, shape = 18)
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
factor(SIgn_bride)
a
h
No real
pattern, though. . .
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Finalizing the graph
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_point(aes(colour = factor(SIgn_bride)),
size = 3,
shape = 18) +
labs(list(title = "Age of marriage over time",
x = "time (years since A.D.)",
y = "age of bride (years)",
colour = "Signature"))
# here we use colour since legend shows colour
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
time (years since A.D.)
ageofbride(years)
Signature
a
h
Age of marriage over time
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Satisfied?
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Actually not. . . the points are plotted on top of each other. . .
Solution: geom_jitter
ggplot(hmar, aes(x= M_year, y = Age_bride)) +
geom_jitter(aes(colour = factor(SIgn_bride)),
size = 3,
shape = 18) +
labs(list(title = "Age of marriage over time",
x = "time (years since A.D.)",
y = "age of bride (years)",
colour = "Signature"))
# here we use colour since legend shows colour
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
time (years since A.D.)
ageofbride(years)
Signature
a
h
Age of marriage over time
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Final remarks on ggplot2
We have just scratched the surface of ggplot2
Build your graph slowly
start with the basics
add complexity step-wise
Now it’s your turn!
Richard L. Zijdeman Examining data and importing data in R
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A small PTE project
Look at the variables in the HSN files
Think of a research question
Provide a general mechanism and hypothesis
Plot your results
Richard L. Zijdeman Examining data and importing data in R

More Related Content

PDF
Introduction into R for historians (part 4: data manipulation)
PDF
Introduction into R for historians (part 1: introduction)
PDF
Basic introduction into R
PDF
Introduction to data analysis using R
PDF
R programming groundup-basic-section-i
PPT
Scalable Data Analysis in R -- Lee Edlefsen
PPTX
Training in Analytics, R and Social Media Analytics
PDF
RDataMining slides-r-programming
Introduction into R for historians (part 4: data manipulation)
Introduction into R for historians (part 1: introduction)
Basic introduction into R
Introduction to data analysis using R
R programming groundup-basic-section-i
Scalable Data Analysis in R -- Lee Edlefsen
Training in Analytics, R and Social Media Analytics
RDataMining slides-r-programming

What's hot (20)

PPTX
A Workshop on R
PDF
R tutorial
PDF
R programming & Machine Learning
PPTX
LSESU a Taste of R Language Workshop
PDF
Introduction to the R Statistical Computing Environment
PDF
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
PDF
Working with text data
PDF
final_copy_camera_ready_paper (7)
PDF
Using R for Social Media and Sports Analytics
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
PPTX
R program
PDF
GraphX: Graph analytics for insights about developer communities
PDF
The History and Use of R
PDF
Text Analysis: Latent Topics and Annotated Documents
PDF
15 unionfind
PDF
1.3 introduction to R language, importing dataset in r, data exploration in r
PDF
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
PPTX
Democratizing Big Semantic Data management
PPTX
Coding and Cookies: R basics
A Workshop on R
R tutorial
R programming & Machine Learning
LSESU a Taste of R Language Workshop
Introduction to the R Statistical Computing Environment
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013
Working with text data
final_copy_camera_ready_paper (7)
Using R for Social Media and Sports Analytics
Why R? A Brief Introduction to the Open Source Statistics Platform
R program
GraphX: Graph analytics for insights about developer communities
The History and Use of R
Text Analysis: Latent Topics and Annotated Documents
15 unionfind
1.3 introduction to R language, importing dataset in r, data exploration in r
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
Democratizing Big Semantic Data management
Coding and Cookies: R basics
Ad

Similar to Introduction into R for historians (part 3: examine and import data) (20)

PPTX
Data Exploration in R.pptx
PPTX
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
PDF
Ggplot2 work
PDF
Introduction to R Short course Fall 2016
PPTX
Data Handling in R language basic concepts.pptx
PPT
How to obtain and install R.ppt
PPT
Introduction to R for Data Science Technology
PPTX
Lab 2 - Managing Data in R Basic Conecpt.pptx
DOCX
Ao assignment sanmeet dhokay
PPT
PPT
Slides on introduction to R by ArinBasu MD
PPT
17641.ppt
PPT
Basics of R-Progranmming with instata.ppt
PPT
introduction to R with example, Data science
PDF
Data import-cheatsheet
PPTX
Unit I - introduction to r language 2.pptx
PDF
Unit---4.pdf how to gst du paper in this day and age
PDF
Science Online 2013: Data Visualization Using R
PDF
Data analystics with R module 3 cseds vtu
PDF
Introduction to Data Mining with R and Data Import/Export in R
Data Exploration in R.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
Ggplot2 work
Introduction to R Short course Fall 2016
Data Handling in R language basic concepts.pptx
How to obtain and install R.ppt
Introduction to R for Data Science Technology
Lab 2 - Managing Data in R Basic Conecpt.pptx
Ao assignment sanmeet dhokay
Slides on introduction to R by ArinBasu MD
17641.ppt
Basics of R-Progranmming with instata.ppt
introduction to R with example, Data science
Data import-cheatsheet
Unit I - introduction to r language 2.pptx
Unit---4.pdf how to gst du paper in this day and age
Science Online 2013: Data Visualization Using R
Data analystics with R module 3 cseds vtu
Introduction to Data Mining with R and Data Import/Export in R
Ad

More from Richard Zijdeman (15)

PDF
Linked Data: Een extra ontstluitingslaag op archieven
PDF
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
PDF
grlc. store, share and run sparql queries
PDF
Rijpma's Catasto meets SPARQL dhb2017_workshop
PDF
Data legend dh_benelux_2017.key
PDF
Toogdag 2017
PDF
Historical occupational classification and occupational stratification schemes
PPTX
Labour force participation of married women, US 1860-2010
PPTX
Advancing the comparability of occupational data through Linked Open Data
PPTX
work in a globalized world
PDF
The Structured Data Hub in 2019
PDF
Examples of digital history at the IISH
PDF
Historical occupational classification and stratification schemes (lecture)
PDF
Using HISCO and HISCAM to code and analyze occupations
PDF
Csdh sbg clariah_intr01
Linked Data: Een extra ontstluitingslaag op archieven
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
grlc. store, share and run sparql queries
Rijpma's Catasto meets SPARQL dhb2017_workshop
Data legend dh_benelux_2017.key
Toogdag 2017
Historical occupational classification and occupational stratification schemes
Labour force participation of married women, US 1860-2010
Advancing the comparability of occupational data through Linked Open Data
work in a globalized world
The Structured Data Hub in 2019
Examples of digital history at the IISH
Historical occupational classification and stratification schemes (lecture)
Using HISCO and HISCAM to code and analyze occupations
Csdh sbg clariah_intr01

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
Introduction to Knowledge Engineering Part 1
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Foundation of Data Science unit number two notes
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Lecture1 pattern recognition............
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
Introduction to Knowledge Engineering Part 1
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Foundation of Data Science unit number two notes
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
.pdf is not working space design for the following data for the following dat...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
annual-report-2024-2025 original latest.
Introduction-to-Cloud-ComputingFinal.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Fluorescence-microscope_Botany_detailed content
Lecture1 pattern recognition............
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

Introduction into R for historians (part 3: examine and import data)

  • 1. Recap Getting data in R Do it yourself! Plotting using ggplot2 Examining data and importing data in R Richard L. Zijdeman May 29, 2015 Richard L. Zijdeman Examining data and importing data in R
  • 2. Recap Getting data in R Do it yourself! Plotting using ggplot2 1 Recap 2 Getting data in R 3 Do it yourself! 4 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
  • 3. Recap Getting data in R Do it yourself! Plotting using ggplot2 Recap Richard L. Zijdeman Examining data and importing data in R
  • 4. Recap Getting data in R Do it yourself! Plotting using ggplot2 The structure of objects Store just about anything in R: numbers, sentences, datasets Objects Study the structure of objects: str() type of object features of object ships <- data.frame(year = c(1850, 1860, 1870, 1880), inbound = c(215, 237, 237, NA), outbound = c(212, 239, 260, 265)) Richard L. Zijdeman Examining data and importing data in R
  • 5. Recap Getting data in R Do it yourself! Plotting using ggplot2 Study the structure of object “ships”" str(ships) ## 'data.frame': 4 obs. of 3 variables: ## $ year : num 1850 1860 1870 1880 ## $ inbound : num 215 237 237 NA ## $ outbound: num 212 239 260 265 Richard L. Zijdeman Examining data and importing data in R
  • 6. Recap Getting data in R Do it yourself! Plotting using ggplot2 Characteristics of objects Class: class() Length: length() Dimensions: dim() class(ships) ## [1] "data.frame" length(ships) ## [1] 3 dim(ships) # rows, columns ## [1] 4 3 Richard L. Zijdeman Examining data and importing data in R
  • 7. Recap Getting data in R Do it yourself! Plotting using ggplot2 Closer inspection of data.frames names of columns (variables): names() top/bottom rows: head(), tail() missing data: is.na() names(ships) ## [1] "year" "inbound" "outbound" is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE Richard L. Zijdeman Examining data and importing data in R
  • 8. Recap Getting data in R Do it yourself! Plotting using ggplot2 Summarizing data in data.frames descriptive statistics: summary() calculus: e.g. min(), mean(), sum() results table format: table() summary(ships) ## year inbound outbound ## Min. :1850 Min. :215.0 Min. :212.0 ## 1st Qu.:1858 1st Qu.:226.0 1st Qu.:232.2 ## Median :1865 Median :237.0 Median :249.5 ## Mean :1865 Mean :229.7 Mean :244.0 ## 3rd Qu.:1872 3rd Qu.:237.0 3rd Qu.:261.2 ## Max. :1880 Max. :237.0 Max. :265.0 ## NA's :1 Richard L. Zijdeman Examining data and importing data in R
  • 9. Recap Getting data in R Do it yourself! Plotting using ggplot2 is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE table(is.na(ships)) ## ## FALSE TRUE ## 11 1 Richard L. Zijdeman Examining data and importing data in R
  • 10. Recap Getting data in R Do it yourself! Plotting using ggplot2 Visualizing your data Not just for analyses! Data quality representativeness missing data Richard L. Zijdeman Examining data and importing data in R
  • 11. Recap Getting data in R Do it yourself! Plotting using ggplot2 plot(ships) year 215 220 225 230 235 1850186018701880 215220225230235 inbound 1850 1855 1860 1865 1870 1875 1880 210 220 230 240 250 260 210220230240250260 outbound Richard L. Zijdeman Examining data and importing data in R
  • 12. Recap Getting data in R Do it yourself! Plotting using ggplot2 Getting data in R Richard L. Zijdeman Examining data and importing data in R
  • 13. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data already in R The “datasets” package very slim datasets specific example data To obtain list of datasets, type: library(help = "datasets") To obtain information on a specific dataset, type: help(swiss) # thus: help(name_of_package) or to just see the data: help(swiss) Richard L. Zijdeman Examining data and importing data in R
  • 14. Recap Getting data in R Do it yourself! Plotting using ggplot2 Reading in data Different functions for different files: Base R: read.table() (read.csv()) foreign package: read.spss(), read.dta(), read.dbf() openxlsx package: read.xlsx() alternatives packages: xlsx(Java required) gdata (perl-based) Richard L. Zijdeman Examining data and importing data in R
  • 15. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.xlsx() from openxlsx package file: your file, including directory sheet: name of sheet Richard L. Zijdeman Examining data and importing data in R
  • 16. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.csv() file: your file, including directory header: variable names or not? sep: seperator read.csv default: “,” read.csv2 default: “;” skip: number of rows to skip nrows: total number of rows to read stringsAsFactors encoding (e.g. “latin1” or “UTF-8”) Richard L. Zijdeman Examining data and importing data in R
  • 17. Recap Getting data in R Do it yourself! Plotting using ggplot2 Do it yourself! Richard L. Zijdeman Examining data and importing data in R
  • 18. Recap Getting data in R Do it yourself! Plotting using ggplot2 Read in the following files as data.frames: HSN_basic.xlsx check the data.frame: using dim(), length() check the variables: using summary(), min(), table() Repeat for HSN_marriages.csv: read in only 100 lines Richard L. Zijdeman Examining data and importing data in R
  • 19. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
  • 20. Recap Getting data in R Do it yourself! Plotting using ggplot2 ggplot2 Package by Hadley Wickham Generic plotting for a great range of plots ggplot2 website: http://ggplot2.org excellent tutorial: https://jofrhwld.github.io/avml2012/#Section_1.1 Richard L. Zijdeman Examining data and importing data in R
  • 21. Recap Getting data in R Do it yourself! Plotting using ggplot2 Building your graph Each plot consists of multiple layers Think of a canvas on which you ‘paint’ data layer geometries layer statistics layer Richard L. Zijdeman Examining data and importing data in R
  • 22. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data layer data.frame and aesthetics ggplot(data.frame, aes(x= ..., y = ...)) geometries layer ggplot(..., aes(x= ..., y = ...)) + geom_...() # e.g. geom_line statistics layer ggplot(..., aes(x= ..., y = ...)) + geom_...() + stat_...() # e.g. stat_smooth Richard L. Zijdeman Examining data and importing data in R
  • 23. Recap Getting data in R Do it yourself! Plotting using ggplot2 an example Reading in the data hmar <- read.csv("./../data/derived/HSN_marriages.csv", stringsAsFactors = FALSE, encoding = "latin1", header = TRUE, nrows = 100) Richard L. Zijdeman Examining data and importing data in R
  • 24. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting the data install.packages(ggplot2) library(ggplot2) ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point() Richard L. Zijdeman Examining data and importing data in R
  • 25. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  • 26. Recap Getting data in R Do it yourself! Plotting using ggplot2 Improving the plot Specify characteristics of the geom_layer ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(colour = "blue", size = 3, shape = 18) See http: //www.cookbook-r.com/Graphs/Shapes_and_line_types/ Richard L. Zijdeman Examining data and importing data in R
  • 27. Recap Getting data in R Do it yourself! Plotting using ggplot2 Specify characteristics of the geom_layer 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  • 28. Recap Getting data in R Do it yourself! Plotting using ggplot2 A PTE example Does age at marriage depend on educational attainment? To marry you need resources the more attainment the longer it takes to acquire resources ergo: brides with edu attainment marry later in life Not a statistical test: but let’s graph this Richard L. Zijdeman Examining data and importing data in R
  • 29. Recap Getting data in R Do it yourself! Plotting using ggplot2 A request from yesterday Can I plot labels? ggplot(hmar, aes(x= M_year, y = Age_bride, label = SIgn_bride)) + geom_text() Richard L. Zijdeman Examining data and importing data in R
  • 30. Recap Getting data in R Do it yourself! Plotting using ggplot2 Yes you can! Not really useful though. . . h a h h h a h a h a a a a h a a h h h h h h h a a h h a a h a a a hh h hh a a a a h a h a h h a a h hh h a h h h h h h h a h a h h a h a h h a hh a h h h h h h a a h h h h h h h h h a h a a h a h 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  • 31. Recap Getting data in R Do it yourself! Plotting using ggplot2 Let’s try with colours. . . ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) Richard L. Zijdeman Examining data and importing data in R
  • 32. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride factor(SIgn_bride) a h No real pattern, though. . . Richard L. Zijdeman Examining data and importing data in R
  • 33. Recap Getting data in R Do it yourself! Plotting using ggplot2 Finalizing the graph ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
  • 34. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
  • 35. Recap Getting data in R Do it yourself! Plotting using ggplot2 Satisfied? Richard L. Zijdeman Examining data and importing data in R
  • 36. Recap Getting data in R Do it yourself! Plotting using ggplot2 Actually not. . . the points are plotted on top of each other. . . Solution: geom_jitter ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_jitter(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
  • 37. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
  • 38. Recap Getting data in R Do it yourself! Plotting using ggplot2 Final remarks on ggplot2 We have just scratched the surface of ggplot2 Build your graph slowly start with the basics add complexity step-wise Now it’s your turn! Richard L. Zijdeman Examining data and importing data in R
  • 39. Recap Getting data in R Do it yourself! Plotting using ggplot2 A small PTE project Look at the variables in the HSN files Think of a research question Provide a general mechanism and hypothesis Plot your results Richard L. Zijdeman Examining data and importing data in R