SlideShare a Scribd company logo
Advanced Data Analytics:
  Basic Graphics in R

         Jeffrey Stanton
  School of Information Studies
      Syracuse University
Movie Data Set from JSE
• From McClaren and DePaolo’s article in the Journal of
  Statistics Education
• Daily per theater box office receipts in dollars for 49 movies
• A variable number of entries for each movie depending
  upon how long it ran
• About 2500 observations altogether
• DAILY_PER_THEATER: Amount in Dollars
• DATE: mm/dd/yyyy of when the observation was made
• DAY_NUM: Which day in the run, number from 1 up
• MOVIE: The title of the movie
• NUMBER: Index number of the movie
                                                              2
Movie Dataset from JSE
http://www.amstat.org/publications/jse/datasets/moviedaily.dat
http://www.amstat.org/publications/jse/datasets/moviedaily.txt

> moviedaily <-
   read.delim("Z:/DataScience/AdvancedAnalytics/moviedaily.dat")
> view(moviedaily) # Display data in R-Studio in a separate pane
> attach(moviedaily) # Make the new data the active data frame
> class(moviedaily) # Make sure the dataset is a dataframe
[1] "data.frame“
> ls(moviedaily) # Show the variable names in the dataframe
[1] "DAILY_PER_THEATER" "DATE"              "DAY_NUM"
[4] "MOVIE"             "NUMBER"
> hist(DAY_NUM)




                                                             3
Histogram of DAY_NUM

                          Histogram of DAY_NUM
                600
    Frequency
                400
                200
                0




                      0     50     100     150   200
                                 DAY_NUM


                                                       4
About Histograms
• Basic type of diagnostic display shows how frequently each
  value occurs in the data set
• In R, works on numeric data only; getting counts on other
  modes of data requires another approach
• Works fine with continuous data (e.g., 3.1, 3.2, 3.25, etc.)
  because it can cluster together nearby values and count them
  in a single frequency category (representing a range)
• Try hist(NUMBER) and hist(DAILY_PER_THEATER)
• Even though these look like numeric variables, in the data
  importing process, R has made them into “factors” – factors
  are stored as integers with “category labels” and are used in
  various procedures to divide the data into groups
                                                             5
Convert a factor into numbers
•   Recall that a factor is stored as integers with character labels: It is the labels that
    we want to convert into numbers (we can’t control how R assigned the integers,
    so we don’t know exactly what they contain, only that they are unique)
•   Try as.character(DAILY_PER_THEATER) – See how we get lots of
    numbers in quotes, plus some occasional other stuff that is not numbers
•   Then try: as.numeric(as.character(DAILY_PER_THEATER))
•    Note the warning messages: “Warning message: NAs introduced by coercion” –
     This is exactly what we want: “NA” is R’s way of coding missing data; all of
     the unusable string values (like: "No daily data“) have been turned into NAs
     because they are missing values
>   detach(moviedaily)
>   moviedaily$dailyper<-
     as.numeric(as.character(DAILY_PER_THEATER))
>   attach(moviedaily)
>   class(dailyper)
#   Adds a new numeric variable converted from the factor
                                                                                        6
On most days, movies make a few $100

                                Histogram of dailyper
                 2000
     Frequency
                 500 1000
                 0




                            0   5000    10000     15000   20000
                                       dailyper



                                                                  7
Which Movie Made the Most $$$
• First, we need to aggregate the data, by summing the daily
  takes for each movie:
aggdata <- aggregate(dailyper,by=list(MOVIE),FUN=sum, na.rm=TRUE)
# Aggregates by MOVIE, which is a factor with the movie names
# Uses the sum function on the variable dailyper

• Next, lets organize the data in descending order:
sortdata<-aggdata[order(-aggdata$x),]
# The minus sign means decreasing order
• Remove the items that had no data (the sums ended up as
  zero):
sortdata<-sortdata[sortdata$x>1,]
# Takes the subset of rows where the agg $ value > 1
• Finally, create a barplot showing the totals for each movie:
barplot(sortdata$x,names.arg=as.character(sortdata$Group.1),las=2)

                                                                     8
Barplot of Movie Total Daily Take




                                    9
Let’s Do the Same Thing With Rcmdr
• The input data file has some anomalies that we had to clear
  up: Rcmdr data loader is not as forgiving as R-Studio
[3] ERROR:
  line 423 did not have 5 elements
[4] ERROR:
  line 1990 did not have 5 elements
moviedaily <-
   read.table("Z:/DataScience/AdvancedAnalytics/moviedaily
   .dat",
  header=TRUE, sep="t", na.strings="NA", dec=".",
   strip.white=TRUE)
[5] NOTE: The dataset moviedaily has 2378 rows and 5
   columns.



                                                           10
Obviously We Need to Tweak It




                              10
                              8
                  Frequency

                              6
                              4
                              2
                              0



                                   121 16 212 29 379 509 65 81 99

                                        DAILY_PER_THEATER


                                                              11
We Still Need to Coerce




     as.numeric(as.character(DAILY_PER_THEATER))




                                                   12
Aggregate is Under the Menu:
Data -> Active Data Set




                               13
Remove Cases with Missing Data
Subset the Data for Nonzero Values




                                     14
Rcmdr has no Sort Function
  …And the Barplot is Troubled
• We can use the sorting capability we learned before:
   aggdata<-aggdata[order(-aggdata$dailyper),]
• The Barplot menu choice in Rcmdr produces this code:
   barplot(table(aggdata$MOVIE), xlab="MOVIE", ylab="Frequency")
   – This creates a frequency table based on MOVIE, which is not really what
      we want
   – The resulting chart is a histogram rather than a barchart with heights based
      on dailyper
• We can run our own barchart command using the Rcmdr data:
   barplot(aggdata$dailyper,names.arg=as.character(aggdata$MOVIE),las=2)




                                                                          15
100000
                                              150000
                                                       200000




                             50000




                         0
               Titanic
              Menace
 : Phantom Chicago
              Batman
    A Beautiful Mind
          Spider-Man
               Return
 f the Rings:Shrek 2
   of the Black Pearl
        Spider-Man 2
      Shrek the Love
  akespeare in Third
        Spider-Man 3
                Shrek
   Strikes Back, Fire
 er 4: Goblet of The
              Secrets
  hamber of Phoenix
       of the
 rder Good Girl, The
   Return of the Jedi
             Azkaban
 risoner of Gladiator
       Departed, The
  Million Dollar Baby
       Super Size Me
                Crash
 est1135087/7/2006
   Can Count on Me
: nd142715/24/2007
   Dead Anger, The
  side ofMans ChestET
  e1880511/16/2001
  nd386435/26/2007
  e7335511/22/2001
  s Sorcerers Stone
 1: 3: At Worlds End
 est960127/15/2006
  s Last Worlds The
     Sorcerers Stone
 1: 3: At Mimzy, End
 est729907/13/2006
 hest385567/9/2006
   Dead Mans Chest
:End929396/1/2007
   Dead Mans Chest
: e9574311/24/2001
:st1524187/21/2006
          Worlds End
  sDead Mans Chest
  e5208411/20/2001
     Sorcerers Stone
 1: 3: At Mans Chest
   Dead
:est538067/11/2006
  15160811/30/2001
 st1119157/17/2006
 1: Sorcerers Chest
  e3665811/18/2001
   Dead Mans Stone
:1: Sorcerers Stone
  nd557475/28/2007
   Dead
:st2321887/29/2006
     Sorcerers Stone
 st1315607/19/2006
          Worlds End
 1: 3: At Mans Chest
  sDead Mans Chest
:est219067/27/2006
  nd239686/15/2007
   Dead Mans Chest
:End157636/7/2007
  nd1722336/9/2007
:sDead Mans Chest
  nd714685/30/2007
  s 3: At Worlds End
 hest299438/4/2006
    3: At Worlds End
  s 3: At Worlds End
  e2964912/14/2001
  e23174012/8/2001
  sDead Mans Chest
  43109412/28/2001
:1: 3: At Worlds End
  41102512/26/2001
 1: Sorcerers Stone
 st1725547/23/2006
     Sorcerers Stone
 1: Sorcerers Stone
 est436088/18/2006
  ne2133712/6/2001
   Dead Mans Stone
:1: Sorcerers Chest
  nd1130986/3/2007
  nd376936/29/2007
   Dead Mans Stone
:1: Sorcerers Chest
 st1910217/25/2006
  s 3: At Worlds End
  45106112/30/2001
 hest574649/1/2006
  sDead Worlds End
  e3783212/22/2001
:1: 3: At Mans Chest
 End138706/5/2007
   Dead Mans Stone
  d3110496/23/2007
:1: Sorcerers Chest
     Sorcerers Stone
  ne573471/11/2002
  s 3: At Worlds End
  nd294046/21/2007
 est519908/26/2006
  s Sorcerers Stone
 1: 3: At Worlds End
 est256797/31/2006
  s 3: At Worlds End
 st3710398/12/2006
:est353828/10/2006
   Dead Mans Chest
: Dead Mans Chest
tone493981/3/2002
: e1168811/26/2001
   Dead Mans Chest
   Dead Mans
tone519391/5/2002
:1: Sorcerers Chest
 hest598329/3/2006
                Stone
 1: Sorcerers Stone
 est492598/24/2006
                Stone
 hest276658/2/2006
 1: Sorcerers Chest
: e1356311/28/2001
   Dead Mans Chest
: ne655921/19/2002
   Dead Mans Chest
   Dead Mans
  e3545112/20/2001
:1: Sorcerers Stone
  nd195426/11/2007
     Sorcerers Stone
  nd215026/13/2007
 1: Sorcerers Stone
tone479301/1/2002
 1: 3: At Worlds End
  e3949112/24/2001
  s 3: At Worlds End
                                                                • It has not discarded the zeroes as we asked




 hest319548/6/2006
  s Sorcerers Stone
  d2512076/17/2007
 1: Sorcerers Stone
  nd274336/19/2007
 hest334028/8/2006
:1: 3: At Mans Chest
  sDead Worlds End
  ne1938412/4/2001
 End396847/1/2007
  sDead Worlds End
  nd333516/25/2007
:1: 3: At Mans Chest
     Sorcerers Stone
  e17193912/2/2001
  s 3: At Worlds End
 est393198/14/2006
          Worlds End
 hest656139/9/2006
  s Sorcerers Stone
 End413837/4/2007
 1: 3: At Mans Chest
: nd353176/27/2007
 hest631029/7/2006
   Dead Worlds End
:sDead Mans Chest
  ne631001/17/2002
    3: At Worlds
    3: At
                                                                  looks like the aggregation did not work correctly




  sDead Mans End
:est412858/16/2006
  e2523612/10/2001
                Stone
 1: Sorcerers Chest
  e3326912/18/2001
   Dead Mans Chest
:est456968/20/2006
 1: Sorcerers Stone
  nd472387/18/2007
 est532508/28/2006
 1: Sorcerers Chest
  e2724212/12/2001
:sDead Mans Stone
    3: At Worlds End
:est472618/22/2006
  e3190912/16/2001
   Dead Mans Stone
 1: Sorcerers Chest
  nd452027/16/2007
 End432307/9/2007
   Dead Mans Stone
:1: Sorcerers Chest
 est552208/30/2006
  s 3: At Worlds End
tone531251/7/2002
 hest611219/5/2006
  sDead Worlds End
  ne595591/13/2002
:1: 3: At Mans Chest
tone551071/9/2002
                                                                                                                               But Some Things are Still Messed Up!




   Dead Mans Stone
  ne611121/15/2002
:1: Sorcerers Chest
 1: Sorcerers Stone
     Sorcerers Stone
 1: Sorcerers Stone
                                                                • There are too many entries – there should be 49 or fewer –




  16
Demonstrating Mastery
• Locate a data set in a CSV or Tab-Delimited file and read it
  into R
• Check the data to ensure that the process of reading in the
  data worked properly
• Run a histogram on any numeric variable
• Aggregate the data based on any grouping variable; use a
  sum function, a mean function, or some other function as
  appropriate
• Display the aggregated data in a barchart or another type of
  graph as appropriate
• Describe the difference between a histogram and a barchart

                                                            17

More Related Content

PDF
The Ring programming language version 1.7 book - Part 53 of 196
PDF
R graphics by Novi Reandy Sasmita
PPTX
Graphics with r
PDF
Data exploration and graphics with R
PDF
Adrenal agonist agents
PPT
Overview Of Pharmacodynamics 04.15.09
PPTX
Antagonist or agonist
PPTX
Concepts of agonist and antagonist receptors
The Ring programming language version 1.7 book - Part 53 of 196
R graphics by Novi Reandy Sasmita
Graphics with r
Data exploration and graphics with R
Adrenal agonist agents
Overview Of Pharmacodynamics 04.15.09
Antagonist or agonist
Concepts of agonist and antagonist receptors

Similar to Basic Graphics with R (20)

PDF
Assignment 5.1.pdf
PDF
Data Manipulation Using R (& dplyr)
PDF
Building generic data queries using python ast
PDF
R programming & Machine Learning
PDF
Regression Model for movies
PPTX
Cluto presentation
PDF
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
PDF
Data Wrangling with dplyr and tidyr Cheat Sheet
PDF
2013.11.14 Big Data Workshop Bruno Voisin
PDF
SevillaR meetup: dplyr and magrittr
PPTX
Dealing with Continuous Data Processing, ConFoo 2012
KEY
Parallel Computing in R
PPTX
Decision Tree.pptx
PPTX
R programming language
PDF
Recommending Movies Using Neo4j
PPTX
ComputeFest 2012: Intro To R for Physical Sciences
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
PDF
BSSML16 L6. Basic Data Transformations
PPTX
Data Science for Folks Without (or With!) a Ph.D.
Assignment 5.1.pdf
Data Manipulation Using R (& dplyr)
Building generic data queries using python ast
R programming & Machine Learning
Regression Model for movies
Cluto presentation
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Data Wrangling with dplyr and tidyr Cheat Sheet
2013.11.14 Big Data Workshop Bruno Voisin
SevillaR meetup: dplyr and magrittr
Dealing with Continuous Data Processing, ConFoo 2012
Parallel Computing in R
Decision Tree.pptx
R programming language
Recommending Movies Using Neo4j
ComputeFest 2012: Intro To R for Physical Sciences
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
BSSML16 L6. Basic Data Transformations
Data Science for Folks Without (or With!) a Ph.D.
Ad

More from Syracuse University (20)

PPTX
Discovery informaticsstanton
PPTX
Basic SEVIS Overview for U.S. University Faculty
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
PPTX
Chapter9 r studio2
PPTX
Basic Overview of Data Mining
PPTX
Strategic planning
PPTX
Carma internet research module scale development
PPTX
Carma internet research module getting started with question pro
PPTX
Carma internet research module visual design issues
PPT
Siop impact of social media
PPTX
R-Studio Vs. Rcmdr
PPTX
Getting Started with R
PPTX
Moving Data to and From R
PPTX
Introduction to Advance Analytics Course
PPTX
Installing R and R-Studio
PPTX
Mining tweets for security information (rev 2)
PPTX
What is Data Science
PPTX
Reducing Response Burden
PPTX
PACIS Survey Workshop
PPTX
Carma internet research module: Future data collection
Discovery informaticsstanton
Basic SEVIS Overview for U.S. University Faculty
Why R? A Brief Introduction to the Open Source Statistics Platform
Chapter9 r studio2
Basic Overview of Data Mining
Strategic planning
Carma internet research module scale development
Carma internet research module getting started with question pro
Carma internet research module visual design issues
Siop impact of social media
R-Studio Vs. Rcmdr
Getting Started with R
Moving Data to and From R
Introduction to Advance Analytics Course
Installing R and R-Studio
Mining tweets for security information (rev 2)
What is Data Science
Reducing Response Burden
PACIS Survey Workshop
Carma internet research module: Future data collection
Ad

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Pre independence Education in Inndia.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial diseases, their pathogenesis and prophylaxis
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pre independence Education in Inndia.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPH.pptx obstetrics and gynecology in nursing
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Structure & Organelles in detailed.
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
O5-L3 Freight Transport Ops (International) V1.pdf
human mycosis Human fungal infections are called human mycosis..pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Final Presentation General Medicine 03-08-2024.pptx
TR - Agricultural Crops Production NC III.pdf

Basic Graphics with R

  • 1. Advanced Data Analytics: Basic Graphics in R Jeffrey Stanton School of Information Studies Syracuse University
  • 2. Movie Data Set from JSE • From McClaren and DePaolo’s article in the Journal of Statistics Education • Daily per theater box office receipts in dollars for 49 movies • A variable number of entries for each movie depending upon how long it ran • About 2500 observations altogether • DAILY_PER_THEATER: Amount in Dollars • DATE: mm/dd/yyyy of when the observation was made • DAY_NUM: Which day in the run, number from 1 up • MOVIE: The title of the movie • NUMBER: Index number of the movie 2
  • 3. Movie Dataset from JSE http://www.amstat.org/publications/jse/datasets/moviedaily.dat http://www.amstat.org/publications/jse/datasets/moviedaily.txt > moviedaily <- read.delim("Z:/DataScience/AdvancedAnalytics/moviedaily.dat") > view(moviedaily) # Display data in R-Studio in a separate pane > attach(moviedaily) # Make the new data the active data frame > class(moviedaily) # Make sure the dataset is a dataframe [1] "data.frame“ > ls(moviedaily) # Show the variable names in the dataframe [1] "DAILY_PER_THEATER" "DATE" "DAY_NUM" [4] "MOVIE" "NUMBER" > hist(DAY_NUM) 3
  • 4. Histogram of DAY_NUM Histogram of DAY_NUM 600 Frequency 400 200 0 0 50 100 150 200 DAY_NUM 4
  • 5. About Histograms • Basic type of diagnostic display shows how frequently each value occurs in the data set • In R, works on numeric data only; getting counts on other modes of data requires another approach • Works fine with continuous data (e.g., 3.1, 3.2, 3.25, etc.) because it can cluster together nearby values and count them in a single frequency category (representing a range) • Try hist(NUMBER) and hist(DAILY_PER_THEATER) • Even though these look like numeric variables, in the data importing process, R has made them into “factors” – factors are stored as integers with “category labels” and are used in various procedures to divide the data into groups 5
  • 6. Convert a factor into numbers • Recall that a factor is stored as integers with character labels: It is the labels that we want to convert into numbers (we can’t control how R assigned the integers, so we don’t know exactly what they contain, only that they are unique) • Try as.character(DAILY_PER_THEATER) – See how we get lots of numbers in quotes, plus some occasional other stuff that is not numbers • Then try: as.numeric(as.character(DAILY_PER_THEATER)) • Note the warning messages: “Warning message: NAs introduced by coercion” – This is exactly what we want: “NA” is R’s way of coding missing data; all of the unusable string values (like: "No daily data“) have been turned into NAs because they are missing values > detach(moviedaily) > moviedaily$dailyper<- as.numeric(as.character(DAILY_PER_THEATER)) > attach(moviedaily) > class(dailyper) # Adds a new numeric variable converted from the factor 6
  • 7. On most days, movies make a few $100 Histogram of dailyper 2000 Frequency 500 1000 0 0 5000 10000 15000 20000 dailyper 7
  • 8. Which Movie Made the Most $$$ • First, we need to aggregate the data, by summing the daily takes for each movie: aggdata <- aggregate(dailyper,by=list(MOVIE),FUN=sum, na.rm=TRUE) # Aggregates by MOVIE, which is a factor with the movie names # Uses the sum function on the variable dailyper • Next, lets organize the data in descending order: sortdata<-aggdata[order(-aggdata$x),] # The minus sign means decreasing order • Remove the items that had no data (the sums ended up as zero): sortdata<-sortdata[sortdata$x>1,] # Takes the subset of rows where the agg $ value > 1 • Finally, create a barplot showing the totals for each movie: barplot(sortdata$x,names.arg=as.character(sortdata$Group.1),las=2) 8
  • 9. Barplot of Movie Total Daily Take 9
  • 10. Let’s Do the Same Thing With Rcmdr • The input data file has some anomalies that we had to clear up: Rcmdr data loader is not as forgiving as R-Studio [3] ERROR: line 423 did not have 5 elements [4] ERROR: line 1990 did not have 5 elements moviedaily <- read.table("Z:/DataScience/AdvancedAnalytics/moviedaily .dat", header=TRUE, sep="t", na.strings="NA", dec=".", strip.white=TRUE) [5] NOTE: The dataset moviedaily has 2378 rows and 5 columns. 10
  • 11. Obviously We Need to Tweak It 10 8 Frequency 6 4 2 0 121 16 212 29 379 509 65 81 99 DAILY_PER_THEATER 11
  • 12. We Still Need to Coerce as.numeric(as.character(DAILY_PER_THEATER)) 12
  • 13. Aggregate is Under the Menu: Data -> Active Data Set 13
  • 14. Remove Cases with Missing Data Subset the Data for Nonzero Values 14
  • 15. Rcmdr has no Sort Function …And the Barplot is Troubled • We can use the sorting capability we learned before: aggdata<-aggdata[order(-aggdata$dailyper),] • The Barplot menu choice in Rcmdr produces this code: barplot(table(aggdata$MOVIE), xlab="MOVIE", ylab="Frequency") – This creates a frequency table based on MOVIE, which is not really what we want – The resulting chart is a histogram rather than a barchart with heights based on dailyper • We can run our own barchart command using the Rcmdr data: barplot(aggdata$dailyper,names.arg=as.character(aggdata$MOVIE),las=2) 15
  • 16. 100000 150000 200000 50000 0 Titanic Menace : Phantom Chicago Batman A Beautiful Mind Spider-Man Return f the Rings:Shrek 2 of the Black Pearl Spider-Man 2 Shrek the Love akespeare in Third Spider-Man 3 Shrek Strikes Back, Fire er 4: Goblet of The Secrets hamber of Phoenix of the rder Good Girl, The Return of the Jedi Azkaban risoner of Gladiator Departed, The Million Dollar Baby Super Size Me Crash est1135087/7/2006 Can Count on Me : nd142715/24/2007 Dead Anger, The side ofMans ChestET e1880511/16/2001 nd386435/26/2007 e7335511/22/2001 s Sorcerers Stone 1: 3: At Worlds End est960127/15/2006 s Last Worlds The Sorcerers Stone 1: 3: At Mimzy, End est729907/13/2006 hest385567/9/2006 Dead Mans Chest :End929396/1/2007 Dead Mans Chest : e9574311/24/2001 :st1524187/21/2006 Worlds End sDead Mans Chest e5208411/20/2001 Sorcerers Stone 1: 3: At Mans Chest Dead :est538067/11/2006 15160811/30/2001 st1119157/17/2006 1: Sorcerers Chest e3665811/18/2001 Dead Mans Stone :1: Sorcerers Stone nd557475/28/2007 Dead :st2321887/29/2006 Sorcerers Stone st1315607/19/2006 Worlds End 1: 3: At Mans Chest sDead Mans Chest :est219067/27/2006 nd239686/15/2007 Dead Mans Chest :End157636/7/2007 nd1722336/9/2007 :sDead Mans Chest nd714685/30/2007 s 3: At Worlds End hest299438/4/2006 3: At Worlds End s 3: At Worlds End e2964912/14/2001 e23174012/8/2001 sDead Mans Chest 43109412/28/2001 :1: 3: At Worlds End 41102512/26/2001 1: Sorcerers Stone st1725547/23/2006 Sorcerers Stone 1: Sorcerers Stone est436088/18/2006 ne2133712/6/2001 Dead Mans Stone :1: Sorcerers Chest nd1130986/3/2007 nd376936/29/2007 Dead Mans Stone :1: Sorcerers Chest st1910217/25/2006 s 3: At Worlds End 45106112/30/2001 hest574649/1/2006 sDead Worlds End e3783212/22/2001 :1: 3: At Mans Chest End138706/5/2007 Dead Mans Stone d3110496/23/2007 :1: Sorcerers Chest Sorcerers Stone ne573471/11/2002 s 3: At Worlds End nd294046/21/2007 est519908/26/2006 s Sorcerers Stone 1: 3: At Worlds End est256797/31/2006 s 3: At Worlds End st3710398/12/2006 :est353828/10/2006 Dead Mans Chest : Dead Mans Chest tone493981/3/2002 : e1168811/26/2001 Dead Mans Chest Dead Mans tone519391/5/2002 :1: Sorcerers Chest hest598329/3/2006 Stone 1: Sorcerers Stone est492598/24/2006 Stone hest276658/2/2006 1: Sorcerers Chest : e1356311/28/2001 Dead Mans Chest : ne655921/19/2002 Dead Mans Chest Dead Mans e3545112/20/2001 :1: Sorcerers Stone nd195426/11/2007 Sorcerers Stone nd215026/13/2007 1: Sorcerers Stone tone479301/1/2002 1: 3: At Worlds End e3949112/24/2001 s 3: At Worlds End • It has not discarded the zeroes as we asked hest319548/6/2006 s Sorcerers Stone d2512076/17/2007 1: Sorcerers Stone nd274336/19/2007 hest334028/8/2006 :1: 3: At Mans Chest sDead Worlds End ne1938412/4/2001 End396847/1/2007 sDead Worlds End nd333516/25/2007 :1: 3: At Mans Chest Sorcerers Stone e17193912/2/2001 s 3: At Worlds End est393198/14/2006 Worlds End hest656139/9/2006 s Sorcerers Stone End413837/4/2007 1: 3: At Mans Chest : nd353176/27/2007 hest631029/7/2006 Dead Worlds End :sDead Mans Chest ne631001/17/2002 3: At Worlds 3: At looks like the aggregation did not work correctly sDead Mans End :est412858/16/2006 e2523612/10/2001 Stone 1: Sorcerers Chest e3326912/18/2001 Dead Mans Chest :est456968/20/2006 1: Sorcerers Stone nd472387/18/2007 est532508/28/2006 1: Sorcerers Chest e2724212/12/2001 :sDead Mans Stone 3: At Worlds End :est472618/22/2006 e3190912/16/2001 Dead Mans Stone 1: Sorcerers Chest nd452027/16/2007 End432307/9/2007 Dead Mans Stone :1: Sorcerers Chest est552208/30/2006 s 3: At Worlds End tone531251/7/2002 hest611219/5/2006 sDead Worlds End ne595591/13/2002 :1: 3: At Mans Chest tone551071/9/2002 But Some Things are Still Messed Up! Dead Mans Stone ne611121/15/2002 :1: Sorcerers Chest 1: Sorcerers Stone Sorcerers Stone 1: Sorcerers Stone • There are too many entries – there should be 49 or fewer – 16
  • 17. Demonstrating Mastery • Locate a data set in a CSV or Tab-Delimited file and read it into R • Check the data to ensure that the process of reading in the data worked properly • Run a histogram on any numeric variable • Aggregate the data based on any grouping variable; use a sum function, a mean function, or some other function as appropriate • Display the aggregated data in a barchart or another type of graph as appropriate • Describe the difference between a histogram and a barchart 17