Plotting 1.0
Selene Fernandez-Valverde
 Lab Meeting 26-08-09
Your scientific graphing options




                              Ot
                                   he r
                                       s ...?
Why not only Excel ?

Excel is relatively limited in its support of scientific graphing

It’s options provide limited control over the output

Limited selection of graph types

Limited number of datapoint that can be plotted (or it dies)
What plots you can do
          with R?
      Type on your R terminal:

>   demo(graphics)

> demo(persp)

> demo(lattice)



      Now, that something you can’t make in
      Excel or Prism. ( you actually can in
      Matlab )
Cool, but...
Steep learning cur ve
Plotting is step by step
Prettifying a graph takes a bit lot of
effort
I don’t want to script in R I just want to
plot my results
How do we avoid that?
 We use a package made by someone who
 encountered these problems before
 “ggplot2 is a plotting system for R, based on the grammar
 of graphics, which tries to take the good parts of base and
 lattice graphics and (almost) none of the bad parts. It takes
 care of many of the fiddly details that make plotting a
 hassle (like drawing legends) as well as providing a
 powerful model of graphics that makes it easy to produce
 complex multi-layered graphics.”


 In summary: R graphs made easy
How do I start?
                      First format the data into a table that looks
                      like this:
carat          cut         color   clarity   depth          table        price         x          y          z
        0.23   Ideal       E       SI2               61.5           55           326       3.95       3.98       2.43
        0.21   Premium     E       SI1               59.8           61           326       3.89       3.84       2.31
        0.23   Good        E       VS1               56.9           65           327       4.05       4.07       2.31
        0.29   Premium     I       VS2               62.4           58           334        4.2       4.23       2.63
        0.31   Good        J       SI2               63.3           58           335       4.34       4.35       2.75
        0.24   Very Good   J       VVS2              62.8           57           336       3.94       3.96       2.48




                      Wait ! This looks like an Excel table! Well... it
                      is ( it can also be a tab delimited file )
                      Make sure your variables (columns) are
                      meaningful and allow you to retrieve the
                      information that you want to plot
Read the table into R
    Set your working directory:
> setwd (“./Documents/MyUsername/FolderWhereMyExcelFileIs/”)
> getwd()
> install.packages("ggplot2", dependencies=TRUE)
> library(ggplot2)


    If your file is and Excel file:
> install.packages("gdata")
> library(gdata)
> table <- read.xls(“MyExcelFile.xls”)


    If your file is a tab delimited file:
> table <- read.delim(“MyExcelFile.txt”)
> summary(table)


    We already have a loaded dataset named “diamonds”
> summary(diamonds)


    Start plotting!
Your first plot (s)
> ggplot(diamonds, aes(color)) + geom_bar()
> ggplot(diamonds, aes(color, fill=cut)) + geom_bar()
> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge")
> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge") +
scale_y_log10()
> ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="fill")
> ggplot(diamonds, aes(color, depth)) + geom_point()
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly()
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() +
xlab("Diamond Color")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() +
xlab("Diamond Color") + coord_flip()
> ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_wrap(~ cut)
> ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_grid(. ~ cut)
> ggplot(diamonds, aes(color, depth, color=cut)) + geom_point()
> ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter()
> ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter() + ylim(53,70)
> ggplot(diamonds, aes(color, depth, color=cut)) + geom_boxplot()
Making the graph prettier
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() +
xlab("Diamond Color")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_hue("Cut")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut",
palette="Set1")
ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts", formatter="comma") +
scale_color_brewer("Cut", palette="Set1")
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut",
palette="Set1") + facet_wrap(~ clarity)
> ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() +
labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut",
palette="Set1") + facet_wrap(~ clarity, scale="free_y")
Oneliner graph example




ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) +
                        opts(aspect.ratio = 0.8)
Our version
                               F                                    M

             20

                                                                                           Number of

             15
Hemoglobin




                                                                                                20
                                                                                                40
                                                                                                60
             10                                                                                 80
                                                                                               100


              5

                  200   300   400   500   600   700    200   300   400   500   600   700

                                                TIBC

    ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) +
     opts(aspect.ratio = 0.8) + theme_bw() + scale_fill_gradient("Number of
                                    Patients")
Some things ggplot2
     can’t do
You can’t click on your graph to change
the labels, you have to rerun the
program ( work around that , edit them
in illustrator ) or use opts
When stacking data and setting a new
limit, you’ll lose all the data that is in
the group over that range
Last thoughts
Can handle millions of datapoints

It’s free

Is good for having a quick look at your data and changing the display in an
easy manner

Works in all platforms ( Windows, Mac and Linux [ser ver] like it )

It’s pretty ( and did I mention is fast? )

I think it saves you a bit of Illustrator time

If you already have your scheme and it works for you is not worth it, but
if you are looking for a new plotting strategy I think is a good place to
start

If you get into it you can start making statistical analysis of your data
and plotting it all together
For more info


http://had.co.nz/ggplot2/
http://learnr.wordpress.com/
NHANES Data : National Health and Nutrition Examination Survey

Description

This is a somewhat large interesting dataset, a data frame of 15 variables (columns) on 9575 persons (rows).This data frame
contains the following columns:

Cancer.Incidence
binary factor with levels No and Yes.
Cancer.Death
binary factor with levels No and Yes.
Age
numeric vector giving age of the person in years.
Smoke
a factor with levels Current, Past, Nonsmoker, and Unknown.
Ed
numeric vector of {0,1} codes giving the education level.
Race
numeric vector of {0,1} codes giving the person's race.
Weight
numeric vector giving the weight in kilograms
BMI
numeric vector giving Body Mass Index, i.e., Weight/Height^2 where Height is in meters, and missings (61% !) are coded as
0 originally.
Diet.Iron
numeric giving Dietary iron.
Albumin
numeric giving albumin level in g/l.
Serum.Iron
numeric giving Serum iron in ug/l.
TIBC
numeric giving Total Iron Binding Capacity in ug/l.
Transferin
numeric giving Transferin Saturation which is just 100*serum.iron/TIBC.
Hemoglobin
numeric giving Hemoglobin level.
Sex
a factor with levels F (female) and M (male).

More Related Content

PDF
(Simulated) Organization in Action
PDF
版型0118
PDF
Noise Mapping
PDF
Creating Histograms from Data Stream via MapReduce
PPT
Revision sql te it new syllabus
PDF
DBMS 4 | MySQL - DDL & DML Commands
PDF
Sj fog
PDF
Session 5
(Simulated) Organization in Action
版型0118
Noise Mapping
Creating Histograms from Data Stream via MapReduce
Revision sql te it new syllabus
DBMS 4 | MySQL - DDL & DML Commands
Sj fog
Session 5

Similar to R graphics260809 (20)

PDF
Data Visualization with ggplot2.pdf
PDF
03 extensions
PDF
Data visualization-2.1
PDF
Q plot tutorial
PDF
Elegant Graphics for Data Analysis with ggplot2
PDF
VISIALIZACION DE DATA.pdf
DOCX
Background This course is all about data visualization. However, we.docx
PDF
Data Visualization in R (Graph, Trend, etc)
PDF
(Very) Basic graphing with R
PDF
M4_DAR_part1. module part 4 analystics with r
PDF
Ggplot2 ch2
PDF
data-visualization.pdf
PDF
Ggplot in python
DOCX
Week-3 – System RSupplemental material1Recap •.docx
PPTX
Exploratory Data Analysis
PPT
A Survey Of R Graphics
PPTX
PMM23 Week 3 Lectures
PDF
Integrating R with the CDK: Enhanced Chemical Data Mining
PPT
Chapter 2_Presentation of Data.ppt mean, median, mode, variance
PPTX
R for data visualization and graphics
Data Visualization with ggplot2.pdf
03 extensions
Data visualization-2.1
Q plot tutorial
Elegant Graphics for Data Analysis with ggplot2
VISIALIZACION DE DATA.pdf
Background This course is all about data visualization. However, we.docx
Data Visualization in R (Graph, Trend, etc)
(Very) Basic graphing with R
M4_DAR_part1. module part 4 analystics with r
Ggplot2 ch2
data-visualization.pdf
Ggplot in python
Week-3 – System RSupplemental material1Recap •.docx
Exploratory Data Analysis
A Survey Of R Graphics
PMM23 Week 3 Lectures
Integrating R with the CDK: Enhanced Chemical Data Mining
Chapter 2_Presentation of Data.ppt mean, median, mode, variance
R for data visualization and graphics
Ad

R graphics260809

  • 2. Your scientific graphing options Ot he r s ...?
  • 3. Why not only Excel ? Excel is relatively limited in its support of scientific graphing It’s options provide limited control over the output Limited selection of graph types Limited number of datapoint that can be plotted (or it dies)
  • 4. What plots you can do with R? Type on your R terminal: > demo(graphics) > demo(persp) > demo(lattice) Now, that something you can’t make in Excel or Prism. ( you actually can in Matlab )
  • 5. Cool, but... Steep learning cur ve Plotting is step by step Prettifying a graph takes a bit lot of effort I don’t want to script in R I just want to plot my results
  • 6. How do we avoid that? We use a package made by someone who encountered these problems before “ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and (almost) none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.” In summary: R graphs made easy
  • 7. How do I start? First format the data into a table that looks like this: carat cut color clarity depth table price x y z 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 Wait ! This looks like an Excel table! Well... it is ( it can also be a tab delimited file ) Make sure your variables (columns) are meaningful and allow you to retrieve the information that you want to plot
  • 8. Read the table into R Set your working directory: > setwd (“./Documents/MyUsername/FolderWhereMyExcelFileIs/”) > getwd() > install.packages("ggplot2", dependencies=TRUE) > library(ggplot2) If your file is and Excel file: > install.packages("gdata") > library(gdata) > table <- read.xls(“MyExcelFile.xls”) If your file is a tab delimited file: > table <- read.delim(“MyExcelFile.txt”) > summary(table) We already have a loaded dataset named “diamonds” > summary(diamonds) Start plotting!
  • 9. Your first plot (s) > ggplot(diamonds, aes(color)) + geom_bar() > ggplot(diamonds, aes(color, fill=cut)) + geom_bar() > ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge") > ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="dodge") + scale_y_log10() > ggplot(diamonds, aes(color, fill=cut)) + geom_bar(position="fill") > ggplot(diamonds, aes(color, depth)) + geom_point() > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + xlab("Diamond Color") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + xlab("Diamond Color") + coord_flip() > ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_wrap(~ cut) > ggplot(diamonds, aes(clarity, fill=color)) + geom_bar() + facet_grid(. ~ cut) > ggplot(diamonds, aes(color, depth, color=cut)) + geom_point() > ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter() > ggplot(diamonds, aes(color, depth, color=cut)) + geom_jitter() + ylim(53,70) > ggplot(diamonds, aes(color, depth, color=cut)) + geom_boxplot()
  • 10. Making the graph prettier > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + xlab("Diamond Color") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_hue("Cut") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut", palette="Set1") ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts", formatter="comma") + scale_color_brewer("Cut", palette="Set1") > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut", palette="Set1") + facet_wrap(~ clarity) > ggplot(diamonds, aes(color, color=cut, group=cut)) + geom_freqpoly() + theme_bw() + labs(x="Diamond color") + scale_y_continuous("Counts") + scale_color_brewer("Cut", palette="Set1") + facet_wrap(~ clarity, scale="free_y")
  • 11. Oneliner graph example ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) + opts(aspect.ratio = 0.8)
  • 12. Our version F M 20 Number of 15 Hemoglobin 20 40 60 10 80 100 5 200 300 400 500 600 700 200 300 400 500 600 700 TIBC ggplot(NHANES, aes(TIBC, Hemoglobin)) + geom_hex() + facet_grid(~Sex) + opts(aspect.ratio = 0.8) + theme_bw() + scale_fill_gradient("Number of Patients")
  • 13. Some things ggplot2 can’t do You can’t click on your graph to change the labels, you have to rerun the program ( work around that , edit them in illustrator ) or use opts When stacking data and setting a new limit, you’ll lose all the data that is in the group over that range
  • 14. Last thoughts Can handle millions of datapoints It’s free Is good for having a quick look at your data and changing the display in an easy manner Works in all platforms ( Windows, Mac and Linux [ser ver] like it ) It’s pretty ( and did I mention is fast? ) I think it saves you a bit of Illustrator time If you already have your scheme and it works for you is not worth it, but if you are looking for a new plotting strategy I think is a good place to start If you get into it you can start making statistical analysis of your data and plotting it all together
  • 16. NHANES Data : National Health and Nutrition Examination Survey Description This is a somewhat large interesting dataset, a data frame of 15 variables (columns) on 9575 persons (rows).This data frame contains the following columns: Cancer.Incidence binary factor with levels No and Yes. Cancer.Death binary factor with levels No and Yes. Age numeric vector giving age of the person in years. Smoke a factor with levels Current, Past, Nonsmoker, and Unknown. Ed numeric vector of {0,1} codes giving the education level. Race numeric vector of {0,1} codes giving the person's race. Weight numeric vector giving the weight in kilograms BMI numeric vector giving Body Mass Index, i.e., Weight/Height^2 where Height is in meters, and missings (61% !) are coded as 0 originally. Diet.Iron numeric giving Dietary iron. Albumin numeric giving albumin level in g/l. Serum.Iron numeric giving Serum iron in ug/l. TIBC numeric giving Total Iron Binding Capacity in ug/l. Transferin numeric giving Transferin Saturation which is just 100*serum.iron/TIBC. Hemoglobin numeric giving Hemoglobin level. Sex a factor with levels F (female) and M (male).