SlideShare a Scribd company logo
SBSM035 - Stats/ 
Bioinformatics/ 
Programming 
y.wurm@qmul.ac.uk 
http://yannick.poulet.org
© Alex Wild & others
2014 11-12 sbsm032rstatsprogramming.key
Atta leaf-cutter ants 
© National Geographic
Atta leaf-cutter ants 
© National Geographic
Atta leaf-cutter ants 
© National Geographic
2014 11-12 sbsm032rstatsprogramming.key
Oecophylla Weaver ants 
© ameisenforum.de
© ameisenforum.de 
Fourmis tisserandes
© ameisenforum.de 
Oecophylla Weaver ants
© wynnie@flickr © forestryimages.org
Tofilski et al 2008 
Forelius pusillus
Forelius pusillus hides the nest entrance at night 
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night 
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night 
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night 
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night 
Avant 
Workers staying outside die 
« preventive self-sacrifice » 
Tofilski et al 2008
Dorylus driver ants: ants with no home 
© BBC
Animal biomass (Brazilian rainforest) 
Mammals 
Birds 
Reptiles 
Other insects Amphibians 
from Fittkau & Klinge 1973 
! 
Earthworms 
! 
! 
Spiders 
Soil fauna excluding 
earthworms, 
ants & termites 
Ants & termites
We use modern technologies to 
understand insect societies. 
• evolution of social behaviour 
• molecules involved in social behaviour 
• consequences of environmental change
2014 11-12 sbsm032rstatsprogramming.key
Big data is invading biology
This changes 
454 everything. 
Illumina 
Solid... 
Any lab can 
sequence anything!
Big data is invading biology 
• Genomics 
• Biodiversity assessments 
• Stool microbiome sequencing 
• Personalized medicine 
• Cancer genomics 
• Sensor networks - e.g tracking microclimates, recording sounds 
• Aerial surveys (Drones) - e.g. crop productivity; rainforest cover 
• Camera traps
2014 11-12 sbsm032rstatsprogramming.key
2014 11-12 sbsm032rstatsprogramming.key
Choosing a programming language 
Good: Bad: 
Excel quick & dirty easy to make mistakes 
doesn’t scale 
R numbers, stats, 
genomics 
programming 
Unix command-line 
== shell == bash 
Can’t escape it. 
Quick & Dirty. HPC. 
programming, 
complicated things 
Java 1990s user interfaces overcomplicated. 
Perl 1980s. Everything. 
Python scripting, text ugly 
Ruby scripting, text 
Javascript/Node scripting, flexibility(web 
& client), community only little bio-stuff
First steps towards data handling 
• Basic stats - done! 
• Programming in R 
• UNIX command-line 
bioinformaticians
2014 11-12 sbsm032rstatsprogramming.key
Practicals 
• Aim: get relevant data handling skills 
• Doing things by hand: 
• impossible? 
• slow, 
• error-prone, 
• Automate! 
• Basic programming 
• in R 
• no stats!
Practicals: contents 
• Done: 
• data accessing/subsetting 
• New: 
• search/replace 
• regular expressions 
• New: 
• functions 
• loops 
Text search on steroids 
Reusable pieces of work 
Repeating the same thing many times
2014 11-12 sbsm032rstatsprogramming.key
• creating a vector 
• give me a vector containing numbers from 5 to 11 (3 variants) 
> myvector <- 5:11 
> myvector <- seq(from=5, to=11, by=1) 
> myvector <- c(5, 6, 7, 8, 9, 10, 11) 
> myvector 
[1] 5 6 7 8 9 10 11 
• accessing a subset 
• of a vector 
> bigvector <- 150:100 
> bigvector 
[1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 [20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 [39] 112 111 110 109 108 107 106 105 104 103 102 101 100 
> mysubset <- bigvector[myvector] 
> mysubset 
[1] 146 145 144 143 142 141 140 
! 
> subset(bigvector, bigvector > 120) 
[1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 [20] 131 130 129 128 127 126 125 124 123 122 121
2014 11-12 sbsm032rstatsprogramming.key
2014 11-12 sbsm032rstatsprogramming.key
2014 11-12 sbsm032rstatsprogramming.key
Regular expressions (regex): 
Text search on steroids.
Regular expressions (regex): 
Text search on steroids. 
Regular expression Finds 
David David 
Dav(e|(id)) David, Dave 
Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo 
At{1,2}enborough Attenborough, 
Atenborough 
Atte[nm]borough Attenborough, 
Attemborough 
At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1} 
Atimbro, 
attenbrough, 
ateinborow 
Easy counting, replacing all with “Sir David Attenborough”
Regex Special symbols 
Regular expression Finds Example 
[aeiou] any single vowel “e” 
[aeiou]* between 0 and infinity 
vowels vowels, e.g.’ “eeooouuu" 
[aeoiu]{1,3} between 1 and 3 vowels “oui” 
! 
a|i one of the 2 characters “" 
((win)|(fail)) one of the two 
words in () fail
More Regex Special symbols 
Synonymous with 
[:digit:] [0-9] 
[A-z] [A-z], ie [A-Za-z] 
s whitespace 
. any single character 
.+ one to many of anything 
b* between 0 and infinity letter ‘b’ 
[^abc] any character other than a, b or c. 
( ( 
[:punct:] any of these: ! " # $ % & ' ( ) * + , - . / 
: ; < = > ? @ [  ] ^ _ ` { | 
• Google “Regular expression cheat sheet” 
• ?regexp
Your turn 
Make a regular expression 
• matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but 
not “LMVSQMIKTIP” 
! 
• matching all variants of “ok” (e.g., “O.K.”, “Okay”…)
Regular expressions (regex): 
matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but 
Text not search “LMVSQMIKTIP” 
on steroids. 
Regular expression Finds 
David David 
Dav(e|(id)) David, Dave 
Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo 
At{1,2}enborough Attenborough, 
Atenborough 
Atte[nm]borough Attenborough, 
Attemborough 
At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1} 
Atimbro, 
attenbrough, 
ateinborow 
Easy counting, replacing all with “Sir David Attenborough”
2014 11-12 sbsm032rstatsprogramming.key
Functions
Functions 
• R has many. e.g.: plot(), t.test() 
• Making your own: 
tree_age_estimate <- function(diameter, species) { 
[...do the magic... maybe something like: 
growth.rate <- growth.rates[ species ] 
age.estimate <- diameter / growth.rate 
...] 
! 
return(age.estimate) 
} 
> tree_age_estimate(25, “White Oak”) 
+ 66 
> tree_age_estimate(60, “Carya ovata”) 
+ 190
Your turn 
• Create a function that takes as input a length in centimetres 
and returns the length in feet+inches.
Function
Loops
“for” 
Loop 
> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 
'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark 
blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue') 
! 
> possible_colours 
[1] "blue" "cyan" "sky-blue" "navy blue" 
[5] "steel blue" "royal blue" "slate blue" "light blue" 
[9] "dark blue" "prussian blue" "indigo" "baby blue" 
[13] "electric blue" 
! 
> for (colour in possible_colours) { 
+ print(paste("The sky is oh so, so", colour)) 
+ } 
! 
[1] "The sky is so, oh so blue" 
[1] "The sky is so, oh so cyan" 
[1] "The sky is so, oh so sky-blue" 
[1] "The sky is so, oh so navy blue" 
[1] "The sky is so, oh so steel blue" 
[1] "The sky is so, oh so royal blue" 
[1] "The sky is so, oh so slate blue" 
[1] "The sky is so, oh so light blue" 
[1] "The sky is so, oh so dark blue" 
[1] "The sky is so, oh so prussian blue" 
[1] "The sky is so, oh so indigo" 
[1] "The sky is so, oh so baby blue" 
[1] "The sky is so, oh so electric blue"
Your turn 
•What does the following code do (decompose on pen and 
paper)
Your turn 
• Create a loop that multiplies the numbers from ‘x’ to ‘y’
2014 11-12 sbsm032rstatsprogramming.key
2014 11-12 sbsm032rstatsprogramming.key

More Related Content

PDF
2014-9-24-SBC361-ResearchMethComm
PPT
Tree Top
PDF
2013 10-16-sbc3610-research methcomm
PDF
A Python Crash Course
KEY
Joshua Wehner - Tomorrows Programming Languages Today
PDF
Hairstylist riches vol1 synopsis
PDF
Charlotte howard press kit
PDF
2013 09-05-cream teasexeter
2014-9-24-SBC361-ResearchMethComm
Tree Top
2013 10-16-sbc3610-research methcomm
A Python Crash Course
Joshua Wehner - Tomorrows Programming Languages Today
Hairstylist riches vol1 synopsis
Charlotte howard press kit
2013 09-05-cream teasexeter

Similar to 2014 11-12 sbsm032rstatsprogramming.key (20)

PDF
2015 11-17-programming inr.key
PDF
2015 9-30-sbc361-research methcomm
PDF
Rtips123
PDF
Masters bioinfo 2013-11-14-15
PPTX
Data processing and visualization basics
PDF
Using R For Data Management Statistical Analysis And Graphics 1st Edition Nic...
PDF
Introduction to R programming
PDF
2015 10-7-9am regex-functions-loops.key
PPT
PDF
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
PPTX
Programming with R in Big Data Analytics
PPTX
R language introduction
PDF
PPTX
Reproducible Computational Research in R
PDF
An Introduction to MATLAB for Geoscientists.pdf
PPTX
ComputeFest 2012: Intro To R for Physical Sciences
PPTX
Getting Started with R
PDF
An Introduction To R Software For Statistical Modelling Computing Course M...
PDF
Rlecturenotes
PDF
Introduction to R Short course Fall 2016
2015 11-17-programming inr.key
2015 9-30-sbc361-research methcomm
Rtips123
Masters bioinfo 2013-11-14-15
Data processing and visualization basics
Using R For Data Management Statistical Analysis And Graphics 1st Edition Nic...
Introduction to R programming
2015 10-7-9am regex-functions-loops.key
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
Programming with R in Big Data Analytics
R language introduction
Reproducible Computational Research in R
An Introduction to MATLAB for Geoscientists.pdf
ComputeFest 2012: Intro To R for Physical Sciences
Getting Started with R
An Introduction To R Software For Statistical Modelling Computing Course M...
Rlecturenotes
Introduction to R Short course Fall 2016
Ad

More from Yannick Wurm (20)

PDF
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
PDF
2018 08-reduce risks of genomics research
PDF
2017 11-15-reproducible research
PDF
2016 09-16-fairdom
PDF
2016 05-31-wurm-social-chromosome
PDF
2016 05-30-monday-assembly
PDF
2016 05-29-intro-sib-springschool-leuker bad
PDF
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
PDF
2015 11-10-bio-in-docker-oswitch
PDF
Week 5 genetic basis of evolution
PDF
Biol113 week4 evolution
PDF
Evolution week3
PDF
2015 10-7-11am-reproducible research
PDF
Evolution week2
PDF
2015 09-29-sbc322-methods.key
PDF
Sbc322 intro.key
PDF
2015 09-28 bio721 intro
PDF
Sustainable software institute Collaboration workshop
PDF
2014 10-15-Nextbug edinburgh
PDF
2014 12-09-oulu
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 08-reduce risks of genomics research
2017 11-15-reproducible research
2016 09-16-fairdom
2016 05-31-wurm-social-chromosome
2016 05-30-monday-assembly
2016 05-29-intro-sib-springschool-leuker bad
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 11-10-bio-in-docker-oswitch
Week 5 genetic basis of evolution
Biol113 week4 evolution
Evolution week3
2015 10-7-11am-reproducible research
Evolution week2
2015 09-29-sbc322-methods.key
Sbc322 intro.key
2015 09-28 bio721 intro
Sustainable software institute Collaboration workshop
2014 10-15-Nextbug edinburgh
2014 12-09-oulu
Ad

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Basic Mud Logging Guide for educational purpose
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Cell Structure & Organelles in detailed.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Classroom Observation Tools for Teachers
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
RMMM.pdf make it easy to upload and study
human mycosis Human fungal infections are called human mycosis..pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Basic Mud Logging Guide for educational purpose
Microbial disease of the cardiovascular and lymphatic systems
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
Cell Structure & Organelles in detailed.
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Classroom Observation Tools for Teachers
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Types and Its function , kingdom of life
VCE English Exam - Section C Student Revision Booklet
FourierSeries-QuestionsWithAnswers(Part-A).pdf
TR - Agricultural Crops Production NC III.pdf
GDM (1) (1).pptx small presentation for students
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

2014 11-12 sbsm032rstatsprogramming.key

  • 1. SBSM035 - Stats/ Bioinformatics/ Programming y.wurm@qmul.ac.uk http://yannick.poulet.org
  • 2. © Alex Wild & others
  • 4. Atta leaf-cutter ants © National Geographic
  • 5. Atta leaf-cutter ants © National Geographic
  • 6. Atta leaf-cutter ants © National Geographic
  • 8. Oecophylla Weaver ants © ameisenforum.de
  • 11. © wynnie@flickr © forestryimages.org
  • 12. Tofilski et al 2008 Forelius pusillus
  • 13. Forelius pusillus hides the nest entrance at night Tofilski et al 2008
  • 14. Forelius pusillus hides the nest entrance at night Tofilski et al 2008
  • 15. Forelius pusillus hides the nest entrance at night Tofilski et al 2008
  • 16. Forelius pusillus hides the nest entrance at night Tofilski et al 2008
  • 17. Forelius pusillus hides the nest entrance at night Avant Workers staying outside die « preventive self-sacrifice » Tofilski et al 2008
  • 18. Dorylus driver ants: ants with no home © BBC
  • 19. Animal biomass (Brazilian rainforest) Mammals Birds Reptiles Other insects Amphibians from Fittkau & Klinge 1973 ! Earthworms ! ! Spiders Soil fauna excluding earthworms, ants & termites Ants & termites
  • 20. We use modern technologies to understand insect societies. • evolution of social behaviour • molecules involved in social behaviour • consequences of environmental change
  • 22. Big data is invading biology
  • 23. This changes 454 everything. Illumina Solid... Any lab can sequence anything!
  • 24. Big data is invading biology • Genomics • Biodiversity assessments • Stool microbiome sequencing • Personalized medicine • Cancer genomics • Sensor networks - e.g tracking microclimates, recording sounds • Aerial surveys (Drones) - e.g. crop productivity; rainforest cover • Camera traps
  • 27. Choosing a programming language Good: Bad: Excel quick & dirty easy to make mistakes doesn’t scale R numbers, stats, genomics programming Unix command-line == shell == bash Can’t escape it. Quick & Dirty. HPC. programming, complicated things Java 1990s user interfaces overcomplicated. Perl 1980s. Everything. Python scripting, text ugly Ruby scripting, text Javascript/Node scripting, flexibility(web & client), community only little bio-stuff
  • 28. First steps towards data handling • Basic stats - done! • Programming in R • UNIX command-line bioinformaticians
  • 30. Practicals • Aim: get relevant data handling skills • Doing things by hand: • impossible? • slow, • error-prone, • Automate! • Basic programming • in R • no stats!
  • 31. Practicals: contents • Done: • data accessing/subsetting • New: • search/replace • regular expressions • New: • functions • loops Text search on steroids Reusable pieces of work Repeating the same thing many times
  • 33. • creating a vector • give me a vector containing numbers from 5 to 11 (3 variants) > myvector <- 5:11 > myvector <- seq(from=5, to=11, by=1) > myvector <- c(5, 6, 7, 8, 9, 10, 11) > myvector [1] 5 6 7 8 9 10 11 • accessing a subset • of a vector > bigvector <- 150:100 > bigvector [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 [20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 [39] 112 111 110 109 108 107 106 105 104 103 102 101 100 > mysubset <- bigvector[myvector] > mysubset [1] 146 145 144 143 142 141 140 ! > subset(bigvector, bigvector > 120) [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 [20] 131 130 129 128 127 126 125 124 123 122 121
  • 37. Regular expressions (regex): Text search on steroids.
  • 38. Regular expressions (regex): Text search on steroids. Regular expression Finds David David Dav(e|(id)) David, Dave Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo At{1,2}enborough Attenborough, Atenborough Atte[nm]borough Attenborough, Attemborough At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1} Atimbro, attenbrough, ateinborow Easy counting, replacing all with “Sir David Attenborough”
  • 39. Regex Special symbols Regular expression Finds Example [aeiou] any single vowel “e” [aeiou]* between 0 and infinity vowels vowels, e.g.’ “eeooouuu" [aeoiu]{1,3} between 1 and 3 vowels “oui” ! a|i one of the 2 characters “" ((win)|(fail)) one of the two words in () fail
  • 40. More Regex Special symbols Synonymous with [:digit:] [0-9] [A-z] [A-z], ie [A-Za-z] s whitespace . any single character .+ one to many of anything b* between 0 and infinity letter ‘b’ [^abc] any character other than a, b or c. ( ( [:punct:] any of these: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | • Google “Regular expression cheat sheet” • ?regexp
  • 41. Your turn Make a regular expression • matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP” ! • matching all variants of “ok” (e.g., “O.K.”, “Okay”…)
  • 42. Regular expressions (regex): matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but Text not search “LMVSQMIKTIP” on steroids. Regular expression Finds David David Dav(e|(id)) David, Dave Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo At{1,2}enborough Attenborough, Atenborough Atte[nm]borough Attenborough, Attemborough At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1} Atimbro, attenbrough, ateinborow Easy counting, replacing all with “Sir David Attenborough”
  • 45. Functions • R has many. e.g.: plot(), t.test() • Making your own: tree_age_estimate <- function(diameter, species) { [...do the magic... maybe something like: growth.rate <- growth.rates[ species ] age.estimate <- diameter / growth.rate ...] ! return(age.estimate) } > tree_age_estimate(25, “White Oak”) + 66 > tree_age_estimate(60, “Carya ovata”) + 190
  • 46. Your turn • Create a function that takes as input a length in centimetres and returns the length in feet+inches.
  • 48. Loops
  • 49. “for” Loop > possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue') ! > possible_colours [1] "blue" "cyan" "sky-blue" "navy blue" [5] "steel blue" "royal blue" "slate blue" "light blue" [9] "dark blue" "prussian blue" "indigo" "baby blue" [13] "electric blue" ! > for (colour in possible_colours) { + print(paste("The sky is oh so, so", colour)) + } ! [1] "The sky is so, oh so blue" [1] "The sky is so, oh so cyan" [1] "The sky is so, oh so sky-blue" [1] "The sky is so, oh so navy blue" [1] "The sky is so, oh so steel blue" [1] "The sky is so, oh so royal blue" [1] "The sky is so, oh so slate blue" [1] "The sky is so, oh so light blue" [1] "The sky is so, oh so dark blue" [1] "The sky is so, oh so prussian blue" [1] "The sky is so, oh so indigo" [1] "The sky is so, oh so baby blue" [1] "The sky is so, oh so electric blue"
  • 50. Your turn •What does the following code do (decompose on pen and paper)
  • 51. Your turn • Create a loop that multiplies the numbers from ‘x’ to ‘y’