SlideShare a Scribd company logo
Regular expressions (regex):
Text search on steroids.
Regular expressions (regex):
Text search on steroids.
Regular expression Finds
David David
Dav(e|(id)) David, Dave
Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo
At{1,2}enborough
Attenborough,
Atenborough
Atte[nm]borough
Attenborough,
Attemborough
At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}
Atimbro,

attenbrough,
ateinborow
Easy counting, replacing all with “Sir David Attenborough”
Regex Special symbols
Regular expression Finds Example
[aeiou] any single vowel “e”
[aeiou]*
between 0 and infinity
vowels vowels, e.g.’
“eeooouuu"
[aeoiu]{1,3} between 1 and 3 vowels “oui”
a|i one of the 2 characters “"
((win)|(fail))
one of the two 

words in ()
fail
More Regex Special symbols
• Google “Regular expression cheat sheet”
• ?regexp
Synonymous with
[:digit:] [0-9]
[A-z] [A-z], ie [A-Za-z]
s whitespace
. any single character
.+ one to many of anything
b* between 0 and infinity letter ‘b’
[^abc] any character other than a, b or c.
( (
[:punct:]
any of these: ! " # $ % & ' ( ) * + , - . /
: ; < = > ? @ [  ] ^ _ ` { |
2015 10-7-9am regex-functions-loops.key
You want to scan a protein sequence database for a
particular binding site.Type a single regular expression that
will match the first two of the following peptide sequences,
but NOT the last one:
"HATSOMIKTIP"
"HAVSONYYIKTIP"
"HAVSQMIKTIP"
(rubular)
Variants of a microsatellite sequence are responsible for
differential expression of vasopressin receptor, and in turn for
differences in social behaviour in voles & others. Create a regular
expression that finds AGAGAGAGAGAGAGAG dinucleotide
microsatellite repeats with lengths of 5 to 500
Again
Make a regular expression
• matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not
“LMVSQMIKTIP”
• matching all variants of “ok” (e.g., “O.K.”,“Okay”…)
2015 10-7-9am regex-functions-loops.key
Ok… so how do we use this?
• ?grep
• ?gsub
Which species names include ‘y’?
Create a vector with only species names, but replace all ‘y’
with ‘Y!
ants <- read.table("https://goo.gl/3Ek1dL")
colnames(ants) <- c("genus", "species")
Remove all vowels
Replace all vowels with ‘o’
2015 10-7-9am regex-functions-loops.key
Functions
Functions
• R has many. e.g.: plot(), t.test()
• Making your own:
tree_age_estimate <- function(diameter, species) {
growth_rate <- growth_rates[ species ]
age_estimate <- diameter / growth_rate
return(age_estimate)
}
> tree_age_estimate(25, “White Oak”)
+ 66
> tree_age_estimate(60, “Carya ovata”)
+ 190
Make a function
• That converts fahrenheit to celsius
(subtract 32 then divide the result by 1.8)
Loops
“for”
Loop
> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue',
'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark
blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue')
> possible_colours
[1] "blue" "cyan" "sky-blue" "navy blue"
[5] "steel blue" "royal blue" "slate blue" "light blue"
[9] "dark blue" "prussian blue" "indigo" "baby blue"
[13] "electric blue"
> for (colour in possible_colours) {
+ print(paste("The sky is oh so, so", colour))
+ }
[1] "The sky is so, oh so blue"
[1] "The sky is so, oh so cyan"
[1] "The sky is so, oh so sky-blue"
[1] "The sky is so, oh so navy blue"
[1] "The sky is so, oh so steel blue"
[1] "The sky is so, oh so royal blue"
[1] "The sky is so, oh so slate blue"
[1] "The sky is so, oh so light blue"
[1] "The sky is so, oh so dark blue"
[1] "The sky is so, oh so prussian blue"
[1] "The sky is so, oh so indigo"
[1] "The sky is so, oh so baby blue"
What does this loop do?
for (index in 10:1) {
print(paste(index, "mins befo lunch"))
}
Again
• What does the following code do (decompose on pen and
paper)
for (letter in LETTERS) {
begins_with <- paste("^", letter, sep="")
matches <- grep(pattern = begins_with,
x = ants$genus)
print(paste(length(matches), "begin with", letter))
}
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"


> ants <- read.table("https://goo.gl/3Ek1dL")
> colnames(ants) <- c("genus", “species")


> head(ants)
genus species
1 Anergates atratulus
2 Camponotus sp.
3 Crematogaster scutellaris
4 Formica aquilonia
5 Formica cunicularia
6 Formica exsecta
What does this loop do?
2015 10-7-9am regex-functions-loops.key
Jasmin
Zohren
Kim
Warren
Bruno
Vieira
Rodrigo
Pracana
Leandro
Santiago
James
Wright
Jingyuan
Zhu
Hernani
Oliveira
Andrea
Hatlen
Programming in R
?
If/else
Logical Operators
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
going further

More Related Content

PDF
2015 11-17-programming inr.key
KEY
Cascading
PDF
Introduction to jRuby
PDF
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
PDF
Making Change Management Stick
PDF
轻量级文本工具集
DOC
Dainosur
2015 11-17-programming inr.key
Cascading
Introduction to jRuby
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
Making Change Management Stick
轻量级文本工具集
Dainosur

Similar to 2015 10-7-9am regex-functions-loops.key (20)

PDF
2014 11-12 sbsm032rstatsprogramming.key
PDF
Reg ex cheatsheet
PPT
Introduction to regular expressions
PDF
2014-9-24-SBC361-ResearchMethComm
PPTX
P3 2017 python_regexes
PDF
Eag 201110-hrugregexpresentation-111006104128-phpapp02
PPT
Introduction to Regular Expressions RootsTech 2013
PPT
Regex Intro
PDF
2015 9-30-sbc361-research methcomm
PDF
regex-presentation_ed_goodwin
PPTX
Regular Expressions
ZIP
Advanced Regular Expressions Redux
PDF
Introduction_to_Regular_Expressions_in_R
PDF
Regular expressions
PDF
An Introduction to Regular expressions
PDF
2013 10-16-sbc3610-research methcomm
PDF
Labs_20210809.pdf
PDF
Text Mining using Regular Expressions
PPT
Regular Expressions grep and egrep
PPTX
UNIT-4( pythonRegular Expressions) (3).pptx
2014 11-12 sbsm032rstatsprogramming.key
Reg ex cheatsheet
Introduction to regular expressions
2014-9-24-SBC361-ResearchMethComm
P3 2017 python_regexes
Eag 201110-hrugregexpresentation-111006104128-phpapp02
Introduction to Regular Expressions RootsTech 2013
Regex Intro
2015 9-30-sbc361-research methcomm
regex-presentation_ed_goodwin
Regular Expressions
Advanced Regular Expressions Redux
Introduction_to_Regular_Expressions_in_R
Regular expressions
An Introduction to Regular expressions
2013 10-16-sbc3610-research methcomm
Labs_20210809.pdf
Text Mining using Regular Expressions
Regular Expressions grep and egrep
UNIT-4( pythonRegular Expressions) (3).pptx
Ad

More from Yannick Wurm (20)

PDF
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
PDF
2018 08-reduce risks of genomics research
PDF
2017 11-15-reproducible research
PDF
2016 09-16-fairdom
PDF
2016 05-31-wurm-social-chromosome
PDF
2016 05-30-monday-assembly
PDF
2016 05-29-intro-sib-springschool-leuker bad
PDF
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
PDF
2015 11-10-bio-in-docker-oswitch
PDF
Week 5 genetic basis of evolution
PDF
Biol113 week4 evolution
PDF
Evolution week3
PDF
2015 10-7-11am-reproducible research
PDF
Evolution week2
PDF
2015 09-29-sbc322-methods.key
PDF
Sbc322 intro.key
PDF
2015 09-28 bio721 intro
PDF
Sustainable software institute Collaboration workshop
PDF
2014 10-15-Nextbug edinburgh
PDF
2014 12-09-oulu
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 08-reduce risks of genomics research
2017 11-15-reproducible research
2016 09-16-fairdom
2016 05-31-wurm-social-chromosome
2016 05-30-monday-assembly
2016 05-29-intro-sib-springschool-leuker bad
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 11-10-bio-in-docker-oswitch
Week 5 genetic basis of evolution
Biol113 week4 evolution
Evolution week3
2015 10-7-11am-reproducible research
Evolution week2
2015 09-29-sbc322-methods.key
Sbc322 intro.key
2015 09-28 bio721 intro
Sustainable software institute Collaboration workshop
2014 10-15-Nextbug edinburgh
2014 12-09-oulu
Ad

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
RMMM.pdf make it easy to upload and study
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Pharma ospi slides which help in ospi learning
PDF
Pre independence Education in Inndia.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
TR - Agricultural Crops Production NC III.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Supply Chain Operations Speaking Notes -ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
Renaissance Architecture: A Journey from Faith to Humanism
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Anesthesia in Laparoscopic Surgery in India
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
RMMM.pdf make it easy to upload and study
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Pharma ospi slides which help in ospi learning
Pre independence Education in Inndia.pdf
O7-L3 Supply Chain Operations - ICLT Program
TR - Agricultural Crops Production NC III.pdf

2015 10-7-9am regex-functions-loops.key

  • 1. Regular expressions (regex): Text search on steroids.
  • 2. Regular expressions (regex): Text search on steroids. Regular expression Finds David David Dav(e|(id)) David, Dave Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo At{1,2}enborough Attenborough, Atenborough Atte[nm]borough Attenborough, Attemborough At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1} Atimbro,
 attenbrough, ateinborow Easy counting, replacing all with “Sir David Attenborough”
  • 3. Regex Special symbols Regular expression Finds Example [aeiou] any single vowel “e” [aeiou]* between 0 and infinity vowels vowels, e.g.’ “eeooouuu" [aeoiu]{1,3} between 1 and 3 vowels “oui” a|i one of the 2 characters “" ((win)|(fail)) one of the two 
 words in () fail
  • 4. More Regex Special symbols • Google “Regular expression cheat sheet” • ?regexp Synonymous with [:digit:] [0-9] [A-z] [A-z], ie [A-Za-z] s whitespace . any single character .+ one to many of anything b* between 0 and infinity letter ‘b’ [^abc] any character other than a, b or c. ( ( [:punct:] any of these: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { |
  • 6. You want to scan a protein sequence database for a particular binding site.Type a single regular expression that will match the first two of the following peptide sequences, but NOT the last one: "HATSOMIKTIP" "HAVSONYYIKTIP" "HAVSQMIKTIP"
  • 8. Variants of a microsatellite sequence are responsible for differential expression of vasopressin receptor, and in turn for differences in social behaviour in voles & others. Create a regular expression that finds AGAGAGAGAGAGAGAG dinucleotide microsatellite repeats with lengths of 5 to 500
  • 9. Again Make a regular expression • matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP” • matching all variants of “ok” (e.g., “O.K.”,“Okay”…)
  • 11. Ok… so how do we use this? • ?grep • ?gsub
  • 12. Which species names include ‘y’? Create a vector with only species names, but replace all ‘y’ with ‘Y! ants <- read.table("https://goo.gl/3Ek1dL") colnames(ants) <- c("genus", "species") Remove all vowels Replace all vowels with ‘o’
  • 15. Functions • R has many. e.g.: plot(), t.test() • Making your own: tree_age_estimate <- function(diameter, species) { growth_rate <- growth_rates[ species ] age_estimate <- diameter / growth_rate return(age_estimate) } > tree_age_estimate(25, “White Oak”) + 66 > tree_age_estimate(60, “Carya ovata”) + 190
  • 16. Make a function • That converts fahrenheit to celsius (subtract 32 then divide the result by 1.8)
  • 17. Loops
  • 18. “for” Loop > possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue') > possible_colours [1] "blue" "cyan" "sky-blue" "navy blue" [5] "steel blue" "royal blue" "slate blue" "light blue" [9] "dark blue" "prussian blue" "indigo" "baby blue" [13] "electric blue" > for (colour in possible_colours) { + print(paste("The sky is oh so, so", colour)) + } [1] "The sky is so, oh so blue" [1] "The sky is so, oh so cyan" [1] "The sky is so, oh so sky-blue" [1] "The sky is so, oh so navy blue" [1] "The sky is so, oh so steel blue" [1] "The sky is so, oh so royal blue" [1] "The sky is so, oh so slate blue" [1] "The sky is so, oh so light blue" [1] "The sky is so, oh so dark blue" [1] "The sky is so, oh so prussian blue" [1] "The sky is so, oh so indigo" [1] "The sky is so, oh so baby blue"
  • 19. What does this loop do? for (index in 10:1) { print(paste(index, "mins befo lunch")) }
  • 20. Again • What does the following code do (decompose on pen and paper)
  • 21. for (letter in LETTERS) { begins_with <- paste("^", letter, sep="") matches <- grep(pattern = begins_with, x = ants$genus) print(paste(length(matches), "begin with", letter)) } > LETTERS [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" [20] "T" "U" "V" "W" "X" "Y" "Z" 
 > ants <- read.table("https://goo.gl/3Ek1dL") > colnames(ants) <- c("genus", “species") 
 > head(ants) genus species 1 Anergates atratulus 2 Camponotus sp. 3 Crematogaster scutellaris 4 Formica aquilonia 5 Formica cunicularia 6 Formica exsecta What does this loop do?