SlideShare a Scribd company logo
^[Rr]egular [Ee]xpressions$ Introduction
Vocabulary Regular expression / Regex / Regexp Regex is pronounced Reg (as in register) Ex (as in FedEx) Matching Regex matches a string means it matches  in  a string
Regular Expressions Composed of two types of characters Metacharacters / Special characters * ? ^ $ . [ ] Literal characters a  b  c  d
Egrep tool Allows you to use Regular Expressions to find words that match Available for Macs, PCs and Linux cat /usr/share/dict/words | egrep ‘…’ See  http://regex.info/egrep.html  if you don’t have it preinstalled
My first regex cat /usr/share/dict/words | egrep ‘cat’ Matches any words  with a ‘c’  followed by an ‘a’  followed by a ‘t’ bobcat cat catwalk scatter Simple regex, only uses Literal chars
Metacharacters:  ^ and $ ^ matches the beginning of a line $ matches the end of a line ^cat  (start of line followed by ‘c’ then ‘a’ then ‘t’) cat catwalk cat$ (‘c’ followed by ‘a’ then ‘t’ followed by EOL) bobcat cat ^cat$ (start of line followed by ‘c’ then ‘a’  then ‘t’ then EOL) cat
How to read regex Read each character one at a time ^bat Start of line followed by ‘b’ then ‘a’ then ‘t’ rat$ ‘ r’ then ‘a’ then ‘t’ followed by end of line ^dog$ Start of line followed by ‘d’ then ‘o’ then ‘g’ then EOL
More simple regex’s ^ Start of line ^$ Start of line followed by end of line $ End of line ^foot$ Start of line followed by ‘f’ then ‘o’ then ‘o’ then ‘t’ followed by EOL
Character Classes [ ] Matches one of the characters in the [ ] [ae] Matches ‘a’ or ‘e’ [aeiouy] Matches any vowel ^gr[ae]y$ Start of line followed by ‘g’ then ‘r’ then ‘a’ or ‘e’ then ‘y’ followed by end of line grey or gray
Character Classes cont. [Ss] Matches upper or lower case ‘S’ [123456] Matches any of the digits listed [Hh][123456] Matches H1, h2, h3, H4, etc
Special characters in [ ]’s - (dash) references a range [1-6] is the same as [123456] [a-f] is the same as [abcdef] Ranges can be mixed with literals [0-9a-fA-F_!.?] Any digit, upper or lower case ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, underscore, exclamation, period or question mark
Negated character class [^ ] ^ inside of [ ] means “not any of these” [^1-6] Any character other than 1, 2, 3, 4, 5, 6 [^a-fA-F] Any character other than A-F (upper or lower) The ^ must be the first character inside [ ] [^c] (Matches anything but ‘c’) [c^] (Matches a ‘c’ or ‘^’)
Translating regex practice List of words that have ‘q’ followed by a character other than ‘u’ q[^u] List of words with ‘f’ followed by an ‘i’ or ‘o’ followed by ‘r’ then ‘e’ f[io]re Line starts with ‘Qu’ or ‘qu’ followed by an ‘e’ followed by any letter between ‘p’ and ‘t’ ^[Qq]ue[p-t]
Metacharacter:  .  (dot) Matches any character c.t  ‘ c’ followed by any character followed by ‘t’ cat cot c8t Period inside of [ ]’s matches a period [a.c] Matches ‘a’, ‘.’ or ‘c’
Periods cont. 03.19.76 Matches ‘03’ followed by a char then ‘19’ then any char then ‘76’ 03-19-76 03/19/76 03.19.76 03 19 76 03 3 19 8 76
Alternatives:  | (pipe) Pipes allow you to specify alternatives grey|gray Matches on grey or gray Use parentheses to constrain alternatives gr(e|a)y Within [ ]’s, | is a normal character [a|b] Matches ‘a’ or ‘|’ or ‘b’
Pipes (cont.) Use parenthesis to constrain gre|ay matches ‘gre’ or ‘ay’ gr(e|a)y matches ‘gr’ followed by ‘e’ or ‘a’ then ‘y’
Regex practice Match “First Street” or “1st street” (First|1st) [Ss]treet (Fir|1)st [Ss]treet These are equivalent, which is better? Match “toothbrush” or “hairbrush” (tooth|hair)brush
^ or $ and alternation Be careful when using ^ or $ with alternation ^From|Subject|Date: Start of line followed by From OR Subject OR Date: ^(From|Subject|Date): Start of line followed by ‘From’ or ‘Subject’ or ‘Date’ followed by ‘:’ Safer to use ()’s to group your alternates
Case insensitive match Matches are case sensitive by default [Ff]rom will match From but not FRom Use egrep’s -i option to do a case insensitive match Most languages have a case insensitive match as well
Quantifiers: ? ? metacharacter means optional colou?r matches color or colour ‘ c’ then ‘o’ then ‘l’ then ‘o’ then optionally ‘u’ then ‘r’ Match July or Jul and fourth, 4th and 4 (July|Jul) (fourth|4th|4) July? (fourth|4th|4) July? (fourth|4(th)?)
Quantifiers: + and * + (plus)  One or more of the previous item * (star) Zero or more of the previous item b[0-9]*a ba b9999a b999999999999999a
Summary of Quantifiers Minimum Required Maximum to try Meaning ? none 1 zero or one occurrence * none no limit zero or more occurrences + 1 no limit one or more occurrences
Escaping metacharacters Use \ (backslash) to escape metacharacters \. matches ‘.’ . matches any character c.t matches cat c\.t does not match cat \(cat\) matches ‘(cat)’ not ‘cat’
More practice Match chat, cat, chart ch?ar?t c[h]?a[r]?t Start of line then M then one or more ‘a’ followed by ‘st’ and zero or more ‘b’ ^M[a]+st[b]* Lines ending with one or more ‘c’ followed by a ‘t’ then zero or one ‘e’ [c]+t[e]*$
More practice ^[Mm][^a-np-z]ney$ Start of line then ‘M’ or ‘m’ then any character not a-n and p-z then ‘ney’ followed by end of line Money, money, m3ney ^be.*(bob|ted)$ Start of line followed by ‘be’ followed by zero or more characters followed by ‘bob’ or ‘ted’ followed by end of line
More practice Match truck, firetruck but not dumptruck ^(fire)?truck$ $0.99, $599.95, $1000.45, $5000 \$[0-9]+(\.[0-9][0-9])?$ 404-555-1212, 404.555.1212, (404) 555-1212 ^[()0-9]+.[0-9]+.[0-9]+$

More Related Content

PPTX
Regular expression
PDF
Regular Expressions Cheat Sheet
PPT
Regular Expressions
PPTX
Regular expressions
PPTX
Introduction to Regular Expressions
PDF
An Introduction to Regular expressions
KEY
Regular Expressions 101
PDF
Regular expressions quick reference
Regular expression
Regular Expressions Cheat Sheet
Regular Expressions
Regular expressions
Introduction to Regular Expressions
An Introduction to Regular expressions
Regular Expressions 101
Regular expressions quick reference

Viewers also liked (7)

PDF
Regex startup
ODP
Introduction To Regex in Lasso 8.5
PDF
Python Programming - XI. String Manipulation and Regular Expressions
PPTX
Software development life cycle yazılım geliştirme yaşam döngüsü
PPT
The Power of Regular Expression: use in notepad++
PPTX
Regular Expression (Regex) Fundamentals
PDF
Regular Expressions Demystified
Regex startup
Introduction To Regex in Lasso 8.5
Python Programming - XI. String Manipulation and Regular Expressions
Software development life cycle yazılım geliştirme yaşam döngüsü
The Power of Regular Expression: use in notepad++
Regular Expression (Regex) Fundamentals
Regular Expressions Demystified
Ad

Similar to Regex Intro (20)

PPT
Php String And Regular Expressions
PDF
Lecture 10.pdf
PDF
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
PPT
Regex Basics
PPTX
Python Strings.pptx
PDF
Basta mastering regex power
PPTX
Module-2_Strings concepts in c programming
PPTX
Regular expressions
PPTX
Regular_Expressions.pptx
PDF
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
PPSX
Regular expressions in oracle
PPTX
Presentation more c_programmingcharacter_and_string_handling_
PDF
Python (regular expression)
PPTX
Regex lecture
PDF
1377874234 eeeeeeeeeeeeeeeor more file
PDF
PDF
PDF
Added to test pdf
PDF
Ganesh added
Php String And Regular Expressions
Lecture 10.pdf
python1uhaibueuhERADGAIUSAERUGHw9uSS.pdf
Regex Basics
Python Strings.pptx
Basta mastering regex power
Module-2_Strings concepts in c programming
Regular expressions
Regular_Expressions.pptx
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
Regular expressions in oracle
Presentation more c_programmingcharacter_and_string_handling_
Python (regular expression)
Regex lecture
1377874234 eeeeeeeeeeeeeeeor more file
Added to test pdf
Ganesh added
Ad

More from Jason Noble (17)

PPTX
Intro to TDD and BDD
PPTX
Davinci git brown_bag
PPTX
Rspec 101
PPTX
Dash of ajax
PPT
jQuery Intro
PPTX
Intro to Rails Give Camp Atlanta
PPTX
Google apps
PPTX
Smarter cart
PPTX
Cart creation-101217222728-phpapp01
PPTX
Catalog display
PPTX
Validation unit testing
PPT
Creating the application
PPT
Capistrano
PPT
Atlanta Pm Git 101
PPT
Git101
PPT
Git Atlrug
PPT
Git102
Intro to TDD and BDD
Davinci git brown_bag
Rspec 101
Dash of ajax
jQuery Intro
Intro to Rails Give Camp Atlanta
Google apps
Smarter cart
Cart creation-101217222728-phpapp01
Catalog display
Validation unit testing
Creating the application
Capistrano
Atlanta Pm Git 101
Git101
Git Atlrug
Git102

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Modernizing your data center with Dell and AMD
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Modernizing your data center with Dell and AMD
Teaching material agriculture food technology
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Regex Intro

  • 2. Vocabulary Regular expression / Regex / Regexp Regex is pronounced Reg (as in register) Ex (as in FedEx) Matching Regex matches a string means it matches in a string
  • 3. Regular Expressions Composed of two types of characters Metacharacters / Special characters * ? ^ $ . [ ] Literal characters a b c d
  • 4. Egrep tool Allows you to use Regular Expressions to find words that match Available for Macs, PCs and Linux cat /usr/share/dict/words | egrep ‘…’ See http://regex.info/egrep.html if you don’t have it preinstalled
  • 5. My first regex cat /usr/share/dict/words | egrep ‘cat’ Matches any words with a ‘c’ followed by an ‘a’ followed by a ‘t’ bobcat cat catwalk scatter Simple regex, only uses Literal chars
  • 6. Metacharacters: ^ and $ ^ matches the beginning of a line $ matches the end of a line ^cat (start of line followed by ‘c’ then ‘a’ then ‘t’) cat catwalk cat$ (‘c’ followed by ‘a’ then ‘t’ followed by EOL) bobcat cat ^cat$ (start of line followed by ‘c’ then ‘a’ then ‘t’ then EOL) cat
  • 7. How to read regex Read each character one at a time ^bat Start of line followed by ‘b’ then ‘a’ then ‘t’ rat$ ‘ r’ then ‘a’ then ‘t’ followed by end of line ^dog$ Start of line followed by ‘d’ then ‘o’ then ‘g’ then EOL
  • 8. More simple regex’s ^ Start of line ^$ Start of line followed by end of line $ End of line ^foot$ Start of line followed by ‘f’ then ‘o’ then ‘o’ then ‘t’ followed by EOL
  • 9. Character Classes [ ] Matches one of the characters in the [ ] [ae] Matches ‘a’ or ‘e’ [aeiouy] Matches any vowel ^gr[ae]y$ Start of line followed by ‘g’ then ‘r’ then ‘a’ or ‘e’ then ‘y’ followed by end of line grey or gray
  • 10. Character Classes cont. [Ss] Matches upper or lower case ‘S’ [123456] Matches any of the digits listed [Hh][123456] Matches H1, h2, h3, H4, etc
  • 11. Special characters in [ ]’s - (dash) references a range [1-6] is the same as [123456] [a-f] is the same as [abcdef] Ranges can be mixed with literals [0-9a-fA-F_!.?] Any digit, upper or lower case ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, underscore, exclamation, period or question mark
  • 12. Negated character class [^ ] ^ inside of [ ] means “not any of these” [^1-6] Any character other than 1, 2, 3, 4, 5, 6 [^a-fA-F] Any character other than A-F (upper or lower) The ^ must be the first character inside [ ] [^c] (Matches anything but ‘c’) [c^] (Matches a ‘c’ or ‘^’)
  • 13. Translating regex practice List of words that have ‘q’ followed by a character other than ‘u’ q[^u] List of words with ‘f’ followed by an ‘i’ or ‘o’ followed by ‘r’ then ‘e’ f[io]re Line starts with ‘Qu’ or ‘qu’ followed by an ‘e’ followed by any letter between ‘p’ and ‘t’ ^[Qq]ue[p-t]
  • 14. Metacharacter: . (dot) Matches any character c.t ‘ c’ followed by any character followed by ‘t’ cat cot c8t Period inside of [ ]’s matches a period [a.c] Matches ‘a’, ‘.’ or ‘c’
  • 15. Periods cont. 03.19.76 Matches ‘03’ followed by a char then ‘19’ then any char then ‘76’ 03-19-76 03/19/76 03.19.76 03 19 76 03 3 19 8 76
  • 16. Alternatives: | (pipe) Pipes allow you to specify alternatives grey|gray Matches on grey or gray Use parentheses to constrain alternatives gr(e|a)y Within [ ]’s, | is a normal character [a|b] Matches ‘a’ or ‘|’ or ‘b’
  • 17. Pipes (cont.) Use parenthesis to constrain gre|ay matches ‘gre’ or ‘ay’ gr(e|a)y matches ‘gr’ followed by ‘e’ or ‘a’ then ‘y’
  • 18. Regex practice Match “First Street” or “1st street” (First|1st) [Ss]treet (Fir|1)st [Ss]treet These are equivalent, which is better? Match “toothbrush” or “hairbrush” (tooth|hair)brush
  • 19. ^ or $ and alternation Be careful when using ^ or $ with alternation ^From|Subject|Date: Start of line followed by From OR Subject OR Date: ^(From|Subject|Date): Start of line followed by ‘From’ or ‘Subject’ or ‘Date’ followed by ‘:’ Safer to use ()’s to group your alternates
  • 20. Case insensitive match Matches are case sensitive by default [Ff]rom will match From but not FRom Use egrep’s -i option to do a case insensitive match Most languages have a case insensitive match as well
  • 21. Quantifiers: ? ? metacharacter means optional colou?r matches color or colour ‘ c’ then ‘o’ then ‘l’ then ‘o’ then optionally ‘u’ then ‘r’ Match July or Jul and fourth, 4th and 4 (July|Jul) (fourth|4th|4) July? (fourth|4th|4) July? (fourth|4(th)?)
  • 22. Quantifiers: + and * + (plus) One or more of the previous item * (star) Zero or more of the previous item b[0-9]*a ba b9999a b999999999999999a
  • 23. Summary of Quantifiers Minimum Required Maximum to try Meaning ? none 1 zero or one occurrence * none no limit zero or more occurrences + 1 no limit one or more occurrences
  • 24. Escaping metacharacters Use \ (backslash) to escape metacharacters \. matches ‘.’ . matches any character c.t matches cat c\.t does not match cat \(cat\) matches ‘(cat)’ not ‘cat’
  • 25. More practice Match chat, cat, chart ch?ar?t c[h]?a[r]?t Start of line then M then one or more ‘a’ followed by ‘st’ and zero or more ‘b’ ^M[a]+st[b]* Lines ending with one or more ‘c’ followed by a ‘t’ then zero or one ‘e’ [c]+t[e]*$
  • 26. More practice ^[Mm][^a-np-z]ney$ Start of line then ‘M’ or ‘m’ then any character not a-n and p-z then ‘ney’ followed by end of line Money, money, m3ney ^be.*(bob|ted)$ Start of line followed by ‘be’ followed by zero or more characters followed by ‘bob’ or ‘ted’ followed by end of line
  • 27. More practice Match truck, firetruck but not dumptruck ^(fire)?truck$ $0.99, $599.95, $1000.45, $5000 \$[0-9]+(\.[0-9][0-9])?$ 404-555-1212, 404.555.1212, (404) 555-1212 ^[()0-9]+.[0-9]+.[0-9]+$