SlideShare a Scribd company logo
PYTHON REGULAR EXPRESSIONS
John Zhang
Tuesday, December 11, 2012
Regular Expressions
• Regular expressions are a powerful string
manipulation tool
• All modern languages have similar library
packages for regular expressions
• Use regular expressions to:
– Search a string (search and match)
– Replace parts of a string (sub)
– Break stings into smaller pieces (split)
Regular Expression Python Syntax
• regular match:
Example: the regular expression “test” only
matches the string ‘test’
• [x] matches any one of a list of characters
Example: “*abc+” matches ‘a’,‘b’,or ‘c’
• [^x] matches any one character that is not
included in x
“*^abc+” matches any single character except
‘a’,’b’,or ‘c’
Regular Expressions Syntax
• “.” matches any single character
• Parentheses can be used for grouping by ()
Example: “(abc)+” matches ’abc’, ‘abcabc’,
‘abcabcabc’, etc.
• x|y matches x or y
Example: “this|that” matches ‘this’ and ‘that’,
but not ‘thisthat’.
Regular Expression Syntax
• x* matches zero or more x’s
“a*” matches ’’, ’a’, ’aa’, etc.
• x+ matches one or more x’s
“a+” matches ’a’,’aa’,’aaa’, etc.
• x? matches zero or one x’s
“a?” matches ’’ or ’a’ .
• x{m, n} matches i x‘s, where m<i< n
“a,2,3-” matches ’aa’ or ’aaa’
Regular Expression Syntax
• “d” matches any digit; “D” matches any non-digit
• “s” matches any whitespace character; “S”
matches any non-whitespace character
• “w” matches any alphanumeric character; “W”
matches any non-alphanumeric character
• “^” matches the beginning of the string; “$”
matches the end of the string
• “b” matches a word boundary; “B” matches
position that is not a word boundary
Search and Match
• The two basic functions are re.search and re.match
– Search looks for a pattern anywhere in a string
– Match looks for a match staring at the beginning
• Both return None if the pattern is not found (logical false)
and a “match object” if it is
pat = "a*b"
import re
matchObj = re.search(pat,"fooaaabcde")
if matchObj:
print “match successfully at %s” % matchObj.group(0)
Q: What’s a match object?
• A: an instance of the match class with the details of the match
result
pat = "a*b"
>>> r1 = re.search(pat,"fooaaabcde")
>>> r1.group() # group returns string matched
'aaab'
>>> r1.start() # index of the match start
3
>>> r1.end() # index of the match end
7
>>> r1.span() # tuple of (start, end)
(3, 7)
What got matched?
• Here’s a pattern to match simple email addresses
w+@(w+.)+(com|org|net|edu)
>>> pat1 = "w+@(w+.)+(com|org|net|edu)"
>>> r1 = re.match(pat1,“qzhang@pku.cn.edu")
>>> r1.group()
'qzhang@pku.cn.edu’

• We might want to extract the pattern parts, like the
email name and host
What got matched?
• We can put parentheses around groups we want to be
able to reference
>>> pat2 = "(w+)@((w+.)+(com|org|net|edu))"
>>> r2 = re.match(pat2,"qzhang@pku.cn.edu")
>>> r2.group(1)
‘qzhang'
>>> r2.group(2)
‘pku.cn.edu'
>>> r2.groups()
r2.groups()
(‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’)

• Note that the ‘groups’ are numbered in a preorder
traversal of the forest
What got matched?
• We can ‘label’ the groups as well…
>>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"qzhang@pku.cn.edu")
>>> r3.group('name')
‘qzhang'
>>> r3.group('host')
‘pku.cn.edu’

• And reference the matching parts by the labels
More re functions
• re.split() is like split but can use patterns
>>> re.split("W+", “This... is a test, short and sweet, of split().”)
*'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+

• re.sub substitutes one string for a pattern
>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')
'black socks and black shoes’

• re.findall() finds al matches
>>> re.findall("d+”,"12 dogs,11 cats, 1 egg")
*'12', '11', ’1’+
Compiling regular expressions
• If you plan to use a re pattern more than once,
compile it to a re object
• Python produces a special data structure that
speeds up matching
>>> capt3 = re.compile(pat3)
>>> cpat3
<_sre.SRE_Pattern object at 0x2d9c0>
>>> r3 = cpat3.search("qzhang@pku.cn.edu")
>>> r3
<_sre.SRE_Match object at 0x895a0>
>>> r3.group()
'qzhang@pku.cn.edu'
Pattern object methods
• There are methods defined for a pattern object that
parallel the regular expression functions, e.g.,
– match
– search
– split
– findall
– sub

More Related Content

PDF
Python (regular expression)
DOCX
Python - Regular Expressions
PDF
Python - Lecture 7
PPT
Adv. python regular expression by Rj
PDF
Python Programming - XI. String Manipulation and Regular Expressions
PDF
Python : Regular expressions
PPTX
Processing Regex Python
PPTX
Regular expressions in Python
Python (regular expression)
Python - Regular Expressions
Python - Lecture 7
Adv. python regular expression by Rj
Python Programming - XI. String Manipulation and Regular Expressions
Python : Regular expressions
Processing Regex Python
Regular expressions in Python

What's hot (20)

PPTX
Java: Regular Expression
PPT
Regular Expressions
PPT
16 Java Regex
PPTX
Regular expressions
PPTX
Python- Regular expression
PPTX
Regular Expression
PPT
Regular Expression
PPT
Php String And Regular Expressions
PDF
Strings in Python
PPTX
Regular expression
ODP
Regular Expression
ODP
Regex Presentation
PPTX
Regular Expressions in Java
PPTX
Finaal application on regular expression
PPT
Textpad and Regular Expressions
PPTX
Regular Expressions 101 Introduction to Regular Expressions
PDF
Strings in python
PDF
Python strings
PPTX
Bioinformatics p2-p3-perl-regexes v2014
PPTX
Regular expressions
Java: Regular Expression
Regular Expressions
16 Java Regex
Regular expressions
Python- Regular expression
Regular Expression
Regular Expression
Php String And Regular Expressions
Strings in Python
Regular expression
Regular Expression
Regex Presentation
Regular Expressions in Java
Finaal application on regular expression
Textpad and Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
Strings in python
Python strings
Bioinformatics p2-p3-perl-regexes v2014
Regular expressions
Ad

Similar to Python advanced 2. regular expression in python (20)

PPTX
P3 2018 python_regexes
PPTX
P3 2017 python_regexes
PPTX
Regular Expressions
PDF
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
PPT
Regular Expressions
PPT
Regex Basics
PPTX
unit-4 regular expression.pptx
PPT
Introduction to Regular Expressions
PDF
Learning notes of r for python programmer (Temp1)
PPT
Class 5 - PHP Strings
KEY
1 the ruby way
PDF
Perl 6 in Context
PDF
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
KEY
Using Regular Expressions and Staying Sane
PDF
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
PDF
Slides chapter3part1 ruby-forjavaprogrammers
PPTX
Switching from java to groovy
PDF
4.1 PHP Arrays
PPTX
07. Java Array, Set and Maps
P3 2018 python_regexes
P3 2017 python_regexes
Regular Expressions
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
Regular Expressions
Regex Basics
unit-4 regular expression.pptx
Introduction to Regular Expressions
Learning notes of r for python programmer (Temp1)
Class 5 - PHP Strings
1 the ruby way
Perl 6 in Context
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Using Regular Expressions and Staying Sane
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
Slides chapter3part1 ruby-forjavaprogrammers
Switching from java to groovy
4.1 PHP Arrays
07. Java Array, Set and Maps
Ad

More from John(Qiang) Zhang (11)

PPTX
Git and github introduction
PPT
Python testing
PPT
Profiling in python
PPT
Introduction to jython
PPT
Introduction to cython
PPT
A useful tools in windows py2exe(optional)
PPT
Python advanced 3.the python std lib by example –data structures
PPT
Python advanced 3.the python std lib by example – system related modules
PPT
Python advanced 3.the python std lib by example – application building blocks
PPT
Python advanced 1.handle error, generator, decorator and decriptor
PPT
Python advanced 3.the python std lib by example – algorithm
Git and github introduction
Python testing
Profiling in python
Introduction to jython
Introduction to cython
A useful tools in windows py2exe(optional)
Python advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example – system related modules
Python advanced 3.the python std lib by example – application building blocks
Python advanced 1.handle error, generator, decorator and decriptor
Python advanced 3.the python std lib by example – algorithm

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars

Python advanced 2. regular expression in python

  • 1. PYTHON REGULAR EXPRESSIONS John Zhang Tuesday, December 11, 2012
  • 2. Regular Expressions • Regular expressions are a powerful string manipulation tool • All modern languages have similar library packages for regular expressions • Use regular expressions to: – Search a string (search and match) – Replace parts of a string (sub) – Break stings into smaller pieces (split)
  • 3. Regular Expression Python Syntax • regular match: Example: the regular expression “test” only matches the string ‘test’ • [x] matches any one of a list of characters Example: “*abc+” matches ‘a’,‘b’,or ‘c’ • [^x] matches any one character that is not included in x “*^abc+” matches any single character except ‘a’,’b’,or ‘c’
  • 4. Regular Expressions Syntax • “.” matches any single character • Parentheses can be used for grouping by () Example: “(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc. • x|y matches x or y Example: “this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.
  • 5. Regular Expression Syntax • x* matches zero or more x’s “a*” matches ’’, ’a’, ’aa’, etc. • x+ matches one or more x’s “a+” matches ’a’,’aa’,’aaa’, etc. • x? matches zero or one x’s “a?” matches ’’ or ’a’ . • x{m, n} matches i x‘s, where m<i< n “a,2,3-” matches ’aa’ or ’aaa’
  • 6. Regular Expression Syntax • “d” matches any digit; “D” matches any non-digit • “s” matches any whitespace character; “S” matches any non-whitespace character • “w” matches any alphanumeric character; “W” matches any non-alphanumeric character • “^” matches the beginning of the string; “$” matches the end of the string • “b” matches a word boundary; “B” matches position that is not a word boundary
  • 7. Search and Match • The two basic functions are re.search and re.match – Search looks for a pattern anywhere in a string – Match looks for a match staring at the beginning • Both return None if the pattern is not found (logical false) and a “match object” if it is pat = "a*b" import re matchObj = re.search(pat,"fooaaabcde") if matchObj: print “match successfully at %s” % matchObj.group(0)
  • 8. Q: What’s a match object? • A: an instance of the match class with the details of the match result pat = "a*b" >>> r1 = re.search(pat,"fooaaabcde") >>> r1.group() # group returns string matched 'aaab' >>> r1.start() # index of the match start 3 >>> r1.end() # index of the match end 7 >>> r1.span() # tuple of (start, end) (3, 7)
  • 9. What got matched? • Here’s a pattern to match simple email addresses w+@(w+.)+(com|org|net|edu) >>> pat1 = "w+@(w+.)+(com|org|net|edu)" >>> r1 = re.match(pat1,“qzhang@pku.cn.edu") >>> r1.group() 'qzhang@pku.cn.edu’ • We might want to extract the pattern parts, like the email name and host
  • 10. What got matched? • We can put parentheses around groups we want to be able to reference >>> pat2 = "(w+)@((w+.)+(com|org|net|edu))" >>> r2 = re.match(pat2,"qzhang@pku.cn.edu") >>> r2.group(1) ‘qzhang' >>> r2.group(2) ‘pku.cn.edu' >>> r2.groups() r2.groups() (‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’) • Note that the ‘groups’ are numbered in a preorder traversal of the forest
  • 11. What got matched? • We can ‘label’ the groups as well… >>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))" >>> r3 = re.match(pat3,"qzhang@pku.cn.edu") >>> r3.group('name') ‘qzhang' >>> r3.group('host') ‘pku.cn.edu’ • And reference the matching parts by the labels
  • 12. More re functions • re.split() is like split but can use patterns >>> re.split("W+", “This... is a test, short and sweet, of split().”) *'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+ • re.sub substitutes one string for a pattern >>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’ • re.findall() finds al matches >>> re.findall("d+”,"12 dogs,11 cats, 1 egg") *'12', '11', ’1’+
  • 13. Compiling regular expressions • If you plan to use a re pattern more than once, compile it to a re object • Python produces a special data structure that speeds up matching >>> capt3 = re.compile(pat3) >>> cpat3 <_sre.SRE_Pattern object at 0x2d9c0> >>> r3 = cpat3.search("qzhang@pku.cn.edu") >>> r3 <_sre.SRE_Match object at 0x895a0> >>> r3.group() 'qzhang@pku.cn.edu'
  • 14. Pattern object methods • There are methods defined for a pattern object that parallel the regular expression functions, e.g., – match – search – split – findall – sub