SlideShare a Scribd company logo
12
Most read
13
Most read
14
Most read
Regular Expressions
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. For example,
^a...s$
The above code defines a RegEx pattern. The pattern is: any five letter string starting with a and
ending with s.
A pattern defined using RegEx can be used to match against a string.
Expression String Matched?
^a...s$
abs No match
alias Match
abyss Match
Alias No match
An abacus No match
Specify Pattern Using RegEx
To specify regular expressions, metacharacters are used. In the above example, ^ and $ are
metacharacters.
MetaCharacters
Metacharacters are characters that are interpreted in a special way by a RegEx engine. Here's a list of
metacharacters:
[] . ^ $ * + ? {} ()  |
[] - Square brackets
Square brackets specify a set of characters you wish to match.
Expression String Matched?
[abc]
a 1 match
ac 2 matches
Hey Jude No match
abc de ca 5 matches
Here, [abc] will match if the string you are trying to match contains any of the a, b or c.
You can also specify a range of characters using - inside square brackets.
• [a-e] is the same as [abcde].
• [1-4] is the same as [1234].
• [0-39] is the same as [01239].
You can complement (invert) the character set by using caret ^ symbol at the start of a square-
bracket.
• [^abc] means any character except a or b or c.
• [^0-9] means any non-digit character.
. - Period
A period matches any single character (except newline 'n').
Expression String Matched?
..
a No match
ac 1 match
acd 1 match
acde 2 matches (contains 4 characters)
^ - Caret
The caret symbol ^ is used to check if a string starts with a certain character.
Expression String Matched?
^a
a 1 match
abc 1 match
bac No match
^ab
abc 1 match
acb No match (starts with a but not followed by b)
$ - Dollar
The dollar symbol $ is used to check if a string ends with a certain character.
Expression String Matched?
a$
a 1 match
formula 1 match
cab No match
* - Star
The star symbol * matches zero or more occurrences of the pattern left to it.
Expression String Matched?
ma*n
mn 1 match
man 1 match
maaan 1 match
main No match (a is not followed by n)
Expression String Matched?
woman 1 match
+ - Plus
The plus symbol + matches one or more occurrences of the pattern left to it.
Expression String Matched?
ma+n
mn No match (no a character)
man 1 match
maaan 1 match
main No match (a is not followed by n)
woman 1 match
? - Question Mark
The question mark symbol ? matches zero or one occurrence of the pattern left to it.
Expression String Matched?
ma?n
mn 1 match
man 1 match
maaan No match (more than one a character)
main No match (a is not followed by n)
woman 1 match
{} - Braces
Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.
Expression String Matched?
a{2,3}
abc dat No match
abc daat 1 match (at daat)
aabc daaat 2 matches (at aabc and daaat)
aabc daaaat 2 matches (at aabc and daaaat)
Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2 digits but not more than 4 digits
Expression String Matched?
[0-9]{2,4}
ab123csde 1 match (match at ab123csde)
12 and 345673 3 matches (12, 3456, 73)
1 and 2 No match
| - Alternation
Vertical bar | is used for alternation (or operator).
Expression String Matched?
a|b
cde No match
ade 1 match (match at ade)
acdbea 3 matches (at acdbea)
Here, a|b match any string that contains either a or b
() - Group
Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that matches
either a or b or c followed by xz
Expression String Matched?
(a|b|c)xz
ab xz No match
abxz 1 match (match at abxz)
axz cabxz 2 matches (at axzbc cabxz)
 - Backslash
Backlash  is used to escape various characters including all metacharacters. For example,
$a match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special
way.
If you are unsure if a character has special meaning or not, you can put  in front of it. This makes sure
the character is not treated in a special way.
Special Sequences
Special sequences make commonly used patterns easier to write. Here's a list of special sequences:
A - Matches if the specified characters are at the start of a string.
Expression String Matched?
Athe
the sun Match
In the sun No match
b - Matches if the specified characters are at the beginning or end of a word.
Expression String Matched?
bfoo
football Match
a football Match
afootball No match
foob
the foo Match
the afoo test Match
the afootest No match
B - Opposite of b. Matches if the specified characters are not at the beginning or end of a word.
Expression String Matched?
Bfoo
football No match
a football No match
afootball Match
fooB
the foo No match
the afoo test No match
the afootest Match
d - Matches any decimal digit. Equivalent to [0-9]
Expression String Matched?
d
12abc3 3 matches (at 12abc3)
Python No match
D - Matches any non-decimal digit. Equivalent to [^0-9]
Expression String Matched?
D
1ab34"50 3 matches (at 1ab34"50)
1345 No match
s - Matches where a string contains any whitespace character. Equivalent to [ tnrfv].
Expression String Matched?
s
Python RegEx 1 match
PythonRegEx No match
S - Matches where a string contains any non-whitespace character. Equivalent to [^ tnrfv].
Expression String Matched?
S
a b 2 matches (at a b)
No match
w - Matches any alphanumeric character (digits and alphabets). Equivalent to [a-zA-Z0-9_]. By the
way, underscore _ is also considered an alphanumeric character.
Expression String Matched?
w
12&": ;c 3 matches (at 12&": ;c)
%"> ! No match
W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_]
Expression String Matched?
W
1a2%c 1 match (at 1a2%c)
Python No match
Z - Matches if the specified characters are at the end of a string.
Expression String Matched?
PythonZ
I like Python 1 match
I like Python Programming No match
Python is fun. No match
Now we understood the basics of RegEx, let's discuss how to use RegEx in your Python code.
Python RegEx
Python has a module named re to work with regular expressions. To use it, we need to import the
module.
import re
The module defines several functions and constants to work with RegEx.
re.search()
The re.search() method takes two arguments: a pattern and a string. The method looks for the first
location where the RegEx pattern produces a match with the string.
If the search is successful, re.search() returns a match object; if not, it returns None.
Syntax of the function:
s = re.search(pattern, str)
Write a python program to perform the searching process or pattern matching using search()
function.
import re
string = "Python is fun"
s = re.search('Python', string)
if s:
print("pattern found inside the string")
else:
print("pattern not found")
Here, s contains a match object.
s.start(), s.end() and s.span()
The start() function returns the index of the start of the matched substring. Similarly, end() returns
the end index of the matched substring. The span() function returns a tuple containing start and end
index of the matched part.
>>> s.start()
0
>>> s.end()
6
>>> s.span()
(0, 6)
>>> s.group()
‘Python’
re.match()
The re.match() method takes two arguments: a pattern and a string. If the pattern is found at the
start of the string, then the method returns a match object. If not, it returns None.
Write a python program to perform the searching process or pattern matching using match()
function.
import re
pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
Here, we used re.match() function to search pattern within the test_string.
re.sub()
The syntax of re.sub() is:
re.sub(pattern, replace, string)
The method returns a string where matched occurrences are replaced with the content of replace
variable.
If the pattern is not found, re.sub() returns the original string.
You can pass count as a fourth parameter to the re.sub() method. If omited, it results to 0. This will
replace all occurrences.
Example1:
re.sub('^a','b','aaa')
Output:
'baa'
Example2:
s=re.sub('a','b','aaa')
print(s)
Output:
‘bbb’
Example3:
s=re.sub('a','b','aaa',2)
print(s)
Output:
‘bba’
re.subn()
The re.subn() is similar to re.sub() expect it returns a tuple of 2 items containing the new string and
the number of substitutions made.
Example1:
s=re.subn('a','b','aaa')
print(s)
Output:
(‘bbb’, 3)
re.findall()
The re.findall() method returns a list of strings containing all matches.
If the pattern is not found, re.findall() returns an empty list.
Syntax:
re.findall(pattern, string)
Example1:
s=re.findall('a','abab')
print(s)
Output:
['a', 'a']
re.split()
The re.split method splits the string where there is a match and returns a list of strings where the
splits have occurred.
If the pattern is not found, re.split() returns a list containing the original string.
You can pass maxsplit argument to the re.split() method. It's the maximum number of splits
that will occur.
By the way, the default value of maxsplit is 0; meaning all possible splits.
Syntax:
re.split(pattern, string)
Example1:
s=re.split('a','abab')
print(s)
Output:
['', 'b', 'b']
Example2:
s=re.split('a','aababa',3)
print(s)
Output:
['', '', 'b', 'ba']
CASE STUDY
Street Addresses: In this case study, we will take one street address as input and try to perform some
operations on the input by making use of library functions.
Example:
str1='100 NORTH MAIN ROAD'
str1.replace('ROAD','RD')
Output:
'100 NORTH MAIN RD'
str1.replace('NORTH','NRTH')
Output:
'100 NRTH MAIN ROAD'
re.sub('ROAD','RD',str1)
Output:
'100 NORTH MAIN RD'
re.sub('NORTH','NRTH',str1)
Output:
'100 NRTH MAIN ROAD'
re.split('A',str1)
Output:
['100 NORTH M', 'IN RO', 'D']
re.findall('O',str1)
Output:
['O', 'O']
re.sub('^1','2',str1)
Output:
'200 NORTH MAIN ROAD'
Roman Numerals
I = 1
V = 5
X = 10
L = 50
C = 100
D = 500
M = 1000
For writing 4, we will write the roman number representation as IV. For 9, we will write as IX. For
40, we can write as XL. For 90, we can write as XC. For 900, we can write as CM.
Let us write the roman number representation for few numbers.
Ex1:
1940
MCMXL
Ex2:
1946
MCMXLVI
Ex3:
1940
MCMXL
Ex4:
1888
MDCCCLXXXVIII
Checking for thousands:
1000=M
2000=MM
3000=MMM
Possible pattern is to have M in it.
Example:
pattern = '^M?M?M?$'
re.search(pattern, 'M')
Output:
<re.Match object; span=(0, 1), match='M'>
re.search(pattern, 'MM')
Output:
<re.Match object; span=(0, 2), match='MM'>
re.search(pattern, 'MMM')
Output:
<re.Match object; span=(0, 3), match='MMM'>
re.search(pattern, 'ML')
re.search(pattern, 'MX')
re.search(pattern, 'MI')
re.search(pattern, 'MMMM')
Checking for Hundreds:
100=C
200=CC
300=CCC
400=CD
500=D
600=DC
700=DCC
800=DCCC
900=CM
Example:
pattern = '^M?M?M?(CM|CD|D?C?C?C?)$'
re.search(pattern,'MCM')
Output:
<re.Match object; span=(0, 3), match='MCM'>
re.search(pattern,’MD’)
Output:
<re.Match object; span=(0, 2), match='MD'>
re.search(pattern,'MMMCCC')
Output:
<re.Match object; span=(0, 6), match='MMMCCC'>
re.search(pattern,'MCMLXX')
Using the {n,m} syntax
We will check in the string, where in the pattern occurs at least minimum ‘n’ times and at most
maximum ‘m’ times.
Example:
pattern='^M{0,3}$'
re.search(pattern,'MM')
Output:
<re.Match object; span=(0, 2), match='MM'>
re.search(pattern,'M')
Output:
<re.Match object; span=(0, 1), match='M'>
re.search(pattern,'MMM')
Output:
<re.Match object; span=(0, 3), match='MMM'>
Checking for Tens and Ones:
1=I
2=II
3=III
4=IV
5=V
6=VI
7=VII
8=VIII
9=IX
10=X
20=XX
30=XXX
40=XL
50=L
60=LX
70=LXX
80=LXXX
90=XC
Example:
pattern='^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$'
re.search(pattern,'MDLVI')
Output:
<re.Match object; span=(0, 5), match='MDLVI'>
re.search(pattern,'MCMXLVI')
Output:
<re.Match object; span=(0, 7), match='MCMXLVI'>
re.search(pattern,'MMMCCCXLV')
Output:
<re.Match object; span=(0, 9), match='MMMCCCXLV'>

More Related Content

PPT
Manipulation of Strings
PPT
Linear search in ds
PPT
Searching algorithms
PPTX
TRIES_data_structure
PPTX
ARRAY
PPT
Regular Expressions
PPTX
Regular Expressions 101 Introduction to Regular Expressions
DOCX
What is AES? Advanced Encryption Standards
Manipulation of Strings
Linear search in ds
Searching algorithms
TRIES_data_structure
ARRAY
Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
What is AES? Advanced Encryption Standards

What's hot (20)

PPTX
Data Structures - Lecture 9 [Stack & Queue using Linked List]
PPTX
Reduction & Handle Pruning
PPTX
Structure of the compiler
PDF
Code generation in Compiler Design
PPTX
Solving recurrences
PPT
Chapter 5 module 3
DOC
Arrays and Strings
PPTX
Rabin Carp String Matching algorithm
PDF
Unit 4 data link layer
PPTX
Linear Search
PDF
Regular expression
PDF
Elliptic Curve Cryptography: Arithmetic behind
PPT
Recurrences
PPTX
Multi-Dimensional Lists
PDF
K-means clustering exercise based on eucalidean distance
DOCX
Bubble sorting lab manual
PPT
Data Structures - Searching & sorting
PPTX
Naive string matching
PPT
Internet security association and key management protocol (isakmp)
 
Data Structures - Lecture 9 [Stack & Queue using Linked List]
Reduction & Handle Pruning
Structure of the compiler
Code generation in Compiler Design
Solving recurrences
Chapter 5 module 3
Arrays and Strings
Rabin Carp String Matching algorithm
Unit 4 data link layer
Linear Search
Regular expression
Elliptic Curve Cryptography: Arithmetic behind
Recurrences
Multi-Dimensional Lists
K-means clustering exercise based on eucalidean distance
Bubble sorting lab manual
Data Structures - Searching & sorting
Naive string matching
Internet security association and key management protocol (isakmp)
 
Ad

Similar to Python (regular expression) (20)

PDF
regular-expression.pdf
PPT
Regular expressions
PDF
Python - Lecture 7
PPT
PPT
16 Java Regex
PPTX
regex.pptx
PPT
Adv. python regular expression by Rj
PPTX
unit-4 regular expression.pptx
PPT
Expresiones regulares, sintaxis y programación en JAVA
PPTX
Regular_Expressions.pptx
PPT
Php String And Regular Expressions
DOCX
Python - Regular Expressions
PPTX
Regular expressions
PDF
Python : Regular expressions
PPSX
Regular expressions in oracle
PPTX
SQL for pattern matching (Oracle 12c)
PDF
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
PDF
Module 3 - Regular Expressions, Dictionaries.pdf
PPT
Regex Basics
PPTX
Regular expressions in Python
regular-expression.pdf
Regular expressions
Python - Lecture 7
16 Java Regex
regex.pptx
Adv. python regular expression by Rj
unit-4 regular expression.pptx
Expresiones regulares, sintaxis y programación en JAVA
Regular_Expressions.pptx
Php String And Regular Expressions
Python - Regular Expressions
Regular expressions
Python : Regular expressions
Regular expressions in oracle
SQL for pattern matching (Oracle 12c)
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
Module 3 - Regular Expressions, Dictionaries.pdf
Regex Basics
Regular expressions in Python
Ad

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
DOCX
573137875-Attendance-Management-System-original
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PPT on Performance Review to get promotions
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Welding lecture in detail for understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Project quality management in manufacturing
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Structs to JSON How Go Powers REST APIs.pdf
573137875-Attendance-Management-System-original
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT on Performance Review to get promotions
Digital Logic Computer Design lecture notes
UNIT 4 Total Quality Management .pptx
Welding lecture in detail for understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Project quality management in manufacturing
Arduino robotics embedded978-1-4302-3184-4.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Internet of Things (IOT) - A guide to understanding
Model Code of Practice - Construction Work - 21102022 .pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................

Python (regular expression)

  • 1. Regular Expressions A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. For example, ^a...s$ The above code defines a RegEx pattern. The pattern is: any five letter string starting with a and ending with s. A pattern defined using RegEx can be used to match against a string. Expression String Matched? ^a...s$ abs No match alias Match abyss Match Alias No match An abacus No match Specify Pattern Using RegEx To specify regular expressions, metacharacters are used. In the above example, ^ and $ are metacharacters. MetaCharacters Metacharacters are characters that are interpreted in a special way by a RegEx engine. Here's a list of metacharacters:
  • 2. [] . ^ $ * + ? {} () | [] - Square brackets Square brackets specify a set of characters you wish to match. Expression String Matched? [abc] a 1 match ac 2 matches Hey Jude No match abc de ca 5 matches Here, [abc] will match if the string you are trying to match contains any of the a, b or c. You can also specify a range of characters using - inside square brackets. • [a-e] is the same as [abcde]. • [1-4] is the same as [1234]. • [0-39] is the same as [01239]. You can complement (invert) the character set by using caret ^ symbol at the start of a square- bracket. • [^abc] means any character except a or b or c. • [^0-9] means any non-digit character. . - Period A period matches any single character (except newline 'n').
  • 3. Expression String Matched? .. a No match ac 1 match acd 1 match acde 2 matches (contains 4 characters) ^ - Caret The caret symbol ^ is used to check if a string starts with a certain character. Expression String Matched? ^a a 1 match abc 1 match bac No match ^ab abc 1 match acb No match (starts with a but not followed by b)
  • 4. $ - Dollar The dollar symbol $ is used to check if a string ends with a certain character. Expression String Matched? a$ a 1 match formula 1 match cab No match * - Star The star symbol * matches zero or more occurrences of the pattern left to it. Expression String Matched? ma*n mn 1 match man 1 match maaan 1 match main No match (a is not followed by n)
  • 5. Expression String Matched? woman 1 match + - Plus The plus symbol + matches one or more occurrences of the pattern left to it. Expression String Matched? ma+n mn No match (no a character) man 1 match maaan 1 match main No match (a is not followed by n) woman 1 match ? - Question Mark The question mark symbol ? matches zero or one occurrence of the pattern left to it.
  • 6. Expression String Matched? ma?n mn 1 match man 1 match maaan No match (more than one a character) main No match (a is not followed by n) woman 1 match {} - Braces Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it. Expression String Matched? a{2,3} abc dat No match abc daat 1 match (at daat) aabc daaat 2 matches (at aabc and daaat) aabc daaaat 2 matches (at aabc and daaaat)
  • 7. Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2 digits but not more than 4 digits Expression String Matched? [0-9]{2,4} ab123csde 1 match (match at ab123csde) 12 and 345673 3 matches (12, 3456, 73) 1 and 2 No match | - Alternation Vertical bar | is used for alternation (or operator). Expression String Matched? a|b cde No match ade 1 match (match at ade) acdbea 3 matches (at acdbea) Here, a|b match any string that contains either a or b () - Group
  • 8. Parentheses () is used to group sub-patterns. For example, (a|b|c)xz match any string that matches either a or b or c followed by xz Expression String Matched? (a|b|c)xz ab xz No match abxz 1 match (match at abxz) axz cabxz 2 matches (at axzbc cabxz) - Backslash Backlash is used to escape various characters including all metacharacters. For example, $a match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special way. If you are unsure if a character has special meaning or not, you can put in front of it. This makes sure the character is not treated in a special way. Special Sequences Special sequences make commonly used patterns easier to write. Here's a list of special sequences: A - Matches if the specified characters are at the start of a string.
  • 9. Expression String Matched? Athe the sun Match In the sun No match b - Matches if the specified characters are at the beginning or end of a word. Expression String Matched? bfoo football Match a football Match afootball No match foob the foo Match the afoo test Match the afootest No match B - Opposite of b. Matches if the specified characters are not at the beginning or end of a word.
  • 10. Expression String Matched? Bfoo football No match a football No match afootball Match fooB the foo No match the afoo test No match the afootest Match d - Matches any decimal digit. Equivalent to [0-9] Expression String Matched? d 12abc3 3 matches (at 12abc3) Python No match D - Matches any non-decimal digit. Equivalent to [^0-9]
  • 11. Expression String Matched? D 1ab34"50 3 matches (at 1ab34"50) 1345 No match s - Matches where a string contains any whitespace character. Equivalent to [ tnrfv]. Expression String Matched? s Python RegEx 1 match PythonRegEx No match S - Matches where a string contains any non-whitespace character. Equivalent to [^ tnrfv]. Expression String Matched? S a b 2 matches (at a b) No match w - Matches any alphanumeric character (digits and alphabets). Equivalent to [a-zA-Z0-9_]. By the way, underscore _ is also considered an alphanumeric character.
  • 12. Expression String Matched? w 12&": ;c 3 matches (at 12&": ;c) %"> ! No match W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_] Expression String Matched? W 1a2%c 1 match (at 1a2%c) Python No match Z - Matches if the specified characters are at the end of a string. Expression String Matched? PythonZ I like Python 1 match I like Python Programming No match Python is fun. No match Now we understood the basics of RegEx, let's discuss how to use RegEx in your Python code.
  • 13. Python RegEx Python has a module named re to work with regular expressions. To use it, we need to import the module. import re The module defines several functions and constants to work with RegEx. re.search() The re.search() method takes two arguments: a pattern and a string. The method looks for the first location where the RegEx pattern produces a match with the string. If the search is successful, re.search() returns a match object; if not, it returns None. Syntax of the function: s = re.search(pattern, str) Write a python program to perform the searching process or pattern matching using search() function. import re string = "Python is fun" s = re.search('Python', string) if s: print("pattern found inside the string") else: print("pattern not found")
  • 14. Here, s contains a match object. s.start(), s.end() and s.span() The start() function returns the index of the start of the matched substring. Similarly, end() returns the end index of the matched substring. The span() function returns a tuple containing start and end index of the matched part. >>> s.start() 0 >>> s.end() 6 >>> s.span() (0, 6) >>> s.group() ‘Python’ re.match() The re.match() method takes two arguments: a pattern and a string. If the pattern is found at the start of the string, then the method returns a match object. If not, it returns None. Write a python program to perform the searching process or pattern matching using match() function. import re pattern = '^a...s$' test_string = 'abyss' result = re.match(pattern, test_string) if result: print("Search successful.") else: print("Search unsuccessful.") Here, we used re.match() function to search pattern within the test_string.
  • 15. re.sub() The syntax of re.sub() is: re.sub(pattern, replace, string) The method returns a string where matched occurrences are replaced with the content of replace variable. If the pattern is not found, re.sub() returns the original string. You can pass count as a fourth parameter to the re.sub() method. If omited, it results to 0. This will replace all occurrences. Example1: re.sub('^a','b','aaa') Output: 'baa' Example2: s=re.sub('a','b','aaa') print(s) Output: ‘bbb’ Example3: s=re.sub('a','b','aaa',2) print(s) Output: ‘bba’ re.subn() The re.subn() is similar to re.sub() expect it returns a tuple of 2 items containing the new string and the number of substitutions made. Example1: s=re.subn('a','b','aaa') print(s) Output: (‘bbb’, 3)
  • 16. re.findall() The re.findall() method returns a list of strings containing all matches. If the pattern is not found, re.findall() returns an empty list. Syntax: re.findall(pattern, string) Example1: s=re.findall('a','abab') print(s) Output: ['a', 'a'] re.split() The re.split method splits the string where there is a match and returns a list of strings where the splits have occurred. If the pattern is not found, re.split() returns a list containing the original string. You can pass maxsplit argument to the re.split() method. It's the maximum number of splits that will occur. By the way, the default value of maxsplit is 0; meaning all possible splits. Syntax: re.split(pattern, string) Example1: s=re.split('a','abab') print(s) Output: ['', 'b', 'b'] Example2: s=re.split('a','aababa',3) print(s) Output: ['', '', 'b', 'ba']
  • 17. CASE STUDY Street Addresses: In this case study, we will take one street address as input and try to perform some operations on the input by making use of library functions. Example: str1='100 NORTH MAIN ROAD' str1.replace('ROAD','RD') Output: '100 NORTH MAIN RD' str1.replace('NORTH','NRTH') Output: '100 NRTH MAIN ROAD' re.sub('ROAD','RD',str1) Output: '100 NORTH MAIN RD' re.sub('NORTH','NRTH',str1) Output: '100 NRTH MAIN ROAD' re.split('A',str1) Output: ['100 NORTH M', 'IN RO', 'D'] re.findall('O',str1) Output: ['O', 'O'] re.sub('^1','2',str1) Output: '200 NORTH MAIN ROAD' Roman Numerals I = 1 V = 5 X = 10
  • 18. L = 50 C = 100 D = 500 M = 1000 For writing 4, we will write the roman number representation as IV. For 9, we will write as IX. For 40, we can write as XL. For 90, we can write as XC. For 900, we can write as CM. Let us write the roman number representation for few numbers. Ex1: 1940 MCMXL Ex2: 1946 MCMXLVI Ex3: 1940 MCMXL Ex4: 1888 MDCCCLXXXVIII Checking for thousands: 1000=M 2000=MM 3000=MMM Possible pattern is to have M in it. Example: pattern = '^M?M?M?$' re.search(pattern, 'M') Output: <re.Match object; span=(0, 1), match='M'> re.search(pattern, 'MM') Output: <re.Match object; span=(0, 2), match='MM'> re.search(pattern, 'MMM')
  • 19. Output: <re.Match object; span=(0, 3), match='MMM'> re.search(pattern, 'ML') re.search(pattern, 'MX') re.search(pattern, 'MI') re.search(pattern, 'MMMM') Checking for Hundreds: 100=C 200=CC 300=CCC 400=CD 500=D 600=DC 700=DCC 800=DCCC 900=CM Example: pattern = '^M?M?M?(CM|CD|D?C?C?C?)$' re.search(pattern,'MCM') Output: <re.Match object; span=(0, 3), match='MCM'> re.search(pattern,’MD’) Output: <re.Match object; span=(0, 2), match='MD'> re.search(pattern,'MMMCCC') Output: <re.Match object; span=(0, 6), match='MMMCCC'> re.search(pattern,'MCMLXX') Using the {n,m} syntax
  • 20. We will check in the string, where in the pattern occurs at least minimum ‘n’ times and at most maximum ‘m’ times. Example: pattern='^M{0,3}$' re.search(pattern,'MM') Output: <re.Match object; span=(0, 2), match='MM'> re.search(pattern,'M') Output: <re.Match object; span=(0, 1), match='M'> re.search(pattern,'MMM') Output: <re.Match object; span=(0, 3), match='MMM'> Checking for Tens and Ones: 1=I 2=II 3=III 4=IV 5=V 6=VI 7=VII 8=VIII 9=IX 10=X 20=XX 30=XXX 40=XL 50=L 60=LX 70=LXX 80=LXXX 90=XC Example: pattern='^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' re.search(pattern,'MDLVI') Output: <re.Match object; span=(0, 5), match='MDLVI'> re.search(pattern,'MCMXLVI') Output: <re.Match object; span=(0, 7), match='MCMXLVI'> re.search(pattern,'MMMCCCXLV') Output: <re.Match object; span=(0, 9), match='MMMCCCXLV'>