SlideShare a Scribd company logo
PYTHON APPLICATION
PROGRAMMING -18EC646
MODULE-3
REGULAR EXPRESSIONS
PROF. KRISHNANANDA L
DEPARTMEN T OF ECE
GSKSJTI, BENGALURU
WHAT IS MEANT BY
REGULAR EXPRESSION?
We have seen string/file slicing, searching, parsing etc and
built-in methods like split, find etc.
This task of searching and extracting finds applications in
Email classification, Web searching etc.
Python has a very powerful library called regularexpressions
that handles many of these tasks quite elegantly
Regular expressions are like small but powerful programming
language, for matching text patterns and provide a
standardized way of searching, replacing, and parsing text
with complex patterns of characters.
Regular expressions can be defined as the sequence of
characters which are used to search for a pattern in a string.
2
FEATURES OF REGEX
Hundreds of lines of code could be reduced to few lines with regular
expressions
Used to construct compilers, interpreters and text editors
Used to search and match text patterns
The power of the regular expressions comes when we add special
characters to the search string that allow us to do sophisticated
matching and extraction with very little code.
Used to validate text data formats especially input data
ARegular Expression (or Regex) is a pattern (or filter) that describes
a set of strings that matches the pattern. A regex consists of a
sequence of characters, metacharacters (such as . , d , ?, W etc ) and
operators (such as + , * , ? , | , ^ ).
Popular programming languages like Python, Perl, JavaScript, Ruby,
Tcl, C# etc have Regex capabilities 3
GENERAL USES OF REGULAR
EXPRESSIONS
Search a string (search and match)
Replace parts of a string(sub)
Break string into small pieces(split)
Finding a string (findall)
The module re provides the support to use regex in the
python program. The re module throws an exception if there
is some error while using the regular expression.
Before using the regular expressions in program, we have to
import the library using “import re”
4
REGEX FUNCTIONS
The re module offers a set of functions
FUNCTION DESCRIPTION
findall Returns a list containing all matches of a pattern in
the string
search Returns a match Object if there is a match
anywhere in the string
split Returns a list where the string has been split at each
match
sub Replaces one or more matches in a string
(substitute with another string)
match This method matches the regex pattern in the string
with the optional flag. It returns true if a match is
found in the string, otherwise it returns false.
5
EXAMPLE PROGRAM
• We open the file, loop through
each line, and use the regular
expression search() to only print
out lines that contain the string
“hello”. (same can be done using
“line.find()” also)
# Search for lines that contain ‘hello'
import re
fp = open('d:/18ec646/demo1.txt')
for line in fp:
line = line.rstrip()
if re.search('hello', line):
print(line)
Output:
hello and welcome to python class
hello how are you?
# Search for lines that contain ‘hello'
import re
fp = open('d:/18ec646/demo2.txt')
for line in fp:
line = line.rstrip()
if re.search('hello', line):
print(line)
Output:
friends,hello and welcome
hello,goodmorning 6
EXAMPLE PROGRAM
• To get the optimum performance from Regex, we need to use special
characters called ‘metacharacters’
# Search for lines that starts with 'hello'
import re
fp = open('d:/18ec646/demo1.txt')
for line in fp:
line = line.rstrip()
if re.search('^hello', line): ## note 'caret' metacharacter
print(line) ## before hello
Output:
hello and welcome to python class
hello how are you?
# Search for lines that starts with 'hello'
import re
fp = open('d:/18ec646/demo2.txt')
for line in fp:
line = line.rstrip()
if re.search('^hello', line): ## note 'caret' metacharacter
print(line) ## before hello
Output:
hello, goodmorning
7
METACHARACTERS
Metacharacters are characters that are interpreted in a
special way by a RegEx engine.
Metacharacters are very helpful for parsing/extraction
from the given file/string
Metacharacters allow us to build more powerful regular
expressions.
Table-1 provides a summary of metacharacters and their
meaning in RegEx
Here's a list of metacharacters:
[ ] . ^ $ * + ? { } ( )  |
8
Metacharacter Description Example
[ ] It represents the set of characters. "[a-z]"
 It represents the special sequence (can also be
used to escape special characters)
"r"
. It signals that any character is present at some
specific place (except newline character)
"Ja...v."
^ It represents the pattern present at the beginning
of the string (indicates “startswith”)
"^python"
$ It represents the pattern present at the end of the
string. (indicates “endswith”)
"world"
* It represents zero or more occurrences of a
pattern in the string.
"hello*"
+ It represents one or more occurrences of a
pattern in the string.
"hello+"
{} The specified number of occurrences of a pattern
the string.
“hello{2}"
| It represents either this or the other character is
present.
"hello|hi"
() Capture and group
9
[ ] - SQUARE BRACKETS
• Square brackets specifies a set of characters you wish to match.
• A set is a group of characters given inside a pair of square brackets. It represents
the special meaning.
10
[abc] Returns a match if the string contains any of the specified
characters in the set.
[a-n] Returns a match if the string contains any of the characters between a to
n.
[^arn] Returns a match if the string contains the characters except a, r, and n.
[0123] Returns a match if the string contains any of the specified digits.
[0-9] Returns a match if the string contains any digit between 0 and 9.
[0-5][0-9] Returns a match if the string contains any digit between 00 and 59.
[a-zA-Z] Returns a match if the string contains any alphabet (lower-case or upper-
case).
CONTD..
### illustrating square brackets
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("[w]", line):
print(line)
## search all the lines where w is
present and display
Output:
Hello and welcome
@abhishek,how are you
### illustrating square brackets
import re
fh = open('d:/18ec646/demo3.txt')
for line in fh:
line = line.rstrip()
if re.search("[ge]", line):
print(line)
### Search for characters g or e or
both and display
Output:
Hello and welcome
This is Bangalore
11
CONTD…
### illustrating square brackets
import re
fh = open('d:/18ec646/demo3.txt')
for line in fh:
line = line.rstrip()
if re.search("[th]", line):
print(line)
Ouput:
This is Bangalore
This is Paris
This is London
import re
fh = open('d:/18ec646/demo7.txt')
for line in fh:
line = line.rstrip()
if re.search("[y]", line):
print(line) Ouput:
johny johny yes papa
open your mouth
### illustratingsquare brackets
import re
fh =
open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("[x-z]", line):
print(line)
Output:
to:abhishek@yahoo.com
@abhishek,how are you
12
. PERIOD (DOT)
A period matches any single character (except newline 'n‘)
Expression String Matched?
..
(any two
characters)
a No match
ac 1 match
acd 1 match
acde
2 matches
(contains 4
characters)
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("y.", line):
print(line)
Output:
to: abhishek@yahoo.com
@abhishek,how are you
13
CONTD..
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo3.txt')
for line in fh:
line = line.rstrip()
if re.search("P.", line):
print(line)
Output:
This is Paris
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("T..s", line):
print(line)
Output:
This is London
These are beautiful flowers
Thus we see the great London bridge
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("L..d", line):
print(line)
Output:
This is London
Thus we see the great London bridge
## any two characters betweenT and s
14
^ - CARET
The caret symbol ^ is used to check if a string starts with a certain
character
Expression String Matched?
^a
a 1 match
abc 1 match
bac No match
^ab
abc 1 match
acb No match (starts with a but not followedby b)
### illustrating caret
import re
fh = open('d:/18ec646/demo2.txt')
for line in fh:
line = line.rstrip()
if re.search("^h",line):
print(line) Output:
hello, goodmorning
### illustrating caret
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("^f", line):
print(line)
from:krishna.sksj@gmail.com
15
$ - DOLLAR
The dollar symbol $ is used to check if a string ends with a certain
character.
Expression String Matched?
a$
a 1 match
formula 1 match
cab No match
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("m$", line):
print(line)
Output:
from:krishna.sksj@gmail.com
to: abhishek@yahoo.com
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo7.txt')
for line in fh:
line = line.rstrip()
if re.search("papa$", line):
print(line)
Output:
johny johny yes papa
eating sugar no papa
16
* - STAR
The star symbol * matches zero or more occurrences of the pattern left
to it.
Expression String Matched?
ma*n
mn 1 match
man 1 match
maaan 1 match
main No match (a is not followedby n)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("London*",line):
print(line)
Output:
This is London
Thus we see the great London bridge
17
+ - PLUS
The plus symbol + matchesone or more occurrences of the pattern left
to it.
Expression String Matched?
ma+n
mn No match (no a character)
man 1 match
maaan 1 match
main No match (a is not followedby n)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("see+", line):
print(line)
Output:
Thus we see the great London bridge
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("ar+", line):
print(line)
Output:
These are beautiful flowers
18
? - QUESTION MARK
The question mark symbol ? matches zero or one occurrence of the pattern left to
it.
Expression String Matched?
ma?n
mn 1 match
man 1 match
maaan No match (more than one a character)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("@gmail?", line):
print(line)
Output:
from:krishna.sksj@gmail.com
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("you?",line):
print(line)
Output:
@abhishek,how are you
19
{} - BRACES
Finds the specified number of occurrences of a pattern. Consider {n, m}. This
means at least n, and at most m repetitions of the pattern left to it.
If a{2} was given, a should be repeated exactly twice
Expression String Matched?
a{2,3}
abc dat No match
abc daat 1 match (at daat)
aabc daaat 2 matches (at aabc and daaat)
aabc daaaat 2 matches (at aabc and daaaat)
20
| - ALTERNATION
Vertical bar | is used for alternation (or operator).
Expression String Matched?
a|b
cde No match
ade 1 match (match at ade)
acdbea 3 matches (at acdbea)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo7.txt')
for line in fh:
line = line.rstrip()
if re.search("yes|no", line):
print(line)
Output:
johny johny yes papa
eating sugar no papa
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo2.txt')
for line in fh:
line = line.rstrip()
if re.search("hello|how", line):
print(line)
Output:
friends,hello and welcome
hello,goodmorning
21
() - GROUP
Parentheses () is used to group sub-patterns.
For ex, (a|b|c)xz match any string that matches
either a or b or c followed by xz
Expression String Matched?
(a|b|c)xz
ab xz No match
abxz 1 match (match at abxz)
axz cabxz 2 matches (at axzbc cabxz)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("(hello|how) are", line):
print(line)
Output:@abhishek,how are you
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo2.txt')
for line in fh:
line = line.rstrip()
if re.search("(hello and)", line):
print(line)
Ouptut:
friends,hello and welcome
22
- BACKSLASH
Backlash  is used to escape various characters including all
metacharacters.
For ex, $a match if a string contains $ followed by a.
Here, $ is not interpreted by a RegEx engine in a special way.
If you are unsure if a character has special meaning or not, you
can put  in front of it. This makes sure the character is not treated
in a special way.
NOTE :- Another way of doing it is putting the special
character in the square brackets [ ]
23
SPECIAL SEQUENCES
A special sequence is a  followed by one of the characters
(see Table) and has a special meaning
Special sequences make commonly used patterns easier to
write.
24
SPECIAL SEQUENCES
Character Description Example
A It returns a match if the specified characters are
present at the beginning of the string.
"AThe"
b It returns a match if the specified characters are
present at the beginning or the end of the string.
r"bain"
r"ainb"
B It returns a match if the specified characters are
present at the beginning of the string but not at the
end.
r"Bain"
r"ainB
d It returns a match if the string contains digits [0-9]. "d"
D It returns a match if the string doesn't contain the
digits [0-9].
"D"
s It returns a match if the string contains any white
space character.
"s"
S It returns a match if the string doesn't contain any
white space character.
"S"
w It returns a match if the string contains any word
characters (Ato Z, a to z, 0 to 9 and underscore)
"w"
W It returns a match if the string doesn't contain any
word characters
"W" 25
A - Matches if the specified characters are at the start of a string.
Expression String Matched?
Athe
the sun Match
In the sun No match
26
b - Matches if the specified characters are at the beginning or end of a word
Expression String Matched?
bfoo
football Match
a football Match
afootball No match
foob
football No Match
the afoo test Match
the afootest No match
B - Opposite of b. Matches if the specified characters
are not at the beginning or end of a word.
Expression String Matched?
Bfoo
football No match
a football No match
afootball Match
fooB
the foo No match
the afoo test No match
the afootest Match
27
d - Matches any decimal digit. Equivalent to [0-9]
D - Matches any non-decimal digit. Equivalent to [^0-9]
Expression String Matched?
d
12abc3 3 matches (at 12abc3)
Python No match
Expression String Matched?
D
1ab34"50 3 matches (at 1ab34"50)
1345 No match
28
s - Matches where a string contains any whitespace
character. Equivalent to [ tnrfv].
S - Matches where a string contains any non-whitespace
character. Equivalent to [^ tnrfv].
Expression String Matched?
s
Python RegEx 1 match
PythonRegEx No match
Expression String Matched?
S
a b 2 matches (at a b)
No match
29
w - Matches any alphanumeric character. Equivalent to [a-zA-Z0-
9_]. Underscore is also considered an alphanumeric character
W - Matches any non-alphanumeric character. Equivalent
to [^a-zA-Z0-9_]
Expression String Matched?
w
12&":;c 3 matches (at 12&":;c)
%"> ! No match
Expression String Matched?
W
1a2%c 1 match (at 1a2%c)
Python No match
30
Z - Matches if the specified characters are at the end of a
string.
Expression String Matched?
PythonZ
I like Python 1 match
I like Python
Programming
No match
Python is fun. No match
31
# check whether the specified
#characters are at the end of string
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
if re.findall ("comZ", x):
print(x)
Output:
from:krishna.sksj@gmail.com
to: abhishek@yahoo.com
REGEX FUNCTIONS
The re module offers a set of functions
FUNCTION DESCRIPTION
findall Returns a list containing all matches of a pattern in
the string
search Returns a match Object if there is a match
anywhere in the string
split Returns a list where the string has been split at each
match
sub Replaces one or more matches in a string
(substitute with another string)
match This method matches the regex pattern in the string
with the optional flag. It returns true if a match is
found in the string, otherwise it returns false.
32
THE FINDALL() FUNCTION
The findall() function returns a list containing all matches.
The list contains the matches in the order they are found.
If no matches are found, an empty list is returned
Here is the syntax for this function −
re. findall(pattern, string, flags=0)
33
import re
str ="How are you. How is everything?"
matches= re.findall("How",str)
print(matches)
['How','How']
EXAMPLES Contd..
OUTPUTS:
34
CONTD..
35
#check whether string starts with How
import re
str ="How are you. How is everything?"
x= re.findall("^How",str)
print (str)
print(x)
if x:
print ("string starts with 'How' ")
else:
print ("string does not start with 'How'")
Output:
How are you.How is everything?
['How']
string starts with 'How'
CONTD…
36
# match all lines that starts with 'hello'
import re
fp = open('d:/18ec646/demo1.txt')
for x in fp:
x = x.rstrip()
if re.findall ('^hello',x): ## note 'caret'
print(x)
Output:
hello and welcome to python class
hello how are you?
# match all lines that starts with ‘@'
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
if re.findall ('^@',x): ## note 'caret'
metacharacter
print(x)
Output:
@abhishek,how are you
# check whether the string contains
## non-digit characters
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
if re.findall ("D", x): ## special sequence
print(x)
from:krishna.sksj@gmail.com
to:abhishek@yahoo.com
Hello and welcome
@abhishek,how are you
THE SEARCH() FUNCTION
The search() function searches the string for a match, and
returns a Match object if there is a match.
If there is more than one match, only the first occurrence
of the match will be returned
If no matches are found, the value None is returned
Here is the syntax for this function −
re.search(pattern, string, flags=0)
37
EXAPLES on search() function:-
outputs:
38
THE SPLIT() FUNCTION
The re.split method splits the string where there is a match
and returns a list of strings where the splits have occurred.
You can pass maxsplit argument to the re.split() method. It's
the maximum number of splits that will occur.
If the pattern is not found, re.split() returns a list containing
the original string.
Here is the syntax for this function −
re.split(pattern, string, maxsplit=0, flags=0)
39
EXAPLES on split() function:-
40
# split function
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
x= re.split("@",x)
print(x)
Output:
['from:krishna.sksj','gmail.com']
['to: abhishek','yahoo.com']
['Hello and welcome']
['','abhishek,how are you']
CONTD..
41
# split function
import re
fp =
open('d:/18ec646/demo7.txt')
for x in fp:
x = x.rstrip()
x= re.split("e",x)
print(x)
Output:
['johny johny y','s papa']
['', 'ating sugar no papa']
['t','lling li', 's']
['op','n your mouth']
Output:
['johny johny yes ', '']
['eating sugar no ','']
['telling lies']
['open your mouth']
# split function
import re
fp =
open('d:/18ec646/demo7.txt')
for x in fp:
x = x.rstrip()
x= re.split("papa",x)
print(x)
# split function
import re
fp =
open('d:/18ec646/demo3.txt')
for x in fp:
x = x.rstrip()
x= re.split("is",x)
print(x)
Output:
['Hello and welcome']
['Th',' ',' Bangalore']
['Th',' ',' Par','']
['Th',' ',' London']
THE SUB() FUNCTION
The sub() function replaces the matches with the text of your
choice
You can control the number of replacements by specifying
the count parameter
If the pattern is not found, re.sub() returns the original string
Here is the syntax for this function −
re.sub(pattern, repl, string, count=0, flags=0)
42
EXAPLES on sub() function:-
43
### illustration of substitute (replace)
import re
str ="How are you.How is everything?"
x= re.sub("How","where",str)
print(x)
Output:
where are you.where is everything?
# sub function
import re
fp = open('d:/18ec646/demo3.txt')
for x in fp:
x = x.rstrip()
x= re.sub("This","Where",x)
print(x)
Output:
Hello and welcome
Where is Bangalore
Where is Paris
Where is London
THE MATCH() FUNCTION
If zero or more characters at the beginning of string match
this regular expression, return a corresponding match object.
Return None if the string does not match the pattern.
Here is the syntax for this function −
Pattern.match(string[, pos[, endpos]])
The optional pos and endpos parameters have the same
meaning as for the search() method.
44
search() Vs match()
Python offers two different primitive operations based on
regular expressions:
 re.match() checksfor a match only at the beginning of the string,
while re.search() checks for a match anywhere in the string
Eg:-
45
# match function
import re
fp = open('d:/18ec646/demo3.txt')
for x in fp:
x = x.rstrip()
if re.match("This",x):
print(x)
Outptut:
This is Bangalore
This is Paris
This is London
MATCH OBJECT
A Match Object is an object containing information about the
search and the result
If there is no match, the value None will be returned, instead
of the Match Object
Some of the commonly used methods and attributes of match
objects are:
match.group(), match.start(), match.end(), match.span(),
match.string
46
match.group()
The group() method returns the part of the string where
there is a match
match.start(), match.end()
The start() function returns the index of the start of the
matched substring.
 Similarly, end() returns the end index of the matched
substring.
match.string
string attribute returns the passed string.
47
match.span()
The span() function returns a tuple containing start
and end index of the matched part.
Eg:-
OUTPUT:
(12,17)
48

More Related Content

PPTX
Regular expressions in Python
PPT
Adv. python regular expression by Rj
PPT
Intermediate code generation (Compiler Design)
PPTX
Function C programming
PPTX
Variables in python
PDF
Operators in python
PPTX
Database Access With JDBC
Regular expressions in Python
Adv. python regular expression by Rj
Intermediate code generation (Compiler Design)
Function C programming
Variables in python
Operators in python
Database Access With JDBC

What's hot (20)

PPTX
Python variables and data types.pptx
PPTX
Python-Classes.pptx
PPTX
Polymorphism presentation in java
PPT
Functions in C++
PPT
PPT
Strings
PPTX
What is identifier c programming
PDF
C++ references
PPTX
Function in C Programming
PDF
Expression trees
PDF
RMM CD LECTURE NOTES UNIT-3 ALL.pdf
PPTX
Hash table in java
PDF
Function overloading ppt
PPT
C++ classes tutorials
ODP
Perl Introduction
PPTX
Python-Polymorphism.pptx
PDF
Type Checking
PPTX
PL/SQL - CURSORS
PPTX
Interfaces c#
Python variables and data types.pptx
Python-Classes.pptx
Polymorphism presentation in java
Functions in C++
Strings
What is identifier c programming
C++ references
Function in C Programming
Expression trees
RMM CD LECTURE NOTES UNIT-3 ALL.pdf
Hash table in java
Function overloading ppt
C++ classes tutorials
Perl Introduction
Python-Polymorphism.pptx
Type Checking
PL/SQL - CURSORS
Interfaces c#
Ad

Similar to Python regular expressions (20)

PPTX
Regular_Expressions.pptx
PDF
Maxbox starter20
DOCX
Python - Regular Expressions
PPTX
Strings,patterns and regular expressions in perl
PPTX
Unit 1-strings,patterns and regular expressions
ODP
PHP Web Programming
PPTX
chapte_6_String_python_bca_2005_computer
PDF
Module 3 - Regular Expressions, Dictionaries.pdf
PDF
regular-expression.pdf
PPT
Php String And Regular Expressions
PPT
Regular expressions
PPSX
Regular expressions in oracle
PPTX
Processing Regex Python
PPT
Introduction to perl scripting______.ppt
PPTX
Regular Expression
PPTX
Unit 1-array,lists and hashes
PPTX
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
PPT
Perl Basics with Examples
PPTX
Chapter 3: Introduction to Regular Expression
PPT
Java căn bản - Chapter9
Regular_Expressions.pptx
Maxbox starter20
Python - Regular Expressions
Strings,patterns and regular expressions in perl
Unit 1-strings,patterns and regular expressions
PHP Web Programming
chapte_6_String_python_bca_2005_computer
Module 3 - Regular Expressions, Dictionaries.pdf
regular-expression.pdf
Php String And Regular Expressions
Regular expressions
Regular expressions in oracle
Processing Regex Python
Introduction to perl scripting______.ppt
Regular Expression
Unit 1-array,lists and hashes
FAL(2022-23)_FRESHERS_CSE1012_ETH_AP2022234000166_Reference_Material_I_06-Dec...
Perl Basics with Examples
Chapter 3: Introduction to Regular Expression
Java căn bản - Chapter9
Ad

More from Krishna Nanda (16)

PDF
Python dictionaries
PDF
Python lists
PDF
Python-Tuples
PDF
Python- strings
PDF
Python-files
PDF
Computer Communication Networks- Introduction to Transport layer
PDF
Computer Communication Networks- TRANSPORT LAYER PROTOCOLS
PDF
COMPUTER COMMUNICATION NETWORKS -IPv4
PDF
COMPUTER COMMUNICATION NETWORKS-R-Routing protocols 2
PDF
Computer Communication Networks-Routing protocols 1
PDF
Computer Communication Networks-Wireless LAN
PDF
Computer Communication Networks-Network Layer
PDF
Lk module3
PDF
Lk module4 structures
PDF
Lk module4 file
PDF
Lk module5 pointers
Python dictionaries
Python lists
Python-Tuples
Python- strings
Python-files
Computer Communication Networks- Introduction to Transport layer
Computer Communication Networks- TRANSPORT LAYER PROTOCOLS
COMPUTER COMMUNICATION NETWORKS -IPv4
COMPUTER COMMUNICATION NETWORKS-R-Routing protocols 2
Computer Communication Networks-Routing protocols 1
Computer Communication Networks-Wireless LAN
Computer Communication Networks-Network Layer
Lk module3
Lk module4 structures
Lk module4 file
Lk module5 pointers

Recently uploaded (20)

PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Project quality management in manufacturing
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
Welding lecture in detail for understanding
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mechanical Engineering MATERIALS Selection
Operating System & Kernel Study Guide-1 - converted.pdf
Project quality management in manufacturing
bas. eng. economics group 4 presentation 1.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Arduino robotics embedded978-1-4302-3184-4.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Model Code of Practice - Construction Work - 21102022 .pdf
Geodesy 1.pptx...............................................
Welding lecture in detail for understanding
Lesson 3_Tessellation.pptx finite Mathematics
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Internet of Things (IOT) - A guide to understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

Python regular expressions

  • 1. PYTHON APPLICATION PROGRAMMING -18EC646 MODULE-3 REGULAR EXPRESSIONS PROF. KRISHNANANDA L DEPARTMEN T OF ECE GSKSJTI, BENGALURU
  • 2. WHAT IS MEANT BY REGULAR EXPRESSION? We have seen string/file slicing, searching, parsing etc and built-in methods like split, find etc. This task of searching and extracting finds applications in Email classification, Web searching etc. Python has a very powerful library called regularexpressions that handles many of these tasks quite elegantly Regular expressions are like small but powerful programming language, for matching text patterns and provide a standardized way of searching, replacing, and parsing text with complex patterns of characters. Regular expressions can be defined as the sequence of characters which are used to search for a pattern in a string. 2
  • 3. FEATURES OF REGEX Hundreds of lines of code could be reduced to few lines with regular expressions Used to construct compilers, interpreters and text editors Used to search and match text patterns The power of the regular expressions comes when we add special characters to the search string that allow us to do sophisticated matching and extraction with very little code. Used to validate text data formats especially input data ARegular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern. A regex consists of a sequence of characters, metacharacters (such as . , d , ?, W etc ) and operators (such as + , * , ? , | , ^ ). Popular programming languages like Python, Perl, JavaScript, Ruby, Tcl, C# etc have Regex capabilities 3
  • 4. GENERAL USES OF REGULAR EXPRESSIONS Search a string (search and match) Replace parts of a string(sub) Break string into small pieces(split) Finding a string (findall) The module re provides the support to use regex in the python program. The re module throws an exception if there is some error while using the regular expression. Before using the regular expressions in program, we have to import the library using “import re” 4
  • 5. REGEX FUNCTIONS The re module offers a set of functions FUNCTION DESCRIPTION findall Returns a list containing all matches of a pattern in the string search Returns a match Object if there is a match anywhere in the string split Returns a list where the string has been split at each match sub Replaces one or more matches in a string (substitute with another string) match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string, otherwise it returns false. 5
  • 6. EXAMPLE PROGRAM • We open the file, loop through each line, and use the regular expression search() to only print out lines that contain the string “hello”. (same can be done using “line.find()” also) # Search for lines that contain ‘hello' import re fp = open('d:/18ec646/demo1.txt') for line in fp: line = line.rstrip() if re.search('hello', line): print(line) Output: hello and welcome to python class hello how are you? # Search for lines that contain ‘hello' import re fp = open('d:/18ec646/demo2.txt') for line in fp: line = line.rstrip() if re.search('hello', line): print(line) Output: friends,hello and welcome hello,goodmorning 6
  • 7. EXAMPLE PROGRAM • To get the optimum performance from Regex, we need to use special characters called ‘metacharacters’ # Search for lines that starts with 'hello' import re fp = open('d:/18ec646/demo1.txt') for line in fp: line = line.rstrip() if re.search('^hello', line): ## note 'caret' metacharacter print(line) ## before hello Output: hello and welcome to python class hello how are you? # Search for lines that starts with 'hello' import re fp = open('d:/18ec646/demo2.txt') for line in fp: line = line.rstrip() if re.search('^hello', line): ## note 'caret' metacharacter print(line) ## before hello Output: hello, goodmorning 7
  • 8. METACHARACTERS Metacharacters are characters that are interpreted in a special way by a RegEx engine. Metacharacters are very helpful for parsing/extraction from the given file/string Metacharacters allow us to build more powerful regular expressions. Table-1 provides a summary of metacharacters and their meaning in RegEx Here's a list of metacharacters: [ ] . ^ $ * + ? { } ( ) | 8
  • 9. Metacharacter Description Example [ ] It represents the set of characters. "[a-z]" It represents the special sequence (can also be used to escape special characters) "r" . It signals that any character is present at some specific place (except newline character) "Ja...v." ^ It represents the pattern present at the beginning of the string (indicates “startswith”) "^python" $ It represents the pattern present at the end of the string. (indicates “endswith”) "world" * It represents zero or more occurrences of a pattern in the string. "hello*" + It represents one or more occurrences of a pattern in the string. "hello+" {} The specified number of occurrences of a pattern the string. “hello{2}" | It represents either this or the other character is present. "hello|hi" () Capture and group 9
  • 10. [ ] - SQUARE BRACKETS • Square brackets specifies a set of characters you wish to match. • A set is a group of characters given inside a pair of square brackets. It represents the special meaning. 10 [abc] Returns a match if the string contains any of the specified characters in the set. [a-n] Returns a match if the string contains any of the characters between a to n. [^arn] Returns a match if the string contains the characters except a, r, and n. [0123] Returns a match if the string contains any of the specified digits. [0-9] Returns a match if the string contains any digit between 0 and 9. [0-5][0-9] Returns a match if the string contains any digit between 00 and 59. [a-zA-Z] Returns a match if the string contains any alphabet (lower-case or upper- case).
  • 11. CONTD.. ### illustrating square brackets import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("[w]", line): print(line) ## search all the lines where w is present and display Output: Hello and welcome @abhishek,how are you ### illustrating square brackets import re fh = open('d:/18ec646/demo3.txt') for line in fh: line = line.rstrip() if re.search("[ge]", line): print(line) ### Search for characters g or e or both and display Output: Hello and welcome This is Bangalore 11
  • 12. CONTD… ### illustrating square brackets import re fh = open('d:/18ec646/demo3.txt') for line in fh: line = line.rstrip() if re.search("[th]", line): print(line) Ouput: This is Bangalore This is Paris This is London import re fh = open('d:/18ec646/demo7.txt') for line in fh: line = line.rstrip() if re.search("[y]", line): print(line) Ouput: johny johny yes papa open your mouth ### illustratingsquare brackets import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("[x-z]", line): print(line) Output: to:abhishek@yahoo.com @abhishek,how are you 12
  • 13. . PERIOD (DOT) A period matches any single character (except newline 'n‘) Expression String Matched? .. (any two characters) a No match ac 1 match acd 1 match acde 2 matches (contains 4 characters) ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("y.", line): print(line) Output: to: abhishek@yahoo.com @abhishek,how are you 13
  • 14. CONTD.. ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo3.txt') for line in fh: line = line.rstrip() if re.search("P.", line): print(line) Output: This is Paris ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("T..s", line): print(line) Output: This is London These are beautiful flowers Thus we see the great London bridge ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("L..d", line): print(line) Output: This is London Thus we see the great London bridge ## any two characters betweenT and s 14
  • 15. ^ - CARET The caret symbol ^ is used to check if a string starts with a certain character Expression String Matched? ^a a 1 match abc 1 match bac No match ^ab abc 1 match acb No match (starts with a but not followedby b) ### illustrating caret import re fh = open('d:/18ec646/demo2.txt') for line in fh: line = line.rstrip() if re.search("^h",line): print(line) Output: hello, goodmorning ### illustrating caret import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("^f", line): print(line) from:krishna.sksj@gmail.com 15
  • 16. $ - DOLLAR The dollar symbol $ is used to check if a string ends with a certain character. Expression String Matched? a$ a 1 match formula 1 match cab No match ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("m$", line): print(line) Output: from:krishna.sksj@gmail.com to: abhishek@yahoo.com ### illustrating metacharacters import re fh = open('d:/18ec646/demo7.txt') for line in fh: line = line.rstrip() if re.search("papa$", line): print(line) Output: johny johny yes papa eating sugar no papa 16
  • 17. * - STAR The star symbol * matches zero or more occurrences of the pattern left to it. Expression String Matched? ma*n mn 1 match man 1 match maaan 1 match main No match (a is not followedby n) ### illustrating metacharacters import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("London*",line): print(line) Output: This is London Thus we see the great London bridge 17
  • 18. + - PLUS The plus symbol + matchesone or more occurrences of the pattern left to it. Expression String Matched? ma+n mn No match (no a character) man 1 match maaan 1 match main No match (a is not followedby n) ### illustrating metacharacters import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("see+", line): print(line) Output: Thus we see the great London bridge ### illustrating metacharacters import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("ar+", line): print(line) Output: These are beautiful flowers 18
  • 19. ? - QUESTION MARK The question mark symbol ? matches zero or one occurrence of the pattern left to it. Expression String Matched? ma?n mn 1 match man 1 match maaan No match (more than one a character) ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("@gmail?", line): print(line) Output: from:krishna.sksj@gmail.com ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("you?",line): print(line) Output: @abhishek,how are you 19
  • 20. {} - BRACES Finds the specified number of occurrences of a pattern. Consider {n, m}. This means at least n, and at most m repetitions of the pattern left to it. If a{2} was given, a should be repeated exactly twice Expression String Matched? a{2,3} abc dat No match abc daat 1 match (at daat) aabc daaat 2 matches (at aabc and daaat) aabc daaaat 2 matches (at aabc and daaaat) 20
  • 21. | - ALTERNATION Vertical bar | is used for alternation (or operator). Expression String Matched? a|b cde No match ade 1 match (match at ade) acdbea 3 matches (at acdbea) ### illustrating metacharacters import re fh = open('d:/18ec646/demo7.txt') for line in fh: line = line.rstrip() if re.search("yes|no", line): print(line) Output: johny johny yes papa eating sugar no papa ### illustrating metacharacters import re fh = open('d:/18ec646/demo2.txt') for line in fh: line = line.rstrip() if re.search("hello|how", line): print(line) Output: friends,hello and welcome hello,goodmorning 21
  • 22. () - GROUP Parentheses () is used to group sub-patterns. For ex, (a|b|c)xz match any string that matches either a or b or c followed by xz Expression String Matched? (a|b|c)xz ab xz No match abxz 1 match (match at abxz) axz cabxz 2 matches (at axzbc cabxz) ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("(hello|how) are", line): print(line) Output:@abhishek,how are you ### illustrating metacharacters import re fh = open('d:/18ec646/demo2.txt') for line in fh: line = line.rstrip() if re.search("(hello and)", line): print(line) Ouptut: friends,hello and welcome 22
  • 23. - BACKSLASH Backlash is used to escape various characters including all metacharacters. For ex, $a match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special way. If you are unsure if a character has special meaning or not, you can put in front of it. This makes sure the character is not treated in a special way. NOTE :- Another way of doing it is putting the special character in the square brackets [ ] 23
  • 24. SPECIAL SEQUENCES A special sequence is a followed by one of the characters (see Table) and has a special meaning Special sequences make commonly used patterns easier to write. 24
  • 25. SPECIAL SEQUENCES Character Description Example A It returns a match if the specified characters are present at the beginning of the string. "AThe" b It returns a match if the specified characters are present at the beginning or the end of the string. r"bain" r"ainb" B It returns a match if the specified characters are present at the beginning of the string but not at the end. r"Bain" r"ainB d It returns a match if the string contains digits [0-9]. "d" D It returns a match if the string doesn't contain the digits [0-9]. "D" s It returns a match if the string contains any white space character. "s" S It returns a match if the string doesn't contain any white space character. "S" w It returns a match if the string contains any word characters (Ato Z, a to z, 0 to 9 and underscore) "w" W It returns a match if the string doesn't contain any word characters "W" 25
  • 26. A - Matches if the specified characters are at the start of a string. Expression String Matched? Athe the sun Match In the sun No match 26 b - Matches if the specified characters are at the beginning or end of a word Expression String Matched? bfoo football Match a football Match afootball No match foob football No Match the afoo test Match the afootest No match
  • 27. B - Opposite of b. Matches if the specified characters are not at the beginning or end of a word. Expression String Matched? Bfoo football No match a football No match afootball Match fooB the foo No match the afoo test No match the afootest Match 27
  • 28. d - Matches any decimal digit. Equivalent to [0-9] D - Matches any non-decimal digit. Equivalent to [^0-9] Expression String Matched? d 12abc3 3 matches (at 12abc3) Python No match Expression String Matched? D 1ab34"50 3 matches (at 1ab34"50) 1345 No match 28
  • 29. s - Matches where a string contains any whitespace character. Equivalent to [ tnrfv]. S - Matches where a string contains any non-whitespace character. Equivalent to [^ tnrfv]. Expression String Matched? s Python RegEx 1 match PythonRegEx No match Expression String Matched? S a b 2 matches (at a b) No match 29
  • 30. w - Matches any alphanumeric character. Equivalent to [a-zA-Z0- 9_]. Underscore is also considered an alphanumeric character W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_] Expression String Matched? w 12&":;c 3 matches (at 12&":;c) %"> ! No match Expression String Matched? W 1a2%c 1 match (at 1a2%c) Python No match 30
  • 31. Z - Matches if the specified characters are at the end of a string. Expression String Matched? PythonZ I like Python 1 match I like Python Programming No match Python is fun. No match 31 # check whether the specified #characters are at the end of string import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() if re.findall ("comZ", x): print(x) Output: from:krishna.sksj@gmail.com to: abhishek@yahoo.com
  • 32. REGEX FUNCTIONS The re module offers a set of functions FUNCTION DESCRIPTION findall Returns a list containing all matches of a pattern in the string search Returns a match Object if there is a match anywhere in the string split Returns a list where the string has been split at each match sub Replaces one or more matches in a string (substitute with another string) match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string, otherwise it returns false. 32
  • 33. THE FINDALL() FUNCTION The findall() function returns a list containing all matches. The list contains the matches in the order they are found. If no matches are found, an empty list is returned Here is the syntax for this function − re. findall(pattern, string, flags=0) 33 import re str ="How are you. How is everything?" matches= re.findall("How",str) print(matches) ['How','How']
  • 35. CONTD.. 35 #check whether string starts with How import re str ="How are you. How is everything?" x= re.findall("^How",str) print (str) print(x) if x: print ("string starts with 'How' ") else: print ("string does not start with 'How'") Output: How are you.How is everything? ['How'] string starts with 'How'
  • 36. CONTD… 36 # match all lines that starts with 'hello' import re fp = open('d:/18ec646/demo1.txt') for x in fp: x = x.rstrip() if re.findall ('^hello',x): ## note 'caret' print(x) Output: hello and welcome to python class hello how are you? # match all lines that starts with ‘@' import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() if re.findall ('^@',x): ## note 'caret' metacharacter print(x) Output: @abhishek,how are you # check whether the string contains ## non-digit characters import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() if re.findall ("D", x): ## special sequence print(x) from:krishna.sksj@gmail.com to:abhishek@yahoo.com Hello and welcome @abhishek,how are you
  • 37. THE SEARCH() FUNCTION The search() function searches the string for a match, and returns a Match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned If no matches are found, the value None is returned Here is the syntax for this function − re.search(pattern, string, flags=0) 37
  • 38. EXAPLES on search() function:- outputs: 38
  • 39. THE SPLIT() FUNCTION The re.split method splits the string where there is a match and returns a list of strings where the splits have occurred. You can pass maxsplit argument to the re.split() method. It's the maximum number of splits that will occur. If the pattern is not found, re.split() returns a list containing the original string. Here is the syntax for this function − re.split(pattern, string, maxsplit=0, flags=0) 39
  • 40. EXAPLES on split() function:- 40 # split function import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() x= re.split("@",x) print(x) Output: ['from:krishna.sksj','gmail.com'] ['to: abhishek','yahoo.com'] ['Hello and welcome'] ['','abhishek,how are you']
  • 41. CONTD.. 41 # split function import re fp = open('d:/18ec646/demo7.txt') for x in fp: x = x.rstrip() x= re.split("e",x) print(x) Output: ['johny johny y','s papa'] ['', 'ating sugar no papa'] ['t','lling li', 's'] ['op','n your mouth'] Output: ['johny johny yes ', ''] ['eating sugar no ',''] ['telling lies'] ['open your mouth'] # split function import re fp = open('d:/18ec646/demo7.txt') for x in fp: x = x.rstrip() x= re.split("papa",x) print(x) # split function import re fp = open('d:/18ec646/demo3.txt') for x in fp: x = x.rstrip() x= re.split("is",x) print(x) Output: ['Hello and welcome'] ['Th',' ',' Bangalore'] ['Th',' ',' Par',''] ['Th',' ',' London']
  • 42. THE SUB() FUNCTION The sub() function replaces the matches with the text of your choice You can control the number of replacements by specifying the count parameter If the pattern is not found, re.sub() returns the original string Here is the syntax for this function − re.sub(pattern, repl, string, count=0, flags=0) 42
  • 43. EXAPLES on sub() function:- 43 ### illustration of substitute (replace) import re str ="How are you.How is everything?" x= re.sub("How","where",str) print(x) Output: where are you.where is everything? # sub function import re fp = open('d:/18ec646/demo3.txt') for x in fp: x = x.rstrip() x= re.sub("This","Where",x) print(x) Output: Hello and welcome Where is Bangalore Where is Paris Where is London
  • 44. THE MATCH() FUNCTION If zero or more characters at the beginning of string match this regular expression, return a corresponding match object. Return None if the string does not match the pattern. Here is the syntax for this function − Pattern.match(string[, pos[, endpos]]) The optional pos and endpos parameters have the same meaning as for the search() method. 44
  • 45. search() Vs match() Python offers two different primitive operations based on regular expressions:  re.match() checksfor a match only at the beginning of the string, while re.search() checks for a match anywhere in the string Eg:- 45 # match function import re fp = open('d:/18ec646/demo3.txt') for x in fp: x = x.rstrip() if re.match("This",x): print(x) Outptut: This is Bangalore This is Paris This is London
  • 46. MATCH OBJECT A Match Object is an object containing information about the search and the result If there is no match, the value None will be returned, instead of the Match Object Some of the commonly used methods and attributes of match objects are: match.group(), match.start(), match.end(), match.span(), match.string 46
  • 47. match.group() The group() method returns the part of the string where there is a match match.start(), match.end() The start() function returns the index of the start of the matched substring.  Similarly, end() returns the end index of the matched substring. match.string string attribute returns the passed string. 47
  • 48. match.span() The span() function returns a tuple containing start and end index of the matched part. Eg:- OUTPUT: (12,17) 48