SlideShare a Scribd company logo
Regular Expressions
for
Beginners
Srikanth Modegunta
Introduction

Also referred to as Regex or RegExp

Used to match the pattern of text
− Ex: maven and maeven can be matched with
regex “mae?ven”

Regular Expressions are processed by a piece
of software called “Regular Expressions
Engine”

Most of the languages support Regex
− Ex: perl, java, c# etc.
Introduction (Contd..)

Used where text processing is required.

XML parsing involves Regex as it is based on the pattern
matching.
− We will see how to match xml or html tag.

Automation of the tasks
− Ex: if mail subject contains “<operation> <some task
name> <command>” then start processing the task.

Text Editors updating the comments to functions
automatically(Replacing a pattern with some text)
− Ex: replace
− “sub subroutine(parameters){<statements>}” by
/* this is a sample subroutine*/
sub subroutine(parameters){<statements>}
Meta Characters
The following are the meta characters
 | ( ) [ { ^ $ * + ? .
Meta Characters (Contd..)
Character Meaning
* 0 or more
+ 1 or more
? 0 or 1 (optional)
. All characters excluding new-line
^ Start of line. But [^abc] means
character other than 'a' or 'b' or 'c'
$ End of line
A Start of string
Z End of string
Meta Characters (Contd..)
Character Meaning
{ } If I know How many times the pattern
repeats I can use this
Ex: a{2, 5} matches 'a' repeated
minimum 2 times and maximum 5
times.
| Saying 'or' in patterns
Ex: cat|dog|mouse
() Used to capture groups
[ ] Only one letter from the set
Quantifiers

To specify the quantity
− Ex: ear, eaaaar – the quantity of a is 1 and 4
in these two cases.

If a pattern is repeated then we need to use
quantifiers to match that repeated pattern.

To match the above case we use the following
regex
− ea+r means a can come 1 or more times
Quantifiers (Contd..)
* 0 or more times (it is hungry matching)
Ex: ca* matches c, ca, caa, caaa etc.
Matches even if the character does not
exist and matches any number of 'a' s
generally till last occurrence of pattern
+ 1 or more times (it is hungry matching)
Ex: ca+ matches ca, caa, caaa etc
{n} Match exactly n times
Ex: ca{4}r matches caaaar
{m,} Matches minimum of m times and
maximum of more than m times
Ex: ca{2,}r matches only if a repeats
greater than 2 times. (hungry matching)
{m,n} Matches minimum m times and maximum n
times.
Ex: ca{2,3}r matches and 'a' repeats
minimum 2 times and maximum 3 times.
(hungry matching)
Hungry Matching refers to the behavior that the pattern matches maximum possible text.
Ex: for ca{0,4} the text “caaaa” matches I.e all the 4 'a's are matched.
Quantifiers (Contd..)
*? Lazy matching i.e it matches 0 or
more times but stops at first match
Ex: if text is “caaaaaa” then “ca*?”
will match only 'c'.
+? Lazy matching i.e it matches 1 or
more times but stops at first match
Ex: if text is “caaaaaa” then “ca+?”
will match only 'ca'.
?? Lazy matching i.e it matches 0 or 1
times but stops at first match
Ex: if text is “ca” then “ca??” will
match only 'c'.
{min,}?
{n}?
{min, max}?
Lazy matching
Lazy Matching refers to the behavior that the pattern matches minimum possible text.
Ex: for ca{0,4}? the text “caaaa” matches only “c”
Character Sets

Matches one character among the set of
characters

[abcd] is same as [a-d]

[a-di-l] is same as [abcdijkl]

[^abcd] matches any character other than
a,b,c,d

Quantifiers can be applied to the character sets
− [a-z]+ matches the string 'hello' in
'hello1234E'
Characters for Matching
Common character classes shorthand
[a-zA-Z0-9_] w
[0-9] d
[ tnr] s
[^a-zA-Z0-9_] W
[^0-9] D
[^ tnr] S
b Word Boundary
B Other than a Word Boundary
Simple Matching

modegunta.srikanth@gmail.com
− mail id should not start with number or special
symbols
− Mail id id can start with _
− Mail id can have '.' in the middle
− Should end with @domain.com

Pattern :
− [a-zA-Z_][a-zA-Z_.]+@w+.(com|co.in)
− Meta characters must be escaped in the
pattern to match them as normal characters
Modifiers
Modifier Meaning
i Case insensitive
g Global matching (in perl)
m Multiline matching
s Dot all ('.' matches n also)
x Extended regex pattern (pretty format
ref: perl)
e (Used for replacing string) evaluate the
replacing pattern as an expression
(ref: perl)
Grouping

Groups can be captured using parenthesis
− (<pattern>)
− Saves the text identified by the group into a
backreference (we will see it later)

Groups are to capture part of text in the matching
pattern
− Ex: take simple xml element
<root>test</root>
− <(w+)>.*?</1>
− Here 1 is back reference

Java has a method “group(int)” method in
“java.util.regex.Matcher” class.
Grouping Example

If the command is
− /sbin/service <service-name> <command>
− ([^s]+)s+([w-_]+)s+(start|stop|status)
− Group 0=matched pattern
− Group 1=”/sbin/service”
− Group 2=<service-name>
− Group 3=<command>
− Command can be start, stop or status
Back References

Stores the part of the string matched by the part
of the regular expression inside the
parentheses

If there is any string that occurs multiple times
in the input, we can use back reference to
identify the match

Ex: xml/html start-tag should have the end-tag

Here if we capture the start-tag name in first
group, we can put end-tag name as back
reference (1)
Back references example

For example take the xml tag
− <root id=”E12”>test</root>
− <([w-_]+)s*([^<>]+)?>w+</1> matches
xml element
− Group 0: <root id=”E12”>test</root>
− Group 1: root
− Group 2: id=”E12”
− 1 in the regex pattern is the back reference to
group 1.
No grouping with parenthesis

If groups are not required for the parenthesized
patterns
− Use ?: inside group (?:)
− (text1|text2|text3) is any on of text1, text2 and
text3
− (?:text1|text2|text3) but will not be a group
Look ahead and Look behind

Positive look-ahead
− w+(?=:) not all words.... select words that come
before ':'

Negative look-ahead
− w+(?!:) words other than those coming before :

When the pattern comes the regex engine looks ahead for
the filtering pattern in case of Look ahead.

Positive look-behind
− (?<=a)b selects 'b' that follows 'a'

Negative look-behind
− (?<!a)b selects 'b' that doesn't follow 'a'

When the pattern comes the regex engine looks behind for
the filtering pattern in case of Look behind.
References:
1) http://www.regular-expressions.info/tutorial.html
2) Thinking in java 4th
Editon –
Chapter: Strings
page 392
Thank You

More Related Content

PPTX
Unit 1-array,lists and hashes
PPTX
Python advanced 2. regular expression in python
PDF
2013 - Andrei Zmievski: Clínica Regex
DOCX
Python - Regular Expressions
PPTX
Bioinformatics p2-p3-perl-regexes v2014
PPT
Adv. python regular expression by Rj
PDF
Python - Lecture 7
PPTX
Introduction Oracle Database 11g Release 2 for developers
Unit 1-array,lists and hashes
Python advanced 2. regular expression in python
2013 - Andrei Zmievski: Clínica Regex
Python - Regular Expressions
Bioinformatics p2-p3-perl-regexes v2014
Adv. python regular expression by Rj
Python - Lecture 7
Introduction Oracle Database 11g Release 2 for developers

What's hot (20)

PPT
Textpad and Regular Expressions
PDF
regex-presentation_ed_goodwin
PPT
Bioinformatica 06-10-2011-p2 introduction
PPT
Regex Basics
PPT
The Power of Regular Expression: use in notepad++
KEY
Andrei's Regex Clinic
PPTX
String in python lecture (3)
PDF
Python strings
PPTX
Processing Regex Python
PPT
16 Java Regex
PPTX
Regular expressions
PPT
Introduction to Regular Expressions
PPTX
Array and functions
PPTX
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
PDF
Strings in Python
PPSX
Java String class
PDF
3.2 javascript regex
PPTX
Regular expressions in Python
PPT
Introduction to regular expressions
PDF
Grep Introduction
Textpad and Regular Expressions
regex-presentation_ed_goodwin
Bioinformatica 06-10-2011-p2 introduction
Regex Basics
The Power of Regular Expression: use in notepad++
Andrei's Regex Clinic
String in python lecture (3)
Python strings
Processing Regex Python
16 Java Regex
Regular expressions
Introduction to Regular Expressions
Array and functions
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Strings in Python
Java String class
3.2 javascript regex
Regular expressions in Python
Introduction to regular expressions
Grep Introduction
Ad

Similar to Regex startup (20)

PPT
Regular expressions
PPTX
Regex lecture
PPT
Regular Expressions
PDF
Regular expressions
PDF
Maxbox starter20
PPT
Regex Experession with Regex functions o
PPT
Regular Expressions 2007
PDF
Regular expressions
ODP
Introduction To Regex in Lasso 8.5
PPT
Regular Expression in Action
ODP
Regular Expressions and You
PPT
PDF
Python (regular expression)
PPSX
Regular expressions in oracle
PDF
regular-expression.pdf
PPTX
Strings,patterns and regular expressions in perl
PPTX
Unit 1-strings,patterns and regular expressions
PDF
Python regular expressions
PPT
2.regular expressions
PDF
Module 3 - Regular Expressions, Dictionaries.pdf
Regular expressions
Regex lecture
Regular Expressions
Regular expressions
Maxbox starter20
Regex Experession with Regex functions o
Regular Expressions 2007
Regular expressions
Introduction To Regex in Lasso 8.5
Regular Expression in Action
Regular Expressions and You
Python (regular expression)
Regular expressions in oracle
regular-expression.pdf
Strings,patterns and regular expressions in perl
Unit 1-strings,patterns and regular expressions
Python regular expressions
2.regular expressions
Module 3 - Regular Expressions, Dictionaries.pdf
Ad

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
A Presentation on Artificial Intelligence
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Monthly Chronicles - July 2025
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf

Regex startup

  • 2. Introduction  Also referred to as Regex or RegExp  Used to match the pattern of text − Ex: maven and maeven can be matched with regex “mae?ven”  Regular Expressions are processed by a piece of software called “Regular Expressions Engine”  Most of the languages support Regex − Ex: perl, java, c# etc.
  • 3. Introduction (Contd..)  Used where text processing is required.  XML parsing involves Regex as it is based on the pattern matching. − We will see how to match xml or html tag.  Automation of the tasks − Ex: if mail subject contains “<operation> <some task name> <command>” then start processing the task.  Text Editors updating the comments to functions automatically(Replacing a pattern with some text) − Ex: replace − “sub subroutine(parameters){<statements>}” by /* this is a sample subroutine*/ sub subroutine(parameters){<statements>}
  • 4. Meta Characters The following are the meta characters | ( ) [ { ^ $ * + ? .
  • 5. Meta Characters (Contd..) Character Meaning * 0 or more + 1 or more ? 0 or 1 (optional) . All characters excluding new-line ^ Start of line. But [^abc] means character other than 'a' or 'b' or 'c' $ End of line A Start of string Z End of string
  • 6. Meta Characters (Contd..) Character Meaning { } If I know How many times the pattern repeats I can use this Ex: a{2, 5} matches 'a' repeated minimum 2 times and maximum 5 times. | Saying 'or' in patterns Ex: cat|dog|mouse () Used to capture groups [ ] Only one letter from the set
  • 7. Quantifiers  To specify the quantity − Ex: ear, eaaaar – the quantity of a is 1 and 4 in these two cases.  If a pattern is repeated then we need to use quantifiers to match that repeated pattern.  To match the above case we use the following regex − ea+r means a can come 1 or more times
  • 8. Quantifiers (Contd..) * 0 or more times (it is hungry matching) Ex: ca* matches c, ca, caa, caaa etc. Matches even if the character does not exist and matches any number of 'a' s generally till last occurrence of pattern + 1 or more times (it is hungry matching) Ex: ca+ matches ca, caa, caaa etc {n} Match exactly n times Ex: ca{4}r matches caaaar {m,} Matches minimum of m times and maximum of more than m times Ex: ca{2,}r matches only if a repeats greater than 2 times. (hungry matching) {m,n} Matches minimum m times and maximum n times. Ex: ca{2,3}r matches and 'a' repeats minimum 2 times and maximum 3 times. (hungry matching) Hungry Matching refers to the behavior that the pattern matches maximum possible text. Ex: for ca{0,4} the text “caaaa” matches I.e all the 4 'a's are matched.
  • 9. Quantifiers (Contd..) *? Lazy matching i.e it matches 0 or more times but stops at first match Ex: if text is “caaaaaa” then “ca*?” will match only 'c'. +? Lazy matching i.e it matches 1 or more times but stops at first match Ex: if text is “caaaaaa” then “ca+?” will match only 'ca'. ?? Lazy matching i.e it matches 0 or 1 times but stops at first match Ex: if text is “ca” then “ca??” will match only 'c'. {min,}? {n}? {min, max}? Lazy matching Lazy Matching refers to the behavior that the pattern matches minimum possible text. Ex: for ca{0,4}? the text “caaaa” matches only “c”
  • 10. Character Sets  Matches one character among the set of characters  [abcd] is same as [a-d]  [a-di-l] is same as [abcdijkl]  [^abcd] matches any character other than a,b,c,d  Quantifiers can be applied to the character sets − [a-z]+ matches the string 'hello' in 'hello1234E'
  • 11. Characters for Matching Common character classes shorthand [a-zA-Z0-9_] w [0-9] d [ tnr] s [^a-zA-Z0-9_] W [^0-9] D [^ tnr] S b Word Boundary B Other than a Word Boundary
  • 12. Simple Matching  modegunta.srikanth@gmail.com − mail id should not start with number or special symbols − Mail id id can start with _ − Mail id can have '.' in the middle − Should end with @domain.com  Pattern : − [a-zA-Z_][a-zA-Z_.]+@w+.(com|co.in) − Meta characters must be escaped in the pattern to match them as normal characters
  • 13. Modifiers Modifier Meaning i Case insensitive g Global matching (in perl) m Multiline matching s Dot all ('.' matches n also) x Extended regex pattern (pretty format ref: perl) e (Used for replacing string) evaluate the replacing pattern as an expression (ref: perl)
  • 14. Grouping  Groups can be captured using parenthesis − (<pattern>) − Saves the text identified by the group into a backreference (we will see it later)  Groups are to capture part of text in the matching pattern − Ex: take simple xml element <root>test</root> − <(w+)>.*?</1> − Here 1 is back reference  Java has a method “group(int)” method in “java.util.regex.Matcher” class.
  • 15. Grouping Example  If the command is − /sbin/service <service-name> <command> − ([^s]+)s+([w-_]+)s+(start|stop|status) − Group 0=matched pattern − Group 1=”/sbin/service” − Group 2=<service-name> − Group 3=<command> − Command can be start, stop or status
  • 16. Back References  Stores the part of the string matched by the part of the regular expression inside the parentheses  If there is any string that occurs multiple times in the input, we can use back reference to identify the match  Ex: xml/html start-tag should have the end-tag  Here if we capture the start-tag name in first group, we can put end-tag name as back reference (1)
  • 17. Back references example  For example take the xml tag − <root id=”E12”>test</root> − <([w-_]+)s*([^<>]+)?>w+</1> matches xml element − Group 0: <root id=”E12”>test</root> − Group 1: root − Group 2: id=”E12” − 1 in the regex pattern is the back reference to group 1.
  • 18. No grouping with parenthesis  If groups are not required for the parenthesized patterns − Use ?: inside group (?:) − (text1|text2|text3) is any on of text1, text2 and text3 − (?:text1|text2|text3) but will not be a group
  • 19. Look ahead and Look behind  Positive look-ahead − w+(?=:) not all words.... select words that come before ':'  Negative look-ahead − w+(?!:) words other than those coming before :  When the pattern comes the regex engine looks ahead for the filtering pattern in case of Look ahead.  Positive look-behind − (?<=a)b selects 'b' that follows 'a'  Negative look-behind − (?<!a)b selects 'b' that doesn't follow 'a'  When the pattern comes the regex engine looks behind for the filtering pattern in case of Look behind.