SlideShare a Scribd company logo
3
Most read
5
Most read
6
Most read
Regular Expressions

   How do they work
Several important Facts
1. Everything in computing was discovered in
one form or another in the 70-80’s and was
probably thought about during the 60’s.
2. The easiest way to become a great computer
engineer in the 80’s was to work for Bell Labs
and have a beard.
Back to the subject at hand
What are regular expressions?
From Wikipedia:
In computing, a regular expression provides a
concise and flexible means to "match" (specify
and recognize) strings of text, such as particular
characters, words, or patterns of characters.
Common abbreviations for "regular expression"
include regex and regexp.
Why do we need regular expressions
         (in programming)
Many reasons but most of them are in their base
finding strings in text .
Preferably without reading it

^(?("")(""[^""]+?""@)|(([0-9a-z]((.(?!.))|[-
!#$%&'*+/=?^`{}|~w])*)(?<=[0-9a-
z])@))(?([)([(d{1,3}.){3}d{1,3}])|(([0-9a-z][-
w]*[0-9a-z]*.)+[a-z0-9]{2,17}))$

^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])S{8,}$
Regular Expressions Syntax
            meta characters
 Grouping
     . – match any other character
     [ ] – grouping, match single character that is inside the group
     [^ ] – grouping, match single character that is not inside the group
     ( ) – sub expression, in Perl can be recalled later from special variables
 Quantifier
     {m,n} –specifies that the character/sub expression before need to be matched
      at least m times and no more than n times
     * - derived from Kleene star in formal logic, matches 0 or more amount of the
      character before it.
     ? –matches zero or one of the preceding elements
     + - derived from Kleene cross in formal logic, matches 1 or more of the
      character before it.
 Location
     ^ - Marking start of line
     $ - Marking end of line
Regular Expressions Syntax
               Character groups
 [:alpha:] - Any alphabetical character - [A-Za-z]
 [:alnum:] - Any alphanumeric character - [A-Za-z0-9]
 [:ascii:] - Any character in the ASCII character set.[:blank:] - A GNU
  extension, equal to a space or a horizontal tab ("t")
 [:cntrl:] - Any control character
 [:digit:] - Any decimal digit - [0-9], equivalent to "d“
 [:graph:] - Any printable character, excluding a space
 [:lower:] - Any lowercase character - [a-z]
 [:print:] - Any printable character, including a space
 [:punct:] - Any graphical character excluding "word" characters
 [:space:] - Any whitespace character. "s" plus the vertical tab ("cK")
 [:upper:] - Any uppercase character - [A-Z]
 [:word:] - A Perl extension - [A-Za-z0-9_], equivalent to "w“
 [:xdigit:] - Any hexadecimal digit - [0-9a-fA-F].
What is a regular expression engine
A regular expression engine is a program that takes
a set of constraints specified in a mini-
language, and then applies those constraints to a
target string, and determines whether or not the
string satisfies the constraints.

In less grandiose terms, the first part of the job is to
turn a pattern into something the computer can
efficiently use to find the matching point in the
string, and the second part is performing the search
itself.
Famous Regex Engines
Part 2
How the Perl Regex engine works
• Unlike the army only two steps
  – Compilation
     • Parsing (Size, Construction)
     • Peep-hole optimization and analysis
  – Execution
     • Start position and no-match optimizations
     • Program execution
DFA
DFA
NFA
 Equal in strength to DFA
            Smaller in size
Ken Thompson
Thompson NFA method
• In 1968 Thompson wrote an article on how to
  convert a regular expression to still unnamed
  automata (NFA)
• The article included code to explain the point
Thompson NFA method
1. Check the regex and inject . For concat action
a(b|c)*d
2. Convert to reverse polish notation
abc|*.d.
Thompson NFA method cont.
Check single character

                                      OR
        char

                                      exp
                                      exp
                  Kleene star


                                exp
Thompson NFA method cont.
• 3.Build the NFA

                        B
       A
                        C


                    D
Problems for regex
• NLP

• Unicode vs. ASCII
Some examples of Regex
• ([^s]+(.(?i)(jpg|png|gif|bmp))$)
   – Match file with specific extentions
• ^(https?://)?([da-z.-]+).([a-z.]{2,6})([/w
  .-]*)*/?$
   – Match URL
• /^#?([a-f0-9]{6}|[a-f0-9]{3})$/
   – Match a hex value
• [ -~]
   – An interesting one.

More Related Content

PPTX
Regular Expression (Regex) Fundamentals
PPTX
Regular Expression
PPTX
Deterministic Finite Automata
PDF
Regular expression
PPTX
Regular expressions
ODP
Regex Presentation
PPTX
Regular expressions
PPTX
Regular Expression in Compiler design
Regular Expression (Regex) Fundamentals
Regular Expression
Deterministic Finite Automata
Regular expression
Regular expressions
Regex Presentation
Regular expressions
Regular Expression in Compiler design

What's hot (20)

PPTX
Introduction to Regular Expressions
PPTX
Control Structure in JavaScript (1).pptx
PPTX
Regular Expression
PDF
COMPILER DESIGN- Syntax Directed Translation
PPTX
Regular Expression Examples.pptx
PDF
Algorithms Lecture 4: Sorting Algorithms I
PDF
Introduction to Computer theory Daniel Cohen Chapter 2 Solutions
PPT
Regular Expressions
PPTX
Pumping lemma Theory Of Automata
PPT
pushdown automata
PDF
File operations
PPTX
Problem Formulation in Artificial Inteligence Projects
PPTX
Context free grammar
PPTX
Prefix, Infix and Post-fix Notations
PPTX
Back patching
PDF
Formal Languages and Automata Theory unit 2
PPT
PPTX
2.1 & 2.2 grammar introduction – types of grammar
PPTX
Regular expressions
PPT
Kleene's theorem
Introduction to Regular Expressions
Control Structure in JavaScript (1).pptx
Regular Expression
COMPILER DESIGN- Syntax Directed Translation
Regular Expression Examples.pptx
Algorithms Lecture 4: Sorting Algorithms I
Introduction to Computer theory Daniel Cohen Chapter 2 Solutions
Regular Expressions
Pumping lemma Theory Of Automata
pushdown automata
File operations
Problem Formulation in Artificial Inteligence Projects
Context free grammar
Prefix, Infix and Post-fix Notations
Back patching
Formal Languages and Automata Theory unit 2
2.1 & 2.2 grammar introduction – types of grammar
Regular expressions
Kleene's theorem
Ad

Viewers also liked (15)

PDF
Lecture: Regular Expressions and Regular Languages
KEY
Regular Expressions 101
PDF
Overlay automata and algorithms for fast and scalable regular expression matc...
PPT
Regular Expressions
DOCX
Tests
PPT
Lecture2 B
PPT
Lecture 03 lexical analysis
PPTX
Finite Automata
PPT
Regular expression with DFA
PDF
Field Extractions: Making Regex Your Buddy
PPT
Theory of computing pdf
PDF
Finite State Automata
PPTX
Optimization of dfa
PPT
Lec 3 ---- dfa
PDF
Regular language and Regular expression
Lecture: Regular Expressions and Regular Languages
Regular Expressions 101
Overlay automata and algorithms for fast and scalable regular expression matc...
Regular Expressions
Tests
Lecture2 B
Lecture 03 lexical analysis
Finite Automata
Regular expression with DFA
Field Extractions: Making Regex Your Buddy
Theory of computing pdf
Finite State Automata
Optimization of dfa
Lec 3 ---- dfa
Regular language and Regular expression
Ad

Similar to Regular expressions (20)

ODP
Introduction To Regex in Lasso 8.5
PDF
14-Strings-In-Python strings with oops .pdf
PPTX
Regular expressions and php
PDF
Learning notes of r for python programmer (Temp1)
DOCX
Quick start reg ex
PDF
Course 102: Lecture 13: Regular Expressions
PPT
Regular Expressions 2007
ODP
Regular Expressions and You
PPT
Regular Expressions grep and egrep
PPTX
Unit I - 1R introduction to R program.pptx
PDF
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
PPTX
P3 2018 python_regexes
PPTX
unit-5 String Math Date Time AI presentation
PPT
CS540-2-lecture2 Lexical analyser of .ppt
PPT
2.regular expressions
PDF
Lexicalanalyzer
PDF
Lexicalanalyzer
PPTX
P3 2017 python_regexes
PDF
Regular expression for everyone
PPTX
Bioinformatica p2-p3-introduction
Introduction To Regex in Lasso 8.5
14-Strings-In-Python strings with oops .pdf
Regular expressions and php
Learning notes of r for python programmer (Temp1)
Quick start reg ex
Course 102: Lecture 13: Regular Expressions
Regular Expressions 2007
Regular Expressions and You
Regular Expressions grep and egrep
Unit I - 1R introduction to R program.pptx
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
P3 2018 python_regexes
unit-5 String Math Date Time AI presentation
CS540-2-lecture2 Lexical analyser of .ppt
2.regular expressions
Lexicalanalyzer
Lexicalanalyzer
P3 2017 python_regexes
Regular expression for everyone
Bioinformatica p2-p3-introduction

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Modernizing your data center with Dell and AMD
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
Modernizing your data center with Dell and AMD
Per capita expenditure prediction using model stacking based on satellite ima...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Regular expressions

  • 1. Regular Expressions How do they work
  • 2. Several important Facts 1. Everything in computing was discovered in one form or another in the 70-80’s and was probably thought about during the 60’s. 2. The easiest way to become a great computer engineer in the 80’s was to work for Bell Labs and have a beard.
  • 3. Back to the subject at hand
  • 4. What are regular expressions? From Wikipedia: In computing, a regular expression provides a concise and flexible means to "match" (specify and recognize) strings of text, such as particular characters, words, or patterns of characters. Common abbreviations for "regular expression" include regex and regexp.
  • 5. Why do we need regular expressions (in programming) Many reasons but most of them are in their base finding strings in text . Preferably without reading it ^(?("")(""[^""]+?""@)|(([0-9a-z]((.(?!.))|[- !#$%&'*+/=?^`{}|~w])*)(?<=[0-9a- z])@))(?([)([(d{1,3}.){3}d{1,3}])|(([0-9a-z][- w]*[0-9a-z]*.)+[a-z0-9]{2,17}))$ ^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])S{8,}$
  • 6. Regular Expressions Syntax meta characters  Grouping  . – match any other character  [ ] – grouping, match single character that is inside the group  [^ ] – grouping, match single character that is not inside the group  ( ) – sub expression, in Perl can be recalled later from special variables  Quantifier  {m,n} –specifies that the character/sub expression before need to be matched at least m times and no more than n times  * - derived from Kleene star in formal logic, matches 0 or more amount of the character before it.  ? –matches zero or one of the preceding elements  + - derived from Kleene cross in formal logic, matches 1 or more of the character before it.  Location  ^ - Marking start of line  $ - Marking end of line
  • 7. Regular Expressions Syntax Character groups  [:alpha:] - Any alphabetical character - [A-Za-z]  [:alnum:] - Any alphanumeric character - [A-Za-z0-9]  [:ascii:] - Any character in the ASCII character set.[:blank:] - A GNU extension, equal to a space or a horizontal tab ("t")  [:cntrl:] - Any control character  [:digit:] - Any decimal digit - [0-9], equivalent to "d“  [:graph:] - Any printable character, excluding a space  [:lower:] - Any lowercase character - [a-z]  [:print:] - Any printable character, including a space  [:punct:] - Any graphical character excluding "word" characters  [:space:] - Any whitespace character. "s" plus the vertical tab ("cK")  [:upper:] - Any uppercase character - [A-Z]  [:word:] - A Perl extension - [A-Za-z0-9_], equivalent to "w“  [:xdigit:] - Any hexadecimal digit - [0-9a-fA-F].
  • 8. What is a regular expression engine A regular expression engine is a program that takes a set of constraints specified in a mini- language, and then applies those constraints to a target string, and determines whether or not the string satisfies the constraints. In less grandiose terms, the first part of the job is to turn a pattern into something the computer can efficiently use to find the matching point in the string, and the second part is performing the search itself.
  • 11. How the Perl Regex engine works • Unlike the army only two steps – Compilation • Parsing (Size, Construction) • Peep-hole optimization and analysis – Execution • Start position and no-match optimizations • Program execution
  • 12. DFA
  • 13. DFA
  • 14. NFA Equal in strength to DFA Smaller in size
  • 16. Thompson NFA method • In 1968 Thompson wrote an article on how to convert a regular expression to still unnamed automata (NFA) • The article included code to explain the point
  • 17. Thompson NFA method 1. Check the regex and inject . For concat action a(b|c)*d 2. Convert to reverse polish notation abc|*.d.
  • 18. Thompson NFA method cont. Check single character OR char exp exp Kleene star exp
  • 19. Thompson NFA method cont. • 3.Build the NFA B A C D
  • 20. Problems for regex • NLP • Unicode vs. ASCII
  • 21. Some examples of Regex • ([^s]+(.(?i)(jpg|png|gif|bmp))$) – Match file with specific extentions • ^(https?://)?([da-z.-]+).([a-z.]{2,6})([/w .-]*)*/?$ – Match URL • /^#?([a-f0-9]{6}|[a-f0-9]{3})$/ – Match a hex value • [ -~] – An interesting one.

Editor's Notes

  • #3: Where they worked on MULTICS which later became unix
  • #4: From XKCD
  • #6: The first is for email from MSDN the second is for a password of 8 characters with atleast one lowercase upper case and other symbol
  • #7: This is from the POSIX specification
  • #8: Again those are the POSIX groups which are not implemented everywhere the examples are for Perl regex
  • #10: Camel for Perl, Japaneas for Onigurama (Devil Chariot), Henry Spencers picture, PCRE logo, google for RE2 and a gnu for the GNU REGEX
  • #12: In compiler theory, peephole optimization is a kind of optimization performed over a very small set of instructions in a segment of generated code. The set is called a &quot;peephole&quot; or a &quot;window&quot;. It works by recognising sets of instructions that can be replaced by shorter or faster set of instructions.The execution is including other algorithms like Bayer-Moore and other ways to shorten execution timeFunctions – compilationReg() for parsingReg_branch() Reg_piece()Reg_atom()Reg_tail()Study_chunk() for optimizationFunctions – ExecutionRe_intuit_start() – for starting locationsReg_try()Regmatch()
  • #13: Not the department of foreign affairs
  • #14: Accepts the language of all numbers ends with 10 or 01
  • #15: NFA’s transition table for the regex (l|e)*n?(i|e)el*
  • #16: Thompson on the left and ritchy on the rightThompson invented B helped inventing CAlso created worked on QED and ed, and invented regular expressionsHelped develop utf8 and the go language
  • #17: The original article included algol code for the IBM mainframe and was implanted in QED editor
  • #18: Reverse polish notation is used to remove ambiguity and make it easier to work with stack
  • #21: Natural language processing is becoming common due to computer speed and massive amounts of data, using statistical tools.UNICODE VS ASCII is a problem for developer due to characters that might be cought as the same while being different
  • #22: Notice it doesn’t use the xdigit which is not supported in all engines