SlideShare a Scribd company logo
Regular Expressions Minh Hoang TO Portal Team
Agenda Finite State Machine Pattern Parser   Java Regex   Parsers in GateIn Advanced Theory
Finite State Machine
State Diagram
JIRA Issue Lifecycle
Java Thread Lifecycle
Java Compilation Flow
Finite State Machine - FSM Behavioral model to describe working flow of a system
Finite State Machine - FSM Directed graph with labeled edges
Pattern Parser
Classic Problem A  – Finite characters set Ex: A  =   {a, b, c, d,..., z}  or  A  =   { a, b, c,..., z, public, class, extends, implements, while, if,...} Pattern  P  and input sequence  INPUT  made of  A  's elements  Ex: P  = “a.*b” or  P  = “class.*extends.*” INPUT  = “aaabbbcc” or  INPUT  = a Java source file   ->  Parser reads character-by-character  INPUT  and recognizes all subsequences matching pattern  P
Classic Problem - Samples Split a sequence of characters into an array of subsequences   String path = “/portal/en/classic/home”;   String[] segments = path.split(“/”); Handle comment block encountered in a file Override  readLine()  in  BufferedReader Extract data from REST response Write an XML parser from scratch
Finite State Machine & Classic Problem Acceptor FSM? How to transform Classic Problem into graph traversing problem  with well-known generic solution?   Find pattern occurrences ↔ Traversing directed graph with labeled edges
FSM – Word Accepting Consider a word  W  – sequence of characters from character set  A     W =  “abcd...xyz” FSM having graph edges labeled with characters from  A , accepts  W  if there exists a path connecting START node to one of END nodes   START  = S1 -> S2 -> … -> Sn  = END 1. Duplicate of intermediate nodes is allowed 2 . The transition from  S_i  ->  S_(i+1)  is determined (labeled) by  i-th character of  W
Acceptor FSM Given a pattern  P , a FSM is called  Acceptor FSM  if it  accepts any word  matching pattern  P .  Ex:   Acceptor FSM of  “a[0-9]b”  accepts any elements from word set   { “a0b”, “a1b”, “a2b”, “a3b”, “a4b”, “a5b”, “a6b”, “a7b”, “a8b”, “a9b”}
How Pattern Parser Works? Traversing directed graph associated with Acceptor FSM   1. Start from root node   2. Read next characters from INPUT, then makes move according to   transition rules   3. Repeat second step until visiting one leaf node or INPUT becomes empty 4. Return OK if leaf node refers to success match.
Example One Recognize pattern   eXo.*er in: AAA eXo123er BBB eXoer CCC eXoeXoer DDD
Example One Acceptor FSM with 8 states: START  –  Start reading input sequence e  –  encounter   e eX  –  encounter   eX eXo  –  encounter   eXo eXo.*  –  encounter   eXo.* eXo.*e  –  encounter   eXo.*e END  –  subsequence matching   eXo.*er  found FAILURE
 
Example Two Recognize comment block   /*  */ in: /* Don't ask * / final int innerClassVariable;
Example Two Acceptor FSM with 5 states: START  –  start reading input sequence OUT  –  stay away from comment blocks ENTERING  –  at the beginning of comment block IN  –  stay inside a comment block LEAVING  –  at the end of comment block
 
Finite State Machine With Stack Example Two is slightly harder than Example One as transition decision depends on past information -> We must keep something in memory FSM with Stack  =  Ordinary FSM  +  Stack Structure  storing past info Contextual transition  is determined by ( next input character  , stack state )
Java Regex
Model Pattern:  Acceptor Finite State Machine Matcher:  Parser
java.util.regex.Pattern Construct FSM accepting pattern   Pattern p =  Pattern.compile(“a.*b”); FSM states are instances of  java.util.regex.Pattern$Node Generate parser working on input sequence   Matcher matcher = p.matcher(“aaabbbb”);
java.util.regex.Matcher Find next subsequence matching pattern   find() Get capturing groups from latest match   group()
Capturing Group Two Pattern objects Pattern p = Pattern.compile(“abcd.*efgh”); Pattern q = Pattern.compile(“abcd(.*)efgh”); String text = “abcd12345efgh”; Matcher pM = p.match(text); Matcher qM = q.match(text); pM.find()  ==  qM.find(); pM.group(1)  !=  qM.group(1);
Capturing Group Hold additional information on each match while(matcher.find()) {   matcher.group(index); } Pattern  P = (A)(B(C)) matcher.group(0) = the whole sequence  ABC matcher.group(1) =  ABC matcher.group(2) =  BC matcher.group(3) =  C
Capturing Group Pattern.compile(“abc(defgh”); Pattern.compile(“abcdef)gh”); ->  PatternSyntaxException Pattern.compile(“abc\\(defgh”); Pattern.compile(“abcdef\\)gh”); ->  Success thanks to escape character '\'
Operators Union   [a-zA-Z-0-9] Negation   [^abc]   [^X]
Contextual Match X(?=Y) Once match X, look ahead to find Y X(?!= Y) Once match X, look ahead and expect not find Y X(?<= Y) Once match X, look behind to find Y X(?<!= Y) Once match X, look behind and expect not find Y
Tips Pattern  is stateless  ->  Maximize reuse We often see:   static final Pattern p = Pattern.compile(“a*b”); Be careful with   String.split    String.split  vs   Java loop + String.charAt
Parsers in GateIn
Parsers in GateIn JavaScript Compressor CSS Compressor Groovy Template Optimizer Navigation Controller   Extracting URL param = Regex matching + Backtracking algorithm StaxNavigator (Nice XML parser based on StAX)
Advanced Theory
Grammar & Language Any word matching pattern eXo.*er is a combination transforms, starting from  S S -> eXoQer Q -> RQT Q -> '' R -> {a,b,c,d,...} T -> {a,b,c,d,...} Language  of a  Grammar  = Vocabularies generated by finite-combination of transforms, starting from  S Ex: Any valid Java source file is generated by a finite number of transforms mentioned in Java Grammar (JLS)
Finite State Machine & Language Language accepted by a FSM with Stack must be built from a context-free grammar Explicit steps to build such context-free grammar are described in Kleene theorem Context-free grammar Language is accepted by a FSM with Stack   Explicit steps to build such Finite State Machine are described in Kleene theorem

More Related Content

PPT
Regular Expression
PPT
16 Java Regex
PPTX
Regular expressions
PDF
Regular expression
PPTX
Java: Regular Expression
PPT
Regular Expression
PDF
Regular expressions in Ruby and Introduction to Vim
PPTX
Regular Expressions in PHP
Regular Expression
16 Java Regex
Regular expressions
Regular expression
Java: Regular Expression
Regular Expression
Regular expressions in Ruby and Introduction to Vim
Regular Expressions in PHP

What's hot (19)

PDF
RegexCat
PPTX
Finite automata-for-lexical-analysis
PDF
PDF
Lex analysis
PPT
Regular expressions
PPT
Maclennan chap5-pascal
PPTX
Regular Expression (Regex) Fundamentals
DOCX
Python - Regular Expressions
PPT
Lecture 05 syntax analysis 2
PPTX
Regular Expression
PPTX
Finaal application on regular expression
PPT
Regular Expressions
PPT
Regular Expressions grep and egrep
PPTX
Regular expressions
ODP
Regex Presentation
PDF
regular expressions (Regex)
PPTX
Programming in C
PDF
Source-Level Proof Reconstruction for Interactive Proving
PDF
Beginning Python
RegexCat
Finite automata-for-lexical-analysis
Lex analysis
Regular expressions
Maclennan chap5-pascal
Regular Expression (Regex) Fundamentals
Python - Regular Expressions
Lecture 05 syntax analysis 2
Regular Expression
Finaal application on regular expression
Regular Expressions
Regular Expressions grep and egrep
Regular expressions
Regex Presentation
regular expressions (Regex)
Programming in C
Source-Level Proof Reconstruction for Interactive Proving
Beginning Python
Ad

Viewers also liked (20)

PPSX
Cumpleaños
PPTX
Barber Library Website Usability Results, Fall 2012
PPT
Mcf presentation by Hai NGUYEN-Portal team
PPSX
Catalog parts
PPT
Scala - By Luu Thanh Thuy CWI team from eXo Platform SEA
PPTX
La desicion mas dificil
PPS
Coisas antigas 1
PPS
COSTA_CONCORDIA
PDF
20140426 wtm66
PPTX
Jhonatanlopez
PPT
Chromattic usage in eXo Social
PPS
Seasons Of Life
PPTX
Ingles speaking
PPTX
Deans workshop
PPS
Seasons of life-Don't judge life in one season!
PPT
SEO presentation By Dang HA - ECM team
PPT
Xss.e xopresentation from eXo SEA
PPSX
Cumple2011
PPT
AOP-IOC made by Vi Quoc Hanh and Vu Cong Thanh in SC Team
PPTX
Expo dun and ross
Cumpleaños
Barber Library Website Usability Results, Fall 2012
Mcf presentation by Hai NGUYEN-Portal team
Catalog parts
Scala - By Luu Thanh Thuy CWI team from eXo Platform SEA
La desicion mas dificil
Coisas antigas 1
COSTA_CONCORDIA
20140426 wtm66
Jhonatanlopez
Chromattic usage in eXo Social
Seasons Of Life
Ingles speaking
Deans workshop
Seasons of life-Don't judge life in one season!
SEO presentation By Dang HA - ECM team
Xss.e xopresentation from eXo SEA
Cumple2011
AOP-IOC made by Vi Quoc Hanh and Vu Cong Thanh in SC Team
Expo dun and ross
Ad

Similar to Regular expression made by To Minh Hoang - Portal team (20)

ODP
AST Transformations
PDF
JCConf 2021 - Java17: The Next LTS
PDF
Scala in Places API
PDF
js_class_notes_for_ institute it is very useful for your study.pdf
PDF
core java
ODP
Ast transformations
ODP
Groovy Ast Transformations (greach)
PDF
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
PDF
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
PPTX
php string part 3
PPTX
The Art of Java Type Patterns
ODP
Ast transformation
PPTX
Day5 String python language for btech.pptx
ODP
Naïveté vs. Experience
PDF
Formal Languages and Automata Theory unit 3
PPT
9781305078444 ppt ch08
PPT
Class 5 - PHP Strings
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
PDF
Python regular expressions
AST Transformations
JCConf 2021 - Java17: The Next LTS
Scala in Places API
js_class_notes_for_ institute it is very useful for your study.pdf
core java
Ast transformations
Groovy Ast Transformations (greach)
MODEL OF A PROGRAM AS MULTITHREADED STOCHASTIC AUTOMATON AND ITS EQUIVALENT T...
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
php string part 3
The Art of Java Type Patterns
Ast transformation
Day5 String python language for btech.pptx
Naïveté vs. Experience
Formal Languages and Automata Theory unit 3
9781305078444 ppt ch08
Class 5 - PHP Strings
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Python regular expressions

More from Thuy_Dang (7)

DOCX
MEET-BIS Vietnam TOR_ ES_2013
PPT
Shell scripting - By Vu Duy Tu from eXo Platform SEA
PPT
Web accessibility developers by Bao AN - eXo SEA
PPT
Secure & authentication By Lai HIEU - eXo SEA
PPT
Lotus Collaboration by Le Thanh Quang in CT
PPT
Os gi introduction made by Ly MInh Phuong-SOC team
PPT
eXo Presentation: Bonita by Nguyen Anh Vu
MEET-BIS Vietnam TOR_ ES_2013
Shell scripting - By Vu Duy Tu from eXo Platform SEA
Web accessibility developers by Bao AN - eXo SEA
Secure & authentication By Lai HIEU - eXo SEA
Lotus Collaboration by Le Thanh Quang in CT
Os gi introduction made by Ly MInh Phuong-SOC team
eXo Presentation: Bonita by Nguyen Anh Vu

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Big Data Technologies - Introduction.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Tartificialntelligence_presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation_ Review paper, used for researhc scholars
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
Getting Started with Data Integration: FME Form 101
Big Data Technologies - Introduction.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Group 1 Presentation -Planning and Decision Making .pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Tartificialntelligence_presentation.pptx
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence

Regular expression made by To Minh Hoang - Portal team

  • 1. Regular Expressions Minh Hoang TO Portal Team
  • 2. Agenda Finite State Machine Pattern Parser Java Regex Parsers in GateIn Advanced Theory
  • 8. Finite State Machine - FSM Behavioral model to describe working flow of a system
  • 9. Finite State Machine - FSM Directed graph with labeled edges
  • 11. Classic Problem A – Finite characters set Ex: A = {a, b, c, d,..., z} or A = { a, b, c,..., z, public, class, extends, implements, while, if,...} Pattern P and input sequence INPUT made of A 's elements Ex: P = “a.*b” or P = “class.*extends.*” INPUT = “aaabbbcc” or INPUT = a Java source file -> Parser reads character-by-character INPUT and recognizes all subsequences matching pattern P
  • 12. Classic Problem - Samples Split a sequence of characters into an array of subsequences String path = “/portal/en/classic/home”; String[] segments = path.split(“/”); Handle comment block encountered in a file Override readLine() in BufferedReader Extract data from REST response Write an XML parser from scratch
  • 13. Finite State Machine & Classic Problem Acceptor FSM? How to transform Classic Problem into graph traversing problem with well-known generic solution? Find pattern occurrences ↔ Traversing directed graph with labeled edges
  • 14. FSM – Word Accepting Consider a word W – sequence of characters from character set A W = “abcd...xyz” FSM having graph edges labeled with characters from A , accepts W if there exists a path connecting START node to one of END nodes START = S1 -> S2 -> … -> Sn = END 1. Duplicate of intermediate nodes is allowed 2 . The transition from S_i -> S_(i+1) is determined (labeled) by i-th character of W
  • 15. Acceptor FSM Given a pattern P , a FSM is called Acceptor FSM if it accepts any word matching pattern P . Ex: Acceptor FSM of “a[0-9]b” accepts any elements from word set { “a0b”, “a1b”, “a2b”, “a3b”, “a4b”, “a5b”, “a6b”, “a7b”, “a8b”, “a9b”}
  • 16. How Pattern Parser Works? Traversing directed graph associated with Acceptor FSM 1. Start from root node 2. Read next characters from INPUT, then makes move according to transition rules 3. Repeat second step until visiting one leaf node or INPUT becomes empty 4. Return OK if leaf node refers to success match.
  • 17. Example One Recognize pattern eXo.*er in: AAA eXo123er BBB eXoer CCC eXoeXoer DDD
  • 18. Example One Acceptor FSM with 8 states: START – Start reading input sequence e – encounter e eX – encounter eX eXo – encounter eXo eXo.* – encounter eXo.* eXo.*e – encounter eXo.*e END – subsequence matching eXo.*er found FAILURE
  • 19.  
  • 20. Example Two Recognize comment block /* */ in: /* Don't ask * / final int innerClassVariable;
  • 21. Example Two Acceptor FSM with 5 states: START – start reading input sequence OUT – stay away from comment blocks ENTERING – at the beginning of comment block IN – stay inside a comment block LEAVING – at the end of comment block
  • 22.  
  • 23. Finite State Machine With Stack Example Two is slightly harder than Example One as transition decision depends on past information -> We must keep something in memory FSM with Stack = Ordinary FSM + Stack Structure storing past info Contextual transition is determined by ( next input character , stack state )
  • 25. Model Pattern: Acceptor Finite State Machine Matcher: Parser
  • 26. java.util.regex.Pattern Construct FSM accepting pattern Pattern p = Pattern.compile(“a.*b”); FSM states are instances of java.util.regex.Pattern$Node Generate parser working on input sequence Matcher matcher = p.matcher(“aaabbbb”);
  • 27. java.util.regex.Matcher Find next subsequence matching pattern find() Get capturing groups from latest match group()
  • 28. Capturing Group Two Pattern objects Pattern p = Pattern.compile(“abcd.*efgh”); Pattern q = Pattern.compile(“abcd(.*)efgh”); String text = “abcd12345efgh”; Matcher pM = p.match(text); Matcher qM = q.match(text); pM.find() == qM.find(); pM.group(1) != qM.group(1);
  • 29. Capturing Group Hold additional information on each match while(matcher.find()) { matcher.group(index); } Pattern P = (A)(B(C)) matcher.group(0) = the whole sequence ABC matcher.group(1) = ABC matcher.group(2) = BC matcher.group(3) = C
  • 30. Capturing Group Pattern.compile(“abc(defgh”); Pattern.compile(“abcdef)gh”); -> PatternSyntaxException Pattern.compile(“abc\\(defgh”); Pattern.compile(“abcdef\\)gh”); -> Success thanks to escape character '\'
  • 31. Operators Union [a-zA-Z-0-9] Negation [^abc] [^X]
  • 32. Contextual Match X(?=Y) Once match X, look ahead to find Y X(?!= Y) Once match X, look ahead and expect not find Y X(?<= Y) Once match X, look behind to find Y X(?<!= Y) Once match X, look behind and expect not find Y
  • 33. Tips Pattern is stateless -> Maximize reuse We often see: static final Pattern p = Pattern.compile(“a*b”); Be careful with String.split String.split vs Java loop + String.charAt
  • 35. Parsers in GateIn JavaScript Compressor CSS Compressor Groovy Template Optimizer Navigation Controller Extracting URL param = Regex matching + Backtracking algorithm StaxNavigator (Nice XML parser based on StAX)
  • 37. Grammar & Language Any word matching pattern eXo.*er is a combination transforms, starting from S S -> eXoQer Q -> RQT Q -> '' R -> {a,b,c,d,...} T -> {a,b,c,d,...} Language of a Grammar = Vocabularies generated by finite-combination of transforms, starting from S Ex: Any valid Java source file is generated by a finite number of transforms mentioned in Java Grammar (JLS)
  • 38. Finite State Machine & Language Language accepted by a FSM with Stack must be built from a context-free grammar Explicit steps to build such context-free grammar are described in Kleene theorem Context-free grammar Language is accepted by a FSM with Stack Explicit steps to build such Finite State Machine are described in Kleene theorem