SlideShare a Scribd company logo
Regular Expressions
Powerful string validation and extraction
Ignaz Wanders – Architect @ Archimiddle
@ignazw
Topics
• What are regular expressions?
• Patterns
• Character classes
• Quantifiers
• Capturing groups
• Boundaries
• Internationalization
• Regular expressions in Java
• Quiz
• References
What are regular expressions?
• A regex is a string pattern used to search and manipulate text
• A regex has special syntax
• Very powerful for any type of String manipulation ranging from simple to very
complex structures:
– Input validation
– S(ubs)tring replacement
– ...
• Example:
• [A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4}
History
• Originates from automata and formal-language theories of computer science
• Stephen Kleene  50’s: Kleene algebra
• Kenneth Thompson  1969: unix: qed, ed
• 70’s - 90’s: unix: grep, awk, sed, emacs
• Programming languages:
– C, Perl
– JavaScript, Java
Patterns
• Regex is based on pattern matching: Strings are searched for certain patterns
• Simplest regex is a string-literal pattern
• Metacharacters: ([{^$|)?*+.
– Period means “any character”
– To search for period as string literal, escape with “”
REGEX: fox
TEXT: The quick brown fox
RESULT: fox
REGEX: fo.
TEXT: The quick brown fox
RESULT: fox
REGEX: .o.
TEXT: The quick brown fox
RESULT: row, fox
Character classes (1/3)
• Syntax: any characters between [ and ]
• Character classes denote one letter
• Negation: ^
REGEX: [rcb]at
TEXT: bat
RESULT: bat
REGEX: [rcb]at
TEXT: rat
RESULT: rat
REGEX: [rcb]at
TEXT: cat
RESULT: cat
REGEX: [rcb]at
TEXT: hat
RESULT: -
REGEX: [^rcb]at
TEXT: rat
RESULT: -
REGEX: [^rcb]at
TEXT: hat
RESULT: hat
Character classes (2/3)
• Ranges: [a-z], [0-9], [i-n], [a-zA-Z]...
• Unions: [0-4[6-8]], [a-p[r-w]], ...
• Intersections: [a-f&&[efg]], [a-f&&[e-k]], ...
• Subtractions: [a-f&&[^efg]], ...
REGEX: [rcb]at[1-5]
TEXT: bat4 RESULT: bat4
REGEX: [rcb]at[1-5[7-8]]
TEXT: hat7 RESULT: -
REGEX: [rcb]at[1-7&&[78]]
TEXT: rat7 RESULT: rat7
REGEX: [rcb]at[1-5&&[^34]]
TEXT: bat4 RESULT: -
Character classes (3/3)
predefined character classes equivalence
. any character
d any digit [0-9]
D any non-digit [^0-9], [^d]
s any white-space character [ tnx0Bfr]
S any non-white-space character [^s]
w any word character [a-zA-Z_0-9]
W any non-word character [^w]
Quantifiers (1/5)
• Quantifiers allow character classes to match more than one character at a time.
Quantifiers for character classes X
X? zero or one time
X* zero or more times
X+ one or more times
X{n} exactly n times
X{n,} at least n times
X{n,m} at least n and at most m times
Quantifiers (2/5)
• Examples of X?, X*, X+
REGEX: “a?”
TEXT: “”
RESULT: “”
REGEX: “a*”
TEXT: “”
RESULT: “”
REGEX: “a+”
TEXT: “”
RESULT: -
REGEX: “a?”
TEXT: “a”
RESULT: “a”
REGEX: “a*”
TEXT: “a”
RESULT: “a”
REGEX: “a+”
TEXT: “a”
RESULT: “a”
REGEX: “a?”
TEXT: “aaa”
RESULT:
“a”,”a”,”a”
REGEX: “a*”
TEXT: “aaa”
RESULT: “aaa”
REGEX: “a+”
TEXT: “aaa”
RESULT: “aaa”
Quantifiers (3/5)
REGEX: “[abc]{3}”
TEXT: “abccabaaaccbbbc”
RESULT: “abc”,”cab”,”aaa”,”ccb”,”bbc”
REGEX: “abc{3}”
TEXT: “abccabaaaccbbbc”
RESULT: -
REGEX: “(dog){3}”
TEXT: “dogdogdogdogdogdog”
RESULT: “dogdogdog”,”dogdogdog”
Quantifiers (4/5)
• Greedy quantifiers:
– read complete string
– work backwards until match found
– syntax: X?, X*, X+, ...
• Reluctant quantifiers:
– read one character at a time
– work forward until match found
– syntax: X??, X*?, X+?, ...
• Possessive quantifiers:
– read complete string
– try match only once
– syntax: X?+, X*+, X++, ...
Quantifiers (5/5)
REGEX: “.*foo”
TEXT: “xfooxxxxxxfoo”
RESULT: “xfooxxxxxxfoo”
REGEX: .*?foo”
TEXT: “xfooxxxxxxfoo”
RESULT: “xfoo”, “xxxxxxfoo”
REGEX: “.*+foo”
TEXT: “xfooxxxxxxfoo”
RESULT: -
greedy
reluctant
possessive
Capturing groups (1/2)
• Capturing groups treat multiple characters as a single unit
• Syntax: between braces ( and )
• Example: (dog){3}
• Numbering from left to right
– Example: ((A)(B(C)))
• Group 1: ((A)(B(C)))
• Group 2: (A)
• Group 3: (B(C))
• Group 4: (C)
Capturing groups (2/2)
• Backreferences to capturing groups are denoted by i with i an integer number
REGEX: “(dd)1”
TEXT: “1212”
RESULT: “1212”
REGEX: “(dd)1”
TEXT: “1234”
RESULT: -
Boundaries (1/2)
Boundary characters
^ beginning of line
$ end of line
b a word boundary
B a non-word boundary
A beginning of input
G end of previous match
z end of input
Z end of input, but before final terminator, if any
Boundaries (2/2)
• Be aware:
• End-of-line marker is $
– Unix EOL is n
– Windows EOL is rn
– JDK uses any of the following as EOL:
• 'n', 'rn', 'u0085', 'u2028', 'u2029'
• Always test your regular expressions on the target OS
Internationalization (1/2)
• Regular expressions originally designed for the ascii Basic Latin set of characters.
– Thus “België” is not matched by ^w+$
• Extension to unicode character sets denoted by p{...}
• Character set: [p{InCharacterSet}]
– Create character classes from symbols in character sets.
– “België” is matched by ^*w|[p{InLatin-1Supplement}]]+$
Internationalization (2/2)
• Note that there are non-letters in character sets as well:
– Latin-1 Supplement:
• Categories:
– Letters: p{L}
– Uppercase letters: p{Lu}
– “België” is matched by ^p{L}+$
• Other (POSIX) categories:
– Unicode currency symbols: p{Sc}
– ASCII punctuation characters: p{Punct}
¡¢£¤¥¦§¨©ª«-®¯°±²³´µ·¸¹º»¼½¾¿÷
Regular expressions in Java
• Since JDK 1.4
• Package java.util.regex
– Pattern class
– Matcher class
• Convenience methods in java.lang.String
• Alternative for JDK 1.3
– Jakarta ORO project
java.util.regex.Pattern
• Wrapper class for regular expressions
• Useful methods:
– compile(String regex): Pattern
– matches(String regex, CharSequence text): boolean
– split(String text): String[]
String regex = “(dd)1”;
Pattern p = Pattern.compile(regex);
java.util.regex.Matcher
• Useful methods:
– matches(): boolean
– find(): boolean
– find(int start): boolean
– group(): String
– replaceFirst(String replace): String
– replaceAll(String replace): String
String regex = “(dd)1”;
Pattern p = Pattern.compile(regex);
String text = “1212”;
Matcher m = p.matcher(text);
boolean matches = m.matches();
java.lang.String
• Pattern and Matcher methods in String:
– matches(String regex): boolean
– split(String regex): String[]
– replaceFirst(String regex, String replace): String
– replaceAll(String regex, String replace): String
Examples
• Validation
• Searching text
• Filtering
• Parsing
• Removing duplicate lines
• On-the-fly editing
Examples: validation
• Validate an e-mail address
• A URL
[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4}
(http|https|ftp)://([a-zA-Z0-9](w+.)+w{2,7}
|localw*)(:d+)?(/(w+[w/-.]*)?)?
Examples: searching text
• Write HttpUnit test to submit HTML form and check whether HTTP response is a
confirmation screen containing a generated form number of the form 9xxxxxx-
xxxxxx:
9[0-9]{6}-[0-9]{6}
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(text);
boolean ok = m.find();
String nr = m.group();
Examples: filtering
• Filter e-mail with subjects with capitals only, and including a leading “Re:”
(R[eE]:)*[^a-z]*$
Examples: parsing
• Matches any opening and closing XML tag:
– Note the use of the back reference
<([A-Z][A-Z0-9]*)[^>]*>(.*?)</1>
Examples: duplicate lines
• Suppose you want to remove duplicate lines from a text.
– requirement here is that the lines are sorted alphabetically
^(.*)(r?n1)+$
Examples: on-the-fly editing
• Suppose you want to edit a file in batch: all occurrances of a certain string pattern
should be replaced with another string.
• In unix: use the sed command with a regex
• In Java: use string.replaceAll(regex,”mystring”)
• In Ant: use replaceregexp optional task to, e.g., edit deployment descriptors
depending on environment
Quiz
• What are the following regular expressions looking for?
d+ at least one digit
[-+]?d+ any integer
((d*.?)?d+|d+(.?d*)) any positive decimal
[p{L}']['-.p{L} ]+ a place name
Conclusion
• When doing one of the following:
– validating strings
– on-the-fly editing of strings
– searching strings
– filtering strings
• think regex!
References
• http://www.regular-expressions.info/
• http://www.regexlib.com/
• http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/
• http://java.sun.com/docs/books/tutorial/extra/regex/
• http://www.wellho.net/regex/javare.html
• >JDK 1.4 API
• Mastering Regular Expressions

More Related Content

KEY
Regular Expressions 101
PPTX
Regular Expression
ODP
Regex Presentation
PDF
Introducing Regular Expressions
PPTX
Regular Expression (Regex) Fundamentals
PPT
Regular Expressions
PDF
Advanced regular expressions
PPTX
Regular expressions
Regular Expressions 101
Regular Expression
Regex Presentation
Introducing Regular Expressions
Regular Expression (Regex) Fundamentals
Regular Expressions
Advanced regular expressions
Regular expressions

What's hot (20)

PPTX
Regular Expressions 101 Introduction to Regular Expressions
PPTX
Introduction to Regular Expressions
PDF
Regex cheatsheet
PDF
Regular expression
PDF
Regex - Regular Expression Basics
ODP
Regular Expression
PDF
COMPILER DESIGN- Syntax Directed Translation
PDF
Formal methods 4 - Z notation
PPTX
Input output statement in C
PPTX
String in programming language in c or c++
PPTX
Regular Expression in Compiler design
PPTX
Kruskal Algorithm
PPT
Regex Basics
PPTX
PROLOG: Introduction To Prolog
PPT
Lex and Yacc ppt
PPTX
Windows PowerShell
PPTX
Theory of Computation Unit 3
PPTX
Bootstrapping in Compiler
PDF
Context free langauges
Regular Expressions 101 Introduction to Regular Expressions
Introduction to Regular Expressions
Regex cheatsheet
Regular expression
Regex - Regular Expression Basics
Regular Expression
COMPILER DESIGN- Syntax Directed Translation
Formal methods 4 - Z notation
Input output statement in C
String in programming language in c or c++
Regular Expression in Compiler design
Kruskal Algorithm
Regex Basics
PROLOG: Introduction To Prolog
Lex and Yacc ppt
Windows PowerShell
Theory of Computation Unit 3
Bootstrapping in Compiler
Context free langauges
Ad

Viewers also liked (17)

PDF
Lecture: Regular Expressions and Regular Languages
PPTX
Regular expression (compiler)
PPTX
Learn PHP Lacture1
PDF
Regular Expressions: JavaScript And Beyond
PPT
Introduction to regular expressions
PPTX
Bitcoin: the future money, or a scam?
PPTX
The Service doing "Ping"
PPTX
Reflexive Access List
PPT
Regular Expressions
DOCX
Tests
PPTX
Regular expression examples
PPT
Lecture2 B
PPTX
Web Service Versioning
PPT
Lecture 03 lexical analysis
PPTX
Finite Automata
PPT
Regular expression with DFA
PDF
Field Extractions: Making Regex Your Buddy
Lecture: Regular Expressions and Regular Languages
Regular expression (compiler)
Learn PHP Lacture1
Regular Expressions: JavaScript And Beyond
Introduction to regular expressions
Bitcoin: the future money, or a scam?
The Service doing "Ping"
Reflexive Access List
Regular Expressions
Tests
Regular expression examples
Lecture2 B
Web Service Versioning
Lecture 03 lexical analysis
Finite Automata
Regular expression with DFA
Field Extractions: Making Regex Your Buddy
Ad

Similar to Regular expressions (20)

PPT
Expresiones regulares, sintaxis y programación en JAVA
PPT
16 Java Regex
PDF
Java Regular Expression PART I
PDF
Java Regular Expression PART I
PPT
Regex Experession with Regex functions o
PPTX
Mikhail Khristophorov "Introduction to Regular Expressions"
PDF
Regular Expressions Cheat Sheet
PDF
Python Programming - XI. String Manipulation and Regular Expressions
PPTX
Regular expressions
PPTX
NUS_NLP__Foundations_-_Section_2_-_Words.pptx
PDF
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
PPTX
Regex lecture
PDF
Basta mastering regex power
PPTX
Regex1.1.pptx
PPTX
Regular expressions
PDF
An Introduction to Regular expressions
PDF
Regular expressions
PPTX
Regular Expressions Introduction Anthony Rudd CS
PPTX
Regular Expression Crash Course
ODP
Regular Expressions and You
Expresiones regulares, sintaxis y programación en JAVA
16 Java Regex
Java Regular Expression PART I
Java Regular Expression PART I
Regex Experession with Regex functions o
Mikhail Khristophorov "Introduction to Regular Expressions"
Regular Expressions Cheat Sheet
Python Programming - XI. String Manipulation and Regular Expressions
Regular expressions
NUS_NLP__Foundations_-_Section_2_-_Words.pptx
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
Regex lecture
Basta mastering regex power
Regex1.1.pptx
Regular expressions
An Introduction to Regular expressions
Regular expressions
Regular Expressions Introduction Anthony Rudd CS
Regular Expression Crash Course
Regular Expressions and You

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Approach and Philosophy of On baking technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
Understanding_Digital_Forensics_Presentation.pptx

Regular expressions

  • 1. Regular Expressions Powerful string validation and extraction Ignaz Wanders – Architect @ Archimiddle @ignazw
  • 2. Topics • What are regular expressions? • Patterns • Character classes • Quantifiers • Capturing groups • Boundaries • Internationalization • Regular expressions in Java • Quiz • References
  • 3. What are regular expressions? • A regex is a string pattern used to search and manipulate text • A regex has special syntax • Very powerful for any type of String manipulation ranging from simple to very complex structures: – Input validation – S(ubs)tring replacement – ... • Example: • [A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4}
  • 4. History • Originates from automata and formal-language theories of computer science • Stephen Kleene  50’s: Kleene algebra • Kenneth Thompson  1969: unix: qed, ed • 70’s - 90’s: unix: grep, awk, sed, emacs • Programming languages: – C, Perl – JavaScript, Java
  • 5. Patterns • Regex is based on pattern matching: Strings are searched for certain patterns • Simplest regex is a string-literal pattern • Metacharacters: ([{^$|)?*+. – Period means “any character” – To search for period as string literal, escape with “” REGEX: fox TEXT: The quick brown fox RESULT: fox REGEX: fo. TEXT: The quick brown fox RESULT: fox REGEX: .o. TEXT: The quick brown fox RESULT: row, fox
  • 6. Character classes (1/3) • Syntax: any characters between [ and ] • Character classes denote one letter • Negation: ^ REGEX: [rcb]at TEXT: bat RESULT: bat REGEX: [rcb]at TEXT: rat RESULT: rat REGEX: [rcb]at TEXT: cat RESULT: cat REGEX: [rcb]at TEXT: hat RESULT: - REGEX: [^rcb]at TEXT: rat RESULT: - REGEX: [^rcb]at TEXT: hat RESULT: hat
  • 7. Character classes (2/3) • Ranges: [a-z], [0-9], [i-n], [a-zA-Z]... • Unions: [0-4[6-8]], [a-p[r-w]], ... • Intersections: [a-f&&[efg]], [a-f&&[e-k]], ... • Subtractions: [a-f&&[^efg]], ... REGEX: [rcb]at[1-5] TEXT: bat4 RESULT: bat4 REGEX: [rcb]at[1-5[7-8]] TEXT: hat7 RESULT: - REGEX: [rcb]at[1-7&&[78]] TEXT: rat7 RESULT: rat7 REGEX: [rcb]at[1-5&&[^34]] TEXT: bat4 RESULT: -
  • 8. Character classes (3/3) predefined character classes equivalence . any character d any digit [0-9] D any non-digit [^0-9], [^d] s any white-space character [ tnx0Bfr] S any non-white-space character [^s] w any word character [a-zA-Z_0-9] W any non-word character [^w]
  • 9. Quantifiers (1/5) • Quantifiers allow character classes to match more than one character at a time. Quantifiers for character classes X X? zero or one time X* zero or more times X+ one or more times X{n} exactly n times X{n,} at least n times X{n,m} at least n and at most m times
  • 10. Quantifiers (2/5) • Examples of X?, X*, X+ REGEX: “a?” TEXT: “” RESULT: “” REGEX: “a*” TEXT: “” RESULT: “” REGEX: “a+” TEXT: “” RESULT: - REGEX: “a?” TEXT: “a” RESULT: “a” REGEX: “a*” TEXT: “a” RESULT: “a” REGEX: “a+” TEXT: “a” RESULT: “a” REGEX: “a?” TEXT: “aaa” RESULT: “a”,”a”,”a” REGEX: “a*” TEXT: “aaa” RESULT: “aaa” REGEX: “a+” TEXT: “aaa” RESULT: “aaa”
  • 11. Quantifiers (3/5) REGEX: “[abc]{3}” TEXT: “abccabaaaccbbbc” RESULT: “abc”,”cab”,”aaa”,”ccb”,”bbc” REGEX: “abc{3}” TEXT: “abccabaaaccbbbc” RESULT: - REGEX: “(dog){3}” TEXT: “dogdogdogdogdogdog” RESULT: “dogdogdog”,”dogdogdog”
  • 12. Quantifiers (4/5) • Greedy quantifiers: – read complete string – work backwards until match found – syntax: X?, X*, X+, ... • Reluctant quantifiers: – read one character at a time – work forward until match found – syntax: X??, X*?, X+?, ... • Possessive quantifiers: – read complete string – try match only once – syntax: X?+, X*+, X++, ...
  • 13. Quantifiers (5/5) REGEX: “.*foo” TEXT: “xfooxxxxxxfoo” RESULT: “xfooxxxxxxfoo” REGEX: .*?foo” TEXT: “xfooxxxxxxfoo” RESULT: “xfoo”, “xxxxxxfoo” REGEX: “.*+foo” TEXT: “xfooxxxxxxfoo” RESULT: - greedy reluctant possessive
  • 14. Capturing groups (1/2) • Capturing groups treat multiple characters as a single unit • Syntax: between braces ( and ) • Example: (dog){3} • Numbering from left to right – Example: ((A)(B(C))) • Group 1: ((A)(B(C))) • Group 2: (A) • Group 3: (B(C)) • Group 4: (C)
  • 15. Capturing groups (2/2) • Backreferences to capturing groups are denoted by i with i an integer number REGEX: “(dd)1” TEXT: “1212” RESULT: “1212” REGEX: “(dd)1” TEXT: “1234” RESULT: -
  • 16. Boundaries (1/2) Boundary characters ^ beginning of line $ end of line b a word boundary B a non-word boundary A beginning of input G end of previous match z end of input Z end of input, but before final terminator, if any
  • 17. Boundaries (2/2) • Be aware: • End-of-line marker is $ – Unix EOL is n – Windows EOL is rn – JDK uses any of the following as EOL: • 'n', 'rn', 'u0085', 'u2028', 'u2029' • Always test your regular expressions on the target OS
  • 18. Internationalization (1/2) • Regular expressions originally designed for the ascii Basic Latin set of characters. – Thus “België” is not matched by ^w+$ • Extension to unicode character sets denoted by p{...} • Character set: [p{InCharacterSet}] – Create character classes from symbols in character sets. – “België” is matched by ^*w|[p{InLatin-1Supplement}]]+$
  • 19. Internationalization (2/2) • Note that there are non-letters in character sets as well: – Latin-1 Supplement: • Categories: – Letters: p{L} – Uppercase letters: p{Lu} – “België” is matched by ^p{L}+$ • Other (POSIX) categories: – Unicode currency symbols: p{Sc} – ASCII punctuation characters: p{Punct} ¡¢£¤¥¦§¨©ª«-®¯°±²³´µ·¸¹º»¼½¾¿÷
  • 20. Regular expressions in Java • Since JDK 1.4 • Package java.util.regex – Pattern class – Matcher class • Convenience methods in java.lang.String • Alternative for JDK 1.3 – Jakarta ORO project
  • 21. java.util.regex.Pattern • Wrapper class for regular expressions • Useful methods: – compile(String regex): Pattern – matches(String regex, CharSequence text): boolean – split(String text): String[] String regex = “(dd)1”; Pattern p = Pattern.compile(regex);
  • 22. java.util.regex.Matcher • Useful methods: – matches(): boolean – find(): boolean – find(int start): boolean – group(): String – replaceFirst(String replace): String – replaceAll(String replace): String String regex = “(dd)1”; Pattern p = Pattern.compile(regex); String text = “1212”; Matcher m = p.matcher(text); boolean matches = m.matches();
  • 23. java.lang.String • Pattern and Matcher methods in String: – matches(String regex): boolean – split(String regex): String[] – replaceFirst(String regex, String replace): String – replaceAll(String regex, String replace): String
  • 24. Examples • Validation • Searching text • Filtering • Parsing • Removing duplicate lines • On-the-fly editing
  • 25. Examples: validation • Validate an e-mail address • A URL [A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4} (http|https|ftp)://([a-zA-Z0-9](w+.)+w{2,7} |localw*)(:d+)?(/(w+[w/-.]*)?)?
  • 26. Examples: searching text • Write HttpUnit test to submit HTML form and check whether HTTP response is a confirmation screen containing a generated form number of the form 9xxxxxx- xxxxxx: 9[0-9]{6}-[0-9]{6} Pattern p = Pattern.compile(regexp); Matcher m = p.matcher(text); boolean ok = m.find(); String nr = m.group();
  • 27. Examples: filtering • Filter e-mail with subjects with capitals only, and including a leading “Re:” (R[eE]:)*[^a-z]*$
  • 28. Examples: parsing • Matches any opening and closing XML tag: – Note the use of the back reference <([A-Z][A-Z0-9]*)[^>]*>(.*?)</1>
  • 29. Examples: duplicate lines • Suppose you want to remove duplicate lines from a text. – requirement here is that the lines are sorted alphabetically ^(.*)(r?n1)+$
  • 30. Examples: on-the-fly editing • Suppose you want to edit a file in batch: all occurrances of a certain string pattern should be replaced with another string. • In unix: use the sed command with a regex • In Java: use string.replaceAll(regex,”mystring”) • In Ant: use replaceregexp optional task to, e.g., edit deployment descriptors depending on environment
  • 31. Quiz • What are the following regular expressions looking for? d+ at least one digit [-+]?d+ any integer ((d*.?)?d+|d+(.?d*)) any positive decimal [p{L}']['-.p{L} ]+ a place name
  • 32. Conclusion • When doing one of the following: – validating strings – on-the-fly editing of strings – searching strings – filtering strings • think regex!
  • 33. References • http://www.regular-expressions.info/ • http://www.regexlib.com/ • http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/ • http://java.sun.com/docs/books/tutorial/extra/regex/ • http://www.wellho.net/regex/javare.html • >JDK 1.4 API • Mastering Regular Expressions