SlideShare a Scribd company logo
Looking for Patterns - Finding
them with Regular
Expressions
Presented by Keith Wright
One Course Source
keith@OneCourseSource.com
From http://xkcd.com/1171/
If this is how you think of regular expression now…
Regular expressions…
REGULAR EXPRESSIONS ARE…
➢Strings used to search for patterns in text
➢More powerful than wildcards
➢Available in many programming languages and
programs
➢Also known as "regexp", "RegEx", and "RE"
RE DOS AND DON'TS…
✔ Input Validation
✔ Data Extraction
✔ Data Elimination
✔ Search/Replace
Do this… Don't do this…
✗Parsing
✗Allow publicly available searches
✗Use where better tools exists
✗Where using a procedure would be better
RE ARE AVAILABLE IN…AND MORE!
 .NET
 C#
 Delphi
 Java
 JavaScript
 Perl
 PCRE
 PHP
 Python
 Ruby
 Tcl
 PowerShell
POSIX PROGRAMS USING RE
awk
pattern scanning and
processing language
find
utility to search for files
grep
utility to print lines
matching a pattern
sed
stream editor for filtering
and transforming text
POSIX PROGRAMS SUPPORT RE…
Basic Regular Expressions (BRE)
Character classes [ ]
Named Character classes
[[:digit:]]
Asterisk *
Dot .
Carat ^
Dollar $
Backslashed Braces { }
Backslashed Parens ( )
Extended Regular Expressions (ERE)
Question mark ?
Plus sign +
Pipe symbol |
Braces { }
Parentheses ( )
All other BRE
grep [options] 'pattern' [file…]
grep is command line tool for
printing lines that match a pattern
Useful for demonstrating how
regular expressions work
By default, grep interprets regular
expressions as BRE
Using egrep, or grep -E interprets
regular expressions as ERE
• --color=auto highlights the part of the
line that matched the pattern
• -i is used to make grep case-
insensitive
• -c is used to have grep report a count
of the lines that matched
• -v is used to print the lines that don't
match the pattern
BASIC RE LITERALS
Alphanumeric characters and
non-regular expression
characters match themselves
Regular expression characters
will match themselves if
preceded by the backslash
character
RE DOT (PERIOD)
The dot . will match any single
character
To match the dot itself, it must be
preceded by a backslash
The RE .* is used to match an
entire string
RE CHARACTER CLASSES
Character classes match a single
character in the list or range enclosed
by brackets [ ]
If the first character enclosed is the
carat ^, then the list or range is
negated
To match the right square bracket ] it
must be the first character enclosed.
To not match it, it must be the second
character after a carat
To match a hyphen, it can be the first
or last character enclosed. To not
match it, it must be the second
character after a carat
RE NAMED CHARACTER CLASSES
Named character classes must
be enclosed in brackets like
[[:xdigit:]]
Many are available: [:alnum:],
[:alpha:], [:cntrl:], [:digit:],
[:graph:], [:lower:], [:print:],
[:punct:], [:space:], [:upper:],
and [:xdigit:]
RE CARAT ANCHOR
The character after the carat
character ^ must appear at the
beginning of the text
If used as the first character in
square brackets, it negates the list
or range of characters
If preceded by the backslash, the
carat character loses it's special
meaning
RE DOLLAR SIGN ANCHOR
The character before the dollar
sign character $ must appear at
the end of the text
If not at the end of the regular
expression, then the dollar sign
loses it's special meaning
When combined with the carat
character ^, the dollar sign
character $ must match the entire
text
RE REPETITION
Basic Regular Expressions
* preceding item repeated zero or more
times or {0,}
+ preceding item repeated one or more
times or {1,}
? preceding item is optional or {0,1}
{n} preceding item repeated exactly n
times
{n,} preceding item repeated n or more
times
{,m} preceding item matched at most m
times
{n,m} preceding item matched at least n
times, but not more than m times
Extended Regular Expressions
* preceding item repeated zero or more
times or {0,}
+ preceding item repeated one or more
times or {1,}
? preceding item is optional or {0,1}
{n} preceding item repeated exactly n
times
{n,} preceding item repeated n or more
times
{,m} preceding item matched at most m
times
{n,m} preceding item matched at least n
times, but not more than m times
RE ASTERISK
The asterisk * will match zero or
more of the item that precedes it
The asterisk is equivalent to the
BRE {0,} and the ERE {0,}
expressions for zero or more
A single item followed by an
asterisk will always match
To match an asterisk, it can be
preceded by a backslash
RE PLUS SIGN
In BRE, the backslashed plus sign +
will match one or more of the item
that precedes it
In ERE, the plus sign + will match one
or more of the item that precedes it
The plus sign is equivalent to the
BRE {1,} and the ERE {1,}
expressions for one or more
In BRE, the plus sign matches itself. In
ERE to match a plus sign, it can be
preceded by a backslash
RE QUESTION MARK
In BRE, the backslashed
question mark ? optionally
matches the item that
precedes it
In ERE, the question mark will
optionally match the item that
precedes it
The question mark equivalent
to the BRE {0,1} and the ERE
{0,1} expressions for zero to one
In BRE, the question mark
matches itself. In ERE to match
a question mark, it can be
preceded by a backslash
RE GROUPING
In BRE, the backslashed parentheses ( and ) are
used to create groups of characters that may
repeat as specified by repetition expressions
In ERE, the parentheses ( and ) are used to create
groups of characters that may repeat as specified
by repetition expressions
In BRE, the parentheses will match themselves, and
in ERE they can be matched if backslashed
RE ALTERNATION
In ERE, the pipe symbol | can
be used to perform alternation
Alternation allows for two or
more alternatives to match as
separated by the pipe symbol |
In BRE, the pipe symbol | will
match itself, and in ERE it will
match if backslashed
PERL US POSTAL CODE EXAMPLE
^d{5}((-|s)?d{4})?$
^ - Starts with
d{5} - exactly five digits
()? - optional group (two)
-|s - hyphen or whitespace
d{4} - exactly four digits
$ - Ends with
To use the perl debugger
type:
perl -d -e1
PERL CHARACTER SEQUENCES
w Alphanumeric and _ (word
characters)
W Not word characters
d Digit characters
D Not digit characters
s Whitespace characters
S Not whitespace characters
b Word boundaries
• grep supports the perl character
sequences in ERE except d
and D
PYTHON PROTOCOL EXAMPLE
(mailto:|(news|(ht|f)tp(s?))://){1}
(){1} - group repeats only once
mailto: - mailto followed by a
colon
| - separates alternatives
news|(ht|f)tp - news, http or ftp
(ht|f)tp(s?) - optional s added
:// - added to news, http, https,
ftp, or ftps
• To start the python shell type:
python
USE THE LIBRARY
RegExLib.com
The Regular Expression Library
Comes with a cheat sheet
A Regular Expression tester
Search thousands of rated expressions
You don't have to reinvent the wheel!
From http://xkcd.com/208/
About One Course Source
➢Online public classes (Linux, Programming & Security)
➢Custom corporate classes
➢Develop custom training programs
www.OneCourseSource.com

More Related Content

PPT
Regular Expressions 2007
KEY
Andrei's Regex Clinic
PPT
Textpad and Regular Expressions
PPT
Regular Expressions grep and egrep
ODP
Regular Expression
PPT
Regular Expressions
PPT
Php String And Regular Expressions
PPTX
Finaal application on regular expression
Regular Expressions 2007
Andrei's Regex Clinic
Textpad and Regular Expressions
Regular Expressions grep and egrep
Regular Expression
Regular Expressions
Php String And Regular Expressions
Finaal application on regular expression

What's hot (20)

PPT
The Power of Regular Expression: use in notepad++
PPTX
Regular expression
PDF
Introduction_to_Regular_Expressions_in_R
PPTX
Regular Expressions 101 Introduction to Regular Expressions
PDF
Basta mastering regex power
PPTX
Regular Expressions in Stata
ODP
Regex Presentation
PPT
Regular Expressions in PHP, MySQL by programmerblog.net
PPTX
Introduction to Regular Expressions
PPT
Introduction to regular expressions
PDF
Python (regular expression)
PPT
PHP Regular Expressions
PPTX
Processing Regex Python
PPT
Adv. python regular expression by Rj
PPTX
Regular Expression
PPTX
Regex posix
PPTX
Regular expressions
PPTX
Regular Expressions in PHP
PPTX
Regular Expression (Regex) Fundamentals
KEY
Regular Expressions 101
The Power of Regular Expression: use in notepad++
Regular expression
Introduction_to_Regular_Expressions_in_R
Regular Expressions 101 Introduction to Regular Expressions
Basta mastering regex power
Regular Expressions in Stata
Regex Presentation
Regular Expressions in PHP, MySQL by programmerblog.net
Introduction to Regular Expressions
Introduction to regular expressions
Python (regular expression)
PHP Regular Expressions
Processing Regex Python
Adv. python regular expression by Rj
Regular Expression
Regex posix
Regular expressions
Regular Expressions in PHP
Regular Expression (Regex) Fundamentals
Regular Expressions 101
Ad

Similar to Looking for Patterns (20)

PPTX
Chapter 3: Introduction to Regular Expression
PPT
Bioinformatica 06-10-2011-p2 introduction
PDF
Construction of a predictive parsing table.pdf
PDF
Tutorial on Regular Expression in Perl (perldoc Perlretut)
PDF
Working with text, Regular expressions
PPT
Chapter-three automata and complexity theory.ppt
PDF
Maxbox starter20
PPT
regex.ppt
PPTX
Bioinformatics p2-p3-perl-regexes v2014
PPTX
Regular Expression Crash Course
PPTX
Regular expressions
PDF
Regularexpressions
PPTX
Regular Expressions Boot Camp
PPTX
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
PDF
An Introduction to Regular expressions
PDF
Regular expressions in Ruby and Introduction to Vim
PPTX
Unit 1-strings,patterns and regular expressions
PPTX
Strings,patterns and regular expressions in perl
PPTX
REGULAR EXPRESSION FOR NATURAL LANGUAGES
PPTX
Unit 1-array,lists and hashes
Chapter 3: Introduction to Regular Expression
Bioinformatica 06-10-2011-p2 introduction
Construction of a predictive parsing table.pdf
Tutorial on Regular Expression in Perl (perldoc Perlretut)
Working with text, Regular expressions
Chapter-three automata and complexity theory.ppt
Maxbox starter20
regex.ppt
Bioinformatics p2-p3-perl-regexes v2014
Regular Expression Crash Course
Regular expressions
Regularexpressions
Regular Expressions Boot Camp
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
An Introduction to Regular expressions
Regular expressions in Ruby and Introduction to Vim
Unit 1-strings,patterns and regular expressions
Strings,patterns and regular expressions in perl
REGULAR EXPRESSION FOR NATURAL LANGUAGES
Unit 1-array,lists and hashes
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation_ Review paper, used for researhc scholars

Looking for Patterns

  • 1. Looking for Patterns - Finding them with Regular Expressions Presented by Keith Wright One Course Source keith@OneCourseSource.com
  • 2. From http://xkcd.com/1171/ If this is how you think of regular expression now… Regular expressions…
  • 3. REGULAR EXPRESSIONS ARE… ➢Strings used to search for patterns in text ➢More powerful than wildcards ➢Available in many programming languages and programs ➢Also known as "regexp", "RegEx", and "RE"
  • 4. RE DOS AND DON'TS… ✔ Input Validation ✔ Data Extraction ✔ Data Elimination ✔ Search/Replace Do this… Don't do this… ✗Parsing ✗Allow publicly available searches ✗Use where better tools exists ✗Where using a procedure would be better
  • 5. RE ARE AVAILABLE IN…AND MORE!  .NET  C#  Delphi  Java  JavaScript  Perl  PCRE  PHP  Python  Ruby  Tcl  PowerShell
  • 6. POSIX PROGRAMS USING RE awk pattern scanning and processing language find utility to search for files grep utility to print lines matching a pattern sed stream editor for filtering and transforming text
  • 7. POSIX PROGRAMS SUPPORT RE… Basic Regular Expressions (BRE) Character classes [ ] Named Character classes [[:digit:]] Asterisk * Dot . Carat ^ Dollar $ Backslashed Braces { } Backslashed Parens ( ) Extended Regular Expressions (ERE) Question mark ? Plus sign + Pipe symbol | Braces { } Parentheses ( ) All other BRE
  • 8. grep [options] 'pattern' [file…] grep is command line tool for printing lines that match a pattern Useful for demonstrating how regular expressions work By default, grep interprets regular expressions as BRE Using egrep, or grep -E interprets regular expressions as ERE • --color=auto highlights the part of the line that matched the pattern • -i is used to make grep case- insensitive • -c is used to have grep report a count of the lines that matched • -v is used to print the lines that don't match the pattern
  • 9. BASIC RE LITERALS Alphanumeric characters and non-regular expression characters match themselves Regular expression characters will match themselves if preceded by the backslash character
  • 10. RE DOT (PERIOD) The dot . will match any single character To match the dot itself, it must be preceded by a backslash The RE .* is used to match an entire string
  • 11. RE CHARACTER CLASSES Character classes match a single character in the list or range enclosed by brackets [ ] If the first character enclosed is the carat ^, then the list or range is negated To match the right square bracket ] it must be the first character enclosed. To not match it, it must be the second character after a carat To match a hyphen, it can be the first or last character enclosed. To not match it, it must be the second character after a carat
  • 12. RE NAMED CHARACTER CLASSES Named character classes must be enclosed in brackets like [[:xdigit:]] Many are available: [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]
  • 13. RE CARAT ANCHOR The character after the carat character ^ must appear at the beginning of the text If used as the first character in square brackets, it negates the list or range of characters If preceded by the backslash, the carat character loses it's special meaning
  • 14. RE DOLLAR SIGN ANCHOR The character before the dollar sign character $ must appear at the end of the text If not at the end of the regular expression, then the dollar sign loses it's special meaning When combined with the carat character ^, the dollar sign character $ must match the entire text
  • 15. RE REPETITION Basic Regular Expressions * preceding item repeated zero or more times or {0,} + preceding item repeated one or more times or {1,} ? preceding item is optional or {0,1} {n} preceding item repeated exactly n times {n,} preceding item repeated n or more times {,m} preceding item matched at most m times {n,m} preceding item matched at least n times, but not more than m times Extended Regular Expressions * preceding item repeated zero or more times or {0,} + preceding item repeated one or more times or {1,} ? preceding item is optional or {0,1} {n} preceding item repeated exactly n times {n,} preceding item repeated n or more times {,m} preceding item matched at most m times {n,m} preceding item matched at least n times, but not more than m times
  • 16. RE ASTERISK The asterisk * will match zero or more of the item that precedes it The asterisk is equivalent to the BRE {0,} and the ERE {0,} expressions for zero or more A single item followed by an asterisk will always match To match an asterisk, it can be preceded by a backslash
  • 17. RE PLUS SIGN In BRE, the backslashed plus sign + will match one or more of the item that precedes it In ERE, the plus sign + will match one or more of the item that precedes it The plus sign is equivalent to the BRE {1,} and the ERE {1,} expressions for one or more In BRE, the plus sign matches itself. In ERE to match a plus sign, it can be preceded by a backslash
  • 18. RE QUESTION MARK In BRE, the backslashed question mark ? optionally matches the item that precedes it In ERE, the question mark will optionally match the item that precedes it The question mark equivalent to the BRE {0,1} and the ERE {0,1} expressions for zero to one In BRE, the question mark matches itself. In ERE to match a question mark, it can be preceded by a backslash
  • 19. RE GROUPING In BRE, the backslashed parentheses ( and ) are used to create groups of characters that may repeat as specified by repetition expressions In ERE, the parentheses ( and ) are used to create groups of characters that may repeat as specified by repetition expressions In BRE, the parentheses will match themselves, and in ERE they can be matched if backslashed
  • 20. RE ALTERNATION In ERE, the pipe symbol | can be used to perform alternation Alternation allows for two or more alternatives to match as separated by the pipe symbol | In BRE, the pipe symbol | will match itself, and in ERE it will match if backslashed
  • 21. PERL US POSTAL CODE EXAMPLE ^d{5}((-|s)?d{4})?$ ^ - Starts with d{5} - exactly five digits ()? - optional group (two) -|s - hyphen or whitespace d{4} - exactly four digits $ - Ends with To use the perl debugger type: perl -d -e1
  • 22. PERL CHARACTER SEQUENCES w Alphanumeric and _ (word characters) W Not word characters d Digit characters D Not digit characters s Whitespace characters S Not whitespace characters b Word boundaries • grep supports the perl character sequences in ERE except d and D
  • 23. PYTHON PROTOCOL EXAMPLE (mailto:|(news|(ht|f)tp(s?))://){1} (){1} - group repeats only once mailto: - mailto followed by a colon | - separates alternatives news|(ht|f)tp - news, http or ftp (ht|f)tp(s?) - optional s added :// - added to news, http, https, ftp, or ftps • To start the python shell type: python
  • 24. USE THE LIBRARY RegExLib.com The Regular Expression Library Comes with a cheat sheet A Regular Expression tester Search thousands of rated expressions You don't have to reinvent the wheel!
  • 26. About One Course Source ➢Online public classes (Linux, Programming & Security) ➢Custom corporate classes ➢Develop custom training programs www.OneCourseSource.com

Editor's Notes

  • #9: In ed or vi, g/re/p was to do a global search for the regular expression and print
  • #14: Backslash example: echo 'xyz^abzzz' | grep '\^ab'
  • #22: # Source: http://neilk.net/blog/2000/06/01/abigails-regex-to-test-for-prime-numbers/ # Source: Abigail -- perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' sub is_prime { if ((1 x shift) !~ /^1?$|^(11+?)\1+$/) { return 1; } else { return 0; } } <number>
  • #26: sub is_what { if ((1 x shift) !~ /^1?$|^(11+?)\1+$/) { return 1; } else { return 0; } }