SlideShare a Scribd company logo
Beginner Track: Introduction to Regular Expressions (aka “regex”) Bil Corry lasso.pro
What is regex? “ Regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” ( Wikipedia: http://en.wikipedia.org/wiki/Regex) In plain English: Regex is a text-searching “language.”
How regex works Three components are needed: A regex engine that uses a regular expression (search string) to search against the text and return results (we're using Lasso).
Some text to search against
A regular expression that defines what to search for (e.g. “\d” to find a digit)
#1 Regex Engine Lasso provides regex processing via: [string_findregexp] ← we'll be covering just this
[string_replaceregexp]
[regexp]
[compare_regexp]
[compare_notregexp]
[match_regexp]
[match_notregexp]
#2 Some Text To Search Against Text should be of type [string] – if you use type [bytes], you may get odd results.
There may be performance and memory challenges using regex against a sizably large [string]
#3 Regular Expressions: The regex “language” Literals
Dot
White Space
Character Classes
Shorthand Character Classes Positional Matching
Alternation
Quantifiers
Grouping
Literals All characters search for their literal selves except for the following: “ [\^$.|?*+()” – they require being escaped when searched for as a literal. Example: [string_findregexp('LDC is fun!',-find='fun')] LP8:  array: (fun) L9:  array(fun)
Literals (cont) By default, regex is case-sensitive.  Use the (?i) switch to make it case-insensitive. Examples: [string_findregexp('ABC abc',-find='abc')] LP8:  array: (abc) L9:  array(abc) [string_findregexp('ABC abc',-find='(?i)abc')] LP8:  array: (ABC), (abc) L9:  array(ABC, abc)
Escaping Characters In regular expressions, depending on the context, various characters have special meaning.  In order to specify the literal character, you must escape it with a backslash (“\”).  And because the backslash has special meaning in Lasso, it means you must double the backslashes in Lasso (“\\”).
Escaping Characters (cont) Example: [string_findregexp('[date] returns the date', -find='\\[date\\]')] LP8:  array: ([date]) L9:  array([date]) [string_findregexp('[date] returns the date', -find='[date]')] LP8: array:(d),(a),(t),(e),(e),(t),(t),(e),(d),(a),(t),(e) L9: array(d, a, t, e, e, t, t, e, d, a, t, e)
Dot A dot (aka period symbol “.”) will match any single character except line returns.  Use the switch “(?s)” to turn on matching line returns too. Example: [string_findregexp('LDC is fun! Turn on a fan.', -find='f.n')] LP8:  array: (fun), (fan) L9:  array(fun, fan)
Dot (cont) [string_findregexp('1\n2\n3',-find='.')] LP8: array: (1), (2), (3) L9:  array(1, 2, 3) [string_findregexp('1\n2\n3',-find='(?s).')] LP8: array: (1), ( ), (2), ( ), (3) L9:  array(1, , 2, , 3)
White Space To find white space, use the Lasso equivalents: Return = \r Newline = \n Tab = \t Example: [string_findregexp('1\n2\n3',-find='\n')] LP8:  array: ( ), ( ) L9:  array( , )
Character Classes Used to match against a set of characters contained within square brackets “[ … ]”.  Order of characters within the class does not matter (i.e. [abc] == [cba]).  Reserved characters are  “ ^-]\”. Example: [string_findregexp('New Years Eve is 2009-12-31', -find='[123ae]')] LP8:array: (e), (e), (a), (e), (2), (1), (2), (3), (1) L9: array(e, e, a, e, 2, 1, 2, 3, 1)
Character Classes (cont) Hyphen denotes a range (e.g. “[0-9]” means 0,1,2,..,9 and [a-z] means a,b,c,...,z). Example: [string_findregexp('abcdef',-find='[b-d]')] LP8:  array: (b), (c), (d) L9:  array(b, c, d)

More Related Content

PPT
Regex Basics
PPTX
Python advanced 2. regular expression in python
PPTX
Bioinformatica p2-p3-introduction
PDF
Certified bit coded regular expression parsing
PDF
Programming in Vinyl (BayHac 2014)
DOCX
Python - Regular Expressions
PPT
Regular Expressions
PDF
Python Programming - XI. String Manipulation and Regular Expressions
Regex Basics
Python advanced 2. regular expression in python
Bioinformatica p2-p3-introduction
Certified bit coded regular expression parsing
Programming in Vinyl (BayHac 2014)
Python - Regular Expressions
Regular Expressions
Python Programming - XI. String Manipulation and Regular Expressions

What's hot (20)

PDF
Learning notes of r for python programmer (Temp1)
PPT
Textpad and Regular Expressions
PDF
Python (regular expression)
PPT
Introduction to Regular Expressions
PPTX
Regular expressions
PDF
3.2 javascript regex
PPTX
Regular Expression
PPTX
Regular Expressions 101 Introduction to Regular Expressions
PDF
Python : Regular expressions
PPTX
Unit 1-array,lists and hashes
PDF
Real World Haskell: Lecture 7
PPT
Regular Expressions
PDF
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
PPT
Introduction to regular expressions
PPTX
Finaal application on regular expression
PPT
2.regular expressions
ODP
Regular Expressions and You
PDF
Basta mastering regex power
PPT
Introduction to Regular Expressions RootsTech 2013
PDF
Data translation with SPARQL 1.1
Learning notes of r for python programmer (Temp1)
Textpad and Regular Expressions
Python (regular expression)
Introduction to Regular Expressions
Regular expressions
3.2 javascript regex
Regular Expression
Regular Expressions 101 Introduction to Regular Expressions
Python : Regular expressions
Unit 1-array,lists and hashes
Real World Haskell: Lecture 7
Regular Expressions
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Introduction to regular expressions
Finaal application on regular expression
2.regular expressions
Regular Expressions and You
Basta mastering regex power
Introduction to Regular Expressions RootsTech 2013
Data translation with SPARQL 1.1
Ad

Similar to Introduction To Regex in Lasso 8.5 (20)

PPTX
Regular expressions
PPTX
Regular Expression Crash Course
PPT
Regular Expression in Action
PDF
Coffee 'n code: Regexes
ODP
Regex Presentation
ODP
Regex Presentation
PPTX
Regular Expressions Introduction Anthony Rudd CS
PPT
Regular Expressions in PHP, MySQL by programmerblog.net
PDF
2013 - Andrei Zmievski: Clínica Regex
ODP
OISF: Regular Expressions (Regex) Overview
KEY
Regular Expressions 101
PDF
Don't Fear the Regex LSP15
PPT
Adv. python regular expression by Rj
PPTX
Regex lecture
PPT
Bioinformatica 06-10-2011-p2 introduction
PPTX
Regular Expressions Boot Camp
PDF
Regex - Regular Expression Basics
ODP
DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview
PDF
Regular expressions
PDF
Don't Fear the Regex - Northeast PHP 2015
Regular expressions
Regular Expression Crash Course
Regular Expression in Action
Coffee 'n code: Regexes
Regex Presentation
Regex Presentation
Regular Expressions Introduction Anthony Rudd CS
Regular Expressions in PHP, MySQL by programmerblog.net
2013 - Andrei Zmievski: Clínica Regex
OISF: Regular Expressions (Regex) Overview
Regular Expressions 101
Don't Fear the Regex LSP15
Adv. python regular expression by Rj
Regex lecture
Bioinformatica 06-10-2011-p2 introduction
Regular Expressions Boot Camp
Regex - Regular Expression Basics
DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview
Regular expressions
Don't Fear the Regex - Northeast PHP 2015
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf

Introduction To Regex in Lasso 8.5

  • 1. Beginner Track: Introduction to Regular Expressions (aka “regex”) Bil Corry lasso.pro
  • 2. What is regex? “ Regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” ( Wikipedia: http://en.wikipedia.org/wiki/Regex) In plain English: Regex is a text-searching “language.”
  • 3. How regex works Three components are needed: A regex engine that uses a regular expression (search string) to search against the text and return results (we're using Lasso).
  • 4. Some text to search against
  • 5. A regular expression that defines what to search for (e.g. “\d” to find a digit)
  • 6. #1 Regex Engine Lasso provides regex processing via: [string_findregexp] ← we'll be covering just this
  • 13. #2 Some Text To Search Against Text should be of type [string] – if you use type [bytes], you may get odd results.
  • 14. There may be performance and memory challenges using regex against a sizably large [string]
  • 15. #3 Regular Expressions: The regex “language” Literals
  • 16. Dot
  • 19. Shorthand Character Classes Positional Matching
  • 23. Literals All characters search for their literal selves except for the following: “ [\^$.|?*+()” – they require being escaped when searched for as a literal. Example: [string_findregexp('LDC is fun!',-find='fun')] LP8: array: (fun) L9: array(fun)
  • 24. Literals (cont) By default, regex is case-sensitive. Use the (?i) switch to make it case-insensitive. Examples: [string_findregexp('ABC abc',-find='abc')] LP8: array: (abc) L9: array(abc) [string_findregexp('ABC abc',-find='(?i)abc')] LP8: array: (ABC), (abc) L9: array(ABC, abc)
  • 25. Escaping Characters In regular expressions, depending on the context, various characters have special meaning. In order to specify the literal character, you must escape it with a backslash (“\”). And because the backslash has special meaning in Lasso, it means you must double the backslashes in Lasso (“\\”).
  • 26. Escaping Characters (cont) Example: [string_findregexp('[date] returns the date', -find='\\[date\\]')] LP8: array: ([date]) L9: array([date]) [string_findregexp('[date] returns the date', -find='[date]')] LP8: array:(d),(a),(t),(e),(e),(t),(t),(e),(d),(a),(t),(e) L9: array(d, a, t, e, e, t, t, e, d, a, t, e)
  • 27. Dot A dot (aka period symbol “.”) will match any single character except line returns. Use the switch “(?s)” to turn on matching line returns too. Example: [string_findregexp('LDC is fun! Turn on a fan.', -find='f.n')] LP8: array: (fun), (fan) L9: array(fun, fan)
  • 28. Dot (cont) [string_findregexp('1\n2\n3',-find='.')] LP8: array: (1), (2), (3) L9: array(1, 2, 3) [string_findregexp('1\n2\n3',-find='(?s).')] LP8: array: (1), ( ), (2), ( ), (3) L9: array(1, , 2, , 3)
  • 29. White Space To find white space, use the Lasso equivalents: Return = \r Newline = \n Tab = \t Example: [string_findregexp('1\n2\n3',-find='\n')] LP8: array: ( ), ( ) L9: array( , )
  • 30. Character Classes Used to match against a set of characters contained within square brackets “[ … ]”. Order of characters within the class does not matter (i.e. [abc] == [cba]). Reserved characters are “ ^-]\”. Example: [string_findregexp('New Years Eve is 2009-12-31', -find='[123ae]')] LP8:array: (e), (e), (a), (e), (2), (1), (2), (3), (1) L9: array(e, e, a, e, 2, 1, 2, 3, 1)
  • 31. Character Classes (cont) Hyphen denotes a range (e.g. “[0-9]” means 0,1,2,..,9 and [a-z] means a,b,c,...,z). Example: [string_findregexp('abcdef',-find='[b-d]')] LP8: array: (b), (c), (d) L9: array(b, c, d)
  • 32. Character Classes (cont) A caret after the opening square bracket denotes characters to omit instead of find. Example: [string_findregexp('abcdef',-find='[^b-d]')] LP8: array: (a), (e), (f) L9: array(a, e, f)
  • 33. Shorthand Character Classes \\d = [0-9] \\D = [^0-9] \\w ≈ [a-zA-Z0-9_] \\W ≈ [^a-zA-Z0-9_] \\s ≈ [\r\n\t] \\S ≈ [^\r\n\t] Example: [string_findregexp('1a2b3c',-find='\\d')] LP8: array: (1), (2), (3) L9: array(1, 2, 3) [string_findregexp('1a2b3c',-find='\\D')] LP8: array: (a), (b), (c) L9: array(a, b, c)
  • 34. Shorthand Character Classes (cont) Example: [string_findregexp('1a2b3c',-find='\\w')] LP8: array: (1), (a), (2), (b), (3), (c) L9: array(1, a, 2, b, 3, c) [string_findregexp('1\r2\r3',-find='\\s')] LP8: array: ( ), ( ) L9: array( , )
  • 35. Positional Matching “^” matches beginning of text, “$” matches end of text, and (?m) switch makes ^ and $ match beginning and ending of each line. Example: [string_findregexp('1\n2\n3',-find='^\\d')] LP8: array: (1) L9: array(1) [string_findregexp('1\n2\n3',-find='(?m)^\\d')] LP8: array: (1), (2), (3) L9: array(1, 2, 3)
  • 36. Positional Matching (cont) “\\b” matches a word boundary (the position between a word character and a non-word character or start/end of line). Example: [string_findregexp('cape and ape',-find='\\bape')] LP8: array: (ape) L9: array(ape) [string_findregexp('cape and ape',-find='ape')] LP8: array: (ape), (ape) L9: array(ape, ape)
  • 37. Alternation Vertical bar (“|”) is an OR operand for regex. Example: [string_findregexp('cat and rat',-find='cat|rat')] LP8: array: (cat), (rat) L9: array(cat, rat)
  • 38. Quantifiers Specifies the number to find: * = 0 or more + = 1 or more ? = 0 or 1 {n} = n times {n,m} = min n, max m times {n, } = min n, no max Example: [string_findregexp('123aaabbb', -find='0*1+2?3{1}a{1,2}ab{2,}')] LP8: array: (123aaabbb) L9: array(123aaabbb)
  • 39. Grouping Round brackets “( )” group the regex together, allowing quantifiers to be used on the group or to perform AND/OR with regex. They also create backreferences, which we won't cover in this session, but know that Lasso returns the group match in addition to the overall match. Example: [string_findregexp('cat and rat',-find='(c|r)at')] LP8: array: (cat), (c), (rat), (r) L9: array(cat, c, rat, r)
  • 40. Grouping (cont) There is an option for non-capturing groups: “(?: … regex here...)” Example: [string_findregexp('cat and rat',-find='(?:c|r)at')] LP8: array: (cat), (rat) L9: array(cat, rat)
  • 41. Tips for Regular Expressions Be sure it's of type [string] – type [bytes] may give odd results
  • 42. When using regular expressions obtained from outside sources, you'll need to double-up the backslashes (“\”) for Lasso (e.g. “\d+” becomes “\\d+”).
  • 43. User-input used as part of a regular expression must be encoded (http://tagswap.net/lp_regexp_encode)
  • 44. Putting it all together When building a complex regex, try breaking the regex into smaller pieces and confirm each piece matches correctly
  • 45. Often, there are several ways to match. If one approach doesn't work, try another.
  • 46. Great reference and tutorial site: www.regular-expressions.info
  • 47. Examples Extract names from comma-delimited list: [string_findregexp('Abe Smith, Bob Jones, Cindy Hart, Darla King',-find='\\w+\\s+\\w+')] LP8: array: (Abe Smith), (Bob Jones), (Cindy Hart), (Darla King) L9: array(Abe Smith, Bob Jones, Cindy Hart, Darla King)
  • 48. Examples (cont) Extract phone numbers into a packed format: [string_findregexp('(213) 555-1212',-find='\\d') ->join('')] [string_findregexp('213-555-1212',-find='\\d') ->join('')] [string_findregexp('213 555 1212',-find='\\d') ->join('')] LP8: 2135551212 2135551212 2135551212 L9: 2135551212 2135551212 2135551212
  • 49. Examples (cont) Extract data from HTML: [string_findregexp('<input type=&quot;hidden&quot; name=&quot;secret&quot; value=&quot;123&quot;>',-find='name=&quot;secret&quot; value=&quot;[^&quot;]+')] LP8: array: (name=&quot;secret&quot; value=&quot;123) L9: array(name=&quot;secret&quot; value=&quot;123) [string_findregexp('<input type=&quot;hidden&quot; name=&quot;secret&quot; value=&quot;123&quot;>',-find='name=&quot;secret&quot; value=&quot;([^&quot;]+)')] LP8: array: (name=&quot;secret&quot; value=&quot;123), (123) L9: array(name=&quot;secret&quot; value=&quot;123, 123)
  • 50. Examples (cont) Extract data from HTML: [string_findregexp('<input type=&quot;hidden&quot; name=&quot;secret&quot; value=&quot;123&quot;>',-find='(?:name=&quot;secret&quot; value=&quot;)[^&quot;]+')] LP8: array: (name=&quot;secret&quot; value=&quot;123) L9: array(name=&quot;secret&quot; value=&quot;123) [string_findregexp('<input type=&quot;hidden&quot; name=&quot;secret&quot; value=&quot;123&quot;>',-find='(?<=name=&quot;secret&quot; value=&quot;)[^&quot;]+')] LP8: array: (123) L9: array(123)