SlideShare a Scribd company logo
Regular Expressions for Data
Management in STATA
John Lunalo
Overview
1. Introduction
2. Definition of terms
3. Regular Expressions Symbols in STATA
4. Regular Expressions Commands in Stata
5. Examples
Introduction
Regular expressions (regex/regexp/rational expressions) are special characters
that defines a search pattern.
Types of Regular Expressions
Extended Regular Expressions: Used majorly by programming languages.
Default in those languages.
Perl-Like Regular Expressions: Use syntax and semantics of Perl Language
Literal/Fixed Regular Expressions: Most basic although it can still be combined
with special characters for complex pattern.
Definition ofTerms
Characters(Metacharacters): d = digits while D represents non digits, w
represents word etc.
Quantifiers: + = one or more, ?= once or more, * = zero or more and { } for
specifying minimum, maximum or both.
Logic: | = or, (...)= Group1 2 3 = Contents of group 1,2,3) you can also use
negate a value in group by using (?: ...) e.g. (?:lunalo|John)= John
Character Classes: Denoted by [...] provides ranges of characters.
Anchors: Declare boundaries e.g. ^ for start and $ for end POSIX Classes:
e.g. [:punct:] for punctuations. and [:alpha:] for alphabets.
Look rounds:(?=...)= Positive look ahead (?<=...)= Positive lookbehind
(?!...)= Negative look ahead (?<!..)= Negative look behind.
Regular Expressions Symbols in Stata
Quantifiers symbols:
Metacharacters
* match zero or more of the preceding expression
+ Match one or more of the preceding expression
? Match either zero or one of the preceding expression
a–z match a range of characters or numbers .The “a” and “z” are an example. It could also be 1-9, etc.
This is used together with square characters. e.g. [1-9].
. match any character
 It used for Escaping a metacharacter
Regular Expressions Symbols in
Stata
Anchors
Groups
Logic
^ Match expression at beginning of string. E.g. “^[hj]” matches hj at the beginning of the
string. Be careful this “[^hj]” will negate hj in a string.
$ Match expression at end of string. E.g. “hj$” will match hj at the end of the string.
( ) Subexpression e.g. (1-9) (a-z) etc.
| The vertical bar /pipe character signifies a logical “or”
Regular Expressions Commands
in StataIn Stata we have three commands that uses regular expressions in their operations:-
1) regexm
2) regexr
3) regexs
M- Matches. it is Boolean
R- Replace
S- Subexpression
Examples
Example Datasets to be used to experiment on three commands
THANKS FORYOUR
ATTENTION

More Related Content

PPT
Personatges històrics de les Matemàtiques
PPTX
Època talaiòtica
PPTX
Perlasca: un giusto padovano
PPTX
Fitxa 32 conjunt de la seu vella de l leida
PPTX
Oracions compostes - batxillerat
PPT
guerra de successió espanyola
PPTX
Treball de recerca: El manga
PDF
Revolució francesa
Personatges històrics de les Matemàtiques
Època talaiòtica
Perlasca: un giusto padovano
Fitxa 32 conjunt de la seu vella de l leida
Oracions compostes - batxillerat
guerra de successió espanyola
Treball de recerca: El manga
Revolució francesa

Similar to Regular Expressions in Stata (20)

PDF
Introduction_to_Regular_Expressions_in_R
PPT
Regular expressions and languages pdf
PDF
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
PDF
RegEx Book.pdf
PDF
Regular expressions
PPT
Adv. python regular expression by Rj
PPTX
Regular expression automata
PPTX
Regular Expressions(Theory of programming languages))
PDF
Regular expressions
PDF
Regular expressions
PPTX
Regular Expressions Introduction Anthony Rudd CS
PPTX
Regular Expression
PPTX
The Theory of Finite Automata.pptx
ODP
OISF: Regular Expressions (Regex) Overview
PPTX
theoryofautomataandformallanguagesunit21-161231042659.pptx
PDF
ReDoS - Regular Expession Denial of Service
PDF
Patterns, Automata and Regular Expressions
ODP
DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview
KEY
Regular Expressions 101
PPTX
Presentation on Data transformation in Stata.
Introduction_to_Regular_Expressions_in_R
Regular expressions and languages pdf
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
RegEx Book.pdf
Regular expressions
Adv. python regular expression by Rj
Regular expression automata
Regular Expressions(Theory of programming languages))
Regular expressions
Regular expressions
Regular Expressions Introduction Anthony Rudd CS
Regular Expression
The Theory of Finite Automata.pptx
OISF: Regular Expressions (Regex) Overview
theoryofautomataandformallanguagesunit21-161231042659.pptx
ReDoS - Regular Expession Denial of Service
Patterns, Automata and Regular Expressions
DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview
Regular Expressions 101
Presentation on Data transformation in Stata.
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Lecture1 pattern recognition............
PDF
Introduction to Business Data Analytics.
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Global journeys: estimating international migration
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
1_Introduction to advance data techniques.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
oil_refinery_comprehensive_20250804084928 (1).pptx
Mega Projects Data Mega Projects Data
Data_Analytics_and_PowerBI_Presentation.pptx
Moving the Public Sector (Government) to a Digital Adoption
Lecture1 pattern recognition............
Introduction to Business Data Analytics.
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Global journeys: estimating international migration
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
IB Computer Science - Internal Assessment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Miokarditis (Inflamasi pada Otot Jantung)
1_Introduction to advance data techniques.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Galatica Smart Energy Infrastructure Startup Pitch Deck
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
Ad

Regular Expressions in Stata

  • 1. Regular Expressions for Data Management in STATA John Lunalo
  • 2. Overview 1. Introduction 2. Definition of terms 3. Regular Expressions Symbols in STATA 4. Regular Expressions Commands in Stata 5. Examples
  • 3. Introduction Regular expressions (regex/regexp/rational expressions) are special characters that defines a search pattern. Types of Regular Expressions Extended Regular Expressions: Used majorly by programming languages. Default in those languages. Perl-Like Regular Expressions: Use syntax and semantics of Perl Language Literal/Fixed Regular Expressions: Most basic although it can still be combined with special characters for complex pattern.
  • 4. Definition ofTerms Characters(Metacharacters): d = digits while D represents non digits, w represents word etc. Quantifiers: + = one or more, ?= once or more, * = zero or more and { } for specifying minimum, maximum or both. Logic: | = or, (...)= Group1 2 3 = Contents of group 1,2,3) you can also use negate a value in group by using (?: ...) e.g. (?:lunalo|John)= John Character Classes: Denoted by [...] provides ranges of characters. Anchors: Declare boundaries e.g. ^ for start and $ for end POSIX Classes: e.g. [:punct:] for punctuations. and [:alpha:] for alphabets. Look rounds:(?=...)= Positive look ahead (?<=...)= Positive lookbehind (?!...)= Negative look ahead (?<!..)= Negative look behind.
  • 5. Regular Expressions Symbols in Stata Quantifiers symbols: Metacharacters * match zero or more of the preceding expression + Match one or more of the preceding expression ? Match either zero or one of the preceding expression a–z match a range of characters or numbers .The “a” and “z” are an example. It could also be 1-9, etc. This is used together with square characters. e.g. [1-9]. . match any character It used for Escaping a metacharacter
  • 6. Regular Expressions Symbols in Stata Anchors Groups Logic ^ Match expression at beginning of string. E.g. “^[hj]” matches hj at the beginning of the string. Be careful this “[^hj]” will negate hj in a string. $ Match expression at end of string. E.g. “hj$” will match hj at the end of the string. ( ) Subexpression e.g. (1-9) (a-z) etc. | The vertical bar /pipe character signifies a logical “or”
  • 7. Regular Expressions Commands in StataIn Stata we have three commands that uses regular expressions in their operations:- 1) regexm 2) regexr 3) regexs M- Matches. it is Boolean R- Replace S- Subexpression
  • 8. Examples Example Datasets to be used to experiment on three commands

Editor's Notes

  • #5: Characters with special meaning Represents the number of times you want a character, literal or a pattern as a whole needs to appear in a regular expressions Defining arrangement of patterns.