SlideShare a Scribd company logo
Regular Expression
R.Rajkumar
Asst.Professor
CSE
Lexical analyzer
• Lexical analysis, also called scanning, is the phase of the compilation
process which deals with the actual program being compiled, character by
character. The higher level parts of the compiler will call the lexical
analyzer with the command "get the next word from the input", and it is
the scanner's job to sort through the input characters and find this word.
• The types of "words" commonly found in a program are:
• programming language keywords, such as if, while, struct, int etc.
• operator symbols like =, +, -, &&, !, <= etc.
• other special symbols like: ( ), { }, [ ], ;, & etc.
• constants like 1, 2, 3, 'a', 'b', 'c', "any quoted string" etc.
• variable and function names (called identifers) such as x, i, t1 etc.
• Some languages (such as C) are case sensitive, in that they differentiate
between eg. if and IF; thus the former would be a keyword, the latter a
variable name.
Tokens
• Also, most languages would insist that identifers cannot be any of the keywords, or
contain operator symbols (versions of Fortran don't, making lexical analysis quite
difficult).
• In addition to the basic grouping process, lexical analysis usually performs the
following tasks:
• Since there are only a finite number of types of words, instead of passing the actual
word to the next phase we can save space by passing a suitable representation. This
representation is known as a token.
• If the language isn't case sensitive, we can eliminate differences between case at this
point by using just one token per keyword, irrespective of case; eg. #define IF-
TOKEN 1 #define WHILE-TOKEN 2 ..... ..... if we meet "IF", "If", "iF", "if" then return
IF_TOKEN if we meet "WHILE, "While", "WHile", ... then return WHILE-TOKEN
• We can pick out mistakes in the lexical syntax of the program such as using a
character which is not valid in the language. (Note that we do not worry about the
combination of patterns; eg. the pattern of characters"+*" would be returned
as PLUS-TOKEN, MULT-TOKEN, and it would be up to the next phase to see that
these should not follow in sequence.)
• We can eliminate pieces of the program that are no longer relevant, such as spaces,
tabs, carriage-returns (in most languages), and comments.
• In order to specify the lexical analysis process, what we need is some method of
describing which patterns of characters correspond to which words.
Regular Expressions
• Regular expressions are used to define patterns of characters; they are used in UNIX tools
such as awk, grep, vi and, of course, lex.
• A regular expression is just a form of notation, used for describing sets of words. For any
given set of characters , a regular expression over is defined by:
• The empty string, , which denotes a string of length zero, and means ``take nothing from
the input''. It is most commonly used in conjunction with other regular expressions eg. to
denote optionality.
• Any character in may be used in a regular expression. For instance, if we write a as a
regular expression, this means ``take the letter a from the input''; ie. it denotes the
(singleton) set of words {``a''}
• The union operator, ``|'', which denotes the union of two sets of words. Thus the regular
expression a|b denotes the set {``a'', ``b''}, and means ``take either the letter a or the
letter b from the input''
• Writing two regular expressions side-by-side is known as concatenation; thus the regular
expression ab denotes the set {``ab''} and means ``take the character a followed by the
character b from the input''.
• The Kleene closure of a regular expression, denoted by ``*'', indicates zero or more
occurrences of that expression. Thus a* is the (infinite) set {, ``a'', ``aa'', ``aaa'', ...} and
means ``take zero or more as from the input''.
• Brackets may be used in a regular expression to enforce precedence or increase clarity.
Thompson Algorithm
for converting RE to NFA
Lexical1
Lexical1

More Related Content

PPTX
1 compiler outline
PDF
Token, Pattern and Lexeme
ODP
About Tokens and Lexemes
PPT
Lexical Analyzers and Parsers
PPTX
Cd ch2 - lexical analysis
PPTX
Role-of-lexical-analysis
PDF
Compilers Design
1 compiler outline
Token, Pattern and Lexeme
About Tokens and Lexemes
Lexical Analyzers and Parsers
Cd ch2 - lexical analysis
Role-of-lexical-analysis
Compilers Design

What's hot (20)

PDF
4 lexical and syntax analysis
PPTX
role of lexical anaysis
PPT
Syntax analysis
PPT
4 lexical and syntax
PPTX
Type checking compiler construction Chapter #6
PPT
Lecture 04 syntax analysis
PPT
Syntax analysis
PPTX
Type checking in compiler design
PPTX
The role of the parser and Error recovery strategies ppt in compiler design
PPT
1.Role lexical Analyzer
PPT
Lexical Analysis
PPTX
Syntax analyzer
PPTX
Compiler design and lexical analyser
PPTX
System Programming Unit IV
PPTX
Lexical analyzer
PPTX
Lexical analysis-using-lex
PPT
Chap 1-language processor
PPTX
A Role of Lexical Analyzer
PPT
Symbol Table, Error Handler & Code Generation
PDF
Lecture3 lexical analysis
4 lexical and syntax analysis
role of lexical anaysis
Syntax analysis
4 lexical and syntax
Type checking compiler construction Chapter #6
Lecture 04 syntax analysis
Syntax analysis
Type checking in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
1.Role lexical Analyzer
Lexical Analysis
Syntax analyzer
Compiler design and lexical analyser
System Programming Unit IV
Lexical analyzer
Lexical analysis-using-lex
Chap 1-language processor
A Role of Lexical Analyzer
Symbol Table, Error Handler & Code Generation
Lecture3 lexical analysis
Ad

Similar to Lexical1 (20)

PDF
Syntax analysis
PPTX
Structure of the compiler
PDF
Lexical analysis - Compiler Design
PDF
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
PPT
Compiler Design in Engineering for Designing
DOCX
Compiler Design
PPTX
Compiler Design_Lexical Analysis phase.pptx
PPTX
Computational model language and grammar bnf
PDF
3a. Context Free Grammar.pdf
PPT
atc 3rd module compiler and automata.ppt
PDF
Lexical analysis Compiler design pdf to read
PDF
Lexical analysis compiler design to read and study
PPTX
LexicalAnalysis chapter2 i n compiler design.pptx
PPTX
Ch03-LexicalAnalysis chapter2 in compiler design.pptx
PPTX
Lecture 1 of automata theory where .pptx
PPTX
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
PDF
Lexical Analysis.pdf
PPTX
NLP_KASHK:Regular Expressions
PDF
syntaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pdf
Syntax analysis
Structure of the compiler
Lexical analysis - Compiler Design
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
Compiler Design in Engineering for Designing
Compiler Design
Compiler Design_Lexical Analysis phase.pptx
Computational model language and grammar bnf
3a. Context Free Grammar.pdf
atc 3rd module compiler and automata.ppt
Lexical analysis Compiler design pdf to read
Lexical analysis compiler design to read and study
LexicalAnalysis chapter2 i n compiler design.pptx
Ch03-LexicalAnalysis chapter2 in compiler design.pptx
Lecture 1 of automata theory where .pptx
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
Lexical Analysis.pdf
NLP_KASHK:Regular Expressions
syntaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pdf
Ad

Recently uploaded (20)

PPTX
Cell Types and Its function , kingdom of life
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Computing-Curriculum for Schools in Ghana
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Trump Administration's workforce development strategy
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Lesson notes of climatology university.
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Cell Types and Its function , kingdom of life
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Complications of Minimal Access Surgery at WLH
Computing-Curriculum for Schools in Ghana
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Final Presentation General Medicine 03-08-2024.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
Indian roads congress 037 - 2012 Flexible pavement
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Digestion and Absorption of Carbohydrates, Proteina and Fats
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Trump Administration's workforce development strategy
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Empowerment Technology for Senior High School Guide
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Lesson notes of climatology university.
202450812 BayCHI UCSC-SV 20250812 v17.pptx

Lexical1

  • 2. Lexical analyzer • Lexical analysis, also called scanning, is the phase of the compilation process which deals with the actual program being compiled, character by character. The higher level parts of the compiler will call the lexical analyzer with the command "get the next word from the input", and it is the scanner's job to sort through the input characters and find this word. • The types of "words" commonly found in a program are: • programming language keywords, such as if, while, struct, int etc. • operator symbols like =, +, -, &&, !, <= etc. • other special symbols like: ( ), { }, [ ], ;, & etc. • constants like 1, 2, 3, 'a', 'b', 'c', "any quoted string" etc. • variable and function names (called identifers) such as x, i, t1 etc. • Some languages (such as C) are case sensitive, in that they differentiate between eg. if and IF; thus the former would be a keyword, the latter a variable name.
  • 3. Tokens • Also, most languages would insist that identifers cannot be any of the keywords, or contain operator symbols (versions of Fortran don't, making lexical analysis quite difficult). • In addition to the basic grouping process, lexical analysis usually performs the following tasks: • Since there are only a finite number of types of words, instead of passing the actual word to the next phase we can save space by passing a suitable representation. This representation is known as a token. • If the language isn't case sensitive, we can eliminate differences between case at this point by using just one token per keyword, irrespective of case; eg. #define IF- TOKEN 1 #define WHILE-TOKEN 2 ..... ..... if we meet "IF", "If", "iF", "if" then return IF_TOKEN if we meet "WHILE, "While", "WHile", ... then return WHILE-TOKEN • We can pick out mistakes in the lexical syntax of the program such as using a character which is not valid in the language. (Note that we do not worry about the combination of patterns; eg. the pattern of characters"+*" would be returned as PLUS-TOKEN, MULT-TOKEN, and it would be up to the next phase to see that these should not follow in sequence.) • We can eliminate pieces of the program that are no longer relevant, such as spaces, tabs, carriage-returns (in most languages), and comments. • In order to specify the lexical analysis process, what we need is some method of describing which patterns of characters correspond to which words.
  • 4. Regular Expressions • Regular expressions are used to define patterns of characters; they are used in UNIX tools such as awk, grep, vi and, of course, lex. • A regular expression is just a form of notation, used for describing sets of words. For any given set of characters , a regular expression over is defined by: • The empty string, , which denotes a string of length zero, and means ``take nothing from the input''. It is most commonly used in conjunction with other regular expressions eg. to denote optionality. • Any character in may be used in a regular expression. For instance, if we write a as a regular expression, this means ``take the letter a from the input''; ie. it denotes the (singleton) set of words {``a''} • The union operator, ``|'', which denotes the union of two sets of words. Thus the regular expression a|b denotes the set {``a'', ``b''}, and means ``take either the letter a or the letter b from the input'' • Writing two regular expressions side-by-side is known as concatenation; thus the regular expression ab denotes the set {``ab''} and means ``take the character a followed by the character b from the input''. • The Kleene closure of a regular expression, denoted by ``*'', indicates zero or more occurrences of that expression. Thus a* is the (infinite) set {, ``a'', ``aa'', ``aaa'', ...} and means ``take zero or more as from the input''. • Brackets may be used in a regular expression to enforce precedence or increase clarity.