SlideShare a Scribd company logo
tokens patterns and lexemes
A lexeme is a sequence of characters in the source program that matches the
pattern for a token and is identified by the lexical analyzer as an instance of
that token.
A token is a pair consisting of a token name and an optional attribute value.
The token name is an abstract symbol representing a kind of lexical unit, e.g.,
a particular keyword, or sequence of input characters denoting an identifier.
The token names are the input symbols that the parser processes.
A pattern is a description of the form that the lexemes of a token may take. In
the case of a keyword as a token, the pattern is just the sequence of characters
that form the keyword. For identifiers and some other tokens, the pattern is
more complex structure that is matched by many strings.
To better understand this relation to a lexer and parser we will start with the
parser and work backwards to the input.
To make it easier to design a parser, a parser does not work with the input
directly but takes in a list of tokens generated by a lexer. Looking at the
token column in Figure 3.2 we see tokens such as if, else, comparison, id,
number and literal; these are names of tokens. Typically with a lexer/parser a
token is a structure that holds not only the name of the token, but the
characters/symbols that make up the token and the start and end position of the
string of characters that make up the token, with the start and end position
being used for error reporting, highlighting, etc.
Now the lexer takes the input of characters/symbols and using the rules of the
lexer converts the input characters/symbols into tokens. Now people who work
with lexer/parser have their own words for things they use often. What you think
of as a sequence of characters/symbols that make up a token are what people who
use lexer/parsers call lexeme. So when you see lexeme, just think of a sequence
of characters/symbols representing a token. In the comparison example, the
sequence of characters/symbols can be different patterns such as < or > or else
or 3.14, etc.
Another way to think of the relation between the two is that a token is a
programming structure used by the parser that has a property called lexeme that
holds the character/symbols from the input. Now if you look at most definitions
of token in code you may not see lexeme as one of the properties of the token.
This is because a token will more likely hold the start and end position of the
characters/symbols that represent the token and the lexeme, sequence of
characters/symbols can be derived from the start and end position as needed
because the input is static.
Tokens, patterns and lexemes
The words generated by the linear analysis may be of different kinds:
identifier,
keyword (if, while, ...),
punctuation character,
multi-character operator (:=, ->, ...).
Such a kind is called a TOKEN and an element of a kind is called a LEXEME.
A word is recognized to be a lexeme for a certain token by PATTERN MATCHING. For
instance letter followed by letters and digits is a pattern that matches a word
like x or y with the token id (= identifier).
oken: Token is a sequence of characters that can be treated as a single logical
entity. Typical tokens are,
1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
Pattern: A set of strings in the input for which the same token is produced as
output. This set of strings is described by a rule called a pattern associated
with the token.
Lexeme: A lexeme is a sequence of characters in the source program that is
matched by the pattern for a token.

More Related Content

PPTX
Relationship Among Token, Lexeme & Pattern
PDF
Token, Pattern and Lexeme
PPTX
Lexical analyzer
PPTX
A Role of Lexical Analyzer
PPTX
Lexical analyzer
PPTX
Lexical analysis - Compiler Design
PPT
Lexical Analysis
PDF
Lexical Analysis - Compiler design
Relationship Among Token, Lexeme & Pattern
Token, Pattern and Lexeme
Lexical analyzer
A Role of Lexical Analyzer
Lexical analyzer
Lexical analysis - Compiler Design
Lexical Analysis
Lexical Analysis - Compiler design

What's hot (20)

PPT
1.Role lexical Analyzer
PPTX
A simple approach of lexical analyzers
PPTX
Language for specifying lexical Analyzer
PPTX
Lexical Analyzer Implementation
PPTX
Role-of-lexical-analysis
PPTX
role of lexical anaysis
PPTX
Lexical analyzer
PDF
Lecture3 lexical analysis
PPTX
Compiler design and lexical analyser
PPT
2_2Specification of Tokens.ppt
PPTX
Lecture 02 lexical analysis
PPT
Syntax analysis
PPT
Syntax analysis
PPTX
3. Lexical analysis
ODP
About Tokens and Lexemes
PDF
Compiler lec 8
PPT
Compier Design_Unit I_SRM.ppt
PPT
Lecture 04 syntax analysis
PPTX
Syntax Analysis in Compiler Design
1.Role lexical Analyzer
A simple approach of lexical analyzers
Language for specifying lexical Analyzer
Lexical Analyzer Implementation
Role-of-lexical-analysis
role of lexical anaysis
Lexical analyzer
Lecture3 lexical analysis
Compiler design and lexical analyser
2_2Specification of Tokens.ppt
Lecture 02 lexical analysis
Syntax analysis
Syntax analysis
3. Lexical analysis
About Tokens and Lexemes
Compiler lec 8
Compier Design_Unit I_SRM.ppt
Lecture 04 syntax analysis
Syntax Analysis in Compiler Design
Ad

Similar to tokens patterns and lexemes (20)

PPTX
Compiler Design
PPTX
ashjhas sahdj ajshbas sajakj askk sadk as
PPTX
Structure of the compiler
PPTX
Chahioiuou9oioooooooooooooofffghfpterTwo.pptx
PPT
Chapter-2-lexical-analyser and its property lecture note.ppt
PDF
Java Programming Introduction Lexer 1 In this project we.pdf
PPTX
Chapter 2.pptx compiler design lecture note
PDF
role of lexical parser compiler design1-181124035217.pdf
PDF
Lexical Analysis.pdf
PPTX
LexicalAnalysis chapter2 i n compiler design.pptx
PPTX
Ch03-LexicalAnalysis chapter2 in compiler design.pptx
PPT
Lexical Analysis
PPT
Lecturer-05 lex anylser (1).pptrjyghsgst
PPTX
PDF
COMPILER DESIGN.pdf
PPT
parser
PPTX
Cd ch2 - lexical analysis
PDF
Ch03-LexicalAnalysis in compiler design subject.pdf
PPTX
automata theroy and compiler designc.pptx
PPT
Chapter Three(1)
Compiler Design
ashjhas sahdj ajshbas sajakj askk sadk as
Structure of the compiler
Chahioiuou9oioooooooooooooofffghfpterTwo.pptx
Chapter-2-lexical-analyser and its property lecture note.ppt
Java Programming Introduction Lexer 1 In this project we.pdf
Chapter 2.pptx compiler design lecture note
role of lexical parser compiler design1-181124035217.pdf
Lexical Analysis.pdf
LexicalAnalysis chapter2 i n compiler design.pptx
Ch03-LexicalAnalysis chapter2 in compiler design.pptx
Lexical Analysis
Lecturer-05 lex anylser (1).pptrjyghsgst
COMPILER DESIGN.pdf
parser
Cd ch2 - lexical analysis
Ch03-LexicalAnalysis in compiler design subject.pdf
automata theroy and compiler designc.pptx
Chapter Three(1)
Ad

Recently uploaded (20)

PDF
Trump Administration's workforce development strategy
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Cell Types and Its function , kingdom of life
PDF
RMMM.pdf make it easy to upload and study
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
Trump Administration's workforce development strategy
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Anesthesia in Laparoscopic Surgery in India
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Cell Types and Its function , kingdom of life
RMMM.pdf make it easy to upload and study
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Computing-Curriculum for Schools in Ghana
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
VCE English Exam - Section C Student Revision Booklet

tokens patterns and lexemes

  • 1. tokens patterns and lexemes A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or sequence of input characters denoting an identifier. The token names are the input symbols that the parser processes. A pattern is a description of the form that the lexemes of a token may take. In the case of a keyword as a token, the pattern is just the sequence of characters that form the keyword. For identifiers and some other tokens, the pattern is more complex structure that is matched by many strings. To better understand this relation to a lexer and parser we will start with the parser and work backwards to the input. To make it easier to design a parser, a parser does not work with the input directly but takes in a list of tokens generated by a lexer. Looking at the token column in Figure 3.2 we see tokens such as if, else, comparison, id, number and literal; these are names of tokens. Typically with a lexer/parser a token is a structure that holds not only the name of the token, but the characters/symbols that make up the token and the start and end position of the string of characters that make up the token, with the start and end position being used for error reporting, highlighting, etc. Now the lexer takes the input of characters/symbols and using the rules of the lexer converts the input characters/symbols into tokens. Now people who work with lexer/parser have their own words for things they use often. What you think of as a sequence of characters/symbols that make up a token are what people who use lexer/parsers call lexeme. So when you see lexeme, just think of a sequence of characters/symbols representing a token. In the comparison example, the sequence of characters/symbols can be different patterns such as < or > or else or 3.14, etc. Another way to think of the relation between the two is that a token is a programming structure used by the parser that has a property called lexeme that holds the character/symbols from the input. Now if you look at most definitions of token in code you may not see lexeme as one of the properties of the token. This is because a token will more likely hold the start and end position of the characters/symbols that represent the token and the lexeme, sequence of characters/symbols can be derived from the start and end position as needed because the input is static. Tokens, patterns and lexemes The words generated by the linear analysis may be of different kinds: identifier, keyword (if, while, ...), punctuation character, multi-character operator (:=, ->, ...). Such a kind is called a TOKEN and an element of a kind is called a LEXEME. A word is recognized to be a lexeme for a certain token by PATTERN MATCHING. For instance letter followed by letters and digits is a pattern that matches a word like x or y with the token id (= identifier). oken: Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are, 1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants Pattern: A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.