SlideShare a Scribd company logo
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
Practical No.4
Aim: Assignment to understand basic syntax of LEX specifications, built-in functions
and Variables. (Study of Lex).
Theory:
LEX helps write programs whose control flow is directed by instances of regular
expressions in the input stream. It is well suited for editor-script type transformations and for
segmenting input in preparation for a parsing routine.LEX is a program generator designed for
Lexical processing of character input streams. It accepts a high-level, problem oriented
specification for character string matching, and produces a program in a general purpose
language which recognizes regular expressions. The regular expressions are specified by the
user in the source specifications given to LEX. The LEX written code recognizes these
expressions in an input stream and partitions the input stream into strings matching the
expressions. At the boundaries between strings program sections provided by the user are
executed. The LEX source file associates the regular expressions and the program fragments.
As each expression appears in the input to the program written by LEX, the corresponding
fragment is executed.
The user supplies the additional code beyond expression matching needed to complete his
tasks, possibly including code written by other generators. The program that recognizes the
expressions is generated in the general purpose programming language employed for the user's
program fragments. Thus, a high level expression language is provided to write the string
expressions to be matched while the user's freedom to write actions is unimpaired. This avoids
forcing the user wishes to use a string manipulation language for input analysis to write
processing programs in the same and often inappropriate string handling language.
LEX is not a complete language, but rather a generator representing a new language
feature which can be added to different programming languages, called ``host languages.'' Just
as general purpose languages can produce code to run on different computer hardware, LEX
can write code in different host languages. LEX turns the user's expressions and actions (called
source in this memo) into the host general-purpose language; the generated program is named
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
yyLEX. The yyLEX program will recognize expressions in a stream (called input in this memo)
and perform the specified actions for each expression as it is detected.
See Figure 1.
LEX Source.
The general format of LEX source is:
{definitions}
%%
{rules}
%%
{user subroutines}
where the definitions and the user subroutines are often omitted. The second %% is optional, but
the first is required to mark the beginning of the rules. The absolute minimum LEX program is thus
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
(no definitions, no rules) which translates into a program which copies the input to the output
unchanged.
3. LEX Regular Expressions.
A regular expression specifies a set of strings to be matched. It contains text characters (which match
the corresponding characters in the strings being compared) and operator characters (which specify
repetitions, choices, and other features). The letters of the alphabet and the digits are always text
characters.
The operator characters are
"[]^-?.*+|()$/{}%<>
and if they are to be used as text characters, an escape should be used. The quotation mark operator (")
indicates that whatever is contained between a pair of quotes is to be taken as text characters.
Character classes. Classes of characters can be specified using the operator pair []. The
construction [abc] matches a single character, which may be a, b, or c. Within square brackets,
most operator meanings are ignored. Only three characters are special: these are  - and ^. The
- character indicates ranges. For example,
[a-z0-9<>_]
indicates the character class containing all the lower case letters, the digits, the angle brackets, and
underline. Ranges may be given in either order. Using - between any pair of characters which are not
both upper case letters, both lower case letters, and both digits is implementation dependent and will
get a warning message.
Alternation and Grouping. The operator | indicates alternation:
(ab|cd)
matches either ab or cd. Note that parentheses are used for grouping, although they are not necessary
on the outside level;
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
ab|cd
4. LEX Actions.
When an expression written as above is matched, LEX executes the corresponding action. This
section describes some features of LEX which aid in writing actions. Note that there is a default
action, which consists of copying the input to the output. This is performed on all strings not
otherwise matched.
One of the simplest things that can be done is to ignore the input. Specifying a C null statement,
; as an action causes this result. A frequent rule is
[ tn] ;
which causes the three spacing characters (blank, tab, and newline) to be ignored.
Another easy way to avoid writing actions is the action character |, which indicates that the
action for this rule is the action for the next rule. The previous example could also have been
written
" "
"t"
"n"
with the same result, although in different style. The quotes around n and t are not required.
In more compLEX actions, the user will often want to know the actual text that matched some
expression like [a-z]+. LEX leaves this text in an external character array named yytext.
Thus, to print the name found, a rule like
[a-z]+ printf("%s", yytext);
5. Usage.
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
There are two steps in compiling a LEX source program. First, the LEX source must be turned
into a generated program in the host general purpose language. Then this program must be
compiled and loaded, usually with a library of LEX subroutines. The generated program is on
a file named LEX.yy.c. The I/O library is defined in terms of the C standard library.
Command:
$ LEX a.l
$ gcc LEX.yy.c –o op.out –ll
$ ./ op.out a.c
ystem administration where text processing is needed.
6. Lex Specification File
In essence, while using lex or flex we have to first create a specification file (used to specify
the tokenization rules, i.e regular expressions to represent the tokens of the language and also
Code, called as rules) and has to be presented to lex command which generates a C language
file known as lex.yy.c ( in which yylex() and other functions given in Table 2 are defined)
which when compiled with gcc (with –lfl option) we get an executable file which does the
required tokenization.
The flex input or specification file consists of three sections namingly definitions, rules and
user
code.
%{
%}
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
definitions
%%
rules
%%
user code
6.1 The Definitions Section:
The definitions section contains declarations of simple name definitions to simplify the
scanner specification, and declarations of start conditions, which are explained in a later
section.
Name definitions have the form:
Name definition
The "name" is a word beginning with a letter or an underscore ('_') followed by zero or more
letters,
digits,
'_', or ''
(dash). The definition is taken to begin at the first nonwhite space character following the name
and
continuing to the end of the line. The definition can subsequently be referred to using
"{name}",
which will expand to "(definition)".
For example,
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
DIGIT [09]
ID [az][az09]*
defines "DIGIT" to be a regular expression which matches a single digit, and "ID" to be a
regular expression which matches a letter followed by zero or more letters or digits.
subsequent reference to
{DIGIT}+"."{DIGIT}*
is identical to
([09])+"."([09])*
and matches one or more digits followed by a '.' followed by zero or more digits.
6.2 The Rules Section:
The rules section of the flex input contains a series of rules of the form:
pattern action where the pattern must be un indented and the action must begin on the same
line.
See below for a further description of patterns and actions.
6.3 The User Code Section:
The user code section is simply copied to `lex.yy.c' verbatim. It is used for companion routines
which call or are called by the scanner. The presence of this section is optional; if it is missing,
the second `%%' in the input file may be skipped, too. In the definitions and rules sections, any
indented text or text enclosed in `%{' and `%}' is copied to the output (with the `%{‘ and ‘%}’
removed). The `%{‘, and ‘%}'' must appear un indented on lines by themselves.
In the rules section, any indented or %{ } text appearing before the first rule may be used to
declare variables which are local to the scanning routine and (after the declarations) code which
is to be executed whenever the scanning routine is entered. Other indented or %{} text in the
Language Processor Lab manual
Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur
rule section is still copied to the output, but its meaning is not well defined and it may well
cause compile time errors (this feature is present for POSIX compliance; see below for other
such features). In the definitions section (but not in the rules section), an un indented comment
(i.e., a line beginning with "/*") is also copied verbatim to the output up to the next "*/".
Conclusion: Study of LEX has been completed.

More Related Content

PDF
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
PDF
Handout#02
DOCX
Compiler Design
PDF
C programming course material
PDF
A Programmatic View and Implementation of XML
PDF
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PPTX
Language for specifying lexical Analyzer
PPT
Lex and Yacc Tool M1.ppt
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Handout#02
Compiler Design
C programming course material
A Programmatic View and Implementation of XML
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
Language for specifying lexical Analyzer
Lex and Yacc Tool M1.ppt

Similar to LANGUAGE PROCESSOR (20)

PDF
PDF
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
PDF
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
PPT
Module4 lex and yacc.ppt
DOCX
LEX & YACC
PPTX
role of lexical anaysis
PPTX
module 4.pptx
PPTX
Chapter 2.pptx compiler design lecture note
PDF
Impact of indentation in programming
PDF
COMPILER DESIGN Engineering learinin.pdf
PDF
3_1_COMPILER_DESIGNGARGREREGREGREGREGREGRGRERE
PDF
design intoduction of_COMPILER_DESIGN.pdf
PDF
Compiler design lecture 1 introduction computer science
PPT
LexicalAnalysis in Compiler design .pt
PDF
COMPILER DESIGN- Introduction & Lexical Analysis:
PDF
Compiler_Lecture1.pdf
PDF
11700220036.pdf
PPTX
A Lecture of Compiler Design Subject.pptx
DOCX
REXX_ Lab_1 ITT 340 NOTE If you are still a little shaky about.docx
PDF
Unit 2 introduction to c programming
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
Module4 lex and yacc.ppt
LEX & YACC
role of lexical anaysis
module 4.pptx
Chapter 2.pptx compiler design lecture note
Impact of indentation in programming
COMPILER DESIGN Engineering learinin.pdf
3_1_COMPILER_DESIGNGARGREREGREGREGREGREGRGRERE
design intoduction of_COMPILER_DESIGN.pdf
Compiler design lecture 1 introduction computer science
LexicalAnalysis in Compiler design .pt
COMPILER DESIGN- Introduction & Lexical Analysis:
Compiler_Lecture1.pdf
11700220036.pdf
A Lecture of Compiler Design Subject.pptx
REXX_ Lab_1 ITT 340 NOTE If you are still a little shaky about.docx
Unit 2 introduction to c programming
Ad

Recently uploaded (20)

PPTX
introduction to high performance computing
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPT
Occupational Health and Safety Management System
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPT
introduction to datamining and warehousing
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
Fundamentals of Mechanical Engineering.pptx
introduction to high performance computing
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Occupational Health and Safety Management System
Safety Seminar civil to be ensured for safe working.
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Soil Improvement Techniques Note - Rabbi
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
introduction to datamining and warehousing
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Automation-in-Manufacturing-Chapter-Introduction.pdf
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Fundamentals of Mechanical Engineering.pptx
Ad

LANGUAGE PROCESSOR

  • 1. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur Practical No.4 Aim: Assignment to understand basic syntax of LEX specifications, built-in functions and Variables. (Study of Lex). Theory: LEX helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine.LEX is a program generator designed for Lexical processing of character input streams. It accepts a high-level, problem oriented specification for character string matching, and produces a program in a general purpose language which recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to LEX. The LEX written code recognizes these expressions in an input stream and partitions the input stream into strings matching the expressions. At the boundaries between strings program sections provided by the user are executed. The LEX source file associates the regular expressions and the program fragments. As each expression appears in the input to the program written by LEX, the corresponding fragment is executed. The user supplies the additional code beyond expression matching needed to complete his tasks, possibly including code written by other generators. The program that recognizes the expressions is generated in the general purpose programming language employed for the user's program fragments. Thus, a high level expression language is provided to write the string expressions to be matched while the user's freedom to write actions is unimpaired. This avoids forcing the user wishes to use a string manipulation language for input analysis to write processing programs in the same and often inappropriate string handling language. LEX is not a complete language, but rather a generator representing a new language feature which can be added to different programming languages, called ``host languages.'' Just as general purpose languages can produce code to run on different computer hardware, LEX can write code in different host languages. LEX turns the user's expressions and actions (called source in this memo) into the host general-purpose language; the generated program is named
  • 2. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur yyLEX. The yyLEX program will recognize expressions in a stream (called input in this memo) and perform the specified actions for each expression as it is detected. See Figure 1. LEX Source. The general format of LEX source is: {definitions} %% {rules} %% {user subroutines} where the definitions and the user subroutines are often omitted. The second %% is optional, but the first is required to mark the beginning of the rules. The absolute minimum LEX program is thus
  • 3. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur (no definitions, no rules) which translates into a program which copies the input to the output unchanged. 3. LEX Regular Expressions. A regular expression specifies a set of strings to be matched. It contains text characters (which match the corresponding characters in the strings being compared) and operator characters (which specify repetitions, choices, and other features). The letters of the alphabet and the digits are always text characters. The operator characters are "[]^-?.*+|()$/{}%<> and if they are to be used as text characters, an escape should be used. The quotation mark operator (") indicates that whatever is contained between a pair of quotes is to be taken as text characters. Character classes. Classes of characters can be specified using the operator pair []. The construction [abc] matches a single character, which may be a, b, or c. Within square brackets, most operator meanings are ignored. Only three characters are special: these are - and ^. The - character indicates ranges. For example, [a-z0-9<>_] indicates the character class containing all the lower case letters, the digits, the angle brackets, and underline. Ranges may be given in either order. Using - between any pair of characters which are not both upper case letters, both lower case letters, and both digits is implementation dependent and will get a warning message. Alternation and Grouping. The operator | indicates alternation: (ab|cd) matches either ab or cd. Note that parentheses are used for grouping, although they are not necessary on the outside level;
  • 4. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur ab|cd 4. LEX Actions. When an expression written as above is matched, LEX executes the corresponding action. This section describes some features of LEX which aid in writing actions. Note that there is a default action, which consists of copying the input to the output. This is performed on all strings not otherwise matched. One of the simplest things that can be done is to ignore the input. Specifying a C null statement, ; as an action causes this result. A frequent rule is [ tn] ; which causes the three spacing characters (blank, tab, and newline) to be ignored. Another easy way to avoid writing actions is the action character |, which indicates that the action for this rule is the action for the next rule. The previous example could also have been written " " "t" "n" with the same result, although in different style. The quotes around n and t are not required. In more compLEX actions, the user will often want to know the actual text that matched some expression like [a-z]+. LEX leaves this text in an external character array named yytext. Thus, to print the name found, a rule like [a-z]+ printf("%s", yytext); 5. Usage.
  • 5. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur There are two steps in compiling a LEX source program. First, the LEX source must be turned into a generated program in the host general purpose language. Then this program must be compiled and loaded, usually with a library of LEX subroutines. The generated program is on a file named LEX.yy.c. The I/O library is defined in terms of the C standard library. Command: $ LEX a.l $ gcc LEX.yy.c –o op.out –ll $ ./ op.out a.c ystem administration where text processing is needed. 6. Lex Specification File In essence, while using lex or flex we have to first create a specification file (used to specify the tokenization rules, i.e regular expressions to represent the tokens of the language and also Code, called as rules) and has to be presented to lex command which generates a C language file known as lex.yy.c ( in which yylex() and other functions given in Table 2 are defined) which when compiled with gcc (with –lfl option) we get an executable file which does the required tokenization. The flex input or specification file consists of three sections namingly definitions, rules and user code. %{ %}
  • 6. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur definitions %% rules %% user code 6.1 The Definitions Section: The definitions section contains declarations of simple name definitions to simplify the scanner specification, and declarations of start conditions, which are explained in a later section. Name definitions have the form: Name definition The "name" is a word beginning with a letter or an underscore ('_') followed by zero or more letters, digits, '_', or '' (dash). The definition is taken to begin at the first nonwhite space character following the name and continuing to the end of the line. The definition can subsequently be referred to using "{name}", which will expand to "(definition)". For example,
  • 7. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur DIGIT [09] ID [az][az09]* defines "DIGIT" to be a regular expression which matches a single digit, and "ID" to be a regular expression which matches a letter followed by zero or more letters or digits. subsequent reference to {DIGIT}+"."{DIGIT}* is identical to ([09])+"."([09])* and matches one or more digits followed by a '.' followed by zero or more digits. 6.2 The Rules Section: The rules section of the flex input contains a series of rules of the form: pattern action where the pattern must be un indented and the action must begin on the same line. See below for a further description of patterns and actions. 6.3 The User Code Section: The user code section is simply copied to `lex.yy.c' verbatim. It is used for companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second `%%' in the input file may be skipped, too. In the definitions and rules sections, any indented text or text enclosed in `%{' and `%}' is copied to the output (with the `%{‘ and ‘%}’ removed). The `%{‘, and ‘%}'' must appear un indented on lines by themselves. In the rules section, any indented or %{ } text appearing before the first rule may be used to declare variables which are local to the scanning routine and (after the declarations) code which is to be executed whenever the scanning routine is entered. Other indented or %{} text in the
  • 8. Language Processor Lab manual Prof. Roshan S. Bhanuse – Dept. of Computer Technology –YCCE Nagpur rule section is still copied to the output, but its meaning is not well defined and it may well cause compile time errors (this feature is present for POSIX compliance; see below for other such features). In the definitions section (but not in the rules section), an un indented comment (i.e., a line beginning with "/*") is also copied verbatim to the output up to the next "*/". Conclusion: Study of LEX has been completed.