SlideShare a Scribd company logo
How to Write Language “Compiler”



Philip Zhong



© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   1
• Language Compilers

• JAVACC

• SQL Parser




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   2
• ANTLR
• YACC
• JAVACC




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   3
• Another Tool for Language Recognition
• Java/C++/C/C#/Python/Ruby/object C
• BSD




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   4
• Yet Another Compiler Compile

• C++/C for Unix

• BSD




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   5
• Java Compiler Compile

• Java

• BSD




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   6
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   7
"n"                                      newline
                 *                                         zero or more copies of the preceding expression
                 +                                         one or more copies of the preceding expression
                 ?                                         zero or one copy of the preceding expression
                 |                                         or
                 []                                        optional
                 ˜[]                                       matches any single character that is not in the empty set
                 ()                                        must appear
                 EOF                                       end of line
                 "a"-"z"                                   any letter, from a to z
                 "0" - "9"                                 any numeric


© 2011 Cisco and/or its affiliates. All rights reserved.                                                          Cisco Confidential   8
• Options

• Program header

• Tokens

• Production




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   9
options {
       JDK_VERSION = "1.6";
       IGNORE_CASE=true ;
       JAVA_UNICODE_ESCAPE = true;
       UNICODE_INPUT=true;
       DEBUG_PARSER=false ;
         STATIC = false;
}




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   10
PARSER_BEGIN(SqlParser)
package com.webex.wddl.engine.parser.sql;
public class SqlParser implements Parser
{
          final public void setStatement(String sqlStatement) {
          InputStream stream = new ByteArrayInputStream(sqlStatement.getBytes());
         ...
         public SqlParser()
          {
          }
}
PARSER_END(SqlParser)

© 2011 Cisco and/or its affiliates. All rights reserved.                 Cisco Confidential   11
• TOKEN: The regular expressions in this regular expression
       production describe tokens in the grammar.
• SPECIAL_TOKEN: The regular expressions in this regular
       expression production describe special tokens.
• SKIP: Matches to regular expressions in this regular expression
       production are simply skipped (ignored) by the token manager.
• MORE: Sometimes it is useful to gradually build up a token to be
       passed on to the parser. Matches to this kind of regular
       expression are stored in a buffer until the next TOKEN or
       SPECIAL_TOKEN match.




© 2011 Cisco and/or its affiliates. All rights reserved.           Cisco Confidential   12
TOKEN:
{
         <X_AND:"AND">
| <X_FROM:"FROM">
| <X_IN:"IN">
| <X_LIKE:"LIKE">
| <X_SELECT:"SELECT">
| <X_WHERE:"WHERE">
...
}

© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   13
SPECIAL_TOKEN:
{
       <LINE_COMMENT: "--"(~["r","n"])*>
| <MULTI_LINE_COMMENT: "/*" (~["*"])* "*" ("*" | (~["*","/"] (~["*"])*
  "*"))* "/">
}




© 2011 Cisco and/or its affiliates. All rights reserved.        Cisco Confidential   14
SKIP:
{
         ""
| "t"
| "r"
| "n"
}




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   15
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   16
Statement parse(String SQL):
...
{
      ...
      (
          statement = insert()
          |
          statement = merge()
          ...
          |
        statement = select()
       )(<EOF>|";")
       ...
}

© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   17
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   18
• Define tokens

• Define parser tree classes

• Write parser logic

• Create parser classes




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   19
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   20
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   21
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   22
© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   23
Thank you.




© 2011 Cisco and/or its affiliates. All rights reserved.   Cisco Confidential   24

More Related Content

PPTX
Mysql performance tuning
PPTX
Mysql architecture&parameters
PPTX
Cisco Webex Distributed Framework and Data Store Design
PDF
Hp cloud performance_benchmark
PPT
DA RTN 10 May 2013
KEY
モバイルアクセス解析の課題
PPTX
Declarative security-oes
PPTX
Evaluation Question 1: Part 3
Mysql performance tuning
Mysql architecture&parameters
Cisco Webex Distributed Framework and Data Store Design
Hp cloud performance_benchmark
DA RTN 10 May 2013
モバイルアクセス解析の課題
Declarative security-oes
Evaluation Question 1: Part 3

Viewers also liked (20)

DOCX
Evaluation questions
PPT
Seventhside présentation version site english
PPTX
Am 04 track1--salvatore orlando--openstack-apac-2012-final
PPT
04 03 wh_chris_walker
PDF
PPTX
Fm1(a) genre
DOC
Millenium Development Goals
PPTX
Tarea4
PDF
My MSc. Project
PDF
Security protocols
ODP
Social media security
PDF
Pm 02 track1-- 魏刚--osac-trusted-computing-pools-in-folsom-v2
PPTX
slope and one point
PPTX
Creative teams the hats ok
PDF
Open Stack Cheng Du Swift Alex Yang
PDF
3interview1 ima
PPTX
Understanding the Icarus Flight of Flappy Bird
PDF
RemiDeVos_Research_Contribution
PDF
KVH Whitepaper: Financial Extranets
PPTX
User manual hl_wp_user
Evaluation questions
Seventhside présentation version site english
Am 04 track1--salvatore orlando--openstack-apac-2012-final
04 03 wh_chris_walker
Fm1(a) genre
Millenium Development Goals
Tarea4
My MSc. Project
Security protocols
Social media security
Pm 02 track1-- 魏刚--osac-trusted-computing-pools-in-folsom-v2
slope and one point
Creative teams the hats ok
Open Stack Cheng Du Swift Alex Yang
3interview1 ima
Understanding the Icarus Flight of Flappy Bird
RemiDeVos_Research_Contribution
KVH Whitepaper: Financial Extranets
User manual hl_wp_user
Ad

Similar to How to write_language_compiler (20)

PDF
Towards JVM Dynamic Languages Toolchain
PDF
Working with XSLT, XPath and ECMA Scripts: Make It Simpler with Novell Identi...
PPTX
08 - Return Oriented Programming, the chosen one
PPTX
05 - Bypassing DEP, or why ASLR matters
PPTX
Introduction to Phoenix Framework (Elixir) 2016-01-07
PDF
Unicode and Collations in MySQL 8.0
PDF
55j7
PPTX
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
PPTX
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
PDF
FPGA DESIGN FLOW.pdf
PDF
FPGA Design Flow and synthesis Techniques
PPTX
04 - I love my OS, he protects me (sometimes, in specific circumstances)
PPTX
02 - Introduction to the cdecl ABI and the x86 stack
ODP
Concepts of JetBrains MPS
PPTX
07 - Bypassing ASLR, or why X^W matters
PPTX
SenchaCon 2016: Learn the Top 10 Best ES2015 Features - Lee Boonstra
PDF
fg.workshop: Software vulnerability
PDF
David-FPGA
PDF
David-FPGA
PDF
MySQL 8.0 & Unicode: Why, what & how
Towards JVM Dynamic Languages Toolchain
Working with XSLT, XPath and ECMA Scripts: Make It Simpler with Novell Identi...
08 - Return Oriented Programming, the chosen one
05 - Bypassing DEP, or why ASLR matters
Introduction to Phoenix Framework (Elixir) 2016-01-07
Unicode and Collations in MySQL 8.0
55j7
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
FPGA DESIGN FLOW.pdf
FPGA Design Flow and synthesis Techniques
04 - I love my OS, he protects me (sometimes, in specific circumstances)
02 - Introduction to the cdecl ABI and the x86 stack
Concepts of JetBrains MPS
07 - Bypassing ASLR, or why X^W matters
SenchaCon 2016: Learn the Top 10 Best ES2015 Features - Lee Boonstra
fg.workshop: Software vulnerability
David-FPGA
David-FPGA
MySQL 8.0 & Unicode: Why, what & how
Ad

More from Philip Zhong (11)

PPTX
How to Implement Distributed Data Store
PPT
MongoDB Knowledge Shareing
PPT
Adapter Poxy Pattern
PPTX
How to estimate_oracle_cost
PDF
Mongo db program_installation_guide
PDF
Mongo db sharding_cluster_installation_guide
PDF
Vitess percona 2012
PDF
Distributed_Database_System
PPTX
Mysql5.1 character set testing
PPTX
Compare mysql5.1.50 mysql5.5.8
PPTX
Mysql handle socket
How to Implement Distributed Data Store
MongoDB Knowledge Shareing
Adapter Poxy Pattern
How to estimate_oracle_cost
Mongo db program_installation_guide
Mongo db sharding_cluster_installation_guide
Vitess percona 2012
Distributed_Database_System
Mysql5.1 character set testing
Compare mysql5.1.50 mysql5.5.8
Mysql handle socket

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.

How to write_language_compiler

  • 1. How to Write Language “Compiler” Philip Zhong © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
  • 2. • Language Compilers • JAVACC • SQL Parser © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
  • 3. • ANTLR • YACC • JAVACC © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
  • 4. • Another Tool for Language Recognition • Java/C++/C/C#/Python/Ruby/object C • BSD © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
  • 5. • Yet Another Compiler Compile • C++/C for Unix • BSD © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
  • 6. • Java Compiler Compile • Java • BSD © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
  • 7. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
  • 8. "n" newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression | or [] optional ˜[] matches any single character that is not in the empty set () must appear EOF end of line "a"-"z" any letter, from a to z "0" - "9" any numeric © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
  • 9. • Options • Program header • Tokens • Production © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
  • 10. options { JDK_VERSION = "1.6"; IGNORE_CASE=true ; JAVA_UNICODE_ESCAPE = true; UNICODE_INPUT=true; DEBUG_PARSER=false ; STATIC = false; } © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
  • 11. PARSER_BEGIN(SqlParser) package com.webex.wddl.engine.parser.sql; public class SqlParser implements Parser { final public void setStatement(String sqlStatement) { InputStream stream = new ByteArrayInputStream(sqlStatement.getBytes()); ... public SqlParser() { } } PARSER_END(SqlParser) © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
  • 12. • TOKEN: The regular expressions in this regular expression production describe tokens in the grammar. • SPECIAL_TOKEN: The regular expressions in this regular expression production describe special tokens. • SKIP: Matches to regular expressions in this regular expression production are simply skipped (ignored) by the token manager. • MORE: Sometimes it is useful to gradually build up a token to be passed on to the parser. Matches to this kind of regular expression are stored in a buffer until the next TOKEN or SPECIAL_TOKEN match. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
  • 13. TOKEN: { <X_AND:"AND"> | <X_FROM:"FROM"> | <X_IN:"IN"> | <X_LIKE:"LIKE"> | <X_SELECT:"SELECT"> | <X_WHERE:"WHERE"> ... } © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
  • 14. SPECIAL_TOKEN: { <LINE_COMMENT: "--"(~["r","n"])*> | <MULTI_LINE_COMMENT: "/*" (~["*"])* "*" ("*" | (~["*","/"] (~["*"])* "*"))* "/"> } © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
  • 15. SKIP: { "" | "t" | "r" | "n" } © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
  • 16. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
  • 17. Statement parse(String SQL): ... { ... ( statement = insert() | statement = merge() ... | statement = select() )(<EOF>|";") ... } © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
  • 18. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
  • 19. • Define tokens • Define parser tree classes • Write parser logic • Create parser classes © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
  • 20. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
  • 21. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
  • 22. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
  • 23. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
  • 24. Thank you. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

Editor's Notes

  • #11: http://javacc.java.net/doc/javaccgrm.html#prod2STATIC: This is a boolean option whose default value is true. If true, all methods and class variables are specified as static in the generated parser and token manager. This allows only one parser object to be present, but it improves the performance of the parser. To perform multiple parses during one run of your Java program, you will have to call the ReInit() method to reinitialize your parser if it is static. If the parser is non-static, you may use the &quot;new&quot; operator to construct as many parsers as you wish. These can all be used simultaneously from different threads. DEBUG_PARSER: This is a boolean option whose default value is false. This option is used to obtain debugging information from the generated parser. Setting this option to true causes the parser to generate a trace of its actions. Tracing may be disabled by calling the method disable_tracing() in the generated parser class. Tracing may be subsequently enabled by calling the method enable_tracing() in the generated parser class. JAVA_UNICODE_ESCAPE: This is a boolean option whose default value is false. When set to true, the generated parser uses an input stream object that processes Java Unicode escapes (\\u...) before sending characters to the token manager. By default, Java Unicode escapes are not processed. This option is ignored if either of options USER_TOKEN_MANAGER, USER_CHAR_STREAM is set to true. UNICODE_INPUT: This is a boolean option whose default value is false. When set to true, the generated parser uses uses an input stream object that reads Unicode files. By default, ASCII files are assumed. This option is ignored if either of options USER_TOKEN_MANAGER, USER_CHAR_STREAM is set to true. IGNORE_CASE: This is a boolean option whose default value is false. Setting this option to true causes the generated token manager to ignore case in the token specifications and the input files. This is useful for writing grammars for languages such as HTML. It is also possible to localize the effect of IGNORE_CASE by using an alternate mechanism described later.
  • #17: The token manager starts initially in the state &quot;DEFAULT“In the default mode (start of the program)