Crystal internals
Part 1
Is a compiler a hard thing?
At Manas we usually do webapps
Let’s talk about webapps...
Let’s talk about webapps...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Let’s talk about webapps...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Easy…?
Let’s talk about compilers...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Easy!
Let’s talk about compilers...
Let’s talk about compilers...
No, let’s talk about usual programs
No, let’s talk about usual programs
INPUT -> [PROCESSING…] -> OUTPUT
No, let’s talk about compilers
SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
No, let’s talk about compilers
SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
How do we go from source code to an executable?
Traditional stages of a compiler
class Foo
def bar
1 + 2
end
end
● Lexer: [“class”, “Foo”, “;”, “def”, “bar”, “;”, “1”, “+”, “2”, “;”, “end”, “;”, “end”]
● Parser: ClassDef(“Foo”, body: [Def.new(“bar”)])
● Semantic (a.k.a “type check”): make sure there are no type errors
● Codegen: generate machine code
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Thus: writing a compiler is HARD! :-(
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Thus: writing a compiler is HARD! :-(
Well, not anymore...
Crystal internals (part 1)
Crystal internals (part 1)
Codegen
With LLVM, we generate LLVM IR (internal representation) instead of assembly,
and LLVM takes care of generating efficient assembly code for us!
The hardest part is solved :-)
define i32 @add(i32 %x, i32 %y) {
%0 = add i32 %x, %y
ret i32 %0
}
Codegen: LLVM (example)
LLVM provides a nice API to generate IR
require "llvm"
mod = LLVM::Module.new("main")
mod.functions.add("add", [LLVM::Int32, LLVM::Int32], LLVM::Int32) do |func|
func.basic_blocks.append do |builder|
res = builder.add(func.params[0], func.params[1])
builder.ret(res)
end
end
puts mod
● Lexer
● Parser
● Semantic
Remaining phases
● Kind of easy: go char by char until we get a keyword, identifier, number, etc.
● We won’t go into implementation details...
Lexer
● Kind of easy: go token by token and create a tree of expressions
● This tree is called AST: Abstract Syntax Tree
● An AST is like a directed, acyclic graph
● We won’t go into implementation details...
Parser
● This is the fundamental piece of the compiler
● It takes an AST as input and analyzes it
● Analysis can result in:
○ Declaring types: for example “class Foo; end” will declare a type Foo
○ Checking methods: for example “Foo.bar” will check that “Foo” is a declared type and that the
method “bar” exists in it, and has the correct arity and types
○ Giving each non-dead expression in the program a type
○ Gathering some info for the codegen phase: for example know the local variables of a method,
and their type
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Writing a compiler is easier than writing a web app! ^_^
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Writing a compiler is easier than writing a web app! ^_^
(Or at least it’s more fun :-P)
Semantic
Crystal internals (part 1)
Directory layout
● src/compiler/crystal
○ command/
○ syntax/
○ semantic/
○ macros/
○ codegen/
○ tools/
○ compiler.cr
○ types.cr
○ program.cr
Directory layout
● src/compiler/crystal
○ command/ : the command line interface
○ syntax/ : lexer, parser, ast, visitor, transformer
○ semantic/ : type declaration, method lookup, etc.
○ macros/ : macro expansion logic
○ codegen/ : codegen
○ tools/ : doc generator, formatter, init
○ compiler.cr : combines syntax + semantic + codegen
○ types.cr : all possible types in Crystal (Int32, String, unions, custom types, etc.)
○ program.cr : holds definitions of a program (holds Int32, String, etc.)
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
One big Rails app at Manas has 14K LOC in “./app”
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
One big Rails app at Manas has 14K LOC in “./app”
A compiler can’t be that hard! ;-)
Show me the code
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Program
● Holds all types and top-level methods for a given compilation
● For example, if I compile “class Foo; end” and you compile “class Bar; end”,
the first program will have a type named “Foo”, and the second one won’t (but
it will have a type named “Bar”)
● It lets us test the compiler more easily, because we can use different Program
instances for each snippet of code that we want to test
● In contrast of having global variables holding all of a program’s data
● A Program is passed around in all phases of a compilation (except lexing and
parsing, which don’t need semantic info)
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source # from source to Crystal::ASTNode
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats # Semantic! :-)
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Semantic
● The entry point for semantic analysis is in
src/compiler/crystal/semantic.cr
● Other files are in src/compiler/crystal/semantic/
● The file semantic.cr has comments that explain the overall algorithm :-)
Semantic: overall algorithm
● top level: declare classes, modules, macros, defs and other top-level stuff
● new methods: create `new` methods for every `initialize` method
● type declarations: process type declarations like `@x : Int32`
● check abstract defs: check that abstract defs are implemented
● class_vars_initializers: process initializers like `@@x = 1`
● instance_vars_initializers: process initializers like `@x = 1`
● main: process "main" code, calls and method bodies (the whole program).
● cleanup: remove dead code and other simplifications
● check recursive structs: check that structs are not recursive (impossible to
codegen)
Semantic: overall algorithm
Note!
● This algorithm didn’t come from the Skies
(nor from a textbook, nor from a paper)
● It’s not written in stone!
● It can definitely be improved: readability,
performance, etc.
Note!
● It’s actually more like this…
Semantic: overall algorithm
Semantic
But before looking at each phase, we need to learn about the most useful pattern
for analyzing an AST...
The Visitor pattern
require "compiler/crystal/syntax"
class SumVisitor < Crystal::Visitor
getter sum = 0
def visit(node : Crystal::NumberLiteral)
@sum += node.value.to_i
end
def visit(node : Crystal::ASTNode)
true # true: continue visiting children nodes
end
end
ast = Crystal::Parser.parse("foo(1 + 2, 3, [4])")
visitor = SumVisitor.new
ast.accept(visitor)
puts visitor.sum
The Visitor pattern
● We define a visit method for each node of interest
● We process the nodes
● We return true if we want to process children, false otherwise
● Example: if we only want to process class declarations, we could just define
visit(node : Crystal::ClassDef) and define some logic there (and return true,
because of nested class definitions)
● A visitor abstracts over the way nodes are composed
● ...though in many cases, for semantic purposes, we need and use the way a
node is composed (for example, to analyze a call we need to know the
argument types, so we check the arguments, not all children in a generic way)
Semantic: overall algorithm
● top level: declare classes, modules, macros, defs and other top-level stuff
● new methods
● type declarations
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
Top level: declare classes, modules, macros, defs...
# src/compiler/crystal/semantic/top_level_visitor.cr
class Crystal::TopLevelVisitor < Crystal::SemanticVisitor
# ...
end
● Located at semantic_visitor.cr
● This is a base visitor used in most of the phases of the semantic analysis
● It keeps track of the “current type”
● For example in “class Foo; class Bar; baz; end; end”, “current type” starts at
the top-level (the Program). When “class Foo” is found, the current type
becomes “Foo” (we search “Foo” in the current type). When “class Bar” is
found, the current type becomes “Foo::Bar” (we search “Bar” in the current
type). When “baz” is found, it will be looked up inside the current type.
● But initially there’s no “Foo” inside the current type (the Program). Who
defines it? … The top-level visitor!
Crystal::SemanticVisitor
● Located at top_level_visitor.cr
● Defines classes, methods, etc.
● Given “class Foo; class Bar; baz; end; end”...
● current_type starts at Program
● When “class Foo” is found (ClassDef), we check if “Foo” exists in the current
type. If not, we create it. If it exists with a different type (if it’s a module), we
give an error.
● We attach this type “Foo” to the AST node ClassDef. SemnticVisitor will use
this in every subsequent phase.
● … the “baz” call is not analyzed here (unless it’s a macro)
Crystal::TopLevelVisitor
Crystal::TopLevelVisitor
● Many other things done in this visitor: methods and macros are added to
types, aliases and enums are defined, etc.
● Question: why are methods and macros defined at this phase?
● The “inherited” macro hook must be processed as soon as “Bar <
Foo” and “Baz < Foo” are found
● The macro expands to “do_something”, which must expand to
“def foo; 1; end”
● This must happen before we continue processing Baz’s body:
“def foo; 3; end” must win and be the method found when doing
“Baz.new.foo”
● Conclusion: methods, macros and hooks must be defined in the
first pass, when defining types. Additionally, macros might be
looked up in types in this same pass (like “do_something”)
● SemanticVisitor takes care to look up and expand calls that
resolve to macro calls
When should macros be defined and expanded
class Foo
macro inherited
do_something
end
macro do_something
def foo; 1; end
end
end
class Bar < Foo; end
class Baz < Foo
def foo; 3; end
end
puts Bar.new.foo # => 1
puts Baz.new.foo # => 3
Method overloads
● Crystal methods are very powerful! For example: optional type restrictions,
different number of arguments, default arguments, splat, etc.
● When methods are added to types we need to:
○ Know if a method replaces (redefines) an old method
○ Track whether a method is “stricter” than another method, to quickly know, given a call
argument types, in which order they are going to be tested
Method restrictions
def foo(x : Int32)
puts 1
end
def foo(x)
puts 2
end
foo(1)
foo('a')
● Given foo(1), both methods match it. However, the first overload
should be invoked because it has a stronger restriction than the
second overload.
● If we define the methods in a different order, it still works the
same
● This is because an argument with a type restriction is stronger than
one without one. We say that the first one is a restriction of the
second one (we should probably rename this to use stronger)
● This applies to types too: Int32 is stronger than Int32 |
String. And Bar is stronger than Foo, if Bar < Foo.
● Given two methods with the same name, if all arguments of a
method are stronger than the others’, the whole method is stronger
and should come first. Each type stores an ordered list of methods
indexed by method name, with this notion.
● If the methods are both stronger than each other, they have the
same restriction.
Method restrictions
def foo(x : Int32)
puts 1
end
def foo(x)
puts 2
end
foo(1)
foo('a')
● This logic is located at restrictions.cr
● A lot of cases to consider: generics, tuples, splats, etc.
● The code and algorithms could probably use a simpler, unified logic
and a cleanup, but first all of these concepts and definitions must be
defined much more formally
Semantic: overall algorithm
● top level
● new methods: create `new` methods for every `initialize` method
● type declarations
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at new.cr
● TopLevelVisitor creates a `new` class method for every `initialize` method it
finds (the logic for this is also in new.cr)
● Classes that end up without an `initialize` need a default, argless `self.new`
method
● This phase is a bit messy right now because of some missing things related to
generics…
Semantic: new methods
class Foo
def initialize(x : Int32)
@x = x
end
# Generated from the above
def self.new(x : Int32)
instance = allocate
instance.initialize(x)
if instance.responds_to?(:finalize)
::GC.add_finalizer(instance)
end
end
end
Semantic: new methods
Semantic: overall algorithm
● top level
● new methods
● type declarations: process type declarations like `@x : Int32`
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at type_declaration_processor.cr (and
type_declaration_visitor.cr and type_guess_visitor.cr)
● Combines info gathered by these two visitors to declare the type of instance
and class variables.
● TypeDeclarationVisitor deals with explicit type declarations
● TypeGuessVisitor tries to “guess” the type of instance and class variables
without an explicit type annotations (for example @x = 1 and @x =
Foo.new)
Semantic: type declarations
Semantic: overall algorithm
● top level
● new methods
● type declarations
● check abstract defs: check that abstract defs are implemented
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at abstract_def_checker.cr
● Not a visitor, but traverses all types, and for those that have abstract defs
checks that subclasses or including modules defined those methods
Semantic: check abstract defs

More Related Content

PDF
An Introduction to Generative AI - May 18, 2023
PDF
Democratizing Data Quality Through a Centralized Platform
PPTX
Cassie Kozyrkov. Journey to AI
PPTX
Graph Databases at Netflix
PDF
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
PDF
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
PPTX
The Future of AI is Generative not Discriminative 5/26/2021
PDF
Improving Data Literacy Around Data Architecture
An Introduction to Generative AI - May 18, 2023
Democratizing Data Quality Through a Centralized Platform
Cassie Kozyrkov. Journey to AI
Graph Databases at Netflix
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
The Future of AI is Generative not Discriminative 5/26/2021
Improving Data Literacy Around Data Architecture

What's hot (20)

PDF
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
PPTX
ChatGPT, Foundation Models and Web3.pptx
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Application Assessment - Executive Summary Report
PDF
The future of AI is hybrid
PDF
Data platform architecture
PDF
Introducing Databricks Delta
PDF
Data Catalog as the Platform for Data Intelligence
PDF
Owning Your Own (Data) Lake House
PDF
Cavalry Ventures | Deep Dive: Generative AI
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PPTX
Introduction to Data Engineering
PDF
Generative AI at the edge.pdf
PDF
Data engineering
PDF
AI Toolkit for Educators
PPTX
Gaining Data Center Cooling Efficiency Through Airflow Management
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PPTX
POWER POINT PRESENTATION ON DATA CENTER
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
PDF
What Leaders Need To KNOW & DO About Generative AI.pdf
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
ChatGPT, Foundation Models and Web3.pptx
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Application Assessment - Executive Summary Report
The future of AI is hybrid
Data platform architecture
Introducing Databricks Delta
Data Catalog as the Platform for Data Intelligence
Owning Your Own (Data) Lake House
Cavalry Ventures | Deep Dive: Generative AI
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Introduction to Data Engineering
Generative AI at the edge.pdf
Data engineering
AI Toolkit for Educators
Gaining Data Center Cooling Efficiency Through Airflow Management
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
POWER POINT PRESENTATION ON DATA CENTER
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
What Leaders Need To KNOW & DO About Generative AI.pdf
Ad

Similar to Crystal internals (part 1) (20)

PDF
Os Keysholistic
PDF
The Holistic Programmer
PDF
Programming Languages #devcon2013
PDF
Programming Languages: some news for the last N years
PPTX
Cs419 Compiler lec1&2 introduction
PDF
Cimplementation
PPT
introduction of compiler unit 1 phases of compiler
PPTX
Compilers Are Databases
PPTX
PPTX
Presentation 1(Compiler Construction).pptx
KEY
U Xml Defense presentation
PDF
COSC3054 Lec 05 - Semantic Analysis and Type checking B.pdf
PPT
Chap01-Intro.ppt
PDF
JRuby, Not Just For Hard-Headed Pragmatists Anymore
PDF
robert-kovacsics-part-ii-dissertation
PPT
Lecture 21 22
PDF
Compiler design
PPT
compiler construvtion aaaaaaaaaaaaaaaaaads
PDF
Compiler design Introduction
PPT
CC Week 11.ppt
Os Keysholistic
The Holistic Programmer
Programming Languages #devcon2013
Programming Languages: some news for the last N years
Cs419 Compiler lec1&2 introduction
Cimplementation
introduction of compiler unit 1 phases of compiler
Compilers Are Databases
Presentation 1(Compiler Construction).pptx
U Xml Defense presentation
COSC3054 Lec 05 - Semantic Analysis and Type checking B.pdf
Chap01-Intro.ppt
JRuby, Not Just For Hard-Headed Pragmatists Anymore
robert-kovacsics-part-ii-dissertation
Lecture 21 22
Compiler design
compiler construvtion aaaaaaaaaaaaaaaaaads
Compiler design Introduction
CC Week 11.ppt
Ad

Recently uploaded (20)

PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PPTX
Introduction to Windows Operating System
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
AI Guide for Business Growth - Arna Softech
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PPTX
CNN LeNet5 Architecture: Neural Networks
PPTX
Tech Workshop Escape Room Tech Workshop
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
Microsoft Office 365 Crack Download Free
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
Guide to Food Delivery App Development.pdf
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
MCP Security Tutorial - Beginner to Advanced
Wondershare Recoverit Full Crack New Version (Latest 2025)
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Introduction to Windows Operating System
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Topaz Photo AI Crack New Download (Latest 2025)
AI Guide for Business Growth - Arna Softech
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Practical Indispensable Project Management Tips for Delivering Successful Exp...
CNN LeNet5 Architecture: Neural Networks
Tech Workshop Escape Room Tech Workshop
Full-Stack Developer Courses That Actually Land You Jobs
Microsoft Office 365 Crack Download Free
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Guide to Food Delivery App Development.pdf
Advanced SystemCare Ultimate Crack + Portable (2025)
DNT Brochure 2025 – ISV Solutions @ D365
MCP Security Tutorial - Beginner to Advanced

Crystal internals (part 1)

  • 2. Is a compiler a hard thing?
  • 3. At Manas we usually do webapps
  • 4. Let’s talk about webapps...
  • 5. Let’s talk about webapps... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers
  • 6. Let’s talk about webapps... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers Easy…?
  • 7. Let’s talk about compilers... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers Easy!
  • 8. Let’s talk about compilers...
  • 9. Let’s talk about compilers...
  • 10. No, let’s talk about usual programs
  • 11. No, let’s talk about usual programs INPUT -> [PROCESSING…] -> OUTPUT
  • 12. No, let’s talk about compilers SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
  • 13. No, let’s talk about compilers SOURCE CODE -> [PROCESSING…] -> EXECUTABLE How do we go from source code to an executable?
  • 14. Traditional stages of a compiler class Foo def bar 1 + 2 end end ● Lexer: [“class”, “Foo”, “;”, “def”, “bar”, “;”, “1”, “+”, “2”, “;”, “end”, “;”, “end”] ● Parser: ClassDef(“Foo”, body: [Def.new(“bar”)]) ● Semantic (a.k.a “type check”): make sure there are no type errors ● Codegen: generate machine code
  • 15. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring
  • 16. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring Thus: writing a compiler is HARD! :-(
  • 17. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring Thus: writing a compiler is HARD! :-( Well, not anymore...
  • 20. Codegen With LLVM, we generate LLVM IR (internal representation) instead of assembly, and LLVM takes care of generating efficient assembly code for us! The hardest part is solved :-)
  • 21. define i32 @add(i32 %x, i32 %y) { %0 = add i32 %x, %y ret i32 %0 } Codegen: LLVM (example)
  • 22. LLVM provides a nice API to generate IR require "llvm" mod = LLVM::Module.new("main") mod.functions.add("add", [LLVM::Int32, LLVM::Int32], LLVM::Int32) do |func| func.basic_blocks.append do |builder| res = builder.add(func.params[0], func.params[1]) builder.ret(res) end end puts mod
  • 23. ● Lexer ● Parser ● Semantic Remaining phases
  • 24. ● Kind of easy: go char by char until we get a keyword, identifier, number, etc. ● We won’t go into implementation details... Lexer
  • 25. ● Kind of easy: go token by token and create a tree of expressions ● This tree is called AST: Abstract Syntax Tree ● An AST is like a directed, acyclic graph ● We won’t go into implementation details... Parser
  • 26. ● This is the fundamental piece of the compiler ● It takes an AST as input and analyzes it ● Analysis can result in: ○ Declaring types: for example “class Foo; end” will declare a type Foo ○ Checking methods: for example “Foo.bar” will check that “Foo” is a declared type and that the method “bar” exists in it, and has the correct arity and types ○ Giving each non-dead expression in the program a type ○ Gathering some info for the codegen phase: for example know the local variables of a method, and their type Semantic
  • 27. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Semantic
  • 28. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Writing a compiler is easier than writing a web app! ^_^ Semantic
  • 29. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Writing a compiler is easier than writing a web app! ^_^ (Or at least it’s more fun :-P) Semantic
  • 31. Directory layout ● src/compiler/crystal ○ command/ ○ syntax/ ○ semantic/ ○ macros/ ○ codegen/ ○ tools/ ○ compiler.cr ○ types.cr ○ program.cr
  • 32. Directory layout ● src/compiler/crystal ○ command/ : the command line interface ○ syntax/ : lexer, parser, ast, visitor, transformer ○ semantic/ : type declaration, method lookup, etc. ○ macros/ : macro expansion logic ○ codegen/ : codegen ○ tools/ : doc generator, formatter, init ○ compiler.cr : combines syntax + semantic + codegen ○ types.cr : all possible types in Crystal (Int32, String, unions, custom types, etc.) ○ program.cr : holds definitions of a program (holds Int32, String, etc.)
  • 33. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC
  • 34. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code.
  • 35. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code. One big Rails app at Manas has 14K LOC in “./app”
  • 36. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code. One big Rails app at Manas has 14K LOC in “./app” A compiler can’t be that hard! ;-)
  • 37. Show me the code
  • 38. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end
  • 39. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end
  • 40. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 41. Program ● Holds all types and top-level methods for a given compilation ● For example, if I compile “class Foo; end” and you compile “class Bar; end”, the first program will have a type named “Foo”, and the second one won’t (but it will have a type named “Bar”) ● It lets us test the compiler more easily, because we can use different Program instances for each snippet of code that we want to test ● In contrast of having global variables holding all of a program’s data ● A Program is passed around in all phases of a compilation (except lexing and parsing, which don’t need semantic info)
  • 42. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source # from source to Crystal::ASTNode node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 43. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats # Semantic! :-) codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 44. Semantic ● The entry point for semantic analysis is in src/compiler/crystal/semantic.cr ● Other files are in src/compiler/crystal/semantic/ ● The file semantic.cr has comments that explain the overall algorithm :-)
  • 45. Semantic: overall algorithm ● top level: declare classes, modules, macros, defs and other top-level stuff ● new methods: create `new` methods for every `initialize` method ● type declarations: process type declarations like `@x : Int32` ● check abstract defs: check that abstract defs are implemented ● class_vars_initializers: process initializers like `@@x = 1` ● instance_vars_initializers: process initializers like `@x = 1` ● main: process "main" code, calls and method bodies (the whole program). ● cleanup: remove dead code and other simplifications ● check recursive structs: check that structs are not recursive (impossible to codegen)
  • 46. Semantic: overall algorithm Note! ● This algorithm didn’t come from the Skies (nor from a textbook, nor from a paper) ● It’s not written in stone! ● It can definitely be improved: readability, performance, etc.
  • 47. Note! ● It’s actually more like this… Semantic: overall algorithm
  • 48. Semantic But before looking at each phase, we need to learn about the most useful pattern for analyzing an AST...
  • 50. require "compiler/crystal/syntax" class SumVisitor < Crystal::Visitor getter sum = 0 def visit(node : Crystal::NumberLiteral) @sum += node.value.to_i end def visit(node : Crystal::ASTNode) true # true: continue visiting children nodes end end ast = Crystal::Parser.parse("foo(1 + 2, 3, [4])") visitor = SumVisitor.new ast.accept(visitor) puts visitor.sum
  • 51. The Visitor pattern ● We define a visit method for each node of interest ● We process the nodes ● We return true if we want to process children, false otherwise ● Example: if we only want to process class declarations, we could just define visit(node : Crystal::ClassDef) and define some logic there (and return true, because of nested class definitions) ● A visitor abstracts over the way nodes are composed ● ...though in many cases, for semantic purposes, we need and use the way a node is composed (for example, to analyze a call we need to know the argument types, so we check the arguments, not all children in a generic way)
  • 52. Semantic: overall algorithm ● top level: declare classes, modules, macros, defs and other top-level stuff ● new methods ● type declarations ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 53. Top level: declare classes, modules, macros, defs... # src/compiler/crystal/semantic/top_level_visitor.cr class Crystal::TopLevelVisitor < Crystal::SemanticVisitor # ... end
  • 54. ● Located at semantic_visitor.cr ● This is a base visitor used in most of the phases of the semantic analysis ● It keeps track of the “current type” ● For example in “class Foo; class Bar; baz; end; end”, “current type” starts at the top-level (the Program). When “class Foo” is found, the current type becomes “Foo” (we search “Foo” in the current type). When “class Bar” is found, the current type becomes “Foo::Bar” (we search “Bar” in the current type). When “baz” is found, it will be looked up inside the current type. ● But initially there’s no “Foo” inside the current type (the Program). Who defines it? … The top-level visitor! Crystal::SemanticVisitor
  • 55. ● Located at top_level_visitor.cr ● Defines classes, methods, etc. ● Given “class Foo; class Bar; baz; end; end”... ● current_type starts at Program ● When “class Foo” is found (ClassDef), we check if “Foo” exists in the current type. If not, we create it. If it exists with a different type (if it’s a module), we give an error. ● We attach this type “Foo” to the AST node ClassDef. SemnticVisitor will use this in every subsequent phase. ● … the “baz” call is not analyzed here (unless it’s a macro) Crystal::TopLevelVisitor
  • 56. Crystal::TopLevelVisitor ● Many other things done in this visitor: methods and macros are added to types, aliases and enums are defined, etc. ● Question: why are methods and macros defined at this phase?
  • 57. ● The “inherited” macro hook must be processed as soon as “Bar < Foo” and “Baz < Foo” are found ● The macro expands to “do_something”, which must expand to “def foo; 1; end” ● This must happen before we continue processing Baz’s body: “def foo; 3; end” must win and be the method found when doing “Baz.new.foo” ● Conclusion: methods, macros and hooks must be defined in the first pass, when defining types. Additionally, macros might be looked up in types in this same pass (like “do_something”) ● SemanticVisitor takes care to look up and expand calls that resolve to macro calls When should macros be defined and expanded class Foo macro inherited do_something end macro do_something def foo; 1; end end end class Bar < Foo; end class Baz < Foo def foo; 3; end end puts Bar.new.foo # => 1 puts Baz.new.foo # => 3
  • 58. Method overloads ● Crystal methods are very powerful! For example: optional type restrictions, different number of arguments, default arguments, splat, etc. ● When methods are added to types we need to: ○ Know if a method replaces (redefines) an old method ○ Track whether a method is “stricter” than another method, to quickly know, given a call argument types, in which order they are going to be tested
  • 59. Method restrictions def foo(x : Int32) puts 1 end def foo(x) puts 2 end foo(1) foo('a') ● Given foo(1), both methods match it. However, the first overload should be invoked because it has a stronger restriction than the second overload. ● If we define the methods in a different order, it still works the same ● This is because an argument with a type restriction is stronger than one without one. We say that the first one is a restriction of the second one (we should probably rename this to use stronger) ● This applies to types too: Int32 is stronger than Int32 | String. And Bar is stronger than Foo, if Bar < Foo. ● Given two methods with the same name, if all arguments of a method are stronger than the others’, the whole method is stronger and should come first. Each type stores an ordered list of methods indexed by method name, with this notion. ● If the methods are both stronger than each other, they have the same restriction.
  • 60. Method restrictions def foo(x : Int32) puts 1 end def foo(x) puts 2 end foo(1) foo('a') ● This logic is located at restrictions.cr ● A lot of cases to consider: generics, tuples, splats, etc. ● The code and algorithms could probably use a simpler, unified logic and a cleanup, but first all of these concepts and definitions must be defined much more formally
  • 61. Semantic: overall algorithm ● top level ● new methods: create `new` methods for every `initialize` method ● type declarations ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 62. ● Located at new.cr ● TopLevelVisitor creates a `new` class method for every `initialize` method it finds (the logic for this is also in new.cr) ● Classes that end up without an `initialize` need a default, argless `self.new` method ● This phase is a bit messy right now because of some missing things related to generics… Semantic: new methods
  • 63. class Foo def initialize(x : Int32) @x = x end # Generated from the above def self.new(x : Int32) instance = allocate instance.initialize(x) if instance.responds_to?(:finalize) ::GC.add_finalizer(instance) end end end Semantic: new methods
  • 64. Semantic: overall algorithm ● top level ● new methods ● type declarations: process type declarations like `@x : Int32` ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 65. ● Located at type_declaration_processor.cr (and type_declaration_visitor.cr and type_guess_visitor.cr) ● Combines info gathered by these two visitors to declare the type of instance and class variables. ● TypeDeclarationVisitor deals with explicit type declarations ● TypeGuessVisitor tries to “guess” the type of instance and class variables without an explicit type annotations (for example @x = 1 and @x = Foo.new) Semantic: type declarations
  • 66. Semantic: overall algorithm ● top level ● new methods ● type declarations ● check abstract defs: check that abstract defs are implemented ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 67. ● Located at abstract_def_checker.cr ● Not a visitor, but traverses all types, and for those that have abstract defs checks that subclasses or including modules defined those methods Semantic: check abstract defs