SlideShare a Scribd company logo
What Lies Beneath
Mohit Thatte
EUROCLOJURE 2015
Barcelona
A Deep Dive into
Clojure’s data structures
@mohitthatte
@pastafari
A DAY IN THE LIFE
Image: User:Joonspoon Wikimedia Commons
Programs that use Maps
Map API
Map Implementation
Primitives (JVM, et al)
TOWERS OF ABSTRACTION
“Any sufficiently advanced data structure
is indistinguishable from magic”
- Me
With apologies to Arthur Clarke
IMMUTABILITY
IS GOOD
PERFORMANCE IS
NECESSARY
By U.S. Navy photo [Public domain], via Wikimedia Commons
IMMUTABILITY
PERF
Image: Maj. Gen. William Anders, Apollo 8
“… functional programming’s stricture
against destructive updates (assignments)
is a staggering handicap, tantamount to
confiscating a master chef’s knives.”
- Chris Okasaki
ABSTRACT DATA TYPE
enqueue add an element to the end
head first element
tail remaining elements
QUEUE
INTERFACE INVARIANTS
NAME
THE CHALLENGE
Correct
Performant
Immutable
X
CHALLENGE ACCEPTED
Structural Sharing
KEY IDEAS
Structural Bootstrapping
Hybrid Structures
STRUCTURAL SHARING
:a :b :c :d :e
(assoc v 2 :zz)
:a :b :zz
STRUCTURAL SHARING
:c
:a
:d
:f
:m
(assoc v 4 :zz)
:e:b
:d
:f
:zz
Image: Alan Levine
STRUCTURAL
DECOMPOSITION
Image: Alan Chia (Lego Color Bricks)
HYBRID STRUCTURES
LETS DIVE IN!
‘(1 2 3) Lists: Code manipulation
[1 2 3] Vectors: All things sequential
{:a 1 :b 2} Maps: Structured Data
#{a e i o u} Sets: Ermm, Sets
CLOJURE DATA STRUCTURES
MAPS
GET GET value for given key
ASSOC ADD key,value to map
DISSOC REMOVE key,value from map
MERGE MERGE two maps together
THE MAP INTERFACE
WHAT MAKES A GOOD MAP?
Constant time operations
independent of number of keys
Efficient space utilization even with mutation
Objects as keys, Objects as values
IDEAS
ARRAYS
IDEA #1
:a 1 :b 2 :c 3
KEY VALUE PAIRS
NOT A GREAT MAP!
Time complexity O(n)
Space efficiency NO
Objects as keys YES
HOW DO WE DO
BETTER?
Image: www.pooktre.com
TREES TO THE RESCUE
Ramon Llull,
Catalunya c. 1250
Arbol de ciencia
IDEA #2
BINARY SEARCH TREE
13 a
8 f 17
1 11q b
6 z
15 s
r
n25
t22 u27
13 a
17
m
r
25
u27
NOT A GREAT MAP!
Time complexity worst case O(n)
Space efficiency POSSIBLY
Objects as keys YES
How do we keep our
trees in ‘balance’?
IDEA #3
BALANCED
BINARY SEARCH TREES
RED BLACK TREES
ALWAYS BALANCED,
100 % MONEY BACK GUARANTEE
Guibas, Sedgwick 1978
RED BLACK TREES
Root is black
Every path from root to an empty node
contains the same number of black nodes
Every node is colored red or black
No red node can have a red child
RED BLACK TREES
Okasaki ‘96
A PRETTY GOOD MAP!
Time complexity O(log2N)
Space efficiency YES
Objects as keys YES
Clojure’s
sorted-maps are
Red Black Trees
CONSTRAINTS
KEYS MUST BE COMPARABLE
KEYS ARE COMPARED AT EVERY
NODE, THIS CAN BE EXPENSIVE
IDEA #4
TRIE - SEARCH BY DIGIT
tap
LEVEL 0
LEVEL 1
LEVEL 2
next(node, symbol)
FINITE STATE MACHINE
Symbols #{a..z}
Nodes, Edges
TRIE IMPLEMENTATIONS
Associate each symbol with
an offset, e.g a=0,b=1,…
LOOKUP TABLES
next = lookup(node, offset)
Fast and space efficient trie searches, Bagwell 2000
ADD
NOT A GREAT MAP!
Time complexity O(logmN)
Space efficiency NO
Objects as keys NO
How do we avoid null
nodes?
IDEA #4
BST + TRIE = TST
Bentley, Sedgwick 1998
Fast and space efficient trie searches, Bagwell 2000
ADD
A DECENT MAP
Time complexity ~O(log2N)
Space efficiency YES
Objects as keys NO
No null nodes,
but can we do better
than log2N?
CHALLENGE ACCEPTED
Fast and space efficient trie searches, Bagwell 2000
Array Mapped Trie
IDEA #5
Use bitmaps to determine
presence or absence
of symbol
Lets say we have 16 symbols,
0…15
0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0
USING BITMAPS
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Does the symbol with offset 6 exist?
mask = 1 << offset
bitmap & mask
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
bitwise AND with a mask
There’s an array alongside
that only contains entries
for the 1’s.
NOT pre-allocated.
What offset in the dynamic
array should I look at?
Image: Martin Fisch, flickr.com
USE THE 1’S AS TALLY MARKS
0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0
0 1 2 3 4
MapEntry MapEntry
SubTrie
Pointer
MapEntry MapEntry
0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0
USING BITMAPS
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Where in the array is the entry for ‘6’?
Integer.bitCount(bitmap & mask)
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
Count tally marks to the ‘right’ of offset
mask = (1 << 6 ) - 1
How do I create a mask to do that?
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
What happens if I insert a new
map entry?
0 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0
0 1 2 3 4
MapEntry MapEntry MapEntry MapEntry MapEntry
0 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0
0 1 2 3 4 5
Map
Entry
Map
Entry
SubTrie
Pointer
Map
Entry
Map
Entry
Map
Entry
A DECENT MAP
Time complexity O(logmN)
Space efficiency YES
Objects as keys NO
How do we support
arbitrary
Objects as keys?
Ideal hash trees, Bagwell 2001
Hashing + AMT
IDEA #6
Ideal hash trees, Bagwell 2001
Use a good hash function
to generate an integer key.
STEP 1
0010 1101 1011 1110 1100 1111 1111 1001
hasheq
STEP 2
72021 35
Divide the 32 bit integer into ‘symbols’
5 bits at a time.
00101 001111010010101 000110100101
11
Use the ‘symbols’ to walk down an AMT
t bits per symbol
give
2t symbols
Why 5 bits?
BIT JUGGLING!
Compute ‘symbols’ by shifting and masking
00111000110010110100101010100101
00 00000 00000 00000 00000 00000 11111
(hash >>> shift) & 0x01f
How to calculate nth digit?
Shift by 5*n and mask with 0x1f
BEST COMMENT EVER.
A persistent rendition of Phil Bagwell's
Hash Array Mapped Trie
Hickey R., Grand C., Emerick C., Miller A., Fingerhut A.
Uses path copying for persistence
HashCollision leaves vs. extended hashing
Node polymorphism vs. conditionals
No sub-tree pools or root-resizing
Any errors are my own
PersistentHashMap.java:19
NODE POLYMORPHISM
ArrayNode - 32 wide pointers to sub-tries
BitmapIndexedNode - bitmap + dynamic array
HashCollisionNode - array for things that collide
EXAMPLE
(let [h (zipmap (range 1e6)
(range 1e6))]
(get h 123456))
10111 111001100101001 00010
28259 223
0101100000
110
shift = 0
ArrayNode
ArrayNode
shift = 5
ArrayNode
shift = 10
BitmapIndexedNode
shift = 15
… and then follow the AMT down
A GOOD MAP
Time complexity O(log32N)
Space efficiency YES
Objects as keys YES
Key compared only once
Bit juggling for great performance!
HAMT
~6 hops to a leaf node
NEED ROOT RESIZING
NOT AMENABLE TO
STRUCTURAL SHARING
REGULAR HASH TABLE?
UPDATES?
Search for the key,
clone leaf nodes and path to root
VECTORS
ArrayNode’s all the way.
Break ‘index’ into digits and walk down levels.
INTUITION
(let [arr (vec (range 1e6))]
(nth arr 123456))
030 182400
shift = 15
ArrayNode
ArrayNode
shift = 10
ArrayNode
shift = 5
ArrayNode
shift = 0
00011 000001001011000000000000000000
123456
THE TAIL OPTIMIZATION
PersistentVector
count shift root tail
RIGHT TOOL
FOR THE JOB
By Schnobby (Own work) [CC BY-SA 3.0], via Wikimedia Commons
HashMaps do not
merge efficiently
data.int-map
MAP CATENATION
Okasaki & Gill’s “Fast Mergeable int maps”
Zach Tellman
Vectors do not
concat efficiently
Vectors do not
subvec efficiently
VECTOR CATENATION
Based on Bagwell and Rompf,
“RRB-Trees: Efficient Immutable Vectors”
logarithmic catenation and slicing
Michal Marczyk
core.rrb-vector
TODO: benchmarks
CTRIES
Michál Marczyk
Tomorrow at 0850
1959 Birandais, Fredkin Trie
1960 Windley,Booth, Colin,Hibbard Binary Search Trees
1962 Adelson-Velsky, Landis AVL Trees
1978 Guibas, Sedgwick Red Black Trees
1985 Sleator, Tarjan Splay Trees
1996 Okasaki Purely Functional
Data Structures
1998 Sedgwick Ternary Search Trees
2000 Phil Bagwell AMT
2001 Phil Bagwell HAMT
2007 Rich Hickey Clojure!
Reading List
Ideal Hash Trees, Bagwell 2001
Fast and efficient trie searches, Bagwell 2000
Fast Mergeable Integer Maps, Okasaki & Gill, 1998
The worlds fastest scrabble program, Appel & Jacobson, 1988
File searching using variable length keys, Birandais, 1959
Purely Functional Data Structures, Okasaki 1996
Polymatheia: Jean Niklas L’Orange
QUESTIONS?
Ask Michal or Zach or Jean Niklas :)
THANK YOU

More Related Content

PDF
The Graph Traversal Programming Pattern
PDF
Nosql data models
PPTX
It elective cs366 barizo radix.docx
PDF
5.1 K plus proches voisins
PDF
Cours d'introduction aux HTML5 & CSS3
PPT
Data structures & algorithms lecture 3
PDF
Business intelligence | State of the art
PDF
Basic data structures in python
The Graph Traversal Programming Pattern
Nosql data models
It elective cs366 barizo radix.docx
5.1 K plus proches voisins
Cours d'introduction aux HTML5 & CSS3
Data structures & algorithms lecture 3
Business intelligence | State of the art
Basic data structures in python

What's hot (20)

PPTX
Hadoop fault-tolerance
PPTX
Transformation Processing Smackdown; Spark vs Hive vs Pig
PPTX
B tree
PPTX
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
PDF
Conhecendo Apache Cassandra @Movile
PDF
Machine Learning
PPTX
4'th Fastest Super Computer
PPTX
Les Base de Données NOSQL -Presentation -
PPTX
Les structures de données.pptx
PDF
IBM - Introduction to Cloudant
PDF
informatique_logiquarchitecture_applicative
PDF
Présentation des bases de données orientées graphes
PPT
Cours compilation
PPTX
Linked list in Data Structure and Algorithm
PDF
Cours Big Data Chap1
PPTX
Hadoop hdfs
PPTX
Apache Cassandra at the Geek2Geek Berlin
PDF
Initiation à Android
PDF
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
Hadoop fault-tolerance
Transformation Processing Smackdown; Spark vs Hive vs Pig
B tree
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Conhecendo Apache Cassandra @Movile
Machine Learning
4'th Fastest Super Computer
Les Base de Données NOSQL -Presentation -
Les structures de données.pptx
IBM - Introduction to Cloudant
informatique_logiquarchitecture_applicative
Présentation des bases de données orientées graphes
Cours compilation
Linked list in Data Structure and Algorithm
Cours Big Data Chap1
Hadoop hdfs
Apache Cassandra at the Geek2Geek Berlin
Initiation à Android
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
Ad

Similar to A deep dive into Clojure's data structures - EuroClojure 2015 (20)

PDF
Spatially resolved pair correlation functions for point cloud data
PDF
Optimizing array-based data structures to the limit
PDF
Do's and Don'ts of using t-SNE.pdf
PPT
simple notes for ug students for college
PPT
Lecture12
PDF
Part Numbering and ID codes: general considerations and check digits
PPT
5.4 randomized datastructures
PPT
5.4 randomized datastructures
PDF
Introduction to VTK
PPT
Sbox_design_tossssssssssssssssssssss.ppt
PDF
Sienna 9 hashing
PDF
Lab Assignment 17 - Working with Object ArraysIn the old days we w.pdf
PDF
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
PDF
Beyond Floating Point – Next Generation Computer Arithmetic
PPT
lecture 9
DOC
2 4 Tree
PPTX
Deep learning simplified
PPTX
MATLABgraphPlotting.pptx
PDF
Dancing Links: an educational pearl
PDF
Deep Learning and Design Thinking
Spatially resolved pair correlation functions for point cloud data
Optimizing array-based data structures to the limit
Do's and Don'ts of using t-SNE.pdf
simple notes for ug students for college
Lecture12
Part Numbering and ID codes: general considerations and check digits
5.4 randomized datastructures
5.4 randomized datastructures
Introduction to VTK
Sbox_design_tossssssssssssssssssssss.ppt
Sienna 9 hashing
Lab Assignment 17 - Working with Object ArraysIn the old days we w.pdf
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Beyond Floating Point – Next Generation Computer Arithmetic
lecture 9
2 4 Tree
Deep learning simplified
MATLABgraphPlotting.pptx
Dancing Links: an educational pearl
Deep Learning and Design Thinking
Ad

Recently uploaded (20)

PPTX
Introduction to Artificial Intelligence
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administration Chapter 2
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
ai tools demonstartion for schools and inter college
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
medical staffing services at VALiNTRY
PDF
Nekopoi APK 2025 free lastest update
PPTX
Transform Your Business with a Software ERP System
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Introduction to Artificial Intelligence
How to Migrate SBCGlobal Email to Yahoo Easily
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
CHAPTER 2 - PM Management and IT Context
Design an Analysis of Algorithms II-SECS-1021-03
Navsoft: AI-Powered Business Solutions & Custom Software Development
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Reimagine Home Health with the Power of Agentic AI​
Understanding Forklifts - TECH EHS Solution
System and Network Administration Chapter 2
PTS Company Brochure 2025 (1).pdf.......
ai tools demonstartion for schools and inter college
Which alternative to Crystal Reports is best for small or large businesses.pdf
medical staffing services at VALiNTRY
Nekopoi APK 2025 free lastest update
Transform Your Business with a Software ERP System
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Upgrade and Innovation Strategies for SAP ERP Customers
2025 Textile ERP Trends: SAP, Odoo & Oracle
Lecture 3: Operating Systems Introduction to Computer Hardware Systems

A deep dive into Clojure's data structures - EuroClojure 2015