SlideShare a Scribd company logo
1
Building a nosql from scratch
Let them know what they are missing!
#ddtx16
@edwardcapriolo
@HuffPostCode
2
If you are looking for
ïŹ
A battle tested NoSQL data store
ïŹ
That scales up to 1 million transactions a second
ïŹ
Allows you to query data from your IoT sensors in real time
ïŹ
You are at the wrong talk!
ïŹ
This is a presentation about Nibiru
ïŹ
An open source database I work on in my spare time
ïŹ
But you should stay anyway...
3
Motivations
ïŹ
Why do that?
ïŹ
How this got started?
ïŹ
What did it morph into?
ïŹ
Many NoSQL databases came out of an industry specific use
case and as a result they had baked in assumptions. If we
have clean interfaces and good abstractions we can make a
better general tool with lessed forced choices.
ïŹ
Pottentially support a majority of the use cases in one
tool.
4
A friend asked
ïŹ
Won't this make Nibiru have all the bugs of all the systems?
5
My response
ïŹ
Jerk!
6
You might want to follow along with local copy
ïŹ
There are a lot of slides that have a fair amount of code
ïŹ
https://github.com/edwardcapriolo/nibiru/blob/master/hexagons.ppt
ïŹ
http://bit.ly/1NcAoEO
7
Basics
8
Terminology
ïŹ
Keyspace: A logical grouping of store(s)
ïŹ
Store: A structure that holds data
− Avoided: Column Family, Table, Collection, etc
ïŹ
Node: a system
ïŹ
Cluster: a group of nodes
9
Assumptions & Design notes
ïŹ
A store is of a specific type Key Value, Column Family, etc
ïŹ
The API of the store is dictated by the type
ïŹ
Ample gotchas from one man, after work, project
ïŹ
Wire components together, not into a large context
ïŹ
Using string (for now) instead of byte[] for debug
10
Server ID
ïŹ
We need to uniquely identify each node
ïŹ
Hostname/ip is not good solution
− Systems have multiple
− Can change
ïŹ
Should be able to run N copies on single node
11
Implementation
ïŹ
On first init() create guid and persist
12
Cluster Membership
13
Cluster Membership
ïŹ
What is a list of nodes in the cluster?
ïŹ
What is the up/down state of each node?
14
Static Membership
15
Different cluster membership models
ïŹ
Consensus/Gossip
− Cassandra
− Elastic Search
ïŹ
Master Node/Someone
elses problem
− HBase (zookeeper)
16
Gossip
http://www.joshclemm.com/projects/
17
Teknek Gossip
ïŹ
Licenced Apache V2
ïŹ
Forked from google code project
ïŹ
Available from maven g: io.teknek a: gossip
ïŹ
Great tool for building a peer-to-peer service
18
Cluster Membership using Gossip
19
Get Live Members
20
Gutcheck
ïŹ
Did clean abstractions hurt the design here?
ïŹ
Does it seem possible we could add zookeeper/etcd as a
backend implemention?
ïŹ
Any takers? :)
21
Request Routing
22
Some options
ïŹ
So you have a bunch of nodes in a cluster,
but where the heck does the data go?
ïŹ
Client dictated - like a sharded memcache|mysql|whatever
ïŹ
HBase - Sharding with a leader election
ïŹ
Dynamo Style - ring topology token ownership
23
Router & Partitioners
24
Pick your poison: no hot spots or key locality :)
25
Quick example LocalPartitioner
26
Scenario: using a Dynamo-ish router
ïŹ
Construct a three node topology
ïŹ
Give each an id
ïŹ
Give them each a token
ïŹ
Test that requests route properly
27
Cluster and Token information
28
Unit Test
29
Token Router
30
Do the Damn Thing!
31
Do the Damn Thing! With Replication
32
Storage Layer
33
Basic Data Storage SSTables
ïŹ
SS = Sorted String { 'a', $PAYLOAD$ },
{ 'b', $PAYLOAD$ }
34
LevelDB SSTable payload
ïŹ
Key Value implementation
ïŹ
SortedMap<byte, byte>
{ 'a', '1' },
{ 'b', '2' }
35
Cassandra SSTable Implementation
ïŹ
Key Value in which value is a
map with last-update-wins
versioning
ïŹ
SortedMap<byte, SortedMap
<byte, Val<byte,long>>
{ 'a', { 'col':{ 'val', 1 } } },
{ 'b', {
'col1':{ 'val', 1 },
'col2':{ 'val2', 2 }
}
}
36
HBase SSTable Implementation
ïŹ
Key-Value in which value is a
map with multi-versioning
ïŹ
SortedMap<byte, SortedMap
<byte, Val<byte,long>>
{
{ 'a', { 'col':{ 'val', 1 } } },
{ 'b', {
'col1':{ 'val', 1 },
'col1':{ 'valb', 2 },
'col2':{ 'val2', 2 }
}
}
}
37
Column Family Store high level
38
Operations to support
39
One possible memtable implementation
ïŹ Holy Generics batman!
ïŹ Isn't it just a map of map?
40
Unforunately no!
ïŹ
Imagine two requests arrive in this order:
− set people [edward] [age]='34' (Time 2)
− set people [edward] [age]='35' (Time 1)
ïŹ
What should be the final value?
ïŹ
We need to deal with events landing out of order
ïŹ
Also exists delete write known as Tombstone
41
And then, there is concurrency
ïŹ
Multiple threads manipulating at same time
ïŹ
Proposed solution: (Which I think is correct)
− Do not compare and swap value, instead append to queue and take
a second pass to optimize
42
43
Optimization 1: BloomFilters
ïŹ
Use guava. Smart!
ïŹ
Audiance: make disapointed aww sound because Ed did not
write it himself
44
Optimization 2: IndexWriter
ïŹ
Not ideal to seek a disk like you would seek memory
45
Consistency
46
Multinode Consistency
ïŹ
Replication: Number of places data lives
ïŹ
Active/Active Master/Slave (with takover)
ïŹ
Resolving conflicted data
47
Quorum Consistency
Active/Active Implemantation
48
Message dispatched
49
Asyncronos Responses T1
50
Asyncronos Responses T2
51
Logic to merge results
52
Breakdown of components
ïŹ
Start & dedline : Max time to wait for requests
ïŹ
Message : The read/write request sent to each destination
ïŹ
Merger : Turn multiple responses into single result
53
54
Testing
55
Challenges of timing in testing
ïŹ
Target goal is ~ 80% unit 20% integetration (e2e) testing
ïŹ
Performance varies in local vs travis-ci
ïŹ
Hard to test something that typically happens in milliseconds
but at worst case can take seconds
ïŹ
Lazy half solution: Thread.sleep() statements for worst case
− Definately a slippery slope
56
Introducing TUnit
ïŹ
https://github.com/edwardcapriolo/tunit
57
The End

More Related Content

PDF
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
PDF
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
PDF
Cassandra Core Concepts - Cassandra Day Toronto
PDF
Cassandra and Docker Lessons Learned
PDF
Diagnosing Problems in Production (Nov 2015)
PDF
Diagnosing Problems in Production - Cassandra
PDF
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
PDF
Redis Everywhere - Sunshine PHP
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra and Docker Lessons Learned
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production - Cassandra
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
Redis Everywhere - Sunshine PHP

What's hot (20)

PDF
High Availability for OpenStack
PDF
Atomic CLI scan
PPTX
Stacking up with OpenStack: Building for High Availability
PDF
Python performance profiling
PDF
Single tenant software to multi-tenant SaaS using K8S
PDF
Cassandra: An Alien Technology That's not so Alien
PDF
Mesosphere and Contentteam: A New Way to Run Cassandra
PDF
Innovating faster with SBT, Continuous Delivery, and LXC
PDF
C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley
PDF
Performance Monitoring: Understanding Your Scylla Cluster
PDF
How you can contribute to Apache Cassandra
PPTX
Best Practices for Running Kafka on Docker Containers
PDF
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
PPTX
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
PDF
High Performance Systems in Go - GopherCon 2014
PDF
Andrew Nelson - Zabbix and SNMP on Linux
 
PPTX
How Yelp does Service Discovery
PDF
Understanding performance aspects of etcd and Raft
PDF
Openstack HA
PPTX
OpenStack HA
High Availability for OpenStack
Atomic CLI scan
Stacking up with OpenStack: Building for High Availability
Python performance profiling
Single tenant software to multi-tenant SaaS using K8S
Cassandra: An Alien Technology That's not so Alien
Mesosphere and Contentteam: A New Way to Run Cassandra
Innovating faster with SBT, Continuous Delivery, and LXC
C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley
Performance Monitoring: Understanding Your Scylla Cluster
How you can contribute to Apache Cassandra
Best Practices for Running Kafka on Docker Containers
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
High Performance Systems in Go - GopherCon 2014
Andrew Nelson - Zabbix and SNMP on Linux
 
How Yelp does Service Discovery
Understanding performance aspects of etcd and Raft
Openstack HA
OpenStack HA
Ad

Viewers also liked (19)

PDF
Jornal digital 4842_seg_25012016
PPT
Cheap pinterest followers
PDF
Proposition de sortie de crise du G8
PDF
DrÀger X-am 2000 Portable Gas Detector - Spec Sheet
PPT
Cheapest followers
PDF
Efficient Temporal Association Rule Mining
 
PPTX
la obesidad en el ecuador
PPT
Cheap pinterest followers buy
PDF
La conquista del pan - Kropotkin
DOCX
CinematicJurisprudenceTrueDetective
PDF
D04011824
 
PPTX
Blogger joaquin isidro gonzalez badillo 1a informatica
PDF
C04011117
 
PDF
Drone It Better: Identificare attivitĂ  in regola
PPTX
Ley de coulomb marco
PDF
Ejercicio seminario 6 realizado
PPTX
Asociacion de cultores
PDF
Drone It Better: Sistemi apr uav,come operare in regola
PPT
Peñalolen una mirada de esperanza
Jornal digital 4842_seg_25012016
Cheap pinterest followers
Proposition de sortie de crise du G8
DrÀger X-am 2000 Portable Gas Detector - Spec Sheet
Cheapest followers
Efficient Temporal Association Rule Mining
 
la obesidad en el ecuador
Cheap pinterest followers buy
La conquista del pan - Kropotkin
CinematicJurisprudenceTrueDetective
D04011824
 
Blogger joaquin isidro gonzalez badillo 1a informatica
C04011117
 
Drone It Better: Identificare attivitĂ  in regola
Ley de coulomb marco
Ejercicio seminario 6 realizado
Asociacion de cultores
Drone It Better: Sistemi apr uav,come operare in regola
Peñalolen una mirada de esperanza
Ad

Similar to Building your own NSQL store (20)

PDF
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
PDF
Under The Hood Of A Shard-Per-Core Database Architecture
PDF
Surge2012
PDF
Fast and Scalable Python
PPTX
Data oriented design and c++
PDF
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
PPTX
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
PDF
There's no magic... until you talk about databases
 
PDF
Postgres clusters
PDF
Infrastructure as code might be literally impossible part 2
 
PDF
Avoiding big data antipatterns
PDF
The Computer Science Behind a modern Distributed Database
PDF
Is NoSQL The Future of Data Storage?
PDF
ParaForming - Patterns and Refactoring for Parallel Programming
PDF
Exploitation and State Machines
PDF
Rhizome - Distribution in Soil (ESUG 2025)
 
KEY
SD, a P2P bug tracking system
PPTX
Speeding up R with Parallel Programming in the Cloud
PDF
Docker and-containers-for-development-and-deployment-scale12x
 
PDF
Distributed Database Consistency: Architectural Considerations and Tradeoffs
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
Under The Hood Of A Shard-Per-Core Database Architecture
Surge2012
Fast and Scalable Python
Data oriented design and c++
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
There's no magic... until you talk about databases
 
Postgres clusters
Infrastructure as code might be literally impossible part 2
 
Avoiding big data antipatterns
The Computer Science Behind a modern Distributed Database
Is NoSQL The Future of Data Storage?
ParaForming - Patterns and Refactoring for Parallel Programming
Exploitation and State Machines
Rhizome - Distribution in Soil (ESUG 2025)
 
SD, a P2P bug tracking system
Speeding up R with Parallel Programming in the Cloud
Docker and-containers-for-development-and-deployment-scale12x
 
Distributed Database Consistency: Architectural Considerations and Tradeoffs

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
PDF
Digital Strategies for Manufacturing Companies
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
System and Network Administration Chapter 2
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
L1 - Introduction to python Backend.pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
Digital Strategies for Manufacturing Companies
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
System and Network Administration Chapter 2
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Which alternative to Crystal Reports is best for small or large businesses.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
top salesforce developer skills in 2025.pdf
Understanding Forklifts - TECH EHS Solution
Softaken Excel to vCard Converter Software.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
L1 - Introduction to python Backend.pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
VVF-Customer-Presentation2025-Ver1.9.pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)

Building your own NSQL store