SlideShare a Scribd company logo
Overhauling a database
engine in 2 months
Max Neunhöffer
Move Fast and Break Things, 12 March 2015
www.arangodb.com
Max Neunhöffer
I am a mathematician
“Earlier life”: Research in Computer Algebra
(Computational Group Theory)
Always juggled with big data
Now: working in database development, NoSQL, ArangoDB
I like:
research,
hacking,
teaching,
tickling the highest performance out of computer systems.
1
ArangoDB GmbH
triAGENS GmbH offers consulting services since 2004:
software architecture
project management
software development
business analysis
a lot of experience with specialised database systems.
have done NoSQL, before the term was coined at all
2011/2012, an idea emerged:
to build the database one had wished to have all those years!
development of ArangoDB as open source software since 2012
ArangoDB GmbH: spin-off to take care of ArangoDB (2014)
2
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
is memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
and enjoys good community as well as professional support.
3
Architecture
DB Engine
Transactions Sharding
Cluster
Infrastructure
V8
JavaScript
libev
zliblibICU
Unicode
Lib: TCP server, HTTP, JSON,
OS−dep., V8 helpers
API
CRUD
high level API in JavaScript Foxx
etcd
(Go)
Client
Scheduler
RequestsHTTP
Dispatcher
HTTP Responses
4
ArangoDB in numbers
DB engine written in C++
embeds Google’s V8 (∼ 130 000 lines of code)
mostly in memory, using memory mapped files
processes JSON data, schema-less but “shapes”
library: ∼ 128 000 lines (C++)
DB engine: ∼ 210 000 lines (C++, including 12 000 for utilities)
JavaScript layer: ∼ 1 232 000 lines of code
∼ 85 000 standard API implementation
∼ 592 000 Foxx apps (API extensions, web front end)
∼ 298 000 unit tests
∼ 327 000 node.js modules
further unit tests: ∼ 10 000 C++ and ∼ 24 000 Ruby for HTTP
plus documentation
and drivers (in other repositories)
5
The Task
It is March 2014, we have just released V2.0.
V2.1 is scheduled for end of May, V2.2 is scheduled for July
V2.1 is incremental, V2.2 is “Write-Ahead-Log”
work for V2.2 started in March
unfortunately, introducing a WAL is akin to open heart surgery,
→ essentially need to reengineer the database engine
do not have the capacity to assign 10 developers to the job
6
The old setup
New data:
{ name: "watch", price: 99 }
Data files:
Collection: products
Collection: sales
(append only)
(append only)
For transactions:
Locks and commit markers on all collections necessary.
7
The new setup
"Collector" (later)
Collection: sales
Collection: products (append only)
(append only)
Data files:
New data:
Write Ahead Log (WAL) (append only)
{ name: "watch", price: 99 }
For transactions:
Less locks and commit markers only in WAL.
8
Advantages of a WAL
have a single history of events
have a single place to note the commit of a transaction
easy asynchronous replication
efficient sync to disk
uncommitted stuff does not hit the data files at all
better support for transactions → much higher performance
better crash recovery
deterministic, well defined behaviour
9
Challenges
need fundamental change in the storage engine
the collector changes stuff that is potentially being read
need to be careful not to create a bottleneck
need to get locking right
crashes hard to test
if possible, users must not notice the change
(except better performance)
10
Our testing setup
We do continuous tests after every push to github and nightly.
We have separate test suites for single server and cluster.
Different types of test for good coverage:
low-level C++ library unit tests (10000 LOC C++)
JS tests (separately on server and JS shell, 290000 LOC)
AQL query engine (230000 LOC of the above)
HTTP interface (TCP and SSL, 24000 LOC Ruby)
dump/restore and bulk import
benchmarks (3000 LOC C++)
user interface (phantomjs, comparing screen shots)
run tests with valgrind
check coverage
11
Test methodologies
Tests can have different aims:
Unit tests:
ensure that individual components work according to
specifications
Integration tests:
ensure that multiple components work together correctly
Benchmark tests:
ensure performance
End to End tests:
ensure that complex systems as a whole do their job
User interface tests:
ensure that the user interface works and behaves as
documented/specified
12
Test characteristics needed for our task
Characteristics of our task
well-defined, deterministic behaviour
if possible, no observable change in functionality
changes relatively far down in the software stack
but reach wide
temporary breakage expected and accepted
=⇒ For our task, we needed:
unit tests,
integration tests and
benchmarks.
Fortunately, we had all these in place!
13
Approach — preparation phase
1. design work:
WAL, markers, collection, compaction, where lock what
2. implement infrastructure for WAL:
mmapped files, append op, marker format
3. add “write to WAL” to write operations
4. test filling of WAL
14
Approach — breaking phase
4. remove old write operations
5. implement collector thread and adjust compactor thread
6. repair by adjusting read operations for WAL/data file
15
Approach — repairing phase
7. fix transaction management
8. implement startup with non-empty WAL: crash recovery
9. fix/simplify replication
10. fix dump/restore
11. fix cluster
12. tune performance (legends)
13. Hurray, tests work again!
14. release beta version
15. fix, fix, fix and tune
16. publish V2.2
16

More Related Content

PDF
Processing large-scale graphs with Google Pregel
PDF
Multi model-databases
PDF
Extensible Database APIs and their role in Software Architecture
PDF
Deep dive into the native multi model database ArangoDB
PDF
Query mechanisms for NoSQL databases
PDF
Multi-model databases and node.js
PDF
Backbone using Extensible Database APIs over HTTP
PDF
Experience with C++11 in ArangoDB
Processing large-scale graphs with Google Pregel
Multi model-databases
Extensible Database APIs and their role in Software Architecture
Deep dive into the native multi model database ArangoDB
Query mechanisms for NoSQL databases
Multi-model databases and node.js
Backbone using Extensible Database APIs over HTTP
Experience with C++11 in ArangoDB

What's hot (20)

PDF
Microservice-based software architecture
PDF
guacamole: an Object Document Mapper for ArangoDB
PDF
An E-commerce App in action built on top of a Multi-model Database
PDF
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
PPTX
The CIOs Guide to NoSQL
PPTX
CouchDB
PDF
Performance comparison: Multi-Model vs. MongoDB and Neo4j
PPT
Couch db
PDF
Hugfr SPARK & RIAK -20160114_hug_france
PDF
Query Languages for Document Stores
PPTX
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
ODP
PPTX
NoSQL and MapReduce
PPTX
ELK - Stack - Munich .net UG
PPTX
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
PPTX
Introduction to NoSQL Database
PPTX
Couch db
PPTX
Data Pipeline at Tapad
PDF
CouchDB
PPTX
How to integrate your database with kafka & CDC
Microservice-based software architecture
guacamole: an Object Document Mapper for ArangoDB
An E-commerce App in action built on top of a Multi-model Database
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
The CIOs Guide to NoSQL
CouchDB
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Couch db
Hugfr SPARK & RIAK -20160114_hug_france
Query Languages for Document Stores
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
NoSQL and MapReduce
ELK - Stack - Munich .net UG
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Introduction to NoSQL Database
Couch db
Data Pipeline at Tapad
CouchDB
How to integrate your database with kafka & CDC
Ad

Viewers also liked (20)

PPT
03 engine bottom end
PDF
Complex queries in a distributed multi-model database
PDF
GraphDatabases and what we can use them for
PDF
Hotcode 2013: Javascript in a database (Part 1)
PDF
Running MRuby in a Database - ArangoDB - RuPy 2012
PDF
Hotcode 2013: Javascript in a database (Part 2)
PDF
Domain Driven Design & NoSQL
PPTX
Hydraulic cylinder piston
PDF
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
PDF
Domain Driven Design & NoSQL
PDF
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
PDF
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
PDF
Is multi-model the future of NoSQL?
PDF
ArangoDB - Using JavaScript in the database
PDF
Domain driven design @FrOSCon
PDF
Rupy2012 ArangoDB Workshop Part1
PDF
Wir sind aber nicht Twitter
PDF
Domain Driven Design and NoSQL TLV
PDF
PDF
FOXX - a Javascript application framework on top of ArangoDB
03 engine bottom end
Complex queries in a distributed multi-model database
GraphDatabases and what we can use them for
Hotcode 2013: Javascript in a database (Part 1)
Running MRuby in a Database - ArangoDB - RuPy 2012
Hotcode 2013: Javascript in a database (Part 2)
Domain Driven Design & NoSQL
Hydraulic cylinder piston
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
Domain Driven Design & NoSQL
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Is multi-model the future of NoSQL?
ArangoDB - Using JavaScript in the database
Domain driven design @FrOSCon
Rupy2012 ArangoDB Workshop Part1
Wir sind aber nicht Twitter
Domain Driven Design and NoSQL TLV
FOXX - a Javascript application framework on top of ArangoDB
Ad

Similar to Overhauling a database engine in 2 months (20)

PDF
Extending DevOps to Big Data Applications with Kubernetes
PDF
HTTP Plugin for MySQL!
PDF
OpenStack Preso: DevOps on Hybrid Infrastructure
PDF
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
PDF
Open shift and docker - october,2014
PPTX
Productionalizing ML : Real Experience
PDF
AKS: k8s e azure
PDF
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
PDF
Serverless Compose vs hurtownia danych
PDF
Gluecon Preso: Hybrid Container Infrastructure
PDF
Beginning MEAN Stack
PDF
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
PDF
Fighting Against Chaotically Separated Values with Embulk
PDF
A fresh look at Google’s Cloud by Mandy Waite
PPTX
OS for AI: Elastic Microservices & the Next Gen of ML
PPTX
StrongLoop Overview
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PDF
Enterprise Data Science
PDF
Red Hat Forum Benelux 2015
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Extending DevOps to Big Data Applications with Kubernetes
HTTP Plugin for MySQL!
OpenStack Preso: DevOps on Hybrid Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
Open shift and docker - october,2014
Productionalizing ML : Real Experience
AKS: k8s e azure
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Serverless Compose vs hurtownia danych
Gluecon Preso: Hybrid Container Infrastructure
Beginning MEAN Stack
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
Fighting Against Chaotically Separated Values with Embulk
A fresh look at Google’s Cloud by Mandy Waite
OS for AI: Elastic Microservices & the Next Gen of ML
StrongLoop Overview
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Enterprise Data Science
Red Hat Forum Benelux 2015
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf

Overhauling a database engine in 2 months

  • 1. Overhauling a database engine in 2 months Max Neunhöffer Move Fast and Break Things, 12 March 2015 www.arangodb.com
  • 2. Max Neunhöffer I am a mathematician “Earlier life”: Research in Computer Algebra (Computational Group Theory) Always juggled with big data Now: working in database development, NoSQL, ArangoDB I like: research, hacking, teaching, tickling the highest performance out of computer systems. 1
  • 3. ArangoDB GmbH triAGENS GmbH offers consulting services since 2004: software architecture project management software development business analysis a lot of experience with specialised database systems. have done NoSQL, before the term was coined at all 2011/2012, an idea emerged: to build the database one had wished to have all those years! development of ArangoDB as open source software since 2012 ArangoDB GmbH: spin-off to take care of ArangoDB (2014) 2
  • 4. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions is memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JS code in the Foxx Microservice Framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation, and enjoys good community as well as professional support. 3
  • 5. Architecture DB Engine Transactions Sharding Cluster Infrastructure V8 JavaScript libev zliblibICU Unicode Lib: TCP server, HTTP, JSON, OS−dep., V8 helpers API CRUD high level API in JavaScript Foxx etcd (Go) Client Scheduler RequestsHTTP Dispatcher HTTP Responses 4
  • 6. ArangoDB in numbers DB engine written in C++ embeds Google’s V8 (∼ 130 000 lines of code) mostly in memory, using memory mapped files processes JSON data, schema-less but “shapes” library: ∼ 128 000 lines (C++) DB engine: ∼ 210 000 lines (C++, including 12 000 for utilities) JavaScript layer: ∼ 1 232 000 lines of code ∼ 85 000 standard API implementation ∼ 592 000 Foxx apps (API extensions, web front end) ∼ 298 000 unit tests ∼ 327 000 node.js modules further unit tests: ∼ 10 000 C++ and ∼ 24 000 Ruby for HTTP plus documentation and drivers (in other repositories) 5
  • 7. The Task It is March 2014, we have just released V2.0. V2.1 is scheduled for end of May, V2.2 is scheduled for July V2.1 is incremental, V2.2 is “Write-Ahead-Log” work for V2.2 started in March unfortunately, introducing a WAL is akin to open heart surgery, → essentially need to reengineer the database engine do not have the capacity to assign 10 developers to the job 6
  • 8. The old setup New data: { name: "watch", price: 99 } Data files: Collection: products Collection: sales (append only) (append only) For transactions: Locks and commit markers on all collections necessary. 7
  • 9. The new setup "Collector" (later) Collection: sales Collection: products (append only) (append only) Data files: New data: Write Ahead Log (WAL) (append only) { name: "watch", price: 99 } For transactions: Less locks and commit markers only in WAL. 8
  • 10. Advantages of a WAL have a single history of events have a single place to note the commit of a transaction easy asynchronous replication efficient sync to disk uncommitted stuff does not hit the data files at all better support for transactions → much higher performance better crash recovery deterministic, well defined behaviour 9
  • 11. Challenges need fundamental change in the storage engine the collector changes stuff that is potentially being read need to be careful not to create a bottleneck need to get locking right crashes hard to test if possible, users must not notice the change (except better performance) 10
  • 12. Our testing setup We do continuous tests after every push to github and nightly. We have separate test suites for single server and cluster. Different types of test for good coverage: low-level C++ library unit tests (10000 LOC C++) JS tests (separately on server and JS shell, 290000 LOC) AQL query engine (230000 LOC of the above) HTTP interface (TCP and SSL, 24000 LOC Ruby) dump/restore and bulk import benchmarks (3000 LOC C++) user interface (phantomjs, comparing screen shots) run tests with valgrind check coverage 11
  • 13. Test methodologies Tests can have different aims: Unit tests: ensure that individual components work according to specifications Integration tests: ensure that multiple components work together correctly Benchmark tests: ensure performance End to End tests: ensure that complex systems as a whole do their job User interface tests: ensure that the user interface works and behaves as documented/specified 12
  • 14. Test characteristics needed for our task Characteristics of our task well-defined, deterministic behaviour if possible, no observable change in functionality changes relatively far down in the software stack but reach wide temporary breakage expected and accepted =⇒ For our task, we needed: unit tests, integration tests and benchmarks. Fortunately, we had all these in place! 13
  • 15. Approach — preparation phase 1. design work: WAL, markers, collection, compaction, where lock what 2. implement infrastructure for WAL: mmapped files, append op, marker format 3. add “write to WAL” to write operations 4. test filling of WAL 14
  • 16. Approach — breaking phase 4. remove old write operations 5. implement collector thread and adjust compactor thread 6. repair by adjusting read operations for WAL/data file 15
  • 17. Approach — repairing phase 7. fix transaction management 8. implement startup with non-empty WAL: crash recovery 9. fix/simplify replication 10. fix dump/restore 11. fix cluster 12. tune performance (legends) 13. Hurray, tests work again! 14. release beta version 15. fix, fix, fix and tune 16. publish V2.2 16