SlideShare a Scribd company logo
How to choose
a database
Vsevolod Solovyov
Why bother
• It's the most fundamental thing about the
project
• Even programming languages are switched
more often
• With wrong choice you'll suffer. But why?
Different cases
• I certainly know
what I need
• I'm not sure,
something will do
• This tech is cool!
• Serious business:
risk/benefit
management
• Small project:

??
• Pet project:

fun/learning
management
I know what I need
• Huge time series DB
• Cross-datacenter replication
• Petabytes of data
• ...
• This talk is not for you
This tech is COOL
• Pet projects
• Experiments
• Beware otherwise
Consider this
• Data correctness (ACID, enforced schema)
• Easy data modeling
• Operational complexity
• Migrations
• Scaling
• Project use cases — no one knows them
yet!
Data correctness
• Silently losing data is not much fun
• Schema-less is a lie
• Heterogeneous data is hard to analyze,
change, display
• Especially true in data-heavy projects
Data modeling
• In document-oriented DB (e.g. MongoDB)
we need to specially craft "tables"
according to anticipated queries.

And re-craft them when queries change!
• Much easier in RDBMS, just dump it in
adequate tables and slap some indexes
• Datomic is best here: 

Entity-Attribute-Value
Migrations
• Often overlooked part
• Keep them in repository!
• Transactional DDL
• Developing migrations in REPL
• Downgrade migrations are useless
Migration tools
• Native migrations (SQL, CQL, Datalog, etc)
are best
• Don't do auto-migrations ever
• Don't use tools that give you auto-
migrations
• Use something like nomad, migrate
Effort
Migration complexity
ORM migration tools Native (SQL, etc)
Tools learning curve
Complex migrations
• Create new column/table/database
• Read from the old place and write to both
• Migrate all old data to the new place
• Read from both places and compare
• Clean up old code and data
Performance
Data model matters here
Read documentation!
Low-level details
• Pages
• Rows, columns, column families
• MVCC
• TOAST
• Index types: B-Tree, hash, GIN, GiST, BRIN
• Sharding hash/range, sharding key
Know your tools!
• Log slow queries (pg_stat, slowlog, etc)
• EXPLAIN
• EXPLAIN (ANALYZE,VERBOSE, BUFFERS)
• htop, iotop, perf, etc
ORMs considered
harmful
• They provoke massive data over-fetch
• Easy-to-miss 1+N queries
• Hard to refactor and move parts of data to
other DBs
• Very leaky abstraction
Over-fetch
clustering = Clustering.query.get(33)
depth2cid = defaultdict(list)
for cl in clustering.clusters:
depth2cid[cl.level].append(cl.id)
Over-fetch
clustering = Clustering.query.get(33)
depth2cid = defaultdict(list)
for cl in clustering.clusters:
depth2cid[cl.level].append(cl.id)
11.4 Gb RAM
50 seconds
Proper-fetch
query = (
db.session
.query(Cluster.id, Cluster.level)
.filter(Cluster.clustering_id == 33))
depth2cid = defaultdict(list)
for cid, level in query:
depth2cid[level].append(cid)
54 Mb RAM
1 second
Properer-fetch
7 Mb RAM
1.2 seconds
query = (
db.session
.query(Cluster.id, Cluster.level)
.filter(Cluster.clustering_id == 33)
.yield_per(1000))
depth2cid = defaultdict(list)
for cid, level in query:
depth2cid[level].append(cid)
Scale
Distributed FUD
• http://jepsen.io/
• CORDS: Redundancy does not imply fault
tolerance - the morning paper
• Do you need it really? RAM is plentiful
Buy bigger server
Scale
Buy bigger
server
Scale
Well...
By that time you will
know what you need
Horizontally scalable
• Citus
• VoltDB
• Cassandra
• CockroachDB
• ...
Pull out data bit-by-bit
A
AAvailability
Availability
• Services mostly die from other problems
• Untested "available" DB can be a problem
• Properly available system (CAP-available) is
a pain and resource sink
NoDB
• Images
• Machine learning models
• ...
• File system, S3, B2, etc
At Cap'n Obvious
• Experiment at home
• Don't bring new DB in a big project just
because it's interesting
• Not sure? Postgres to the rescue

More Related Content

PPTX
Normalizing Data for Migrations
PPTX
NOSQL Databases for the .NET Developer
PPTX
Session 03 acquiring data
PPT
MongoDB - An Agile NoSQL Database
PPTX
Data Archiving and Sharing
PDF
Do It With SQL - Journey to the Center of Database Worlds
PDF
Performance comparison: Multi-Model vs. MongoDB and Neo4j
PDF
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Normalizing Data for Migrations
NOSQL Databases for the .NET Developer
Session 03 acquiring data
MongoDB - An Agile NoSQL Database
Data Archiving and Sharing
Do It With SQL - Journey to the Center of Database Worlds
Performance comparison: Multi-Model vs. MongoDB and Neo4j
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...

Similar to How to choose a database (20)

PPTX
Data Modeling for NoSQL
PPTX
Demystifying data engineering
PDF
FP Days: Down the Clojure Rabbit Hole
PPTX
Data modeling trends for analytics
KEY
Make Life Suck Less (Building Scalable Systems)
KEY
Lag Sucks! GDC 2012
PPTX
Build a modern data platform.pptx
PPTX
02-Lifecycle.pptx
PDF
Stig: Social Graphs & Discovery at Scale
PPTX
Introduction to NoSQL
PPTX
cours database pour etudiant NoSQL (1).pptx
PDF
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
PDF
Intro to Big Data
PPTX
Big Data (NJ SQL Server User Group)
PPTX
Scaling etl with hadoop shapira 3
PPTX
Introduction to Data Science NoSQL.pptx
PPT
Large scale computing
PPTX
NoSQL.pptx
PDF
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PPTX
Sharing a Startup’s Big Data Lessons
Data Modeling for NoSQL
Demystifying data engineering
FP Days: Down the Clojure Rabbit Hole
Data modeling trends for analytics
Make Life Suck Less (Building Scalable Systems)
Lag Sucks! GDC 2012
Build a modern data platform.pptx
02-Lifecycle.pptx
Stig: Social Graphs & Discovery at Scale
Introduction to NoSQL
cours database pour etudiant NoSQL (1).pptx
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
Intro to Big Data
Big Data (NJ SQL Server User Group)
Scaling etl with hadoop shapira 3
Introduction to Data Science NoSQL.pptx
Large scale computing
NoSQL.pptx
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Sharing a Startup’s Big Data Lessons
Ad

More from Vsevolod Solovyov (6)

PDF
Data science: з печі до столу
PDF
How to debug
PDF
Data science from the trenches
PDF
Будни data science/NLP стартапа
PDF
Introduction to information retrieval
PDF
Apache Kafka and stream processing peculiarities [ru]
Data science: з печі до столу
How to debug
Data science from the trenches
Будни data science/NLP стартапа
Introduction to information retrieval
Apache Kafka and stream processing peculiarities [ru]
Ad

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
annual-report-2024-2025 original latest.
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Introduction to the R Programming Language
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Clinical guidelines as a resource for EBP(1).pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
annual-report-2024-2025 original latest.
.pdf is not working space design for the following data for the following dat...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
[EN] Industrial Machine Downtime Prediction
Qualitative Qantitative and Mixed Methods.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Quality review (1)_presentation of this 21
Introduction to the R Programming Language
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ISS -ESG Data flows What is ESG and HowHow
SAP 2 completion done . PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

How to choose a database