SlideShare a Scribd company logo
Containerized DBs
In a Machine Data Environment
(or how you get the most out of your containerized database)
DevOps Gathering, 24th March 2017
@claus__m
About
~2yrs at Crate.io
DevRel/Field Engineering/Support/
Integrations/…
Crate.io
Founded in 2013, ~25 people and growing
Offices
San Francisco, Berlin, Dornbirn (AT)
Talk to me about
Rust, Raspberry Pis, Tech!
@claus__m
Machine Data
@claus__m
Source: HPE Jun 2016
http://www.slideshare.net/penumuru/harness-the-power-of-big-data-with-oracle-63438438/9
@claus__m
Machine Data
Characteristics
Millions of data points/second
Streaming in from sensors, devices, logs, etc.
Data diversity
Structured & unstructured JSON, Blobs
Real-time query performance
Monitoring & alerting
Complex queries of big data volumes
With Terabytes of historic data
Growth
Adding sources often means exponential
growth @claus__m
Machine Data
Internet of Things
Sensors, cameras, ...
Wearables, Gadgets
Location data, interaction data, ...
Logs & Monitoring data
Component health monitoring, access logs, ...
Industry 4.0, Digitization
Production line insights, automation, ...
Vehicles
Location data, health data, ...
@claus__m
Clickdrive.io
Fleet management & vehicle tracking
Vehicle health and tracking data
High ingest rate
2,000 data points per car, per second
In-depth & real-time analysis
Predictive maintenance, accident
reconstruction, route/driver efficiency
@claus__m
Roomonitor
Smart apartments
Monitoring & control climate, occupancy, noise,
access
Better efficiency, safer environment
Alerts: AC/heating on with window open, noisy
neighbors, ...
@claus__m
Skyhigh Networks
Cloud access security broker (CASB)
Access logging for cloud services
Large data volumes & ingest
Billions of events per day from 600+
customers, 10s of thousands of concurrent
TCP connections
Machine data is the fingerprint of fraud
Unsupervised learning to find anomalies
@claus__m
Architecture
For Machine Data
@claus__m
Microservices
Containers
Isolation by default
Flexibility
Building blocks
Horizontally scalable
Mostly
Stateful containers
Databases?
@claus__m
An Open Stack
Example
@claus__m
Consumer
Visualizer
An Example
Sensor
Sensor
Produces data
Consumer
Receives and enriches data
Visualizer
Draws stuff
@claus__m
LOAD BALANCER
V
C
Deploy!
S
U
S
S
C
V
C
V
Load balancer
For TLS, reverse proxying, load
balancing
High availability
3 instances
A few sensors
One user to actually use it
@claus__m
Go Live
More users!
More sensors and users
Data storage
Slow and fast
Monitoring & Analytics
Two different subsystems
LOAD BALANCER
V
C
S
S
U
S S
U
NoSQL DBMessage
Queue
SQL DB
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
@claus__m
But ...
Even more users?
Horizontal scaling?
Maintenance & bug hunting?
Mostly via scheduled downtimes
Reporting?
Kafka? Elasticsearch?
Security?
Access control?
Expertise?
Knowledge transfer?
LOAD BALANCER
V
C
S
U
S S
U
NoSQL DBMessage
Queue
SQL DB
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
S
@claus__m
Another DB?
Yay!
@claus__m
Yay!
…
which one
though?
@claus__m
CrateDB
github.com/crate/crate
hub.docker.com/r/_/crate
@claus__m
Yay!
@claus__m
CrateDB
Shared nothing
Partitioning & auto-sharding
Replication
(Almost) Zero config
Multi model: Structured &
unstructured
SQL
@claus__m
CrateDB Fundamentals
Disk-based index with
in-memory caching
Fast and efficient OS caching
Shards: Units of data
Concurrency by distributing
shards
Distributed query execution
engine
“Push down” queries
@claus__m
Postgres Wire Protocol
ANTLR4 Parser
Distributed Query Planner
Query Execution Engine
Elasticsearch
Lucene
CLIENT
@claus__m
A better
setup!
Horizontal scalability
Scale out everything
Reduced tech stack
Get to know it quicker
Live reporting
Use ad-hoc
queries on
production data
Flexibility
Schema
Evolution not
required @claus__m
LOAD BALANCER
V
C
S
S
U
S S
U
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
A better
setup!
No single point of failure
As highly available as your service
Reduced network traffic
Better reliability
No queue
Work with
real data
DB isolation
Accessible only
from the host
@claus__m
LOAD BALANCER
V
C
S
S
U
S S
U
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
Live Demo
Docker Swarm
Orchestration across platforms
Eden Server (Rust!)
RESTful web service
Eden Client (Rust!)
ARM application for reading
temperature data from BMP180
Grafana
To draw up a nice dashboard
@claus__m
LOAD BALANCER
G
E
ME
Pi
E E
Demo Time!
@claus__m
An Open Stack
for Machine Data w/ CrateDB
Ad-hoc analysis with SQL
Instant reporting on production
data
Integrates well
Legacy SQL applications included
Horizontally scalable
Container native, highly
availability
@claus__m
Thanks!
Links
https://github.com/celaus
https://github.com/crate
https://hub.docker.com/r/_/crate
https://crate.io
Follow us on twitter
@crateio @claus__m

More Related Content

PDF
Open Machine Data Analysis Stack with Docker, CrateDB, and Grafana @Chadev+Lunch
PDF
Sensordaten analysieren mit Docker, CrateDB und Grafana
PDF
Containerized DBs in a Machine Data environment with Crate.io
PDF
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
PDF
Big and fast a quest for relevant and real-time analytics
PDF
Big Data Landscape 2016
PDF
Big Data Landscape 2019
PDF
Big Data Landscape 2019
Open Machine Data Analysis Stack with Docker, CrateDB, and Grafana @Chadev+Lunch
Sensordaten analysieren mit Docker, CrateDB und Grafana
Containerized DBs in a Machine Data environment with Crate.io
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
Big and fast a quest for relevant and real-time analytics
Big Data Landscape 2016
Big Data Landscape 2019
Big Data Landscape 2019

What's hot (18)

PDF
The Walking Data
PDF
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PPTX
Bdu -stream_processing_with_smack_final
PPTX
Lambda Architecture The Hive
PDF
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
PDF
Get Started with CrateDB: Sensor Data
PPTX
Trivento summercamp fast data 9/9/2016
PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Hasura 2.0 Webinar
PPTX
Real timefrauddetectiononbigdata
PDF
A Gentle Introduction to Big Data
PDF
How to visualize Cosmos DB graph data
PDF
Yo. big data. understanding data science in the era of big data.
PDF
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
PDF
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
PDF
Treasure Data From MySQL to Redshift
PDF
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
The Walking Data
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Bdu -stream_processing_with_smack_final
Lambda Architecture The Hive
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Get Started with CrateDB: Sensor Data
Trivento summercamp fast data 9/9/2016
Trivento summercamp masterclass 9/9/2016
Hasura 2.0 Webinar
Real timefrauddetectiononbigdata
A Gentle Introduction to Big Data
How to visualize Cosmos DB graph data
Yo. big data. understanding data science in the era of big data.
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Treasure Data From MySQL to Redshift
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Ad

Viewers also liked (17)

PPTX
Google Chrome Extensions (1)
PDF
Your Pay Cheque has nothing to do with your retirement savings
PDF
Aristotle's Storytelling Framework for the Web
PPT
Indonesia After 2014 by Greg Barton
PPTX
Relacion de la economia con otras ciencias
PDF
Ponencia Seguridad de Datos
PPTX
Que agua estas tomando?
PPTX
World Trade Center Quad
ODP
The Essential Perl Hacker's Toolkit
PDF
Amur river
PDF
Portfolio pvc
PPTX
Blogging, tweeting, sharing your work to reach policy makers
PDF
College to Confidence
PPTX
SharePoint meetup Speaking Deck - Knowing the formula
PPTX
Austin Journal of Clinical Immunology
PPTX
Javascript part1
PPTX
Project Irrigation
Google Chrome Extensions (1)
Your Pay Cheque has nothing to do with your retirement savings
Aristotle's Storytelling Framework for the Web
Indonesia After 2014 by Greg Barton
Relacion de la economia con otras ciencias
Ponencia Seguridad de Datos
Que agua estas tomando?
World Trade Center Quad
The Essential Perl Hacker's Toolkit
Amur river
Portfolio pvc
Blogging, tweeting, sharing your work to reach policy makers
College to Confidence
SharePoint meetup Speaking Deck - Knowing the formula
Austin Journal of Clinical Immunology
Javascript part1
Project Irrigation
Ad

Similar to Getting the most out of your containerized database (20)

PDF
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
PPTX
Webinar: Q&A on Globus Subscription Features
PDF
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
PDF
Science cloud foster june 2013
PPTX
Science as a Service: How On-Demand Computing can Accelerate Discovery
PDF
Big data solutions for advanced marketing analytics
PDF
CrateDB 101: Sensor data
PDF
Architecting Data Lakes on AWS
PPTX
Predictive maintenance withsensors_in_utilities_
PDF
Introduction to Globus - XSEDE14 Tutorial
PPTX
Sqrrl and Accumulo
PDF
Modeling data and best practices for the Azure Cosmos DB.
PDF
Azure Cosmos DB - Technical Deep Dive
PPTX
Big Data on Azure Tutorial
PPTX
O2 060814
PPTX
Need for Time series Database
PPTX
Machine Learning and Hadoop
PPTX
Databricks Platform.pptx
PDF
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
Webinar: Q&A on Globus Subscription Features
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Science cloud foster june 2013
Science as a Service: How On-Demand Computing can Accelerate Discovery
Big data solutions for advanced marketing analytics
CrateDB 101: Sensor data
Architecting Data Lakes on AWS
Predictive maintenance withsensors_in_utilities_
Introduction to Globus - XSEDE14 Tutorial
Sqrrl and Accumulo
Modeling data and best practices for the Azure Cosmos DB.
Azure Cosmos DB - Technical Deep Dive
Big Data on Azure Tutorial
O2 060814
Need for Time series Database
Machine Learning and Hadoop
Databricks Platform.pptx
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
sap open course for s4hana steps from ECC to s4
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MYSQL Presentation for SQL database connectivity
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf

Getting the most out of your containerized database