SlideShare a Scribd company logo
Druid : Sub-Second OLAP
queries over Petabytes of
Streaming Data
Nishant Bangarwa
Hortonworks
Druid Committer, PMC
Superset Incubator PPMC
June 2017
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
Agenda
History and Motivation
Introduction
Demo
Druid Architecture – Indexing and Querying Data
Druid In Production
Recent Improvements
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
HISTORY AND MOTIVATION
 Druid Open sourced in late 2012
 Initial Use case
 Power ad-tech analytics product
 Requirements
 Arbitrary queries
 Scalability : trillions of events/day
 Interactive : low latency queries
 Real-time : data freshness
 High Availability
 Rolling Upgrades
© Hortonworks Inc. 2011 – 2016. All Rights Reserved4
MOTIVATION
 Business Intelligence Queries
 Arbitrary slicing and dicing of data
 Interactive real time visualizations on
Complex data streams
 Answer BI questions
– How many unique male visitors visited my
website last month ?
– How many products were sold last quarter
broken down by a demographic and product
category ?
 Not interested in dumping entire dataset
© Hortonworks Inc. 2011 – 2016. All Rights Reserved5
Introdution
© Hortonworks Inc. 2011 – 2016. All Rights Reserved6
What is Druid ?
 Column-oriented distributed datastore
 Sub-Second query times
 Realtime streaming ingestion
 Arbitrary slicing and dicing of data
 Automatic Data Summarization
 Approximate algorithms (hyperLogLog, theta)
 Scalable to petabytes of data
 Highly available
© Hortonworks Inc. 2011 – 2016. All Rights Reserved7
Demo
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo: Wikipedia Real-Time Dashboard (Accelerated 30x)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Wikipedia Real-Time Dashboard: How it Works
Wikipedia
Edits Data
Stream
Exactly-Once
Ingestion
Write
Read
Java Stream
Reader
© Hortonworks Inc. 2011 – 2016. All Rights Reserved10
Druid Architecture
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realtime
Nodes
Historical
Nodes
11
Druid Architecture
Batch Data
Event
Historical
Nodes
Broker
Nodes
Realtime
Index Tasks
Streaming
Data
Historical
Nodes
Handoff
© Hortonworks Inc. 2011 – 2016. All Rights Reserved12
Druid Architecture
Batch Data
Queries
Metadata
Store
Coordinator
Nodes
Zookeepe
r
Historical
Nodes
Broker
Nodes
Realtime
Index Tasks
Streaming
Data
Handoff
Optional Distributed
Query Cache
© Hortonworks Inc. 2011 – 2016. All Rights Reserved13
Indexing Data
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Indexing Service
 Indexing is performed by
 Overlord
 Middle Managers
 Peons
 Middle Managers spawn peons which runs ingestion
tasks
 Each peon runs 1 task
 Task definition defines which task to run and its
properties
© Hortonworks Inc. 2011 – 2016. All Rights Reserved15
Streaming Ingestion : Realtime Index Tasks
 Ability to ingest streams of data
 Stores data in write-optimized structure
 Periodically converts write-optimized structure
to read-optimized segments
 Event query-able as soon as it is ingested
 Both push and pull based ingestion
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Ingestion : Tranquility
 Helper library for coordinating
streaming ingestion
 Simple API to send events to
druid
 Transparently Manages
 Realtime index Task Creation
 Partitioning and Replication
 Schema Evolution
 Can be used with Flink, Samza,
Spark, Storm any other ETL
framework
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Indexing Service (experimental)
 Supports Exactly once ingestion
 Messages pulled by Kafka Index Tasks
 Each Kafka Index Task consumes from a set of
partitions with start and end offset
 Each message verified to ensure sequence
 Kafka Offsets and corresponding segments
persisted in same metadata transaction
atomically
 Kafka Supervisor
 embedded inside overlord
 Manages kafka index tasks
 Retry failed tasks
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Batch Ingestion
 HadoopIndexTask
 Peon launches Hadoop MR job
 Mappers read data
 Reducers create Druid segment files
 Index Task
 Runs in single JVM i.e peon
 Suitable for data sizes(<1G)
 Integrations with Apache HIVE and Spark for Batch Ingestion
© Hortonworks Inc. 2011 – 2016. All Rights Reserved19
Querying Data
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Querying Data from Druid
 Druid supports
 JSON Queries over HTTP
 In built SQL (experimental)
 Querying libraries available for
 Python
 R
 Ruby
 Javascript
 Clojure
 PHP
© Hortonworks Inc. 2011 – 2016. All Rights Reserved21
JSON Over HTTP
 HTTP Rest API
 Queries and results expressed in JSON
 Multiple Query Types
 Time Boundary
 Timeseries
 TopN
 GroupBy
 Select
 Segment Metadata
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
In built SQL (experimental)
 Apache Calcite based parser and planner
 Ability to connect druid to any BI tool that supports JDBC
 SQL via JSON over HTTP
 Supports Approximate queries
 APPROX_COUNT_DISTINCT(col)
 Ability to do Fast Approx TopN queries
 APPROX_QUANTILE(column, probability)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integrated with multiple Open Source UI tools
 Superset –
 Developed at AirBnb
 In Apache Incubation since May 2017
 Grafana – Druid plugin (https://github.com/grafana-druid-plugin/druidplugin)
 Metabase
 With in-built SQL connect with any BI tool supporting JDBC
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Superset
 Python backend
 Flask app builder
 Authentication
 Pandas for rich analytics
 SqlAlchemy for SQL toolkit
 Javascript frontend
 React, NVD3
 Deep integration with Druid
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Superset Rich Dashboarding Capabilities: Treemaps
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Superset Rich Dashboarding Capabilities: Sunburst
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Superset UI Provides Powerful Visualizations
Rich library of dashboard visualizations:
Basic:
• Bar Charts
• Pie Charts
• Line Charts
Advanced:
• Sankey Diagrams
• Treemaps
• Sunburst
• Heatmaps
And More!
© Hortonworks Inc. 2011 – 2016. All Rights Reserved28
Druid in Production
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Production readiness
 Is Druid suitable for my Use case ?
 Will Druid meet my performance requirements at scale ?
 How complex is it to Operate and Manage Druid cluster ?
 How to monitor a Druid cluster ?
 High Availability ?
 How to upgrade Druid cluster without downtime ?
 Security ?
 Extensibility for future Use cases ?
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Suitable Use Cases
 Powering Interactive user facing applications
 Arbitrary slicing and dicing of large datasets
 User behavior analysis
 measuring distinct counts
 retention analysis
 funnel analysis
 A/B testing
 Exploratory analytics/root cause analysis
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance and Scalability : Fast Facts
Most Events per Day
300 Billion Events / Day
(Metamarkets)
Most Computed Metrics
1 Billion Metrics / Min
(Jolata)
Largest Cluster
200 Nodes
(Metamarkets)
Largest Hourly Ingestion
2TB per Hour
(Netflix)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved32
Performance
 Query Latency
– average - 500ms
– 90%ile < 1sec
– 95%ile < 5sec
– 99%ile < 10 sec
 Query Volume
– 1000s queries per minute
 Benchmarking code
 https://github.com/druid-
io/druid-benchmark
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance : Approximate Algorithms
 Ability to Store Approximate Data Sketches for high cardinality columns e.g userid
 Reduced storage size
 Use Cases
 Fast approximate distinct counts
 Approximate Top-K queries
 Approximate histograms
 Funnel/retention analysis
 Limitation
 Not possible to do exact counts
 filter on individual row values
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplified Druid Cluster Management with Ambari
 Install, configure and manage Druid and all external dependencies from Ambari
 Easy to enable HA, Security, Monitoring …
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplified Druid Cluster Management with Ambari
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring a Druid Cluster
 Each Druid Node emits metrics for
 Query performance
 Ingestion Rate
 JVM Health
 Query Cache performance
 System health
 Emitted as JSON objects to a runtime log file or over HTTP to other services
 Emitters available for Ambari Metrics Server, Graphite, StatsD, Kafka
 Easy to implement your own metrics emitter
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring using Ambari Metrics Server
 HDP 2.6.1 contains pre-defined grafana dashboards
 Health of Druid Nodes
 Ingestion
 Query performance
 Easy to create new dashboards and setup alerts
 Auto configured when both Druid and Ambari Metrics Server are installed
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring using Ambari Metrics Server
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring using Ambari Metrics Server
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
High Availability
 Deploy Coordinator/Overlord on multiple instances
 Leader election in zookeeper
 Broker – install multiple brokers
 Use druid Router/ Any Load balancer to route queries to brokers
 Realtime Index Tasks – create redundant tasks.
 Historical Nodes – create load rule with replication factor >= 2 (default = 2)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved41
Rolling Upgrades
 Maintain backwards compatibility
 Data redundancy
 Shared Nothing Architecture
 Rolling upgrades
 No Downtime
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security
 Supports Authentication via Kerberos/ SPNEGO
 Easy Wizard based kerberos security enablement via Ambari
Druid
Druid
KDC server
User
Browser1 kinit user
2 Token
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extending Core Druid
 Plugin Based Architecture
 leverage Guice in order to load extensions at runtime
 Possible to add extension to
 Add a new deep storage implementation
 Add a new Firehose
 Add Aggregators
 Add Complex metrics
 Add new Query types
 Add new Jersey resources
 Bundle your extension with all the other Druid extensions
© Hortonworks Inc. 2011 – 2016. All Rights Reserved44
Companies Using Druid
© Hortonworks Inc. 2011 – 2016. All Rights Reserved45
Recent Improvements
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid 0.10.0
 Kafka Indexing Service – Exactly once Ingestion
 Built in SQL support (cli, http, jdbc)
 Numeric Dimensions
 Kerberos Authentication
 Performance improvements
 Optimized large amounts of and/or with concise bitmaps
 Index-based regex simple filters like ‘foo%’
 ~30% improvement on non-time groupBys
 Apache Hive Integration – Supports full SQL, Large Joins, Batch Indexing
 Apache Ambari Integration – Easy deployments and Cluster management
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Work
 Improved schema definition & management
 Improvements to Hive/Druid integration
 Materialized Views, Push down more filters, support complex columns etc…
 Performance improvements
 Select query performance improvements
 Jit-friendly topN queries
 Security enhancements
 Row/Column level security
 Integration with Apache Ranger
 And much more……
© Hortonworks Inc. 2011 – 2016. All Rights Reserved48
Community
 User google group - druid-user@googlegroups.com
 Dev google group - druid-dev@googlegroups.com
 Github - druid-io/druid
 IRC - #druid-dev on irc.freenode.net
© Hortonworks Inc. 2011 – 2016. All Rights Reserved49
Summary
 Easy installation and management via Ambari
 Real-time
– Ingestion latency < seconds.
– Query latency < seconds.
 Arbitrary slice and dice big data like ninja
– No more pre-canned drill downs.
– Query with more fine-grained granularity.
 High availability and Rolling deployment capabilities
 Secure and Production ready
 Vibrant and Active community
 Available as Tech Preview in HDP 2.6.1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved50
Thank you ! Questions ?
 Twitter - @NishantBangarwa
 Email - nbangarwa@hortonworks.com
 Linkedin - https://www.linkedin.com/in/nishant-bangarwa
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atscale + Hive + Druid
 Leverage Atscale cubing
capabilities
 store aggregate tables in
Druid
 Updatable dimensions in
HIVE
© Hortonworks Inc. 2011 – 2016. All Rights Reserved52
Storage Format
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Druid: Segments
 Data in Druid is stored in Segment Files.
 Partitioned by time
 Ideally, segment files are each smaller than 1GB.
 If files are large, smaller time partitions are needed.
Time
Segment 1:
Monday
Segment 2:
Tuesday
Segment 3:
Wednesday
Segment 4:
Thursday
Segment 5_2:
Friday
Segment 5_1:
Friday
© Hortonworks Inc. 2011 – 2016. All Rights Reserved54
Example Wikipedia Edit Dataset
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
Timestamp Dimensions Metrics
© Hortonworks Inc. 2011 – 2016. All Rights Reserved55
Data Rollup
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
timestamp page language city country count sum_added sum_deleted min_added max_added ….
2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 32
2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 43
2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 12
Rollup by hour
© Hortonworks Inc. 2011 – 2016. All Rights Reserved56
Dictionary Encoding
 Create and store Ids for each value
 e.g. page column
 Values - Justin Bieber, Ke$ha, Selena Gomes
 Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2
 Column Data - [0 0 0 1 1 2]
 city column - [0 0 0 1 1 1]
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
© Hortonworks Inc. 2011 – 2016. All Rights Reserved57
Bitmap Indices
 Store Bitmap Indices for each value
 Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0]
 Ke$ha -> [3, 4] -> [0 0 0 1 1 0]
 Selena Gomes -> [5] -> [0 0 0 0 0 1]
 Queries
 Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0]
 language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1]
 Indexes compressed with Concise or Roaring encoding
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:01:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:01:35Z Ke$ha en Calgary CA 43 99
2011-01-01T00:01:35Z Selena Gomes en Calgary CA 12 53
© Hortonworks Inc. 2011 – 2016. All Rights Reserved58
Approximate Sketch Columns
timestamp page userid language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber user1111111 en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber user1111111 en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber user2222222 en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha user3333333 en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha user4444444 en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes user1111111 en Calgary CA 12 53
timestamp page language city country count sum_added sum_delete
d
min_added Userid_sket
ch
….
2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 {sketch}
2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 {sketch}
2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 {sketch}
Rollup by hour
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Approximate Sketch Columns
 Better rollup for high cardinality columns e.g userid
 Reduced storage size
 Use Cases
 Fast approximate distinct counts
 Approximate histograms
 Funnel/retention analysis
 Limitation
 Not possible to do exact counts
 filter on individual row values

More Related Content

PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PDF
Hive Bucketing in Apache Spark with Tejas Patil
PDF
Kafka internals
PDF
Parquet performance tuning: the missing guide
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
The Parquet Format and Performance Optimization Opportunities
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Hive + Tez: A Performance Deep Dive
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Hive Bucketing in Apache Spark with Tejas Patil
Kafka internals
Parquet performance tuning: the missing guide
Apache Tez - A New Chapter in Hadoop Data Processing

What's hot (20)

PPTX
Rds data lake @ Robinhood
PDF
Iceberg: a fast table format for S3
PDF
How We Optimize Spark SQL Jobs With parallel and sync IO
PPTX
Druid deep dive
PDF
Dataflow with Apache NiFi
PPTX
Apache Spark Architecture
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
PPTX
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
PPTX
Frame - Feature Management for Productive Machine Learning
PDF
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PPTX
Kafka Retry and DLQ
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PDF
Spark shuffle introduction
PDF
Apache HBase Improvements and Practices at Xiaomi
PDF
Hadoop Strata Talk - Uber, your hadoop has arrived
PPTX
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
PPTX
Tuning and Debugging in Apache Spark
Rds data lake @ Robinhood
Iceberg: a fast table format for S3
How We Optimize Spark SQL Jobs With parallel and sync IO
Druid deep dive
Dataflow with Apache NiFi
Apache Spark Architecture
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Frame - Feature Management for Productive Machine Learning
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Apache Iceberg - A Table Format for Hige Analytic Datasets
Kafka Retry and DLQ
Presto Summit 2018 - 09 - Netflix Iceberg
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Spark shuffle introduction
Apache HBase Improvements and Practices at Xiaomi
Hadoop Strata Talk - Uber, your hadoop has arrived
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Tuning and Debugging in Apache Spark
Ad

Viewers also liked (7)

PDF
Druid at SF Big Analytics 2015-12-01
PDF
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
KEY
Large scale ETL with Hadoop
PDF
Hadoop Family and Ecosystem
PDF
OLAP options on Hadoop
PPTX
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
PPTX
Scalable Real-time analytics using Druid
Druid at SF Big Analytics 2015-12-01
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Large scale ETL with Hadoop
Hadoop Family and Ecosystem
OLAP options on Hadoop
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Scalable Real-time analytics using Druid
Ad

Similar to Druid: Sub-Second OLAP queries over Petabytes of Streaming Data (20)

PPTX
Druid Scaling Realtime Analytics
PPTX
Time-series data analysis and persistence with Druid
PPTX
Druid at Hadoop Ecosystem
PDF
Premier Inside-Out: Apache Druid
PPTX
Scalable olap with druid
PDF
PDF
Aggregated queries with Druid on terrabytes and petabytes of data
PDF
Apache Druid 101
PPTX
Understanding apache-druid
PPTX
An Introduction to Druid
PDF
Druid: Under the Covers (Virtual Meetup)
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
PPTX
Interactive Analytics at Scale in Apache Hive Using Druid
PDF
Real-time analytics with Druid at Appsflyer
PPTX
Interactive Analytics at Scale in Apache Hive Using Druid
PDF
Hands-on with Apache Druid: Installation & Data Ingestion Steps
PPTX
Our journey with druid - from initial research to full production scale
PDF
Game Analytics at London Apache Druid Meetup
PDF
Fast analytics kudu to druid
PPTX
The of Operational Analytics Data Store
Druid Scaling Realtime Analytics
Time-series data analysis and persistence with Druid
Druid at Hadoop Ecosystem
Premier Inside-Out: Apache Druid
Scalable olap with druid
Aggregated queries with Druid on terrabytes and petabytes of data
Apache Druid 101
Understanding apache-druid
An Introduction to Druid
Druid: Under the Covers (Virtual Meetup)
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Interactive Analytics at Scale in Apache Hive Using Druid
Real-time analytics with Druid at Appsflyer
Interactive Analytics at Scale in Apache Hive Using Druid
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Our journey with druid - from initial research to full production scale
Game Analytics at London Apache Druid Meetup
Fast analytics kudu to druid
The of Operational Analytics Data Store

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
A Presentation on Artificial Intelligence
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Chapter 5: Probability Theory and Statistics
Hybrid model detection and classification of lung cancer
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
DP Operators-handbook-extract for the Mautical Institute
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A Presentation on Artificial Intelligence
cloud_computing_Infrastucture_as_cloud_p
WOOl fibre morphology and structure.pdf for textiles
TLE Review Electricity (Electricity).pptx
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Heart disease approach using modified random forest and particle swarm optimi...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative study of natural language inference in Swahili using monolingua...
Zenith AI: Advanced Artificial Intelligence
NewMind AI Weekly Chronicles - August'25-Week II
Chapter 5: Probability Theory and Statistics

Druid: Sub-Second OLAP queries over Petabytes of Streaming Data

  • 1. Druid : Sub-Second OLAP queries over Petabytes of Streaming Data Nishant Bangarwa Hortonworks Druid Committer, PMC Superset Incubator PPMC June 2017
  • 2. © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 Agenda History and Motivation Introduction Demo Druid Architecture – Indexing and Querying Data Druid In Production Recent Improvements
  • 3. © Hortonworks Inc. 2011 – 2016. All Rights Reserved HISTORY AND MOTIVATION  Druid Open sourced in late 2012  Initial Use case  Power ad-tech analytics product  Requirements  Arbitrary queries  Scalability : trillions of events/day  Interactive : low latency queries  Real-time : data freshness  High Availability  Rolling Upgrades
  • 4. © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 MOTIVATION  Business Intelligence Queries  Arbitrary slicing and dicing of data  Interactive real time visualizations on Complex data streams  Answer BI questions – How many unique male visitors visited my website last month ? – How many products were sold last quarter broken down by a demographic and product category ?  Not interested in dumping entire dataset
  • 5. © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 Introdution
  • 6. © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 What is Druid ?  Column-oriented distributed datastore  Sub-Second query times  Realtime streaming ingestion  Arbitrary slicing and dicing of data  Automatic Data Summarization  Approximate algorithms (hyperLogLog, theta)  Scalable to petabytes of data  Highly available
  • 7. © Hortonworks Inc. 2011 – 2016. All Rights Reserved7 Demo
  • 8. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo: Wikipedia Real-Time Dashboard (Accelerated 30x)
  • 9. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Wikipedia Real-Time Dashboard: How it Works Wikipedia Edits Data Stream Exactly-Once Ingestion Write Read Java Stream Reader
  • 10. © Hortonworks Inc. 2011 – 2016. All Rights Reserved10 Druid Architecture
  • 11. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Realtime Nodes Historical Nodes 11 Druid Architecture Batch Data Event Historical Nodes Broker Nodes Realtime Index Tasks Streaming Data Historical Nodes Handoff
  • 12. © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 Druid Architecture Batch Data Queries Metadata Store Coordinator Nodes Zookeepe r Historical Nodes Broker Nodes Realtime Index Tasks Streaming Data Handoff Optional Distributed Query Cache
  • 13. © Hortonworks Inc. 2011 – 2016. All Rights Reserved13 Indexing Data
  • 14. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Indexing Service  Indexing is performed by  Overlord  Middle Managers  Peons  Middle Managers spawn peons which runs ingestion tasks  Each peon runs 1 task  Task definition defines which task to run and its properties
  • 15. © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 Streaming Ingestion : Realtime Index Tasks  Ability to ingest streams of data  Stores data in write-optimized structure  Periodically converts write-optimized structure to read-optimized segments  Event query-able as soon as it is ingested  Both push and pull based ingestion
  • 16. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Ingestion : Tranquility  Helper library for coordinating streaming ingestion  Simple API to send events to druid  Transparently Manages  Realtime index Task Creation  Partitioning and Replication  Schema Evolution  Can be used with Flink, Samza, Spark, Storm any other ETL framework
  • 17. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka Indexing Service (experimental)  Supports Exactly once ingestion  Messages pulled by Kafka Index Tasks  Each Kafka Index Task consumes from a set of partitions with start and end offset  Each message verified to ensure sequence  Kafka Offsets and corresponding segments persisted in same metadata transaction atomically  Kafka Supervisor  embedded inside overlord  Manages kafka index tasks  Retry failed tasks
  • 18. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Batch Ingestion  HadoopIndexTask  Peon launches Hadoop MR job  Mappers read data  Reducers create Druid segment files  Index Task  Runs in single JVM i.e peon  Suitable for data sizes(<1G)  Integrations with Apache HIVE and Spark for Batch Ingestion
  • 19. © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 Querying Data
  • 20. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Querying Data from Druid  Druid supports  JSON Queries over HTTP  In built SQL (experimental)  Querying libraries available for  Python  R  Ruby  Javascript  Clojure  PHP
  • 21. © Hortonworks Inc. 2011 – 2016. All Rights Reserved21 JSON Over HTTP  HTTP Rest API  Queries and results expressed in JSON  Multiple Query Types  Time Boundary  Timeseries  TopN  GroupBy  Select  Segment Metadata
  • 22. © Hortonworks Inc. 2011 – 2016. All Rights Reserved In built SQL (experimental)  Apache Calcite based parser and planner  Ability to connect druid to any BI tool that supports JDBC  SQL via JSON over HTTP  Supports Approximate queries  APPROX_COUNT_DISTINCT(col)  Ability to do Fast Approx TopN queries  APPROX_QUANTILE(column, probability)
  • 23. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integrated with multiple Open Source UI tools  Superset –  Developed at AirBnb  In Apache Incubation since May 2017  Grafana – Druid plugin (https://github.com/grafana-druid-plugin/druidplugin)  Metabase  With in-built SQL connect with any BI tool supporting JDBC
  • 24. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Superset  Python backend  Flask app builder  Authentication  Pandas for rich analytics  SqlAlchemy for SQL toolkit  Javascript frontend  React, NVD3  Deep integration with Druid
  • 25. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Superset Rich Dashboarding Capabilities: Treemaps
  • 26. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Superset Rich Dashboarding Capabilities: Sunburst
  • 27. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Superset UI Provides Powerful Visualizations Rich library of dashboard visualizations: Basic: • Bar Charts • Pie Charts • Line Charts Advanced: • Sankey Diagrams • Treemaps • Sunburst • Heatmaps And More!
  • 28. © Hortonworks Inc. 2011 – 2016. All Rights Reserved28 Druid in Production
  • 29. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Production readiness  Is Druid suitable for my Use case ?  Will Druid meet my performance requirements at scale ?  How complex is it to Operate and Manage Druid cluster ?  How to monitor a Druid cluster ?  High Availability ?  How to upgrade Druid cluster without downtime ?  Security ?  Extensibility for future Use cases ?
  • 30. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Suitable Use Cases  Powering Interactive user facing applications  Arbitrary slicing and dicing of large datasets  User behavior analysis  measuring distinct counts  retention analysis  funnel analysis  A/B testing  Exploratory analytics/root cause analysis
  • 31. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Performance and Scalability : Fast Facts Most Events per Day 300 Billion Events / Day (Metamarkets) Most Computed Metrics 1 Billion Metrics / Min (Jolata) Largest Cluster 200 Nodes (Metamarkets) Largest Hourly Ingestion 2TB per Hour (Netflix)
  • 32. © Hortonworks Inc. 2011 – 2016. All Rights Reserved32 Performance  Query Latency – average - 500ms – 90%ile < 1sec – 95%ile < 5sec – 99%ile < 10 sec  Query Volume – 1000s queries per minute  Benchmarking code  https://github.com/druid- io/druid-benchmark
  • 33. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Performance : Approximate Algorithms  Ability to Store Approximate Data Sketches for high cardinality columns e.g userid  Reduced storage size  Use Cases  Fast approximate distinct counts  Approximate Top-K queries  Approximate histograms  Funnel/retention analysis  Limitation  Not possible to do exact counts  filter on individual row values
  • 34. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplified Druid Cluster Management with Ambari  Install, configure and manage Druid and all external dependencies from Ambari  Easy to enable HA, Security, Monitoring …
  • 35. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplified Druid Cluster Management with Ambari
  • 36. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitoring a Druid Cluster  Each Druid Node emits metrics for  Query performance  Ingestion Rate  JVM Health  Query Cache performance  System health  Emitted as JSON objects to a runtime log file or over HTTP to other services  Emitters available for Ambari Metrics Server, Graphite, StatsD, Kafka  Easy to implement your own metrics emitter
  • 37. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitoring using Ambari Metrics Server  HDP 2.6.1 contains pre-defined grafana dashboards  Health of Druid Nodes  Ingestion  Query performance  Easy to create new dashboards and setup alerts  Auto configured when both Druid and Ambari Metrics Server are installed
  • 38. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitoring using Ambari Metrics Server
  • 39. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitoring using Ambari Metrics Server
  • 40. © Hortonworks Inc. 2011 – 2016. All Rights Reserved High Availability  Deploy Coordinator/Overlord on multiple instances  Leader election in zookeeper  Broker – install multiple brokers  Use druid Router/ Any Load balancer to route queries to brokers  Realtime Index Tasks – create redundant tasks.  Historical Nodes – create load rule with replication factor >= 2 (default = 2)
  • 41. © Hortonworks Inc. 2011 – 2016. All Rights Reserved41 Rolling Upgrades  Maintain backwards compatibility  Data redundancy  Shared Nothing Architecture  Rolling upgrades  No Downtime 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
  • 42. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security  Supports Authentication via Kerberos/ SPNEGO  Easy Wizard based kerberos security enablement via Ambari Druid Druid KDC server User Browser1 kinit user 2 Token
  • 43. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extending Core Druid  Plugin Based Architecture  leverage Guice in order to load extensions at runtime  Possible to add extension to  Add a new deep storage implementation  Add a new Firehose  Add Aggregators  Add Complex metrics  Add new Query types  Add new Jersey resources  Bundle your extension with all the other Druid extensions
  • 44. © Hortonworks Inc. 2011 – 2016. All Rights Reserved44 Companies Using Druid
  • 45. © Hortonworks Inc. 2011 – 2016. All Rights Reserved45 Recent Improvements
  • 46. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Druid 0.10.0  Kafka Indexing Service – Exactly once Ingestion  Built in SQL support (cli, http, jdbc)  Numeric Dimensions  Kerberos Authentication  Performance improvements  Optimized large amounts of and/or with concise bitmaps  Index-based regex simple filters like ‘foo%’  ~30% improvement on non-time groupBys  Apache Hive Integration – Supports full SQL, Large Joins, Batch Indexing  Apache Ambari Integration – Easy deployments and Cluster management
  • 47. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Future Work  Improved schema definition & management  Improvements to Hive/Druid integration  Materialized Views, Push down more filters, support complex columns etc…  Performance improvements  Select query performance improvements  Jit-friendly topN queries  Security enhancements  Row/Column level security  Integration with Apache Ranger  And much more……
  • 48. © Hortonworks Inc. 2011 – 2016. All Rights Reserved48 Community  User google group - druid-user@googlegroups.com  Dev google group - druid-dev@googlegroups.com  Github - druid-io/druid  IRC - #druid-dev on irc.freenode.net
  • 49. © Hortonworks Inc. 2011 – 2016. All Rights Reserved49 Summary  Easy installation and management via Ambari  Real-time – Ingestion latency < seconds. – Query latency < seconds.  Arbitrary slice and dice big data like ninja – No more pre-canned drill downs. – Query with more fine-grained granularity.  High availability and Rolling deployment capabilities  Secure and Production ready  Vibrant and Active community  Available as Tech Preview in HDP 2.6.1
  • 50. © Hortonworks Inc. 2011 – 2016. All Rights Reserved50 Thank you ! Questions ?  Twitter - @NishantBangarwa  Email - nbangarwa@hortonworks.com  Linkedin - https://www.linkedin.com/in/nishant-bangarwa
  • 51. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atscale + Hive + Druid  Leverage Atscale cubing capabilities  store aggregate tables in Druid  Updatable dimensions in HIVE
  • 52. © Hortonworks Inc. 2011 – 2016. All Rights Reserved52 Storage Format
  • 53. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Druid: Segments  Data in Druid is stored in Segment Files.  Partitioned by time  Ideally, segment files are each smaller than 1GB.  If files are large, smaller time partitions are needed. Time Segment 1: Monday Segment 2: Tuesday Segment 3: Wednesday Segment 4: Thursday Segment 5_2: Friday Segment 5_1: Friday
  • 54. © Hortonworks Inc. 2011 – 2016. All Rights Reserved54 Example Wikipedia Edit Dataset timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53 Timestamp Dimensions Metrics
  • 55. © Hortonworks Inc. 2011 – 2016. All Rights Reserved55 Data Rollup timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53 timestamp page language city country count sum_added sum_deleted min_added max_added …. 2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 32 2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 43 2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 12 Rollup by hour
  • 56. © Hortonworks Inc. 2011 – 2016. All Rights Reserved56 Dictionary Encoding  Create and store Ids for each value  e.g. page column  Values - Justin Bieber, Ke$ha, Selena Gomes  Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2  Column Data - [0 0 0 1 1 2]  city column - [0 0 0 1 1 1] timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
  • 57. © Hortonworks Inc. 2011 – 2016. All Rights Reserved57 Bitmap Indices  Store Bitmap Indices for each value  Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0]  Ke$ha -> [3, 4] -> [0 0 0 1 1 0]  Selena Gomes -> [5] -> [0 0 0 0 0 1]  Queries  Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0]  language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1]  Indexes compressed with Concise or Roaring encoding timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:01:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:01:35Z Ke$ha en Calgary CA 43 99 2011-01-01T00:01:35Z Selena Gomes en Calgary CA 12 53
  • 58. © Hortonworks Inc. 2011 – 2016. All Rights Reserved58 Approximate Sketch Columns timestamp page userid language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber user1111111 en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber user1111111 en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber user2222222 en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha user3333333 en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha user4444444 en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes user1111111 en Calgary CA 12 53 timestamp page language city country count sum_added sum_delete d min_added Userid_sket ch …. 2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 {sketch} 2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 {sketch} 2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 {sketch} Rollup by hour
  • 59. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Approximate Sketch Columns  Better rollup for high cardinality columns e.g userid  Reduced storage size  Use Cases  Fast approximate distinct counts  Approximate histograms  Funnel/retention analysis  Limitation  Not possible to do exact counts  filter on individual row values

Editor's Notes

  • #2: Thank you all for coming to my talk. The title of this talk is Druid: Sub-Second OLAP queries over Petabytes of Streaming Data Sub-Second means Fast Interactive queries which can be used to power interactive dashboards, fast analytics, monitoring and alerting applications. I am a Software Engineer at Hortonworks, a committer and PMC Member in Druid and part of PPMC for Superset Incubation. I am part of the Business Intelligence team at Hortonworks. Prior to that I have spent 2 years working at Metamarkets where I was responsible for handling the analytics infrastructure, including real-time analytics with Druid
  • #3: Motivation Druid introduction and use case Demo Druid Architecture Storage Internals Recent Improvements
  • #4: Initial Use Case Power ad-tech analytics product at metamarkets. Similar to as shown in the picture in the right, A dashboard where you can visualize timeseries data and do arbitrary filtering and grouping on any combinations of dimensions. Requirements - Data store needs to support Arbitrary queries i.e users should be able to filter and group on any combination of dimensions. Scalability : should be able to handle trillions of events/day Interactive : since the data store was going to power and interactive dashboard low latency queries was must Real-time : the time when between an event occurred and it is visible dashboard should be mininal (order of few seconds..) High Availability – no central point of failure Rolling Upgrades – the architecture was required to support Rolling upgrades
  • #5: MOTIVATION Interactive real time visualizations on Complex data streams Answer BI questions How many unique male visitors visited my website last month ? How many products were sold last quarter broken down by a demographic and product category ? Not interested in dumping entire dataset Suppose I am running an ad campaign, and I want to understand what kind of Impressions are there What is my click through rate How many users decided to purchase my services We have User Activity Stream and we may want to know How the users are behaving. We may have a stream of Firewall Events and we want to do detect any anomalies in those streams in realtime. Also, For very large distributed clusters there is a need to answer questions about application performance. How individual node in my cluster behaving ? Are there any Anomalies in query response time ? All the above use cases can have data streams which can be huge in volume depending on the scale of business. How do I analyze this information ? How do I get insights from these Stream of Events in realtime ?
  • #6: Druid Architecture
  • #7: What is Druid ? Column-oriented distributed datastore – data is stored in columnar format, in general many datasets have a large number of dimensions e.g 100s or 1000s , but most of the time queries only need 5-10s of columns, the column oriented format helps druid in only scanning the required columns. Sub-Second query times – It utilizes various techniques like bitmap indexes to do fast filtering of data, uses memory mapped files to serve data from memory, data summarization and compression, query caching to do fast filtering of data and have very optimized algorithms for different query types. And is able to achievesub second query times Realtime streaming ingestion from almost any ETL pipeline. Arbitrary slicing and dicing of data – no need to create pre-canned drill downs Automatic Data Summarization – during ingestion it can summarize your data based, e.g If my dashboard only shows events aggregated by HOUR, we can optionally configure druid to do pre-aggregation at ingestion time. Approximate algorithms (hyperLogLog, theta) – for fast approximate answers Scalable to petabytes of data Highly available
  • #8: Druid Architecture
  • #9: Demo: Wikipedia Real-Time Dashboard (Accelerated 30x)
  • #10: Wikipedia Real-Time Dashboard: How it Works
  • #11: Druid Architecture
  • #12: Realtime Index Tasks- Handle Real-Time Ingestion, Support both pull & push based ingestion. Handle Queries - Ability to serve queries as soon as data is ingested. Store data in write optimized data structure on heap In case you need to do any ETL like data enrichment or joining multiple streams of data, you can do it in a separate ETL layer before sending it to druid. Realtime Periodically persist their in memory data to deep storage in form of read optimized chunks. Deep storage can be any distributed FS and acts as a permanent backup of data Historical Nodes - Main workhorses of druid cluster Use Memory Mapped files to load columnar data Respond to User queries Now Lets see the how data can be queried. Broker Nodes - Keeps track of the data chunks being loaded by each node in the cluster Ability to scatter query across multiple Historical and Realtime nodes Caching Layer Now Lets discuss another case, when you are not having streaming data, but want to Ingest Batch data into druid Batch ingestion can be done using either Hadoop MR or spark job, which converts your data into druid’s columnar format and persist it to deep storage.
  • #13: With many historical nodes in a cluster there is a need for balance the load across them, this is done by the Coordinator Nodes - Uses Zookeeper for coordination Asks historical Nodes to load or drop data They also move data across historical nodes to balances load in the cluster Manages Data replication
  • #14: Indexing Data
  • #15: Indexing Service Indexing is performed by Overlord Middle Managers Peons Middle Managers spawn peons which runs ingestion tasks Each peon runs 1 task Task definition defines which task to run and its properties
  • #16: Realtime Ingestion Done by Realtime Index Tasks Ability to ingest streams of data Stores data in write-optimized structure Periodically converts write-optimized structure to read-optimized segments Event query-able as soon as it is ingested Both push and pull based ingestion Maintain a In-Memory row oriented key-value store Data stored inside the heap within a map Indexed by time and dimension values. Persist data to disk based on threshold or lapse time
  • #19: Batch Ingestion HadoopIndexTask Peon launches Hadoop MR job Mappers read data Reducers create Druid segment files Index Task Suitable for data sizes(<1G)
  • #20: Querying Data
  • #22: HTTP Rest API Queries and results expressed in JSON Multiple Query Types Time Boundary Timeseries TopN GroupBy Select Segment Metadata
  • #29: Druid in Practice
  • #32: Most Events per Day 300 Billion Events / Day (Metamarkets) Most Computed Metrics 1 Billion Metrics / Min (Jolata) Largest Cluster 200 Nodes (Metamarkets) Largest Hourly Ingestion 2TB per Hour (Netflix)
  • #33: Query Latency average - 500ms 90%ile < 1sec 95%ile < 5sec 99%ile < 10 sec Query Volume 1000s queries per minute
  • #37: Query performance – query time, segment scan time … Ingestion Rate – events ingested, events persisted … JVM Health – JVM Heap usage, GC stats … Cache Related – cache hits, cache misses, cache evictions … System related – cpu, disk, network, swap usage etc..
  • #42: No Downtime Data redundancy Rolling upgrades
  • #45: This shows some of the production users. I can talk about some of the large ones which have common use cases. Alibaba and Ebay use druid for ecommerce and user behavior analytics Cisco has a realtime analytics product for analyzing network flows Yahoo uses druid for user behavior analytics and realtime cluster monitoring Hulu does interactive analysis on user and application behavior Paypal, SK telecom – uses druid for business analytics
  • #46: Recent Improvements
  • #47: Built in SQL support (cli, http, jdbc) Kerberos Authentication Performance improvements Optimized large amounts of and/or with concise bitmaps Index-based regex simple filters like ‘foo%’ ~30% improvement on non-time groupBys Apache Hive Integration – Supports full SQL, Large Joins, Batch Indexing Apache Ambari Integration – Easy deployments and Cluster management
  • #48: Future Work Improved schema definition & management reIndexing/compaction without hadoop Closer Hive/Druid integration performance improvements Select query performance improvements Jit-friendly topN queries Security enhancements Row/Column level security Integration with Apache Ranger Work towards supporting Joins
  • #50: Summary Scalability Horizontal Scalability. Columnar storage, indexing and compression. Multi-tenancy. Real-time Ingestion latency < seconds. Query latency < seconds. Arbitrary slice and dice big data like ninja No more pre-canned drill downs. Query with more fine-grained granularity. High availability and Rolling deployment capabilities Less costly to run. Very active open source community.
  • #53: Storage Internals
  • #54: Druid: Segments Data in Druid is stored in Segment Files. Partitioned by time Ideally, segment files are each smaller than 1GB. If files are large, smaller time partitions are needed.
  • #55: Example Wikipedia Edit Dataset
  • #56: Data Rollup Rollup by hour
  • #57: Dictionary Encoding Create and store Ids for each value e.g. page column Values - Justin Bieber, Ke$ha, Selena Gomes Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2 Column Data - [0 0 0 1 1 2] city column - [0 0 0 1 1 1]
  • #58: Bitmap Indices Store Bitmap Indices for each value Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0] Ke$ha -> [3, 4] -> [0 0 0 1 1 0] Selena Gomes -> [5] -> [0 0 0 0 0 1] Queries Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0] language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1] Indexes compressed with Concise or Roaring encoding
  • #59: Data Rollup Rollup by hour