Druid: Sub-Second OLAP queries over Petabytes of Streaming Data

Druid : Sub-Second OLAP
queries over Petabytes of
Streaming Data
Nishant Bangarwa
Hortonworks
Druid Committer, PMC
Superset Incubator PPMC
June 2017

© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
Agenda
History and Motivation
Introduction
Demo
Druid Architecture – Indexing and Querying Data
Druid In Production
Recent Improvements

© Hortonworks Inc. 2011 – 2016. All Rights Reserved
HISTORY AND MOTIVATION
 Druid Open sourced in late 2012
 Initial Use case
 Power ad-tech analytics product
 Requirements
 Arbitrary queries
 Scalability : trillions of events/day
 Interactive : low latency queries
 Real-time : data freshness
 High Availability
 Rolling Upgrades

MOTIVATION
 Business Intelligence Queries
 Arbitrary slicing and dicing of data
 Interactive real time visualizations on
Complex data streams
 Answer BI questions
– How many unique male visitors visited my
website last month ?
– How many products were sold last quarter
broken down by a demographic and product
category ?
 Not interested in dumping entire dataset

Introdution

What is Druid ?
 Column-oriented distributed datastore
 Sub-Second query times
 Realtime streaming ingestion
 Arbitrary slicing and dicing of data
 Automatic Data Summarization
 Approximate algorithms (hyperLogLog, theta)
 Scalable to petabytes of data
 Highly available

Demo

Demo: Wikipedia Real-Time Dashboard (Accelerated 30x)

Wikipedia Real-Time Dashboard: How it Works
Wikipedia
Edits Data
Stream
Exactly-Once
Ingestion
Write
Read
Java Stream
Reader

Druid Architecture

Realtime
Nodes
Historical
Nodes
11
Druid Architecture
Batch Data
Event
Historical
Nodes
Broker
Nodes
Realtime
Index Tasks
Streaming
Data
Historical
Nodes
Handoff

Druid Architecture
Batch Data
Queries
Metadata
Store
Coordinator
Nodes
Zookeepe
r
Historical
Nodes
Broker
Nodes
Realtime
Index Tasks
Streaming
Data
Handoff
Optional Distributed
Query Cache

Indexing Data

Indexing Service
 Indexing is performed by
 Overlord
 Middle Managers
 Peons
 Middle Managers spawn peons which runs ingestion
tasks
 Each peon runs 1 task
 Task definition defines which task to run and its
properties

Streaming Ingestion : Realtime Index Tasks
 Ability to ingest streams of data
 Stores data in write-optimized structure
 Periodically converts write-optimized structure
to read-optimized segments
 Event query-able as soon as it is ingested
 Both push and pull based ingestion

Streaming Ingestion : Tranquility
 Helper library for coordinating
streaming ingestion
 Simple API to send events to
druid
 Transparently Manages
 Realtime index Task Creation
 Partitioning and Replication
 Schema Evolution
 Can be used with Flink, Samza,
Spark, Storm any other ETL
framework

Kafka Indexing Service (experimental)
 Supports Exactly once ingestion
 Messages pulled by Kafka Index Tasks
 Each Kafka Index Task consumes from a set of
partitions with start and end offset
 Each message verified to ensure sequence
 Kafka Offsets and corresponding segments
persisted in same metadata transaction
atomically
 Kafka Supervisor
 embedded inside overlord
 Manages kafka index tasks
 Retry failed tasks

Batch Ingestion
 HadoopIndexTask
 Peon launches Hadoop MR job
 Mappers read data
 Reducers create Druid segment files
 Index Task
 Runs in single JVM i.e peon
 Suitable for data sizes(<1G)
 Integrations with Apache HIVE and Spark for Batch Ingestion

Querying Data

Querying Data from Druid
 Druid supports
 JSON Queries over HTTP
 In built SQL (experimental)
 Querying libraries available for
 Python
 R
 Ruby
 Javascript
 Clojure
 PHP

JSON Over HTTP
 HTTP Rest API
 Queries and results expressed in JSON
 Multiple Query Types
 Time Boundary
 Timeseries
 TopN
 GroupBy
 Select
 Segment Metadata

In built SQL (experimental)
 Apache Calcite based parser and planner
 Ability to connect druid to any BI tool that supports JDBC
 SQL via JSON over HTTP
 Supports Approximate queries
 APPROX_COUNT_DISTINCT(col)
 Ability to do Fast Approx TopN queries
 APPROX_QUANTILE(column, probability)

Integrated with multiple Open Source UI tools
 Superset –
 Developed at AirBnb
 In Apache Incubation since May 2017
 Grafana – Druid plugin (https://github.com/grafana-druid-plugin/druidplugin)
 Metabase
 With in-built SQL connect with any BI tool supporting JDBC

Superset
 Python backend
 Flask app builder
 Authentication
 Pandas for rich analytics
 SqlAlchemy for SQL toolkit
 Javascript frontend
 React, NVD3
 Deep integration with Druid

Superset Rich Dashboarding Capabilities: Treemaps

Superset Rich Dashboarding Capabilities: Sunburst

Superset UI Provides Powerful Visualizations
Rich library of dashboard visualizations:
Basic:
• Bar Charts
• Pie Charts
• Line Charts
Advanced:
• Sankey Diagrams
• Treemaps
• Sunburst
• Heatmaps
And More!

Druid in Production

Production readiness
 Is Druid suitable for my Use case ?
 Will Druid meet my performance requirements at scale ?
 How complex is it to Operate and Manage Druid cluster ?
 How to monitor a Druid cluster ?
 High Availability ?
 How to upgrade Druid cluster without downtime ?
 Security ?
 Extensibility for future Use cases ?

Suitable Use Cases
 Powering Interactive user facing applications
 Arbitrary slicing and dicing of large datasets
 User behavior analysis
 measuring distinct counts
 retention analysis
 funnel analysis
 A/B testing
 Exploratory analytics/root cause analysis

Performance and Scalability : Fast Facts
Most Events per Day
300 Billion Events / Day
(Metamarkets)
Most Computed Metrics
1 Billion Metrics / Min
(Jolata)
Largest Cluster
200 Nodes
(Metamarkets)
Largest Hourly Ingestion
2TB per Hour
(Netflix)

Performance
 Query Latency
– average - 500ms
– 90%ile < 1sec
– 95%ile < 5sec
– 99%ile < 10 sec
 Query Volume
– 1000s queries per minute
 Benchmarking code
 https://github.com/druid-
io/druid-benchmark

Performance : Approximate Algorithms
 Ability to Store Approximate Data Sketches for high cardinality columns e.g userid
 Reduced storage size
 Use Cases
 Fast approximate distinct counts
 Approximate Top-K queries
 Approximate histograms
 Funnel/retention analysis
 Limitation
 Not possible to do exact counts
 filter on individual row values

Simplified Druid Cluster Management with Ambari
 Install, configure and manage Druid and all external dependencies from Ambari
 Easy to enable HA, Security, Monitoring …

Simplified Druid Cluster Management with Ambari

Monitoring a Druid Cluster
 Each Druid Node emits metrics for
 Query performance
 Ingestion Rate
 JVM Health
 Query Cache performance
 System health
 Emitted as JSON objects to a runtime log file or over HTTP to other services
 Emitters available for Ambari Metrics Server, Graphite, StatsD, Kafka
 Easy to implement your own metrics emitter

Monitoring using Ambari Metrics Server
 HDP 2.6.1 contains pre-defined grafana dashboards
 Health of Druid Nodes
 Ingestion
 Query performance
 Easy to create new dashboards and setup alerts
 Auto configured when both Druid and Ambari Metrics Server are installed

Monitoring using Ambari Metrics Server

High Availability
 Deploy Coordinator/Overlord on multiple instances
 Leader election in zookeeper
 Broker – install multiple brokers
 Use druid Router/ Any Load balancer to route queries to brokers
 Realtime Index Tasks – create redundant tasks.
 Historical Nodes – create load rule with replication factor >= 2 (default = 2)

Rolling Upgrades
 Maintain backwards compatibility
 Data redundancy
 Shared Nothing Architecture
 Rolling upgrades
 No Downtime
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
3

Security
 Supports Authentication via Kerberos/ SPNEGO
 Easy Wizard based kerberos security enablement via Ambari
Druid
Druid
KDC server
User
Browser1 kinit user
2 Token

Extending Core Druid
 Plugin Based Architecture
 leverage Guice in order to load extensions at runtime
 Possible to add extension to
 Add a new deep storage implementation
 Add a new Firehose
 Add Aggregators
 Add Complex metrics
 Add new Query types
 Add new Jersey resources
 Bundle your extension with all the other Druid extensions

Companies Using Druid

Recent Improvements

Druid 0.10.0
 Kafka Indexing Service – Exactly once Ingestion
 Built in SQL support (cli, http, jdbc)
 Numeric Dimensions
 Kerberos Authentication
 Performance improvements
 Optimized large amounts of and/or with concise bitmaps
 Index-based regex simple filters like ‘foo%’
 ~30% improvement on non-time groupBys
 Apache Hive Integration – Supports full SQL, Large Joins, Batch Indexing
 Apache Ambari Integration – Easy deployments and Cluster management

Future Work
 Improved schema definition & management
 Improvements to Hive/Druid integration
 Materialized Views, Push down more filters, support complex columns etc…
 Performance improvements
 Select query performance improvements
 Jit-friendly topN queries
 Security enhancements
 Row/Column level security
 Integration with Apache Ranger
 And much more……

Community
 User google group - druid-user@googlegroups.com
 Dev google group - druid-dev@googlegroups.com
 Github - druid-io/druid
 IRC - #druid-dev on irc.freenode.net

Summary
 Easy installation and management via Ambari
 Real-time
– Ingestion latency < seconds.
– Query latency < seconds.
 Arbitrary slice and dice big data like ninja
– No more pre-canned drill downs.
– Query with more fine-grained granularity.
 High availability and Rolling deployment capabilities
 Secure and Production ready
 Vibrant and Active community
 Available as Tech Preview in HDP 2.6.1

Thank you ! Questions ?
 Twitter - @NishantBangarwa
 Email - nbangarwa@hortonworks.com
 Linkedin - https://www.linkedin.com/in/nishant-bangarwa

Atscale + Hive + Druid
 Leverage Atscale cubing
capabilities
 store aggregate tables in
Druid
 Updatable dimensions in
HIVE

Storage Format

Druid: Segments
 Data in Druid is stored in Segment Files.
 Partitioned by time
 Ideally, segment files are each smaller than 1GB.
 If files are large, smaller time partitions are needed.
Time
Segment 1:
Monday
Segment 2:
Tuesday
Segment 3:
Wednesday
Segment 4:
Thursday
Segment 5_2:
Friday
Segment 5_1:
Friday

Example Wikipedia Edit Dataset
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
Timestamp Dimensions Metrics

Data Rollup
timestamp page language city country count sum_added sum_deleted min_added max_added ….
2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 32
2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 43
2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 12
Rollup by hour

Dictionary Encoding
 Create and store Ids for each value
 e.g. page column
 Values - Justin Bieber, Ke$ha, Selena Gomes
 Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2
 Column Data - [0 0 0 1 1 2]
 city column - [0 0 0 1 1 1]

Bitmap Indices
 Store Bitmap Indices for each value
 Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0]
 Ke$ha -> [3, 4] -> [0 0 0 1 1 0]
 Selena Gomes -> [5] -> [0 0 0 0 0 1]
 Queries
 Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0]
 language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1]
 Indexes compressed with Concise or Roaring encoding

Approximate Sketch Columns
timestamp page userid language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber user1111111 en SF USA 10 65
2011-01-01T00:05:35Z Ke$ha user3333333 en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha user4444444 en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes user1111111 en Calgary CA 12 53
timestamp page language city country count sum_added sum_delete
d
min_added Userid_sket
ch
….
2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 {sketch}
2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 {sketch}
2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 {sketch}
Rollup by hour

Approximate Sketch Columns
 Better rollup for high cardinality columns e.g userid
 Reduced storage size
 Use Cases
 Fast approximate distinct counts
 Approximate histograms
 Funnel/retention analysis
 Limitation
 Not possible to do exact counts
 filter on individual row values

Druid: Sub-Second OLAP queries over Petabytes of Streaming Data

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Druid: Sub-Second OLAP queries over Petabytes of Streaming Data (20)

More from DataWorks Summit (20)

Recently uploaded (20)

Druid: Sub-Second OLAP queries over Petabytes of Streaming Data

Editor's Notes