SlideShare a Scribd company logo
1
Architecting Microservices
Applications with Instant Analytics
Tim Berglund, Sr. Director, Developer Experience, Confluent
Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
2
What the heck is Apache
Kafka and Why Should I Care?
K
V
K
V
K
V
K
V
K
V
K
V
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
producer
consumer A
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
Stream
Processingapps
rdbms
nosql
dwh/
hadoop
Stream
Processingapps
rdbms
nosql
dwh/
hadoop
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
consumer A
consumer A
consumer A
Streams
Application
Streams
Application
Streams
Application
public static void main(String args[]) {
Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL);
final Map<String, String> serdeConfig =
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
SCHEMA_REGISTRY_URL);
final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig);
final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig);
final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
StreamsBuilder builder = new StreamsBuilder();
KTable<Long, Double> ratingAverage = getRatingAverageTable(builder);
getRatedMoviesTable(builder, ratingAverage, movieSerde);
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.start();
}
private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
return ratingSerde;
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>();
movieSerde.configure(serdeConfig, false);
return movieSerde;
}
public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder,
KTable<Long, Double> ratingAverage,
SpecificAvroSerde<Movie> movieSerde) {
builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String()))
.mapValues(Parser::parseMovie)
.map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie))
.to("movies", Produced.with(Serdes.Long(), movieSerde));
KTable<Long, Movie> movies = builder.table("movies",
Materialized
.<Long, Movie, KeyValueStore<Bytes, byte[]>>as(
"movies-store")
.withValueSerde(movieSerde)
.withKeySerde(Serdes.Long())
);
KTable<Long, String> ratedMovies = ratingAverage
.join(movies, (avg, movie) -> movie.getTitle() + "=" + avg);
ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String()));
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
.map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating));
KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating);
KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey();
KTable<Long, Long> ratingCounts = ratingsById.count();
KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);
KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts,
(sum, count) -> sum / count.doubleValue(),
Materialized.as("average-ratings"));
ratingAverage.toStream()
/*.peek((key, value) -> { // debug only
System.out.println("key = " + key + ", value = " + value);
})*/
.to("average-ratings");
return ratingAverage;
}
CREATE TABLE movie_ratings AS
SELECT title,
SUM(rating)/COUNT(rating) AS avg_rating,
COUNT(rating) AS num_ratings
FROM ratings
LEFT OUTER JOIN movies
ON ratings.movie_id = movies.movie_id
GROUP BY title;
producer
consumer
KSQL Cluster
KSQL
Server
KSQL
Server
orders shipping users
warehouse
order web
UI
users web
UI
orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics
27
What the heck is Apache
Druid and Why Should I Care?
28
TimRachel
29
30
31
32
Data
Data
Data
Data Sources
ETL Data
Warehouse
Some Code Usually an RDBMS
Analytics
Reporting
Data mining
Querying
33
34
35
Data
Data
Data
Map/reduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search system
Data
Lake
HDFSRDBMS / NoSQL
36
37
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
38
39
40
Typical
Big Data++
Challenges
● Scale: when data is large, we need a lot of servers
● Speed: aiming for sub-second response time
● Complexity: too much fine grain to precompute
● High dimensionality: 10s or 100s of dimensions
● Concurrency: many users and tenants
● Freshness: load from streams
41
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
42
! Batch ingestion
! Efficient storage
! Fast analytic queries
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
43
Druid Use Cases
in the Wild
1. Digital Advertising - Publishers, Advertisers,
Exchanges
2. User Event Analytics- Clickstream, QoS, Usage
3. Network Telemetry
4. Lots and Lots of Data- IoT, Product Analytics,
Fraud
44
Gratuitous
Customer Quote “The performance is great ... some of the tables
that we have internally in Druid have billions and
billions of events in them, and we’re scanning
them in under a second.”
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-
its-hadoop-stuff.html
orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics
46
47
You can Druid
too! Druid community site: https://druid.apache.org/
Imply distribution: https://imply.io/get-started
Contribute: https://github.com/apache/druid
48

More Related Content

PDF
Real-time processing of large amounts of data
PDF
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
PDF
Concepts and Patterns for Streaming Services with Kafka
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
PDF
Microservices with Kafka Ecosystem
PDF
Introduction to Stream Processing
PDF
Building a Streaming Platform with Kafka
Real-time processing of large amounts of data
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Concepts and Patterns for Streaming Services with Kafka
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Microservices with Kafka Ecosystem
Introduction to Stream Processing
Building a Streaming Platform with Kafka

What's hot (20)

PDF
Building event-driven (Micro)Services with Apache Kafka
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
PDF
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Streaming Visualization
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
APAC ksqlDB Workshop
PDF
Building Event-Driven (Micro) Services with Apache Kafka
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
PDF
KSQL: Open Source Streaming for Apache Kafka
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
PDF
Building Event-Driven Applications with Apache Kafka & Confluent Platform
PDF
Ingesting streaming data into Graph Database
PDF
Introduction to Stream Processing
PDF
Time series-analysis-using-an-event-streaming-platform -_v3_final
PPTX
Stream me to the Cloud (and back) with Confluent & MongoDB
PDF
Top use cases for 2022 with Data in Motion and Apache Kafka
PPTX
A guide through the Azure Messaging services - Update Conference
PDF
Event Broker (Kafka) in a Modern Data Architecture
Building event-driven (Micro)Services with Apache Kafka
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
Building Event Driven (Micro)services with Apache Kafka
Streaming Visualization
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
APAC ksqlDB Workshop
Building Event-Driven (Micro) Services with Apache Kafka
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
KSQL: Open Source Streaming for Apache Kafka
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Building Event-Driven Applications with Apache Kafka & Confluent Platform
Ingesting streaming data into Graph Database
Introduction to Stream Processing
Time series-analysis-using-an-event-streaming-platform -_v3_final
Stream me to the Cloud (and back) with Confluent & MongoDB
Top use cases for 2022 with Data in Motion and Apache Kafka
A guide through the Azure Messaging services - Update Conference
Event Broker (Kafka) in a Modern Data Architecture
Ad

Similar to Architecting Microservices Applications with Instant Analytics (20)

PPTX
Kubernetes Controller for Pull Request Based Environment
PDF
Lessons from running AppSync in prod
PDF
Tech Webinar: Angular 2, Introduction to a new framework
PDF
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
PPTX
ql.io at NodePDX
PPTX
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
PPTX
The Very Very Latest in Database Development - Oracle Open World 2012
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
PPTX
Make streaming processing towards ANSI SQL
PDF
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
PDF
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
PPTX
Debugging Microservices - QCON 2017
PDF
Android RenderScript on LLVM
PDF
Big Data Tools in AWS
PDF
Big datadc skyfall_preso_v2
PDF
My past-3 yeas-developer-journey-at-linkedin-by-iantsai
PPTX
Tackle Containerization Advisor (TCA) for Legacy Applications
PPTX
Introduction Into Docker Ecosystem
PDF
Introduction to Software Defined Visualization (SDVis)
PDF
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Kubernetes Controller for Pull Request Based Environment
Lessons from running AppSync in prod
Tech Webinar: Angular 2, Introduction to a new framework
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
ql.io at NodePDX
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest in Database Development - Oracle Open World 2012
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Make streaming processing towards ANSI SQL
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Debugging Microservices - QCON 2017
Android RenderScript on LLVM
Big Data Tools in AWS
Big datadc skyfall_preso_v2
My past-3 yeas-developer-journey-at-linkedin-by-iantsai
Tackle Containerization Advisor (TCA) for Legacy Applications
Introduction Into Docker Ecosystem
Introduction to Software Defined Visualization (SDVis)
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
Modernizing your data center with Dell and AMD
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm

Architecting Microservices Applications with Instant Analytics

  • 1. 1 Architecting Microservices Applications with Instant Analytics Tim Berglund, Sr. Director, Developer Experience, Confluent Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
  • 2. 2 What the heck is Apache Kafka and Why Should I Care?
  • 3. K V
  • 4. K V
  • 5. K V
  • 6. K V
  • 7. K V
  • 8. K V
  • 9. … … … partition 0 partition 1 partition 2 Partitioned Topic producer
  • 10. consumer A … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 11. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 12. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 13. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A
  • 14. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  • 17. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  • 20. public static void main(String args[]) { Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL); final Map<String, String> serdeConfig = Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, SCHEMA_REGISTRY_URL); final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig); final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig); final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); StreamsBuilder builder = new StreamsBuilder(); KTable<Long, Double> ratingAverage = getRatingAverageTable(builder); getRatedMoviesTable(builder, ratingAverage, movieSerde); Topology topology = builder.build(); KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); streams.start(); } private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); return ratingSerde; } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
  • 21. } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>(); movieSerde.configure(serdeConfig, false); return movieSerde; } public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder, KTable<Long, Double> ratingAverage, SpecificAvroSerde<Movie> movieSerde) { builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String())) .mapValues(Parser::parseMovie) .map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie)) .to("movies", Produced.with(Serdes.Long(), movieSerde)); KTable<Long, Movie> movies = builder.table("movies", Materialized .<Long, Movie, KeyValueStore<Bytes, byte[]>>as( "movies-store") .withValueSerde(movieSerde) .withKeySerde(Serdes.Long()) ); KTable<Long, String> ratedMovies = ratingAverage .join(movies, (avg, movie) -> movie.getTitle() + "=" + avg); ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String())); return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
  • 22. return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating) .map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating)); KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating); KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey(); KTable<Long, Long> ratingCounts = ratingsById.count(); KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2); KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts, (sum, count) -> sum / count.doubleValue(), Materialized.as("average-ratings")); ratingAverage.toStream() /*.peek((key, value) -> { // debug only System.out.println("key = " + key + ", value = " + value); })*/ .to("average-ratings"); return ratingAverage; }
  • 23. CREATE TABLE movie_ratings AS SELECT title, SUM(rating)/COUNT(rating) AS avg_rating, COUNT(rating) AS num_ratings FROM ratings LEFT OUTER JOIN movies ON ratings.movie_id = movies.movie_id GROUP BY title;
  • 26. orders shipping users warehouse order web UI users web UI druid product analytics
  • 27. 27 What the heck is Apache Druid and Why Should I Care?
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32 Data Data Data Data Sources ETL Data Warehouse Some Code Usually an RDBMS Analytics Reporting Data mining Querying
  • 33. 33
  • 34. 34
  • 35. 35 Data Data Data Map/reduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake HDFSRDBMS / NoSQL
  • 36. 36
  • 38. 38
  • 39. 39
  • 40. 40 Typical Big Data++ Challenges ● Scale: when data is large, we need a lot of servers ● Speed: aiming for sub-second response time ● Complexity: too much fine grain to precompute ● High dimensionality: 10s or 100s of dimensions ● Concurrency: many users and tenants ● Freshness: load from streams
  • 41. 41 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 42. 42 ! Batch ingestion ! Efficient storage ! Fast analytic queries Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 43. 43 Druid Use Cases in the Wild 1. Digital Advertising - Publishers, Advertisers, Exchanges 2. User Event Analytics- Clickstream, QoS, Usage 3. Network Telemetry 4. Lots and Lots of Data- IoT, Product Analytics, Fraud
  • 44. 44 Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts- its-hadoop-stuff.html
  • 45. orders shipping users warehouse order web UI users web UI druid product analytics
  • 46. 46
  • 47. 47 You can Druid too! Druid community site: https://druid.apache.org/ Imply distribution: https://imply.io/get-started Contribute: https://github.com/apache/druid
  • 48. 48