SlideShare a Scribd company logo
Enabling Real-time Analytics Applications @ LinkedIn’s Scale
Mayank Shrivastava Jackie Jiang
Senior Software Engineer
Seunghyun Lee
Senior Software EngineerStaff Software Engineer
Apache Pinot
1
2
3
4
Agenda
Introduction
Pinot @ LinkedIn
How to use Pinot
Pinot Performance
How is data generated and used at LinkedIn
Actor Verb
Member
Job
Post
Company
Object Life Cycle
Create
Generate
Analyze
Product
DataInsights
600+ million
members
Tens of
million posts
likes/shared
per day
3+ million
jobs posted
per month
30 million
companies
Trillions of events per day
Real-time Analytics Applications at LinkedIn
How to build an online analytics application?
• Real-time data ingestion
• Millions of active users, 1000s of queries per sec
• Super low latency (10s ms)
• Highly available, always on
Approach 1. Join on the fly
Event Stream
Profile View
Profile View Table
Member Table
Application
Server
Who viewed my profile
• Real-time
(depending on storage)
• High latency due to join
Approach 2. Pre Join + Pre Aggregate
• Near real-time ingestion
• Latency varies with query
selectivity
Event Stream
Profile View
Profile View
Table
Member Table
Application
Server
Who viewed my profile
Stream
Processing
Engine
Pre Join +
Pre Aggr
Approach 3. Pre Join + Pre Aggregate + Pre Cube
• Very fast
• Batch ingestion (hourly / daily)
• Storage explosion
• Re-bootstrap on schema change
Event Stream
Profile View Profile View
Table
Member Table
Application
Server
Who viewed my profile
Batch
Processing
Engine
Pre Join +
Pre Aggr +
Pre Cube
Latency vs. Flexibility
Profile View Table
Member Table Pre-Join Pre-Aggregation Pre-Cube
Spark SQL
Presto
Hive
Big Query
Druid
Elastic Search
Pinot
Kylin
KV Store
Latency
Flexibility
lowhigh
lowhigh
Pinot
Who Viewed My Profile @ LinkedIn
Data Lake
Stream
Processing
WVMP
Dashboard
Ad-hoc Queries
Espresso
Raw Tracking
Data
Pre-joined
Data
Pre Join +
Pre Aggr
What is Apache Pinot?
• OLAP Datastore
• Columnar, indexed storage
• Low latency analytics
• Distributed – highly available, reliable, scalable
• Lambda architecture
○ Offline data pushes + Real-time stream ingestion
• Open Source
1
2
3
4
Agenda
Introduction
Pinot @ LinkedIn
How to use Pinot
Pinot Performance
Pinot @ LinkedIn
70+ 2000+ 100K+ 1M+
Member Facing
Use Cases
Dashboards
for Internal
Business Metrics
Queries
Per Second
Records Ingested
Per Second
Pinot @ LinkedIn: Member Facing Analytics Report
• Providing analytics reports
for Linkedin member-facing
applications
• Very high QPS (Thousands)
• Requires strict latency SLA
(10s ms - sub-sec)
Pinot @ LinkedIn: Interactive Dashboard
• Visualization tool for
multi-dimensional metrics
• Complex, explorative queries
• 2000+ metrics,
used by 1000+ employees
Pinot @ LinkedIn: Anomaly Detection
• Efficiently detect and
investigate anomalies in
metrics
• Third Eye: Part of Apache
Pinot open source
Pinot Usage @ Other Companies
1
2
3
4
Agenda
Introduction
Pinot @ LinkedIn
How to use Pinot
Pinot Performance
How to use Pinot
Batch Data Ingestion
Real-time Data Ingestion
SQL-like Query Interface (PQL)
Let’s build something cool
Event RSVP Data
How to use Pinot: Workflow
Define
Schema
Define Table
Configuration
Create
Table
One Time Setup
Raw Data
Generate
Pinot
Segments
Push Data
Streaming
Data
Setup
Stream Data
Source
Batch
(Scheduled Job)
Real-time
(One Time Setup)
Data Ingestion
HDFS, S3,
ADSL, NFS...
Kafka,
Event Hub...
How to use Pinot: Define Schema
● Schema name: meetupRsvp
● Dimension field specs
○ event_name (string)
○ event_time (long)
○ country (string)
○ city (string)
○ …
● Metrics field specs
○ rsvp_count (int)
● Time field spec
○ timestamp (long)
■ timetype: epoch / datetime
■ granularity: millisecond /
second/hour/day
• Dimension: an attribute of your data (filter,
group by)
• Metric: a number that is used to measure
characteristics of a dimension (aggregation)
• Time: a timestamp of an event (partitioning,
retention management)
SELECT event_name, sum(rsvp_count)
FROM meetupRsvp
WHERE country = “us”
GROUP BY event_name
TOP 10
Example Query - Top 10 events in US
How to use Pinot: Configure and Create Table
Pinot Schema
Table Config
● Table name: meetupRsvp
● Table type: batch / realtime
/ hybrid
● Replication factor: 2
● Index Columns: ...
● Bloom filters: ...
● Retention: 30 days
● ...
Pinot
Admin Client
How to use Pinot: Batch Ingestion
Raw DataRaw Data
Raw Data
Segment
Generation
Job
(library)
Json, CSV, Avro,
Parquet, ORC...
Pinot
Schema
Table
Config
Pinot
Segment
Pinot
Segment
Pinot
Segment
HDFS, S3, ADLS, NFS...
HDFS, S3, ADLS, NFS...
How to use Pinot: Batch Ingestion
Raw Data
Segment
Generation
Job
(library)
Json, Avro,
Parquet, ORC...
Pinot
Schema
Table
Config
Pinot
Segment
Pinot
Segment
Pinot
Segment
Segment
Push Job
(library)
HDFS, S3, ADLS, NFS... HDFS, S3, ADLS, NFS...
How to use Pinot: Segment Assignment
Segment
Push Job
Controller
Helix
Zookeeper
Server-0 Server-1 Server-2
Pinot
• Assignment strategies
○ Uniform
○ Replica Group
○ Partition Aware
Segment Store
S0 S2S1
HDFS, S3, ADLS, NFS...
● S0: Sever-0, Server-1
● S1: Server-1, Server-2
● S2: Server-0, Server-2
S0 S2 S1 S0 S2 S1
1. Table name
2. Segment name
3. Segment URI path
How to use Pinot: Query Routing
Segment
Push Job
Controller
Helix
• Routing Strategies
○ Uniform
○ Replica Group
○ Partition Aware
Broker
Queries
Segment Store
S0 S2S1
HDFS, S3, ADLS, NFS...
Server-0 Server-1 Server-2
Pinot
S0 S2 S1 S0 S2 S1
How to use Pinot: Batch + Realtime
Segment
Push Job
Controller
Helix
Real-time
Servers
Offline
Servers
Broker
Queries
Pinot
Streaming
Data
Kafka,
Event Hub,
Kinesis...
Table Config
● Table name: meetupRsvp
● Table type: real-time
● Replication factor: 2
● Kafka broker: ...
● Kafka topic name: ...
● Retention: 5 days
● ...
• A single schema for both
offline + real-time tables
How to use Pinot: Batch + Realtime
Segment
Push Job
Controller
Helix
Real-time
Servers
Offline
Servers
Broker
Queries
Pinot
Streaming
Data
Kafka,
Event Hub,
Kinesis...
• Real-time servers keep
consumed data in
memory, periodically
flush data to segment
store.
• Broker handles offline
and real-time federation.
Quick Demo
Event RSVP Data
1
2
3
4
Agenda
Introduction
Pinot @ LinkedIn
How to use Pinot
Pinot Performance
Interactive Dashboard select sum(pageView) from T
where country = us
and browser = chrome
...
group by time
• Human-driven queries
• Slice and dice over arbitrary dimensions
5000 Queries Pinot Druid
Total Time 11 minutes 24 minutes
P50 84ms 136ms
P90 206ms 667ms
Site Facing Analytics
select sum(articleViewCount) from T
where articleId = x
...
and time >= y time < z
group by viewer[title|geo|industry]
• Pre-defined queries with different
filtering values
• Usually have a filter on the primary key
(e.g. articleId)
• High QPS (thousands), low latency
(< 100ms for 99%) requirements
Anomaly Detection
for d1 in [us, ca, ...]
for d2 in [chrome, firefox, ...]
...
select sum(pageViews) from T
where country = d1 and browser = d2…
group by time
Filter Aggregation
select …
where country = us …
Slow, scan 60-70% data
select …
where country = ireland …
Scan less than 1%
• Identifying issues requires monitoring
all possible combinations
• Data distribution can be skewed
Secret behind Pinot
Aggregation
Filter
Storage
Scan Star-Tree Pre-aggregation
Scan Inverted Index
Columnar Store Encoding/Compression
Sorted Index Star-Tree Index
❏ Common Techniques
❏ Pinot & Druid
❏ Pinot Only
select sum(pageView) from T
where country = us
and browser = chrome
Columnar Store
• Read relevant columns only
country browser ...
us chrome ...
ca firefox ...
jp ie ...
us firefox ...
ca ie ...
… … ...
Raw Data
Row Based
Column Based
Aggregation
Filter
Storage
select sum(pageView) from T
where country = us
and browser = chrome
Columnar us chrome ...
ca firefox ...
jp ie ...
country
us
ca
jp
us
ca
…
browser
chrome
firefox
ie
firefox
ie
…
...
...
...
...
...
...
...
Encoding & Compression Dictionary
Forward Index
country
ca
jp
us
…
browser
chrome
firefox
ie
…
country
2
0
1
2
0
...
browser
0
1
2
1
2
...
• Storage compression
○ Dictionary encoding
○ Bit compression
Aggregation
Filter
Storage Encoding/Compression
select sum(pageView) from T
where country = us
and browser = chrome
Column Based
country
us
ca
jp
us
ca
…
browser
chrome
firefox
ie
firefox
ie
…
docId
0
1
2
3
4
…
docId
0
1
2
3
4
...
dictId
0
1
2
…
Inverted Index
docId country browser
0 us chrome
1 ca firefox
2 jp ie
3 us firefox
4 ca ie
… … …
Raw Data country docIds
ca 1, 4...
jp 2...
us 0, 3...
... ...
Inverted Index
browser docIds
chrome 0 ...
firefox 1, 3...
ie 2, 4...
... ...• Storing bitmap for each value
• Fast filtering:
○ Constant time value lookup
○ Bit operations for AND/OR clause
Aggregation
Filter
Storage
Inverted
Index
select sum(pageView) from T
where country = us
and browser = chrome
Sorted Index
• Better data compression:
○ Run length encoding
○ Can be accessed as
forward/inverted index
• Spatial locality
country start docId end docId
ca 0 80
jp 81 100
us 101 300
… … …
docId country
0 ca
... …
100 jp
101 us
… …
300 us
… …
sorted index
inverted index
Aggregation
Filter
Storage
Sorted Index
select sum(pageView) from T
where country = us
and browser = chrome
Latency vs. Space Trade-off
latency
space requirement
scan
pre-cubeStar-Tree
select sum(pageView) from T
where country = us
and browser = chrome
Aggregation
Filter
Storage
Star-Tree Pre-aggregation
Star-Tree Index
Star-Tree Index
latency
space requirement
T=infinity
T=1,000,000
T=10,000
T=100
T=1
• Configurable trade-off between latency and space by partial
pre-aggregation technique
• Be able to achieve a hard upper bound for query latencies
Star-Tree Index
Flexible Query Execution Plan
Query Optimization
select max(col) from T Use metadata instead of scanning
select sum(metric) from T
where country = us and accountId = x
Reorder filter based on the available indexes
(apply accountId before country predicate)
Segment level physical query planner can intelligently choose the best way
to solve the query based on the segment metadata and available indexes.
Global Optimizations
Problem Solution
Querying all segments
Segment pruning to minimize the number of
segments to query
Querying all servers
Smart segment assignment to reduce the fan-out
to servers
Conclusion
User Activity
Data
Member
Facing
Applications
Interactive
Dashboard
Anomaly
Detection
Contributing to Pinot
• We are looking for contributions!
• Apache Pinot (incubating) 0.1.0 is available at
https://pinot.apache.org
• Pinot Twitter Account
https://twitter.com/ApachePinot
• Pinot Meetup Page
https://www.meetup.com/apache-pinot
• Pinot Slack Channel
https://tinyurl.com/pinotSlackChannel
Folks behind Pinot
Mayank Shrivastava
Subbu Subramaniam
Jean-Francois Im
Jackie Jiang
Seunghyun Lee
Jennifer Dai
Neha Pawar
Jialiang Li
Sunitha Beeram
Shraddha Sahay
Kishore Gopalakrishna
Xiang Fu
James Shao
Prasanna Ravi
John Gutmann
Dino Occhialini
Walter Huf
Xiaohui Sun
Long Huynh
Akshay Rai
Alexander Pucher
Jihao Zhang
Felix Cheung
Olivier Lamy
Jim Jagielski
Marcel Siegrist
Roman Shaposhnik
Anurag Shendge
Thank you

More Related Content

PDF
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
Pinot: Near Realtime Analytics @ Uber
PDF
Kafka internals
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Apache Pinot Meetup Sept02, 2020
PDF
Iceberg + Alluxio for Fast Data Analytics
PPTX
Autoscaling Flink with Reactive Mode
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Real-time Analytics with Trino and Apache Pinot
Pinot: Near Realtime Analytics @ Uber
Kafka internals
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Apache Pinot Meetup Sept02, 2020
Iceberg + Alluxio for Fast Data Analytics
Autoscaling Flink with Reactive Mode

What's hot (20)

PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PDF
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
PDF
Pinot: Realtime Distributed OLAP datastore
PPTX
Apache Beam: A unified model for batch and stream processing data
PDF
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
PPTX
Kafka 101
PPTX
Kafka presentation
PDF
Iceberg: a fast table format for S3
PDF
Fundamentals of Apache Kafka
PDF
Apache Kafka - Martin Podval
PDF
Kafka Streams State Stores Being Persistent
PPTX
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Apache Druid 101
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PDF
Building Pinterest Real-Time Ads Platform Using Kafka Streams
PDF
Apache Hudi: The Path Forward
PDF
Building an open data platform with apache iceberg
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Pinot: Realtime Distributed OLAP datastore
Apache Beam: A unified model for batch and stream processing data
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Kafka 101
Kafka presentation
Iceberg: a fast table format for S3
Fundamentals of Apache Kafka
Apache Kafka - Martin Podval
Kafka Streams State Stores Being Persistent
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Apache Kafka Architecture & Fundamentals Explained
Apache Druid 101
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
APACHE KAFKA / Kafka Connect / Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Apache Hudi: The Path Forward
Building an open data platform with apache iceberg
Ad

Similar to Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale (20)

PDF
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
Intro to Pinot (2016-01-04)
PDF
How LinkedIn Democratizes Big Data Visualization
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
New Features in Apache Pinot
PDF
History of Apache Pinot
PDF
Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot
PDF
Building real time analytics applications using pinot : A LinkedIn case study
PDF
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
PDF
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
PDF
Pinotcoursera 151103183418-lva1-app6892 (1)
PPTX
Splunk live! ninjas_break-out
PDF
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
PPTX
How Concur uses Big Data to get you to Tableau Conference On Time
PDF
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
PPTX
Intro to Big Data - Orlando Code Camp 2014
PDF
[ODSC EUROPE 2022] Eagleeye - Data Pipeline for Anomaly Detection in Cyber Se...
PDF
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Intro to Pinot (2016-01-04)
How LinkedIn Democratizes Big Data Visualization
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
New Features in Apache Pinot
History of Apache Pinot
Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot
Building real time analytics applications using pinot : A LinkedIn case study
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Pinotcoursera 151103183418-lva1-app6892 (1)
Splunk live! ninjas_break-out
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
How Concur uses Big Data to get you to Tableau Conference On Time
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
Intro to Big Data - Orlando Code Camp 2014
[ODSC EUROPE 2022] Eagleeye - Data Pipeline for Anomaly Detection in Cyber Se...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Introduction to Business Data Analytics.
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Acceptance and paychological effects of mandatory extra coach I classes.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Launch Your Data Science Career in Kochi – 2025
Major-Components-ofNKJNNKNKNKNKronment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Galatica Smart Energy Infrastructure Startup Pitch Deck
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
Reliability_Chapter_ presentation 1221.5784
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Business Data Analytics.
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Acumen Training GuidePresentation.pptx

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale

  • 1. Enabling Real-time Analytics Applications @ LinkedIn’s Scale Mayank Shrivastava Jackie Jiang Senior Software Engineer Seunghyun Lee Senior Software EngineerStaff Software Engineer Apache Pinot
  • 2. 1 2 3 4 Agenda Introduction Pinot @ LinkedIn How to use Pinot Pinot Performance
  • 3. How is data generated and used at LinkedIn Actor Verb Member Job Post Company Object Life Cycle Create Generate Analyze Product DataInsights 600+ million members Tens of million posts likes/shared per day 3+ million jobs posted per month 30 million companies Trillions of events per day
  • 5. How to build an online analytics application? • Real-time data ingestion • Millions of active users, 1000s of queries per sec • Super low latency (10s ms) • Highly available, always on
  • 6. Approach 1. Join on the fly Event Stream Profile View Profile View Table Member Table Application Server Who viewed my profile • Real-time (depending on storage) • High latency due to join
  • 7. Approach 2. Pre Join + Pre Aggregate • Near real-time ingestion • Latency varies with query selectivity Event Stream Profile View Profile View Table Member Table Application Server Who viewed my profile Stream Processing Engine Pre Join + Pre Aggr
  • 8. Approach 3. Pre Join + Pre Aggregate + Pre Cube • Very fast • Batch ingestion (hourly / daily) • Storage explosion • Re-bootstrap on schema change Event Stream Profile View Profile View Table Member Table Application Server Who viewed my profile Batch Processing Engine Pre Join + Pre Aggr + Pre Cube
  • 9. Latency vs. Flexibility Profile View Table Member Table Pre-Join Pre-Aggregation Pre-Cube Spark SQL Presto Hive Big Query Druid Elastic Search Pinot Kylin KV Store Latency Flexibility lowhigh lowhigh Pinot
  • 10. Who Viewed My Profile @ LinkedIn Data Lake Stream Processing WVMP Dashboard Ad-hoc Queries Espresso Raw Tracking Data Pre-joined Data Pre Join + Pre Aggr
  • 11. What is Apache Pinot? • OLAP Datastore • Columnar, indexed storage • Low latency analytics • Distributed – highly available, reliable, scalable • Lambda architecture ○ Offline data pushes + Real-time stream ingestion • Open Source
  • 12. 1 2 3 4 Agenda Introduction Pinot @ LinkedIn How to use Pinot Pinot Performance
  • 13. Pinot @ LinkedIn 70+ 2000+ 100K+ 1M+ Member Facing Use Cases Dashboards for Internal Business Metrics Queries Per Second Records Ingested Per Second
  • 14. Pinot @ LinkedIn: Member Facing Analytics Report • Providing analytics reports for Linkedin member-facing applications • Very high QPS (Thousands) • Requires strict latency SLA (10s ms - sub-sec)
  • 15. Pinot @ LinkedIn: Interactive Dashboard • Visualization tool for multi-dimensional metrics • Complex, explorative queries • 2000+ metrics, used by 1000+ employees
  • 16. Pinot @ LinkedIn: Anomaly Detection • Efficiently detect and investigate anomalies in metrics • Third Eye: Part of Apache Pinot open source
  • 17. Pinot Usage @ Other Companies
  • 18. 1 2 3 4 Agenda Introduction Pinot @ LinkedIn How to use Pinot Pinot Performance
  • 19. How to use Pinot Batch Data Ingestion Real-time Data Ingestion SQL-like Query Interface (PQL)
  • 20. Let’s build something cool Event RSVP Data
  • 21. How to use Pinot: Workflow Define Schema Define Table Configuration Create Table One Time Setup Raw Data Generate Pinot Segments Push Data Streaming Data Setup Stream Data Source Batch (Scheduled Job) Real-time (One Time Setup) Data Ingestion HDFS, S3, ADSL, NFS... Kafka, Event Hub...
  • 22. How to use Pinot: Define Schema ● Schema name: meetupRsvp ● Dimension field specs ○ event_name (string) ○ event_time (long) ○ country (string) ○ city (string) ○ … ● Metrics field specs ○ rsvp_count (int) ● Time field spec ○ timestamp (long) ■ timetype: epoch / datetime ■ granularity: millisecond / second/hour/day • Dimension: an attribute of your data (filter, group by) • Metric: a number that is used to measure characteristics of a dimension (aggregation) • Time: a timestamp of an event (partitioning, retention management) SELECT event_name, sum(rsvp_count) FROM meetupRsvp WHERE country = “us” GROUP BY event_name TOP 10 Example Query - Top 10 events in US
  • 23. How to use Pinot: Configure and Create Table Pinot Schema Table Config ● Table name: meetupRsvp ● Table type: batch / realtime / hybrid ● Replication factor: 2 ● Index Columns: ... ● Bloom filters: ... ● Retention: 30 days ● ... Pinot Admin Client
  • 24. How to use Pinot: Batch Ingestion Raw DataRaw Data Raw Data Segment Generation Job (library) Json, CSV, Avro, Parquet, ORC... Pinot Schema Table Config Pinot Segment Pinot Segment Pinot Segment HDFS, S3, ADLS, NFS... HDFS, S3, ADLS, NFS...
  • 25. How to use Pinot: Batch Ingestion Raw Data Segment Generation Job (library) Json, Avro, Parquet, ORC... Pinot Schema Table Config Pinot Segment Pinot Segment Pinot Segment Segment Push Job (library) HDFS, S3, ADLS, NFS... HDFS, S3, ADLS, NFS...
  • 26. How to use Pinot: Segment Assignment Segment Push Job Controller Helix Zookeeper Server-0 Server-1 Server-2 Pinot • Assignment strategies ○ Uniform ○ Replica Group ○ Partition Aware Segment Store S0 S2S1 HDFS, S3, ADLS, NFS... ● S0: Sever-0, Server-1 ● S1: Server-1, Server-2 ● S2: Server-0, Server-2 S0 S2 S1 S0 S2 S1 1. Table name 2. Segment name 3. Segment URI path
  • 27. How to use Pinot: Query Routing Segment Push Job Controller Helix • Routing Strategies ○ Uniform ○ Replica Group ○ Partition Aware Broker Queries Segment Store S0 S2S1 HDFS, S3, ADLS, NFS... Server-0 Server-1 Server-2 Pinot S0 S2 S1 S0 S2 S1
  • 28. How to use Pinot: Batch + Realtime Segment Push Job Controller Helix Real-time Servers Offline Servers Broker Queries Pinot Streaming Data Kafka, Event Hub, Kinesis... Table Config ● Table name: meetupRsvp ● Table type: real-time ● Replication factor: 2 ● Kafka broker: ... ● Kafka topic name: ... ● Retention: 5 days ● ... • A single schema for both offline + real-time tables
  • 29. How to use Pinot: Batch + Realtime Segment Push Job Controller Helix Real-time Servers Offline Servers Broker Queries Pinot Streaming Data Kafka, Event Hub, Kinesis... • Real-time servers keep consumed data in memory, periodically flush data to segment store. • Broker handles offline and real-time federation.
  • 31. 1 2 3 4 Agenda Introduction Pinot @ LinkedIn How to use Pinot Pinot Performance
  • 32. Interactive Dashboard select sum(pageView) from T where country = us and browser = chrome ... group by time • Human-driven queries • Slice and dice over arbitrary dimensions 5000 Queries Pinot Druid Total Time 11 minutes 24 minutes P50 84ms 136ms P90 206ms 667ms
  • 33. Site Facing Analytics select sum(articleViewCount) from T where articleId = x ... and time >= y time < z group by viewer[title|geo|industry] • Pre-defined queries with different filtering values • Usually have a filter on the primary key (e.g. articleId) • High QPS (thousands), low latency (< 100ms for 99%) requirements
  • 34. Anomaly Detection for d1 in [us, ca, ...] for d2 in [chrome, firefox, ...] ... select sum(pageViews) from T where country = d1 and browser = d2… group by time Filter Aggregation select … where country = us … Slow, scan 60-70% data select … where country = ireland … Scan less than 1% • Identifying issues requires monitoring all possible combinations • Data distribution can be skewed
  • 35. Secret behind Pinot Aggregation Filter Storage Scan Star-Tree Pre-aggregation Scan Inverted Index Columnar Store Encoding/Compression Sorted Index Star-Tree Index ❏ Common Techniques ❏ Pinot & Druid ❏ Pinot Only select sum(pageView) from T where country = us and browser = chrome
  • 36. Columnar Store • Read relevant columns only country browser ... us chrome ... ca firefox ... jp ie ... us firefox ... ca ie ... … … ... Raw Data Row Based Column Based Aggregation Filter Storage select sum(pageView) from T where country = us and browser = chrome Columnar us chrome ... ca firefox ... jp ie ... country us ca jp us ca … browser chrome firefox ie firefox ie … ... ... ... ... ... ... ...
  • 37. Encoding & Compression Dictionary Forward Index country ca jp us … browser chrome firefox ie … country 2 0 1 2 0 ... browser 0 1 2 1 2 ... • Storage compression ○ Dictionary encoding ○ Bit compression Aggregation Filter Storage Encoding/Compression select sum(pageView) from T where country = us and browser = chrome Column Based country us ca jp us ca … browser chrome firefox ie firefox ie … docId 0 1 2 3 4 … docId 0 1 2 3 4 ... dictId 0 1 2 …
  • 38. Inverted Index docId country browser 0 us chrome 1 ca firefox 2 jp ie 3 us firefox 4 ca ie … … … Raw Data country docIds ca 1, 4... jp 2... us 0, 3... ... ... Inverted Index browser docIds chrome 0 ... firefox 1, 3... ie 2, 4... ... ...• Storing bitmap for each value • Fast filtering: ○ Constant time value lookup ○ Bit operations for AND/OR clause Aggregation Filter Storage Inverted Index select sum(pageView) from T where country = us and browser = chrome
  • 39. Sorted Index • Better data compression: ○ Run length encoding ○ Can be accessed as forward/inverted index • Spatial locality country start docId end docId ca 0 80 jp 81 100 us 101 300 … … … docId country 0 ca ... … 100 jp 101 us … … 300 us … … sorted index inverted index Aggregation Filter Storage Sorted Index select sum(pageView) from T where country = us and browser = chrome
  • 40. Latency vs. Space Trade-off latency space requirement scan pre-cubeStar-Tree select sum(pageView) from T where country = us and browser = chrome Aggregation Filter Storage Star-Tree Pre-aggregation Star-Tree Index
  • 41. Star-Tree Index latency space requirement T=infinity T=1,000,000 T=10,000 T=100 T=1 • Configurable trade-off between latency and space by partial pre-aggregation technique • Be able to achieve a hard upper bound for query latencies
  • 43. Flexible Query Execution Plan Query Optimization select max(col) from T Use metadata instead of scanning select sum(metric) from T where country = us and accountId = x Reorder filter based on the available indexes (apply accountId before country predicate) Segment level physical query planner can intelligently choose the best way to solve the query based on the segment metadata and available indexes.
  • 44. Global Optimizations Problem Solution Querying all segments Segment pruning to minimize the number of segments to query Querying all servers Smart segment assignment to reduce the fan-out to servers
  • 46. Contributing to Pinot • We are looking for contributions! • Apache Pinot (incubating) 0.1.0 is available at https://pinot.apache.org • Pinot Twitter Account https://twitter.com/ApachePinot • Pinot Meetup Page https://www.meetup.com/apache-pinot • Pinot Slack Channel https://tinyurl.com/pinotSlackChannel
  • 47. Folks behind Pinot Mayank Shrivastava Subbu Subramaniam Jean-Francois Im Jackie Jiang Seunghyun Lee Jennifer Dai Neha Pawar Jialiang Li Sunitha Beeram Shraddha Sahay Kishore Gopalakrishna Xiang Fu James Shao Prasanna Ravi John Gutmann Dino Occhialini Walter Huf Xiaohui Sun Long Huynh Akshay Rai Alexander Pucher Jihao Zhang Felix Cheung Olivier Lamy Jim Jagielski Marcel Siegrist Roman Shaposhnik Anurag Shendge