SlideShare a Scribd company logo
Real-time Analytics
Rong Rong
Software Engineer
- Going Beyond Stream Processing with Apache Pinot
Evolution of Real-Time Analytics
Dashboards / BI Tools
Machine Learning
Total users 700 Million
QPS 10000+
Latency SLA < 100 ms p99th
Freshness Seconds
Actionable Insights
Missed
orders
Inaccurate
orders
Downtime
Top selling items Menu item
Feedback
Actionable Insights
Restaurant Dashboard
Why User-facing analytics?
● Real-Time analytics landscape is rapidly changing.
● OnLine Analytical Processing (OLAP) system are evolving with these trends.
Internal-Facing Analytics
Structured Data
Approximate Data / Query
Consistency
Slice-and-Dice Queries
User-Facing Analytics
Semi-Structured Data
Strong Data / Query
Consistency
Full-SQL semantics
GB-to-TB of Data TB-to-PB of Data
Evolution of Real-Time Analytics
Better Ingestion Better Queries Better Scalability
Requirements are changing!
Intro to Apache Pinot
What is Apache Pinot?
Table
Pinot Data Segments
Col1
..
ca
jp
us
ca
…
ColN
..
…
…
…
…
…
. . . +
Indexes
Metadata
Columnar Store
Higher Level Architecture
Higher Level Architecture
Query
Higher Level Architecture
Tailored for User-Facing Analytics
Efficient Pruning
browser
chrome
firefox
ie
firefox
ie
…
...
...
...
...
...
...
...
country browser ...
us chrome ...
ca firefox ...
jp ie ...
us firefox ...
ca ie ...
… … ...
Raw Data Column Based
country
us
ca
jp
us
ca
…
1
select count(*) from X
where country = us
Country
Browser
Star-Tree Index
US CA
IE C IE C
Optimized Index Algorithm
Filtering Optimizations
Inverted Index
Sorted Index
Range Index
JSON Index
Geo Index
Text Index
Aggregation Optimizations
Theta Sketches
StarTree Index
HyperLogLog
Optimized Index Algorithm
OLAP
Semi-Structured
Data
Pre-Process
Structured Data
OLAP
Semi-Structured
Data
Support Semi-Structured Data
Filtering Optimizations
Inverted Index
Sorted Index
Range Index
JSON Index
Geo Index
Text Index
Aggregation Optimizations
Theta Sketches
StarTree Index
HyperLogLog
Support Semi-Structured Data
OLAP
Real-Time
Insert
OLAP
Best Effort Queries
(Double Counting)
Accurate Queries
Real-Time
Insert + Updates +
Deletes
Support Strong Data Consistency
Support Strong Data Consistency
UPSERT data ingestions
Real-Time
Stream
K=1, V=5
K=1 ,V=1
S3 S4
S1 S2
Key Index Insert: (K=1, S=S1, Row=1)
Key Index Update: (K=1, S=S1, Row=1)
K=3, V=4
K=2, V=3
K=2, V=2 Key Index Insert: (K=2, S=S2, Row=2)
Key Index Update: (K=2, S=S3, Row=2)
Key Index Insert: (K=3, S=S4, Row=1)
OLAP
OLAP Queries
(Aggregations, Order By, Group
By, UDFs, …)
Full SQL Semantics !
(OLAP Queries + Joins + Nested
Queries + Window functions + …)
OLAP
Towards full-SQL semantics
Towards full-SQL semantics
Local Storage
Compute
OLAP Data Nodes
Cloud Storage
$$$$
$$
TBs - PBs
of Data
Local Storage
Compute
OLAP Data Nodes
$
GBs - TBs
of Data
Compute & Storage Decoupling
Pinot Servers
Pinot Brokers
Pinot Segments
Compute & Storage Decoupling
Pinot Servers
Cloud Storage
Pinot Brokers
Pinot Segments
Compute & Storage Decoupling
Summary
● Real-time analytics has evolved to power more user-facing,
actionable applications.
● Apache Pinot is our solution for real-time analytics.
Excellent Query Performance
Handles real-time data mutation
Ingest from Streaming, Batch, SQL sources
Semi-Structured Data support
Hybrid architecture for cost efficiency
One platform for building all kinds of use cases
Takeaway
Thank you
Questions?

More Related Content

PDF
Cost-based query optimization in Apache Hive 0.14
PDF
Spark + AI Summit 2020 イベント概要
PPTX
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
PPTX
An Architect's guide to real time big data systems
PDF
Simplify and Scale Data Engineering Pipelines with Delta Lake
PDF
Making Apache Spark Better with Delta Lake
PDF
SnappyData at Spark Summit 2017
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
Cost-based query optimization in Apache Hive 0.14
Spark + AI Summit 2020 イベント概要
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
An Architect's guide to real time big data systems
Simplify and Scale Data Engineering Pipelines with Delta Lake
Making Apache Spark Better with Delta Lake
SnappyData at Spark Summit 2017
SnappyData, the Spark Database. A unified cluster for streaming, transactions...

Similar to Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot (20)

PPTX
Apache Kylin @ Big Data Europe 2015
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
PDF
Microsoft Power BI and Cortana Analytics user group meetings with Alteryx
PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
PDF
Blueflood: Open Source Metrics Processing at CassandraEU 2013
PDF
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
PDF
Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust
PDF
near real time search in e-commerce
PDF
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
PPTX
How Totango uses Apache Spark
PDF
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
PPTX
O'Reilly Media Webcast: Building Real-Time Data Pipelines
PDF
Apache CarbonData+Spark to realize data convergence and Unified high performa...
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PDF
Big Data Analytics Platforms by KTH and RISE SICS
PPTX
ELK Solutions Enablement Session - 17th March'2020
PPTX
Ho-Ho-Hold onto Your Hats! Real-Time Data Magic from Santa’s Sleigh with Azur...
PPTX
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
PDF
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
PPTX
How to choose between SharePoint lists, SQL Azure, Microsoft Dataverse with D...
Apache Kylin @ Big Data Europe 2015
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Microsoft Power BI and Cortana Analytics user group meetings with Alteryx
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust
near real time search in e-commerce
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
How Totango uses Apache Spark
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
O'Reilly Media Webcast: Building Real-Time Data Pipelines
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Metadata and Provenance for ML Pipelines with Hopsworks
Big Data Analytics Platforms by KTH and RISE SICS
ELK Solutions Enablement Session - 17th March'2020
Ho-Ho-Hold onto Your Hats! Real-Time Data Magic from Santa’s Sleigh with Azur...
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
How to choose between SharePoint lists, SQL Azure, Microsoft Dataverse with D...
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Ad

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
L1 - Introduction to python Backend.pptx
PPT
Introduction Database Management System for Course Database
PDF
medical staffing services at VALiNTRY
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
System and Network Administration Chapter 2
2025 Textile ERP Trends: SAP, Odoo & Oracle
wealthsignaloriginal-com-DS-text-... (1).pdf
Softaken Excel to vCard Converter Software.pdf
Digital Strategies for Manufacturing Companies
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Design an Analysis of Algorithms I-SECS-1021-03
How to Migrate SBCGlobal Email to Yahoo Easily
Design an Analysis of Algorithms II-SECS-1021-03
Computer Software and OS of computer science of grade 11.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Which alternative to Crystal Reports is best for small or large businesses.pdf
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Navsoft: AI-Powered Business Solutions & Custom Software Development
L1 - Introduction to python Backend.pptx
Introduction Database Management System for Course Database
medical staffing services at VALiNTRY
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Designing Intelligence for the Shop Floor.pdf
System and Network Administration Chapter 2

Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot