SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
SF TechWeek : Data & AI Edition
Ken Chen, Co-founder & Chief Architect
Oct 8, 2024
Proton - one single binary to tackle
streaming and historical analytics
Agenda
• Two Quick Demos
• What kind of problems Timeplus Proton resolves
• Architecture
• Storage
• Query Processing
• Python Extension
• Integrate with ML / AI Use Cases Study
• QA
Demo 1 : Query on Kafka
Kafka Topic
(Source)
Proton Materialized View
(Streaming ETL +
Window Aggregation)
Kafka Topic
(Target)
Data Gen
{
“devicename”:
“region”:
“city”:
“version”:
“lat”:
“lon”:
“battery”:
“humidity”:
“temperature”:
“hydraulic_pressure”:
“atmospheric_pressure”:
“timestamp”:
}
Demo 2 : Ingest, Materialize, Serving (All in one binary)
Proton Source
Stream
Proton Materialized View
(Window Aggregation)
Proton Target
Stream
Data Gen
What’s Timeplus Proton
• Core problems to solve :
“Query, detect and act on fast-changing stream data in an
incremental way.”
• Key Tech Highlights
• Unifies streaming processing and historical analytics via SQL
• Dual data stores to handle data persistence
• Materialized incremental computation engine
• Everything is in single binary, light-weight and efficient , run from edge
to cloud, from single instance to distributed cluster
CONFIDENTIAL
Proton Architecture
Timeplus Proton
(Edge / Cloud)
Incremental ML / AI
CONFIDENTIAL
Proton Architecture - Zoom In
Unified Incremental SQL Processing Engine (Vectorized / JIT)
NativeLog
(Multi-Raft Replicated)
Query processing super power
(Applied state of the art
vectorized / JIT capabilities)
V8 Engine
(JavaScript)
CPython
Engine
(Cloud) Historical Store
NativeLog - Streaming Store
Optimized for concurrent ingest / latency Optimized for bulk scan / throughput
SQL via TCP, JSON Rest via HTTP
Ingest Query
API Layer
CONFIDENTIAL
Streaming Storage – NativeLog (Zoom In)
9
Replicated
LogManager
MetaStore
(Replicated)
NativeLog
Server
Loglet
Loglet
Segment sequence
index
atime
index
etime
index
Client
ReplicatedLog
Segment
…
CONFIDENTIAL
Historical Storage - LSM-like (Zoom In)
Level 0 Level 0 Level 0 … Level 0
write `part`
Level 1 Background (Multi-level) Merge
primary.idx skipping..idx
data..bin data..mrk2
Metadata files
…
Big chunk of columnarized columns
Query - One SQL to Rule All Analytics
Powerful windowing, aggregation and joining etc for streaming and historical query
SUBSCRIBE TO
SELECT device_name, avg(temperature), predict(temperature)
FROM tumble(devices, 5s)
INNER JOIN table(device_products)
ON devices.product_id = device_products.id
GROUP BY device_name, window_end
[SETTINGS seek_to = ‘2021-12-02 10:00:00’]
EMIT [AFTER watermark | LAST 30m]
1. Just one declarative SQL query
2. The most
advanced streaming
windowing & global
functions
5. Intelligent watermark control can
handle late events and time skew issues
properly
7. JOIN between
stream and stream, or
stream and table can
drive more real-time
analytics insights
6. LAST X can help user focusing on what’s
happening in recent time window
9. Super fast push-based stateful query through TCP.
No more “Refresh”!
4. Time can be easily
rewinded to any
historical moment for
reprocessing
8. UDA/F for customer functions, e.g. ML
prediction or anomaly detection, customized
aggregations, Complex Event Processing
3. Connect historical
data via table()
CONFIDENTIAL
Extended SQL / Execution Plans
13
StreamSource HistoricalSource
BuildingHashTable
Joining
Transform
Watermark
Transform
WindowAssigment
Transform
TumbleWindowAggr
Transform
Output Project
WITH joined AS
(
SELECT
*
FROM devices
INNER JOIN table(device_inventory) AS
device_inventory ON id = device_inventory.device_id
)
SELECT
max_k(cpu, device_name), window_start
FROM
tumble(joined, 5s)
GROUP BY window_start
CONFIDENTIAL
Query Processing - Historical Backfill
Entry Entry Entry Block Block Block Block
Historical Data
Reader (Backfill)
Streaming Data
Reader (Live)
Concatenated Stream Reader
Last sequence number
(1)
(2)
(3)
Downstream
Pipeline
NativeLog Historical Data Store
● Streaming Lookup
● Streaming Enrichment
● Traveling across historical
store and streaming events
● Correlated windows analytics:
Compare live window with
historical window
sn
SELECT count(), max(cpu), max(memory)
FROM devices WHERE _tp_time > ‘2022-11-12 00:00:00’ GROUP by device_id;
CONFIDENTIAL
Native Python Integration (Python UDF)
CREATE FUNCTION mask_password(values string) RETURN string
LANGUAGE PYTHON AS
$$
import re
for value in values:
value[i]=re.sub(‘password=(?=.*[A-Za-z])(?=.*d)[A-Za-zd]{8,}’, ‘password=***’, v);
return value;
$$;
CONFIDENTIAL
Native Python Integration (Python UDAF)
CREATE AGGREGATE FUNCTION second_max(value double) RETURN double
LANGUAGE PYTHON AS
$$
def initialize():
pass
def process(values):
pass
def finalize():
pass
}
$$;
CONFIDENTIAL
Case Study : Real-time DDos Detection
https://www.timeplus.com/post/real-time-ddos-detection
CONFIDENTIAL
Case Study : Real-time Machine Learning
https://www.timeplus.com/post/real-time-machine-learning
Recap - Timeplus Proton
• Core problems to solve :
“Query, detect and act on fast-changing stream data in an
incremental way.”
• Key Tech Highlights
• Unifies streaming processing and historical analytics via SQL
• Dual data stores to handle data persistence
• Materialized incremental computation engine
• Everything is in single binary, light-weight and efficient , run from edge
to cloud, from single instance to distributed cluster
Thank you!
Timeplus: One single binary to tackle streaming and historical analytics
CONFIDENTIAL
Timeplus Cluster - Single Binary with Multiple-Roles
(Replicated NativeLog, powered by Multi-Raft)
Data Node
shard-1 shard-3
shard-2
Data Node
shard-1 shard-3
shard-2
Data Node
shard-1 shard-3
shard-2
Data Ingestion
Node
Data Query Node
Data Ingestion
Node
Data Ingestion
Node
Data Query Node
Data Query Node
Query Query Query
Ingestion Ingestion Ingestion
Client
replica = 3
Access
Layer
Computing
Layer
Data
Layer
Data
Replication
and parallel
access
High
Availability
Query and
Ingestion and
horizontal
scalability
High
Availability
Access and
Load
Balancing
Metadata
Node
Metadata
Node
Metadata
Node
CONFIDENTIAL
Timeplus Cluster - Query Failover
Data Node
shard-1 shard-3
shard-2
Data Node
shard-1 shard-3
shard-2
Data Node
shard-1 shard-3
shard-2
Data Query Node Data Query Node
Query Query
Client
replica = 3
checkpoint checkpoint checkpoint
Access
Layer
Computing
Layer
Data
Layer
Data
Replication
and parallel
access
High
Availability
Query and
Ingestion and
horizontal
scalability
High
Availability
Access and
Load
Balancing

More Related Content

PDF
iguazio - nuclio overview to CNCF (Sep 25th 2017)
PDF
nuclio Overview October 2017
PDF
Log everything! @DC13
PDF
Flux QL - Nexgen Management of Time Series Inspired by JS
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PPTX
Discovery Day 2019 Sofia - Big data clusters
PDF
Logging for Production Systems in The Container Era
PDF
iguazio - nuclio overview to CNCF (Sep 25th 2017)
nuclio Overview October 2017
Log everything! @DC13
Flux QL - Nexgen Management of Time Series Inspired by JS
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Discovery Day 2019 Sofia - Big data clusters
Logging for Production Systems in The Container Era

Similar to Timeplus: One single binary to tackle streaming and historical analytics (20)

PDF
Real-Time Spark: From Interactive Queries to Streaming
PPTX
Implementing Real-Time IoT Stream Processing in Azure
PDF
A New Chapter of Data Processing with CDK
PPTX
Developing your first application using FIWARE
PDF
Big Data Tools in AWS
PPTX
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
PDF
Logisland "Event Mining at scale"
PDF
Expanding your impact with programmability in the data center
PDF
DSDT Meetup Nov 2017
PDF
Dsdt meetup 2017 11-21
PPTX
Introduction to WSO2 Data Analytics Platform
PDF
Query Your Streaming Data on Kafka using SQL: Why, How, and What
PDF
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
PDF
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
PDF
Immutable Deployments with AWS CloudFormation and AWS Lambda
PPTX
Influx data basic
PPTX
NoSQL Endgame DevoxxUA Conference 2020
PDF
Data Grids with Oracle Coherence
PDF
OSCON 2014 - API Ecosystem with Scala, Scalatra, and Swagger at Netflix
PPT
Cloud State of the Union for Java Developers
Real-Time Spark: From Interactive Queries to Streaming
Implementing Real-Time IoT Stream Processing in Azure
A New Chapter of Data Processing with CDK
Developing your first application using FIWARE
Big Data Tools in AWS
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Logisland "Event Mining at scale"
Expanding your impact with programmability in the data center
DSDT Meetup Nov 2017
Dsdt meetup 2017 11-21
Introduction to WSO2 Data Analytics Platform
Query Your Streaming Data on Kafka using SQL: Why, How, and What
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
Immutable Deployments with AWS CloudFormation and AWS Lambda
Influx data basic
NoSQL Endgame DevoxxUA Conference 2020
Data Grids with Oracle Coherence
OSCON 2014 - API Ecosystem with Scala, Scalatra, and Swagger at Netflix
Cloud State of the Union for Java Developers
Ad

More from Zilliz (20)

PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
PDF
Zilliz Cloud Demo for performance and scale
PDF
Open Source Milvus Vector Database v 2.6
PDF
Zilliz Cloud Monthly Technical Review: May 2025
PDF
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
PDF
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
PDF
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
PDF
Webinar - Zilliz Cloud Monthly Demo - March 2025
PDF
What Makes "Deep Research"? A Dive into AI Agents
PDF
Combining Lexical and Semantic Search with Milvus 2.5
PDF
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
PDF
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
PDF
February Product Demo: Discover the Power of Zilliz Cloud
PDF
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
PDF
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
PDF
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
PDF
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
PDF
1 Table = 1000 Words? Foundation Models for Tabular Data
PDF
How Milvus allows you to run Full Text Search
PDF
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz Cloud Demo for performance and scale
Open Source Milvus Vector Database v 2.6
Zilliz Cloud Monthly Technical Review: May 2025
Smarter RAG Pipelines: Scaling Search with Milvus and Feast
Hands-on Tutorial: Building an Agent to Reason about Private Data with OpenAI...
Agentic AI in Action: Real-Time Vision, Memory & Autonomy with Browser Use & ...
Webinar - Zilliz Cloud Monthly Demo - March 2025
What Makes "Deep Research"? A Dive into AI Agents
Combining Lexical and Semantic Search with Milvus 2.5
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Deploying a Multimodal RAG System Using Open Source Milvus, LlamaIndex, and vLLM
February Product Demo: Discover the Power of Zilliz Cloud
Full Text Search with Milvus 2.5 - UD Meetup Berlin Jan 23
Building the Next-Gen Apps with Multimodal Retrieval using Twelve Labs & Milvus
Voice-to-Value- LLM-Powered Customer Interaction Analysis.pdf
Accelerate AI Agents with Multimodal RAG powered by Friendli Endpoints and Mi...
1 Table = 1000 Words? Foundation Models for Tabular Data
How Milvus allows you to run Full Text Search
How to Optimize Your Embedding Model Selection and Development through TDA Cl...
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectroscopy.pptx food analysis technology
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Review of recent advances in non-invasive hemoglobin estimation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Timeplus: One single binary to tackle streaming and historical analytics

  • 1. SF TechWeek : Data & AI Edition Ken Chen, Co-founder & Chief Architect Oct 8, 2024 Proton - one single binary to tackle streaming and historical analytics
  • 2. Agenda • Two Quick Demos • What kind of problems Timeplus Proton resolves • Architecture • Storage • Query Processing • Python Extension • Integrate with ML / AI Use Cases Study • QA
  • 3. Demo 1 : Query on Kafka Kafka Topic (Source) Proton Materialized View (Streaming ETL + Window Aggregation) Kafka Topic (Target) Data Gen { “devicename”: “region”: “city”: “version”: “lat”: “lon”: “battery”: “humidity”: “temperature”: “hydraulic_pressure”: “atmospheric_pressure”: “timestamp”: }
  • 4. Demo 2 : Ingest, Materialize, Serving (All in one binary) Proton Source Stream Proton Materialized View (Window Aggregation) Proton Target Stream Data Gen
  • 5. What’s Timeplus Proton • Core problems to solve : “Query, detect and act on fast-changing stream data in an incremental way.” • Key Tech Highlights • Unifies streaming processing and historical analytics via SQL • Dual data stores to handle data persistence • Materialized incremental computation engine • Everything is in single binary, light-weight and efficient , run from edge to cloud, from single instance to distributed cluster
  • 7. CONFIDENTIAL Proton Architecture - Zoom In Unified Incremental SQL Processing Engine (Vectorized / JIT) NativeLog (Multi-Raft Replicated) Query processing super power (Applied state of the art vectorized / JIT capabilities) V8 Engine (JavaScript) CPython Engine (Cloud) Historical Store NativeLog - Streaming Store Optimized for concurrent ingest / latency Optimized for bulk scan / throughput SQL via TCP, JSON Rest via HTTP Ingest Query API Layer
  • 8. CONFIDENTIAL Streaming Storage – NativeLog (Zoom In) 9 Replicated LogManager MetaStore (Replicated) NativeLog Server Loglet Loglet Segment sequence index atime index etime index Client ReplicatedLog Segment …
  • 9. CONFIDENTIAL Historical Storage - LSM-like (Zoom In) Level 0 Level 0 Level 0 … Level 0 write `part` Level 1 Background (Multi-level) Merge primary.idx skipping..idx data..bin data..mrk2 Metadata files … Big chunk of columnarized columns
  • 10. Query - One SQL to Rule All Analytics Powerful windowing, aggregation and joining etc for streaming and historical query SUBSCRIBE TO SELECT device_name, avg(temperature), predict(temperature) FROM tumble(devices, 5s) INNER JOIN table(device_products) ON devices.product_id = device_products.id GROUP BY device_name, window_end [SETTINGS seek_to = ‘2021-12-02 10:00:00’] EMIT [AFTER watermark | LAST 30m] 1. Just one declarative SQL query 2. The most advanced streaming windowing & global functions 5. Intelligent watermark control can handle late events and time skew issues properly 7. JOIN between stream and stream, or stream and table can drive more real-time analytics insights 6. LAST X can help user focusing on what’s happening in recent time window 9. Super fast push-based stateful query through TCP. No more “Refresh”! 4. Time can be easily rewinded to any historical moment for reprocessing 8. UDA/F for customer functions, e.g. ML prediction or anomaly detection, customized aggregations, Complex Event Processing 3. Connect historical data via table()
  • 11. CONFIDENTIAL Extended SQL / Execution Plans 13 StreamSource HistoricalSource BuildingHashTable Joining Transform Watermark Transform WindowAssigment Transform TumbleWindowAggr Transform Output Project WITH joined AS ( SELECT * FROM devices INNER JOIN table(device_inventory) AS device_inventory ON id = device_inventory.device_id ) SELECT max_k(cpu, device_name), window_start FROM tumble(joined, 5s) GROUP BY window_start
  • 12. CONFIDENTIAL Query Processing - Historical Backfill Entry Entry Entry Block Block Block Block Historical Data Reader (Backfill) Streaming Data Reader (Live) Concatenated Stream Reader Last sequence number (1) (2) (3) Downstream Pipeline NativeLog Historical Data Store ● Streaming Lookup ● Streaming Enrichment ● Traveling across historical store and streaming events ● Correlated windows analytics: Compare live window with historical window sn SELECT count(), max(cpu), max(memory) FROM devices WHERE _tp_time > ‘2022-11-12 00:00:00’ GROUP by device_id;
  • 13. CONFIDENTIAL Native Python Integration (Python UDF) CREATE FUNCTION mask_password(values string) RETURN string LANGUAGE PYTHON AS $$ import re for value in values: value[i]=re.sub(‘password=(?=.*[A-Za-z])(?=.*d)[A-Za-zd]{8,}’, ‘password=***’, v); return value; $$;
  • 14. CONFIDENTIAL Native Python Integration (Python UDAF) CREATE AGGREGATE FUNCTION second_max(value double) RETURN double LANGUAGE PYTHON AS $$ def initialize(): pass def process(values): pass def finalize(): pass } $$;
  • 15. CONFIDENTIAL Case Study : Real-time DDos Detection https://www.timeplus.com/post/real-time-ddos-detection
  • 16. CONFIDENTIAL Case Study : Real-time Machine Learning https://www.timeplus.com/post/real-time-machine-learning
  • 17. Recap - Timeplus Proton • Core problems to solve : “Query, detect and act on fast-changing stream data in an incremental way.” • Key Tech Highlights • Unifies streaming processing and historical analytics via SQL • Dual data stores to handle data persistence • Materialized incremental computation engine • Everything is in single binary, light-weight and efficient , run from edge to cloud, from single instance to distributed cluster
  • 20. CONFIDENTIAL Timeplus Cluster - Single Binary with Multiple-Roles (Replicated NativeLog, powered by Multi-Raft) Data Node shard-1 shard-3 shard-2 Data Node shard-1 shard-3 shard-2 Data Node shard-1 shard-3 shard-2 Data Ingestion Node Data Query Node Data Ingestion Node Data Ingestion Node Data Query Node Data Query Node Query Query Query Ingestion Ingestion Ingestion Client replica = 3 Access Layer Computing Layer Data Layer Data Replication and parallel access High Availability Query and Ingestion and horizontal scalability High Availability Access and Load Balancing Metadata Node Metadata Node Metadata Node
  • 21. CONFIDENTIAL Timeplus Cluster - Query Failover Data Node shard-1 shard-3 shard-2 Data Node shard-1 shard-3 shard-2 Data Node shard-1 shard-3 shard-2 Data Query Node Data Query Node Query Query Client replica = 3 checkpoint checkpoint checkpoint Access Layer Computing Layer Data Layer Data Replication and parallel access High Availability Query and Ingestion and horizontal scalability High Availability Access and Load Balancing