SlideShare a Scribd company logo
Using Clickhouse Database for GrowthRx,
an analytics and customer engagement platform
Prafulla Gupta
Principal Architect
Times Internet
//01
// 2
News
Live
Events
Brand
Capital
Magazines
Outdoor
Radio
Print
Music
TV
Film
Education
Internet
Established in 1838, The
Times Group is India’s largest
media conglomerate. Its
flagship newspaper, The
Times of India, is the most
read English newspaper in the
world.
609 MN
Monthly Active Users
Entertainment
Marketplace Fintech
133 MN
Daily Active Users
54 BN
Monthly Page Views
India’s leading digital consumer company
//01
// 3
What’s GrowthRx and its features?
C
u
s
t
o
m
e
r
A
n
a
l
y
t
i
c
s
a
n
d
E
n
g
a
g
e
m
e
n
t
T
o
o
l
U
s
e
d
a
c
r
o
s
s
m
u
l
t
i
p
l
e
p
r
o
d
u
c
t
s
a
t
s
c
a
l
e
Helps convert, retain and engage more users
through self serve analytics
GrowthRx
Project Features
Can ingest 40-50k records per sec
Low
cash-
burn on
Infra
Analyze user trends & generate
queries on > 5 billion data
Push notifications powered by a
segmentation engine that can target
10-15M users/segment
View & Analyze multiple charts
in Dashboard
//01
// 4
Overall architecture
● Ingests events data from
multiple websites and
apps.
● Create real time user
segments and target
them via multiple
mediums
● View and analyse
campaign reports.
● Create funnel, cohorts,
pivots etc
● Create dashboards for
easy trend analysis.
//01
// 5
Overall architecture - Data ingest
● Receives data at the rate
of more than 30-40K
Req/Sec.
● More than 1 Trillion
data points are collected
per month.
● Achieved this on just 3
VM Clickhouse replica
setup with 10TB SSD on
each machine.
● Clickhouse efficient
compression results in
very good reduction in
size of data at disk.
//01
// 6
Overall architecture - User segmentation
For user segmentation we have queries with
multiple AND/OR condition on Events table
along with Join on User Profile Table
● Eg: Subscribed Users read Tech News
>5 Times in 2 days and not visited
Sports section
Used Bitmap functions provided by
Clickhouse
Query processing speed - 200 million
rows/sec
//01
// 7
Overall architecture- Dashboard and Trend Analysis
● Multiple charts on
a single screen
● Async execution to
throttle number of
queries executing
parallely.
● Sampling feature to
make queries run
faster on larger
dataset
//01
// 8
01 02 03
ClickHouse Key Features
MATERIALIZED VIEWS
SAMPLE CLAUSE
PARAMETRIC AGGREGATE
FUNCTION
//01
// 9
ClickHouse Key Features - MATERIALISED VIEWS
● Generate summarised view of raw data
● Multiple Mat Views
● Minimal Space & Max duration storage of
data
● Easy to generate campaign reports & power
user curve
CREATE MATERIALIZED VIEW campaign_user_mv (
`tenantId` LowCardinality(String),
`campaignId` UUID,
`date` Date,
`count` UInt64
)
ENGINE=ReplicatedSummingMergeTree('/{cluster}
/{shard}/campaign_user_mv','{replica}')
ORDER BY (tenantId, campaignId, date)
AS SELECT tenantId, campaignId,
toDate(timestamp) AS date, count(*) as count
FROM growthrx.campaign GROUP BY tenantId,
campaignId, date
//01
// 10
ClickHouse Key Features - SAMPLE CLAUSE
● SAMPLE clause in Clickhouse query proves
very efficient for approximate SELECT query
processing
● For large dataset, results using SAMPLE
clause are near to accurate but query
processing time improves drastically.
● Business trends where approximate results
are good enough is an ideal use case.
● Select Sampling Factor based on the
expected dataset size.
SELECT eventname,
round(count(*) * 10) AS
count
FROM <table>
SAMPLE 0.1
WHERE tenantid = ‘x’
and ….
GROUP BY eventname
//01
// 11
ClickHouse Key Features - PARAMETRIC AGGREGATE
FUNCTION
● Fast and easy to use.
● Window Funnel function
provided by Clickhouse
1. Funnel analysis
2. Creating segments for
targeting users who are
part of a funnel section
e.g. send email to users
who are part of
Responded section.
SELECT
funnel_level,count() AS c
FROM (SELECT uidint,
windowFunnel(2678400)(grx_
timestamp, eventname =
‘lead’, eventname =
'prospect',....) AS
funnel_level FROM <table>
WHERE (....)
Thank You!

More Related Content

PPTX
Elastic stack Presentation
PDF
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
PPTX
WebRTC multitrack / multistream
PDF
Your first ClickHouse data warehouse
PDF
서버 성능에 대한 정의와 이해
PPTX
Airflow를 이용한 데이터 Workflow 관리
PDF
Massive service basic
PDF
객체지향적인 도메인 레이어 구축하기
Elastic stack Presentation
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
WebRTC multitrack / multistream
Your first ClickHouse data warehouse
서버 성능에 대한 정의와 이해
Airflow를 이용한 데이터 Workflow 관리
Massive service basic
객체지향적인 도메인 레이어 구축하기

What's hot (20)

PPTX
로그 기깔나게 잘 디자인하는 법
PPTX
Presto query optimizer: pursuit of performance
PDF
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
PDF
[Meetup] a successful migration from elastic search to clickhouse
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PDF
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
PPTX
Elastic - ELK, Logstash & Kibana
PDF
Machine Learning at Scale with MLflow and Apache Spark
PDF
AWS DirectConnect 구성 가이드 (김용우) - 파트너 웨비나 시리즈
PPTX
Unreal_GameAbilitySystem.pptx
PPTX
Elastic Stack Introduction
PDF
Random Thoughts on Paper Implementations [KAIST 2018]
PPTX
Gruntwork Executive Summary
PDF
중앙 서버 없는 게임 로직
PDF
Hadoop and Kerberos
PDF
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
PDF
Microservices-DDD-Telosys-Devoxx-FR-2022
PPTX
PPTX
엘라스틱 서치 세미나
PDF
Apache Calcite (a tutorial given at BOSS '21)
로그 기깔나게 잘 디자인하는 법
Presto query optimizer: pursuit of performance
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
[Meetup] a successful migration from elastic search to clickhouse
Stephan Ewen - Experiences running Flink at Very Large Scale
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
Elastic - ELK, Logstash & Kibana
Machine Learning at Scale with MLflow and Apache Spark
AWS DirectConnect 구성 가이드 (김용우) - 파트너 웨비나 시리즈
Unreal_GameAbilitySystem.pptx
Elastic Stack Introduction
Random Thoughts on Paper Implementations [KAIST 2018]
Gruntwork Executive Summary
중앙 서버 없는 게임 로직
Hadoop and Kerberos
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Microservices-DDD-Telosys-Devoxx-FR-2022
엘라스틱 서치 세미나
Apache Calcite (a tutorial given at BOSS '21)
Ad

Similar to OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Engagement Platform - Prafulla Gupta.pdf (20)

PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
10 Good Reasons to Use ClickHouse
PDF
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
PDF
Building Real-Time Analytics Infrastructure on ClickHouse with ChistaDATA
PDF
Clickstream Data Warehouse - Turning clicks into customers
PDF
How is Real-Time Analytics Different from Traditional OLAP?
PDF
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
PDF
Adventures in Observability - Clickhouse and Instana
PPTX
Web Analytics: Challenges in Data Modeling
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
PDF
Sfdc user group good data012712(1)
PPTX
Building a Marketing Data Warehouse in Google BigQuery with Supermetrics
PDF
Technically Speaking: How Self-Service Analytics Fosters Collaboration
PDF
Business Data Analytics Powerpoint Presentation Slides
PPTX
Taming the Data Lake with Scalable Metrics Model Framework
PPTX
a2c Boston Big Data Meet-up: Agile Data Warehouse Design
 
PDF
Big Data Analytics PowerPoint Presentation Slides
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
10 Good Reasons to Use ClickHouse
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
Building Real-Time Analytics Infrastructure on ClickHouse with ChistaDATA
Clickstream Data Warehouse - Turning clicks into customers
How is Real-Time Analytics Different from Traditional OLAP?
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability - Clickhouse and Instana
Web Analytics: Challenges in Data Modeling
Creating Beautiful Dashboards with Grafana and ClickHouse
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Sfdc user group good data012712(1)
Building a Marketing Data Warehouse in Google BigQuery with Supermetrics
Technically Speaking: How Self-Service Analytics Fosters Collaboration
Business Data Analytics Powerpoint Presentation Slides
Taming the Data Lake with Scalable Metrics Model Framework
a2c Boston Big Data Meet-up: Agile Data Warehouse Design
 
Big Data Analytics PowerPoint Presentation Slides
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Global journeys: estimating international migration
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Knowledge Engineering Part 1
Database Infoormation System (DBIS).pptx
Moving the Public Sector (Government) to a Digital Adoption
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Global journeys: estimating international migration
Business Acumen Training GuidePresentation.pptx
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Supervised vs unsupervised machine learning algorithms
IBA_Chapter_11_Slides_Final_Accessible.pptx
Launch Your Data Science Career in Kochi – 2025
Data_Analytics_and_PowerBI_Presentation.pptx
1_Introduction to advance data techniques.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Engagement Platform - Prafulla Gupta.pdf

  • 1. Using Clickhouse Database for GrowthRx, an analytics and customer engagement platform Prafulla Gupta Principal Architect Times Internet
  • 2. //01 // 2 News Live Events Brand Capital Magazines Outdoor Radio Print Music TV Film Education Internet Established in 1838, The Times Group is India’s largest media conglomerate. Its flagship newspaper, The Times of India, is the most read English newspaper in the world. 609 MN Monthly Active Users Entertainment Marketplace Fintech 133 MN Daily Active Users 54 BN Monthly Page Views India’s leading digital consumer company
  • 3. //01 // 3 What’s GrowthRx and its features? C u s t o m e r A n a l y t i c s a n d E n g a g e m e n t T o o l U s e d a c r o s s m u l t i p l e p r o d u c t s a t s c a l e Helps convert, retain and engage more users through self serve analytics GrowthRx Project Features Can ingest 40-50k records per sec Low cash- burn on Infra Analyze user trends & generate queries on > 5 billion data Push notifications powered by a segmentation engine that can target 10-15M users/segment View & Analyze multiple charts in Dashboard
  • 4. //01 // 4 Overall architecture ● Ingests events data from multiple websites and apps. ● Create real time user segments and target them via multiple mediums ● View and analyse campaign reports. ● Create funnel, cohorts, pivots etc ● Create dashboards for easy trend analysis.
  • 5. //01 // 5 Overall architecture - Data ingest ● Receives data at the rate of more than 30-40K Req/Sec. ● More than 1 Trillion data points are collected per month. ● Achieved this on just 3 VM Clickhouse replica setup with 10TB SSD on each machine. ● Clickhouse efficient compression results in very good reduction in size of data at disk.
  • 6. //01 // 6 Overall architecture - User segmentation For user segmentation we have queries with multiple AND/OR condition on Events table along with Join on User Profile Table ● Eg: Subscribed Users read Tech News >5 Times in 2 days and not visited Sports section Used Bitmap functions provided by Clickhouse Query processing speed - 200 million rows/sec
  • 7. //01 // 7 Overall architecture- Dashboard and Trend Analysis ● Multiple charts on a single screen ● Async execution to throttle number of queries executing parallely. ● Sampling feature to make queries run faster on larger dataset
  • 8. //01 // 8 01 02 03 ClickHouse Key Features MATERIALIZED VIEWS SAMPLE CLAUSE PARAMETRIC AGGREGATE FUNCTION
  • 9. //01 // 9 ClickHouse Key Features - MATERIALISED VIEWS ● Generate summarised view of raw data ● Multiple Mat Views ● Minimal Space & Max duration storage of data ● Easy to generate campaign reports & power user curve CREATE MATERIALIZED VIEW campaign_user_mv ( `tenantId` LowCardinality(String), `campaignId` UUID, `date` Date, `count` UInt64 ) ENGINE=ReplicatedSummingMergeTree('/{cluster} /{shard}/campaign_user_mv','{replica}') ORDER BY (tenantId, campaignId, date) AS SELECT tenantId, campaignId, toDate(timestamp) AS date, count(*) as count FROM growthrx.campaign GROUP BY tenantId, campaignId, date
  • 10. //01 // 10 ClickHouse Key Features - SAMPLE CLAUSE ● SAMPLE clause in Clickhouse query proves very efficient for approximate SELECT query processing ● For large dataset, results using SAMPLE clause are near to accurate but query processing time improves drastically. ● Business trends where approximate results are good enough is an ideal use case. ● Select Sampling Factor based on the expected dataset size. SELECT eventname, round(count(*) * 10) AS count FROM <table> SAMPLE 0.1 WHERE tenantid = ‘x’ and …. GROUP BY eventname
  • 11. //01 // 11 ClickHouse Key Features - PARAMETRIC AGGREGATE FUNCTION ● Fast and easy to use. ● Window Funnel function provided by Clickhouse 1. Funnel analysis 2. Creating segments for targeting users who are part of a funnel section e.g. send email to users who are part of Responded section. SELECT funnel_level,count() AS c FROM (SELECT uidint, windowFunnel(2678400)(grx_ timestamp, eventname = ‘lead’, eventname = 'prospect',....) AS funnel_level FROM <table> WHERE (....)