SlideShare a Scribd company logo
Assisting Millions of User in Real-Time
Big Data Tech Warsaw 2018
2
The Speakers
Who are these guys?
Alexey Brodovshuk
@alexeybrod
Krzysztof Zarzycki
@k_zarzyk
3
About Kcell
Kcell JSC is a part of the largest Scandinavian telecommunications holding – TeliaCompany
Kcell has a strong software
development team and lots
of experience in building
services and products
We like innovations
> 10 000 000 subscribers
Largest GSM operator
in Kazakhstan
4G (40%), 3G (73%), 2G
(96%) population
Great network
coverage
There is the ongoing
process of company digital
transformation
Not only telco
4
Business needs
Assisting Millions of User in Real-Time
SMS events
Voice usage events
Data usage events
Roaming events
Location events
Input Process Actions
5
Use Cases
Use case scenarios. Just few of many.
Case
If subscriber top-ups her balance too often in
short period of time. We can offer her a less
expensive tariff or auto-payment services.
Balance Top Up Case
Trigger
UI
6
Roaming
Fraud
Trigger to Marketing Platform if subscriber
visited X country OR/AND registered in Y
visited mobile network and his device's type
is Z
Roaming case
Send an email to the anti-fraud unit if
subscriber registered in roaming but his
balance at the moment is equal to 0.
This situation is impossible in standard case.
Fraud case in roaming
7
Old System
Why did we start to look for the new solution?
External Vendor
Solution
Blackbox Solution
Scalability issues
Not reliable
1
2
3
Kcell Developers can’t fix, tweak or optimize it
Limited to ~2000 events / sec
Can’t support all needed data sources
Multiple accidents which took too much time to resolve
8
Scale
Required system throughput
160KEvents / second
10MSubscribers
22.15TB / month
9
About GetInData
Big Data. Passion. Experience.
Roots at
Spotify
Focus on
Big Data
from Day 1
Production
Experience
Contributions
to
Apache Flink
10
New Solution
Real-time Stream Processing
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
11
New Solution
Real-time Stream Processing
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
12
Processing Flow
Real-time Stream Processing
raw call events
data usage events
transform
transformed events
transform
transformed events
local state
RocksDB
control topic
Admin UI
HTTP
calls
notification
events
outgestion
ingestion
ingestion
submit/stop
triggers
13
New Solution (Operations)
Web UI, Monitoring, Security
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Admin UI
(Triggers workbench)
Monitoring
ELK stack - logs
InfluxDB/Grafana - metrics
Security
FreeIPA
Kerberos
LDAP/AD
API (kafka based)
14
New Solution (Data Lake)
Data Lake and Sub-second OLAP Analytics
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Data Lake
Historical Storage (HDFS)
Batch (Spark) SQL (Hive)
Keep history, Report, Explore
Column-oriented
Data store
OLAP (Druid)
online
15
Decisions made
Some decisions our team made before or during project implementation
Streaming-first approach
Apache Kafka for event hub
Apache Flink
Powerful Real-Time Analytics
16
Apache Avro
Keep state local to the process
Ingest reference data for local joins and
enrichment
● No need to query external systems
while processing
● Data time correlation correctness
Performance
transformed
events
transformed
events
Subscriber profile data
(events)
Local State
Not at >100K
events / sec
17
Nifi for data ingestion (no coding)
● but not for CEP
Web UI for configuring triggers
Ease of Use
18
Flink on YARN, with HDFS
HA for redundancy and running ~24/7
InfluxDb & Grafana for monitoring & alerting
ELK for logs collection and aggregation
Reliability and battle-tested techniques
Kerberos and AD thanks to FreeIPA
Apache Ranger for authorization
Security
19
One platform for the whole Enterprise
Batch (adhoc) queries too
● Spark, Hive/Presto
Online analytics
● OLAP
Extensiveness
HDP
Open-source technologies
HDP as a licence-free distribution
Just start with a bunch of servers
Cost-Efficiency
20
Before You Start
Words of wisdom
1
Simple sketchy trigger request quickly becomes a complex algorithm2
Know your data sources
● Do NOT assume that your data sources are ready for streaming
3
DO prepare yourself to use open-source
● NiFi is a great framework, but not a comprehend set of processors
● HDP is a great distribution, but versions in it are quickly outdated
4
Start small, Start fast
21
Our Collaboration
Two heads are better than one
Joint development team
Not a vendor solution
Development as one team
Code quality
Code review and
automated tools for
code quality control
Agile Practices
Distant geographic
locations, but
everyday standups
Go live quickly!
<4 months to first
production case
running 24/7!
Deliver
DevOps/Automation
Knowledge sharing
Constant knowledge
exchange in areas of
expertise
Testing
Separate testing
environment
Automated Unit/E2E tests
22
Make it a company-wide,
self-service go-to place for data
analysis
Future Work
We have already done a lot. But more great things are coming.
2018 Q2 2018 Q3 2018 Q4 Bright Future
More Data Sources
More Triggers
Geolocation data
CDRs
Equipment logs
Data Lake
Machine Learning
We plan to include machine
learning and other tools that
would enhance our platform even
more
Real-time BI
Intraday view on business and
operations
Call center, clickstream,
communication… all in one place
ready for behavioral analysis
Customer 360 view
Monetize valuable insights from
our combined rich data sources.
Data Monetization
Predictive maintenance
Network Optimization
To lower operational costs
And make better investments
And many more...
Questions?
Big Data Tech Warsaw 2018
zarz@getindata.com
alexey.brodovshuk@gmail.com
Contact us:

More Related Content

PDF
Fraud Detection with Graphs at the Danish Business Authority
PDF
Real World Data Governance Governing Unstructured Data
PDF
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
PDF
Data Modeling with Neo4j
PDF
Truecaller towards a data-driven company
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PPTX
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
PDF
SQL vs NoSQL, an experiment with MongoDB
Fraud Detection with Graphs at the Danish Business Authority
Real World Data Governance Governing Unstructured Data
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Data Modeling with Neo4j
Truecaller towards a data-driven company
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
SQL vs NoSQL, an experiment with MongoDB

What's hot (20)

PDF
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
PDF
Building an MLOps Stack for Companies at Reasonable Scale
PDF
2022 Trends in Enterprise Analytics
PPTX
An Introduction to NOSQL, Graph Databases and Neo4j
PDF
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
PDF
Web ARChive (WARC) File Format
PDF
SQL Outer Joins for Fun and Profit
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PPT
Business intelligence kpi
PDF
The Chief Data Officer Agenda: Metrics for Information and Data Management
PDF
Integrating Coupa with Your Enterprise
PDF
Supply Chain Twin Demo - Companion Deck
PDF
Workshop - Neo4j Graph Data Science
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PPTX
Easily Identify Sources of Supply Chain Gridlock
PPTX
1. Data Analytics-introduction
PPSX
OLAP OnLine Analytical Processing
PPT
Introduction To Predictive Analytics Part I
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Building an MLOps Stack for Companies at Reasonable Scale
2022 Trends in Enterprise Analytics
An Introduction to NOSQL, Graph Databases and Neo4j
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Web ARChive (WARC) File Format
SQL Outer Joins for Fun and Profit
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Business intelligence kpi
The Chief Data Officer Agenda: Metrics for Information and Data Management
Integrating Coupa with Your Enterprise
Supply Chain Twin Demo - Companion Deck
Workshop - Neo4j Graph Data Science
Scaling your Data Pipelines with Apache Spark on Kubernetes
Easily Identify Sources of Supply Chain Gridlock
1. Data Analytics-introduction
OLAP OnLine Analytical Processing
Introduction To Predictive Analytics Part I
Ad

Similar to Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; Krzysztof Zarzycki, GetInData (20)

PDF
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
PDF
The Zero-ETL Approach: Enhancing Data Agility and Insight
PDF
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
PDF
The Zero-ETL Approach: Enhancing Data Agility and Insight
PDF
Complex event processing platform handling millions of users - Krzysztof Zarz...
PPT
Excellent slides on the new z13s announced on 16th Feb 2016
PDF
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
PDF
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
PDF
Powering Real-Time Decisions with Continuous Data Streams
PDF
The Zero-ETL Approach: Enhancing Data Agility and Insight
PDF
Pivotal Big Data Suite: A Technical Overview
PDF
7_considerations_final
PPT
RTI Data-Distribution Service (DDS) Master Class 2011
PDF
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
PPS
Qo Introduction V2
PDF
Future Network
PPTX
Streaming Data and Stream Processing with Apache Kafka
PPTX
Introducing Events and Stream Processing into Nationwide Building Society
PDF
From an experiment to a real production environment
PDF
Confluent Partner Tech Talk with BearingPoint
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
The Zero-ETL Approach: Enhancing Data Agility and Insight
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
The Zero-ETL Approach: Enhancing Data Agility and Insight
Complex event processing platform handling millions of users - Krzysztof Zarz...
Excellent slides on the new z13s announced on 16th Feb 2016
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Powering Real-Time Decisions with Continuous Data Streams
The Zero-ETL Approach: Enhancing Data Agility and Insight
Pivotal Big Data Suite: A Technical Overview
7_considerations_final
RTI Data-Distribution Service (DDS) Master Class 2011
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Qo Introduction V2
Future Network
Streaming Data and Stream Processing with Apache Kafka
Introducing Events and Stream Processing into Nationwide Building Society
From an experiment to a real production environment
Confluent Partner Tech Talk with BearingPoint
Ad

More from Evention (20)

PDF
The Factorization Machines algorithm for building recommendation system - Paw...
PDF
A/B testing powered by Big data - Saurabh Goyal, Booking.com
PDF
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
PDF
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
PDF
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Privacy by Design - Lars Albertsson, Mapflat
PDF
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
PDF
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
PDF
Enhancing Spark - increase streaming capabilities of your applications - Kami...
PDF
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
PDF
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
PDF
Scaling Cassandra in all directions - Jimmy Mardell Spotify
PDF
Big Data for unstructured data Dariusz Śliwa
PDF
Elastic development. Implementing Big Data search Grzegorz Kołpuć
PDF
H2 o deep water making deep learning accessible to everyone -jo-fai chow
PDF
That won’t fit into RAM - Michał Brzezicki
PDF
Stream Analytics with SQL on Apache Flink - Fabian Hueske
PDF
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
The Factorization Machines algorithm for building recommendation system - Paw...
A/B testing powered by Big data - Saurabh Goyal, Booking.com
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Privacy by Design - Lars Albertsson, Mapflat
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Big Data for unstructured data Dariusz Śliwa
Elastic development. Implementing Big Data search Grzegorz Kołpuć
H2 o deep water making deep learning accessible to everyone -jo-fai chow
That won’t fit into RAM - Michał Brzezicki
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Foundation of Data Science unit number two notes
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Computer network topology notes for revision
PPTX
Global journeys: estimating international migration
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Foundation of Data Science unit number two notes
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Knowledge Engineering Part 1
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Database Infoormation System (DBIS).pptx
Computer network topology notes for revision
Global journeys: estimating international migration
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf

Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; Krzysztof Zarzycki, GetInData

  • 1. Assisting Millions of User in Real-Time Big Data Tech Warsaw 2018
  • 2. 2 The Speakers Who are these guys? Alexey Brodovshuk @alexeybrod Krzysztof Zarzycki @k_zarzyk
  • 3. 3 About Kcell Kcell JSC is a part of the largest Scandinavian telecommunications holding – TeliaCompany Kcell has a strong software development team and lots of experience in building services and products We like innovations > 10 000 000 subscribers Largest GSM operator in Kazakhstan 4G (40%), 3G (73%), 2G (96%) population Great network coverage There is the ongoing process of company digital transformation Not only telco
  • 4. 4 Business needs Assisting Millions of User in Real-Time SMS events Voice usage events Data usage events Roaming events Location events Input Process Actions
  • 5. 5 Use Cases Use case scenarios. Just few of many. Case If subscriber top-ups her balance too often in short period of time. We can offer her a less expensive tariff or auto-payment services. Balance Top Up Case Trigger UI
  • 6. 6 Roaming Fraud Trigger to Marketing Platform if subscriber visited X country OR/AND registered in Y visited mobile network and his device's type is Z Roaming case Send an email to the anti-fraud unit if subscriber registered in roaming but his balance at the moment is equal to 0. This situation is impossible in standard case. Fraud case in roaming
  • 7. 7 Old System Why did we start to look for the new solution? External Vendor Solution Blackbox Solution Scalability issues Not reliable 1 2 3 Kcell Developers can’t fix, tweak or optimize it Limited to ~2000 events / sec Can’t support all needed data sources Multiple accidents which took too much time to resolve
  • 8. 8 Scale Required system throughput 160KEvents / second 10MSubscribers 22.15TB / month
  • 9. 9 About GetInData Big Data. Passion. Experience. Roots at Spotify Focus on Big Data from Day 1 Production Experience Contributions to Apache Flink
  • 10. 10 New Solution Real-time Stream Processing ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ
  • 11. 11 New Solution Real-time Stream Processing flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ
  • 12. 12 Processing Flow Real-time Stream Processing raw call events data usage events transform transformed events transform transformed events local state RocksDB control topic Admin UI HTTP calls notification events outgestion ingestion ingestion submit/stop triggers
  • 13. 13 New Solution (Operations) Web UI, Monitoring, Security flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Admin UI (Triggers workbench) Monitoring ELK stack - logs InfluxDB/Grafana - metrics Security FreeIPA Kerberos LDAP/AD API (kafka based)
  • 14. 14 New Solution (Data Lake) Data Lake and Sub-second OLAP Analytics flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Data Lake Historical Storage (HDFS) Batch (Spark) SQL (Hive) Keep history, Report, Explore Column-oriented Data store OLAP (Druid) online
  • 15. 15 Decisions made Some decisions our team made before or during project implementation Streaming-first approach Apache Kafka for event hub Apache Flink Powerful Real-Time Analytics
  • 16. 16 Apache Avro Keep state local to the process Ingest reference data for local joins and enrichment ● No need to query external systems while processing ● Data time correlation correctness Performance transformed events transformed events Subscriber profile data (events) Local State Not at >100K events / sec
  • 17. 17 Nifi for data ingestion (no coding) ● but not for CEP Web UI for configuring triggers Ease of Use
  • 18. 18 Flink on YARN, with HDFS HA for redundancy and running ~24/7 InfluxDb & Grafana for monitoring & alerting ELK for logs collection and aggregation Reliability and battle-tested techniques Kerberos and AD thanks to FreeIPA Apache Ranger for authorization Security
  • 19. 19 One platform for the whole Enterprise Batch (adhoc) queries too ● Spark, Hive/Presto Online analytics ● OLAP Extensiveness HDP Open-source technologies HDP as a licence-free distribution Just start with a bunch of servers Cost-Efficiency
  • 20. 20 Before You Start Words of wisdom 1 Simple sketchy trigger request quickly becomes a complex algorithm2 Know your data sources ● Do NOT assume that your data sources are ready for streaming 3 DO prepare yourself to use open-source ● NiFi is a great framework, but not a comprehend set of processors ● HDP is a great distribution, but versions in it are quickly outdated 4 Start small, Start fast
  • 21. 21 Our Collaboration Two heads are better than one Joint development team Not a vendor solution Development as one team Code quality Code review and automated tools for code quality control Agile Practices Distant geographic locations, but everyday standups Go live quickly! <4 months to first production case running 24/7! Deliver DevOps/Automation Knowledge sharing Constant knowledge exchange in areas of expertise Testing Separate testing environment Automated Unit/E2E tests
  • 22. 22 Make it a company-wide, self-service go-to place for data analysis Future Work We have already done a lot. But more great things are coming. 2018 Q2 2018 Q3 2018 Q4 Bright Future More Data Sources More Triggers Geolocation data CDRs Equipment logs Data Lake Machine Learning We plan to include machine learning and other tools that would enhance our platform even more Real-time BI Intraday view on business and operations Call center, clickstream, communication… all in one place ready for behavioral analysis Customer 360 view Monetize valuable insights from our combined rich data sources. Data Monetization Predictive maintenance Network Optimization To lower operational costs And make better investments And many more...
  • 23. Questions? Big Data Tech Warsaw 2018 zarz@getindata.com alexey.brodovshuk@gmail.com Contact us: