SlideShare a Scribd company logo
Fast Data Processing with RFX
Simplify Fast Data Processing
tantrieuf31@gmail.com
http://www.rfxlab.com
The Big Picture
Demo first
Content at glance
1. BEAM✲ methodology for agile data warehouse
2. Introduction to Fast Data
3. Problem “Fast Data in web analytics”
4. Examples for fast data design pattern (RFX or Reactive Function X)
4.1. Event data actor
4.2. Event data agent
4.3. Event data collector
4.4. Event data router
4.5. Event data processor
4.6. Event data storage
4.7. Event data query
4.8. Event data reactor
5. Demo “Fast Data in web analytics” with source code explanation
1 - BEAM✲ methodology
1 - BEAM✲ methodology for Agile Data Warehouse
BEAM✲ stands for Business Event Analysis &
Modelling, and it’s a methodology for gathering business
requirements for Agile Data Warehouses and building
those warehouses.
It was developed by Lawrence Corr (@LawrenceCorr) and
Jim Stagnitto (@JimStag), and published in their book Agile
Data Warehouse Design: Collaborative Dimensional
Modeling, from Whiteboard to Star Schema.
Example with BEAM✲
Goal: Modeling all business events and put into a database
in agile way
2 - Fast Data
Introduction to Fast Data
Fast Data processing with RFX
3 - Problems in Practice
Problems
“Fast Data in web analytics”
1. Counting pageview of website
2. Counting unique user of website
3. Sending email when pageview is unnormal (simple DDOS
attack detection)
4 - Thinking with RFX
● A design pattern to solve big fast data problems
● A collection of Open Source Tools
● The mission of RFX
1. Build data product quickly with design patterns
2. Apply BEAM✲ for agile data pipeline
3. React to critical events in near-real-time
What is RFX or Reactive Function X ?
Philosophy of RFX
How to solve problems with RFX ?
“Fast Data in web analytics”
1. Counting pageview of website
2. Counting unique user of website
3. Sending email when pageview is unnormal (simple
DDOS attack detection)
Apply RFX into Pageview Analytics
1.1. Event data actor: a web user
1.2. Event data agent: RFX-track-js
1.3. Event data collector: RFX-track-server
1.4. Event data router: Apache Kafka
1.5. Event data processor: RFX-stream
1.6. Event data storage: Redis, MySQL
1.7. Event data query: RFX-data-api
1.8. Event data reactor: RFX-reactor
Demo and Explanation for code and concepts
https://github.com/rfxlab/pageview-analytics-with-rfx
Readings
● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf
● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798
● Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc
● https://www.tutorialspoint.com/apache_kafka/
● https://kafka.apache.org/quickstart
● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/
● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr
ocessing-with-apache-hadoop/
● https://www.oreilly.com/ideas/drivetrain-approach-data-products

More Related Content

PDF
Slide 3 Fast Data processing with kafka, rfx and redis
PDF
From Data Analytics to Fast Data Intelligence
PDF
RFX - Full-Stack Technology for Real-time Big Data
PDF
Tracking data lineage at Stitch Fix
PDF
Automatic Detection of Web Trackers by Vasia Kalavri
PPTX
Flink Case Study: Capital One
PDF
Lambda Architecture 2.0 for Reactive AB Testing
PPTX
Flink Case Study: OKKAM
Slide 3 Fast Data processing with kafka, rfx and redis
From Data Analytics to Fast Data Intelligence
RFX - Full-Stack Technology for Real-time Big Data
Tracking data lineage at Stitch Fix
Automatic Detection of Web Trackers by Vasia Kalavri
Flink Case Study: Capital One
Lambda Architecture 2.0 for Reactive AB Testing
Flink Case Study: OKKAM

What's hot (20)

PDF
Big Data Meets Learning Science: Keynote by Al Essa
PDF
Amundsen at Brex and Looker integration
PDF
Slides PAPIs.io'14 RapidMiner
PPTX
Implementing BigPetStore with Apache Flink
PDF
ML Production Pipelines: A Classification Model
ODP
Cloud Computing ...changes everything
PDF
Introduction to RapidMiner Studio V7
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
PDF
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
PDF
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
PDF
From discovering to trusting data
PDF
Micro-Servicing Linked Data
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
PDF
Airflow at lyft for Airflow summit 2020 conference
PDF
Code Once Use Often with Declarative Data Pipelines
PDF
Misusing MLflow To Help Deduplicate Data At Scale
PDF
2017-01-08-scaling tribalknowledge
PDF
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
PDF
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
PDF
Time travel and time series analysis with pandas + statsmodels
Big Data Meets Learning Science: Keynote by Al Essa
Amundsen at Brex and Looker integration
Slides PAPIs.io'14 RapidMiner
Implementing BigPetStore with Apache Flink
ML Production Pipelines: A Classification Model
Cloud Computing ...changes everything
Introduction to RapidMiner Studio V7
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
From discovering to trusting data
Micro-Servicing Linked Data
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Airflow at lyft for Airflow summit 2020 conference
Code Once Use Often with Declarative Data Pipelines
Misusing MLflow To Help Deduplicate Data At Scale
2017-01-08-scaling tribalknowledge
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Time travel and time series analysis with pandas + statsmodels
Ad

Viewers also liked (20)

PDF
How to build a data driven business in big data age
PDF
Does Current Advertising Cause Future Sales?
PDF
Slide 2 collecting, storing and analyzing big data
PDF
Reactive Data System in Practice
PDF
Where is my next jobs in the age of Big Data and Automation
PDF
2016 Data Science Salary Survey
PDF
Experience economy
PDF
Introduction to Human Data Theory for Digital Economy
PDF
TỔNG QUAN VỀ DỮ LIỆU LỚN (BIGDATA)
PDF
MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient...
PDF
Netty Cookbook - Chapter 1
PDF
Netty Cookbook - Chapter 2
PPTX
Event Hub & Azure Stream Analytics
PDF
praktikum_solidarnost_Ivaylo Radev
PPTX
Art nouveau & de st ijl
PDF
zeugnis-zsb
PDF
Building a useful target architecture - Myth or reality2
PDF
Consciousness as a Limitation
PDF
Crossroads of Asynchrony and Graceful Degradation
PDF
Microservices
How to build a data driven business in big data age
Does Current Advertising Cause Future Sales?
Slide 2 collecting, storing and analyzing big data
Reactive Data System in Practice
Where is my next jobs in the age of Big Data and Automation
2016 Data Science Salary Survey
Experience economy
Introduction to Human Data Theory for Digital Economy
TỔNG QUAN VỀ DỮ LIỆU LỚN (BIGDATA)
MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient...
Netty Cookbook - Chapter 1
Netty Cookbook - Chapter 2
Event Hub & Azure Stream Analytics
praktikum_solidarnost_Ivaylo Radev
Art nouveau & de st ijl
zeugnis-zsb
Building a useful target architecture - Myth or reality2
Consciousness as a Limitation
Crossroads of Asynchrony and Graceful Degradation
Microservices
Ad

Similar to Fast Data processing with RFX (20)

PDF
Building Reactive Real-time Data Pipeline
PDF
ALT-F1.BE : The Accelerator (Google Cloud Platform)
PPTX
data-mesh-101.pptx
PDF
From an experiment to a real production environment
PPTX
aip_developer_overview_icar_2014
PPTX
Analysis of Major Trends in Big Data Analytics
PPTX
Analysis of Major Trends in Big Data Analytics
PPTX
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
PDF
Extending Analytic Reach
PDF
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
PDF
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
PDF
(20.05.2009) Cumuy Presenta - Más tecnologías interesantes para conocer - PPT 2
PDF
Pivotal Real Time Data Stream Analytics
PDF
An elastic batch-and stream-processing stack with Pravega and Apache Flink
PPTX
How to scraping content from web for location-based mobile app.
PDF
Sabrina Kirstein @ RapidMiner
PDF
Scraper API To Acquire Real-Time Data Using Python.pdf
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
PDF
Data Discovery and Metadata
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Building Reactive Real-time Data Pipeline
ALT-F1.BE : The Accelerator (Google Cloud Platform)
data-mesh-101.pptx
From an experiment to a real production environment
aip_developer_overview_icar_2014
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Extending Analytic Reach
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
(20.05.2009) Cumuy Presenta - Más tecnologías interesantes para conocer - PPT 2
Pivotal Real Time Data Stream Analytics
An elastic batch-and stream-processing stack with Pravega and Apache Flink
How to scraping content from web for location-based mobile app.
Sabrina Kirstein @ RapidMiner
Scraper API To Acquire Real-Time Data Using Python.pdf
Jeremy cabral search marketing summit - scraping data-driven content (1)
Data Discovery and Metadata
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent

More from Trieu Nguyen (20)

PDF
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
PDF
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
PDF
Building Your Customer Data Platform with LEO CDP
PDF
How to track and improve Customer Experience with LEO CDP
PDF
[Notes] Customer 360 Analytics with LEO CDP
PDF
Leo CDP - Pitch Deck
PDF
LEO CDP - What's new in 2022
PDF
Lộ trình triển khai LEO CDP cho ngành bất động sản
PDF
Why is LEO CDP important for digital business ?
PDF
From Dataism to Customer Data Platform
PDF
Data collection, processing & organization with USPA framework
PDF
Part 1: Introduction to digital marketing technology
PDF
Why is Customer Data Platform (CDP) ?
PDF
How to build a Personalized News Recommendation Platform
PDF
How to grow your business in the age of digital marketing 4.0
PDF
Video Ecosystem and some ideas about video big data
PDF
Concepts, use cases and principles to build big data systems (1)
PDF
Open OTT - Video Content Platform
PDF
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
PDF
Introduction to Recommendation Systems (Vietnam Web Submit)
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP
How to track and improve Customer Experience with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
Leo CDP - Pitch Deck
LEO CDP - What's new in 2022
Lộ trình triển khai LEO CDP cho ngành bất động sản
Why is LEO CDP important for digital business ?
From Dataism to Customer Data Platform
Data collection, processing & organization with USPA framework
Part 1: Introduction to digital marketing technology
Why is Customer Data Platform (CDP) ?
How to build a Personalized News Recommendation Platform
How to grow your business in the age of digital marketing 4.0
Video Ecosystem and some ideas about video big data
Concepts, use cases and principles to build big data systems (1)
Open OTT - Video Content Platform
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Introduction to Recommendation Systems (Vietnam Web Submit)

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Global journeys: estimating international migration
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
Global journeys: estimating international migration
Database Infoormation System (DBIS).pptx
Supervised vs unsupervised machine learning algorithms
IBA_Chapter_11_Slides_Final_Accessible.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
IB Computer Science - Internal Assessment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Quality review (1)_presentation of this 21
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
oil_refinery_comprehensive_20250804084928 (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Mega Projects Data Mega Projects Data

Fast Data processing with RFX

  • 1. Fast Data Processing with RFX Simplify Fast Data Processing tantrieuf31@gmail.com http://www.rfxlab.com
  • 4. Content at glance 1. BEAM✲ methodology for agile data warehouse 2. Introduction to Fast Data 3. Problem “Fast Data in web analytics” 4. Examples for fast data design pattern (RFX or Reactive Function X) 4.1. Event data actor 4.2. Event data agent 4.3. Event data collector 4.4. Event data router 4.5. Event data processor 4.6. Event data storage 4.7. Event data query 4.8. Event data reactor 5. Demo “Fast Data in web analytics” with source code explanation
  • 5. 1 - BEAM✲ methodology
  • 6. 1 - BEAM✲ methodology for Agile Data Warehouse BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses. It was developed by Lawrence Corr (@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema.
  • 8. Goal: Modeling all business events and put into a database in agile way
  • 9. 2 - Fast Data
  • 12. 3 - Problems in Practice
  • 13. Problems “Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
  • 14. 4 - Thinking with RFX
  • 15. ● A design pattern to solve big fast data problems ● A collection of Open Source Tools ● The mission of RFX 1. Build data product quickly with design patterns 2. Apply BEAM✲ for agile data pipeline 3. React to critical events in near-real-time What is RFX or Reactive Function X ?
  • 17. How to solve problems with RFX ?
  • 18. “Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
  • 19. Apply RFX into Pageview Analytics 1.1. Event data actor: a web user 1.2. Event data agent: RFX-track-js 1.3. Event data collector: RFX-track-server 1.4. Event data router: Apache Kafka 1.5. Event data processor: RFX-stream 1.6. Event data storage: Redis, MySQL 1.7. Event data query: RFX-data-api 1.8. Event data reactor: RFX-reactor
  • 20. Demo and Explanation for code and concepts
  • 22. Readings ● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf ● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798 ● Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc ● https://www.tutorialspoint.com/apache_kafka/ ● https://kafka.apache.org/quickstart ● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/ ● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr ocessing-with-apache-hadoop/ ● https://www.oreilly.com/ideas/drivetrain-approach-data-products