SlideShare a Scribd company logo
Real-time Big Data at FPT
and some key ideas to build
real-time big data platform
from open source tools
○ Apache Spark
○ Reactive Function X (RFX)
Presented by @tantrieuf31
http://nguyentantrieu.info
about me ?
● Full Stack Engineer and Tech Lead at AdsPlay,
startup project from FPT Telecom
● Founder at RFXLab.com, building RFX
framework and Fast Data Intelligence Platform
for Data-driven Organization
● Tech Blogger at http://engineering.adsplay.net
Abstract
1. Just 5 minutes about the history of “Big Data”
2. Does Big Data solve big problems ?
3. Overview about Open Source Tools
a. Netty (Event Collector)
b. Kafka (Event Queue)
c. RFX-Stream (Event Processor)
d. Apache Spark (Big Data processing engine)
e. RFX-Iris (Fast Data Query Interface)
5 minutes about the history of “Big Data”
Imagine what if you have to build a GREAT pyramid ?
In fact, the Big Data was born in 3000 years ago.
When you have to build a great thing, you would face
with making decisions with lots of data.
How ?
Decisions
without Data ?
Real-time Big Data at FPT (for TechCamp University)
OK, let’s get
back to 2015
What if the business is not driven by data?
Refer: http://www.nytimes.com/2011/04/24/business/24unboxed.html
Since 2015, the Fast Data, a new trend,
has been replacing Big Data
http://www.tibco.com/blog/2015/03/27/how-analytics-
facilitates-fast-data
1970s 1990s 2000s 2010s
Data Management Technology and Trends
● Netty.io
● Apache Storm
● Apache Kafka
● Apache Spark
● RFX
● ...
● Hadoop Ecosystem
● NoSQL Ecosystem
● ...
● Oracle
● MySQL
● PostgreSQL
● ...
“
Does Big Data solve
our big problems ?
tracking all access logs
and user’s activities
Processing in real-
time( seconds) !
Storing multiple types of log (video, web, mobile,
like, comment, play, … )
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)
http://www.rfxlab.com
boosting Sale Revenue / Profit
Log events
Reactive events
How is the Big Data used at FPT ?
Does Vietnamese love football ?
The correlation said YES
Analyzing trending events in real-time !
Visualizing all user’s devices
Real-time Big Data Architecture
“
How to build an “Just-Work” real-
time big data system ?
KEY IDEA is “divide and conquer”
User Story in plain English
1. Hercules is thinking about some questions. E.
g: What’s hot songs of Nhacso on Facebook ?
2. He decides to ask Iris about this question.
3. Iris analyzes the question into “query
messages” and deliver them to Zeus.
4. Zeus uses his power of “large-scale data
processing” to answer the question.
5. Done, Zeus return the result “hot songs on
Facebook” for Iris.
6. She sends the result to Hercules
Visualizing our user story
Question about Big Data:
What’s hot songs of NhacSo.net on
Facebook ?
messages
ZeusIris
Hercules
Real-time Big Data at FPT (for TechCamp University)
Let’s see how it works
Awesome Open Source Projects to follow
RFXLab.com
◎ http://www.rfxlab.com
◎ https://github.com/rfxlab
Kafka : http://kafka.apache.org
Hadoop http://hadoop.apache.org
Apache Spark https://spark.apache.org
Awesome Open Source Projects to follow
Native Kafka driver: https://github.
com/edenhill/librdkafka/
PHP Kafka driver: https://github.
com/EVODelavega/phpkafka
Data Visualization JavaScript Library
https://github.com/nvd3-community/nvd3
Good ref books
"Spend some time alone and learn to develop
your personal resources."
Alexander Reid Martin
Real-time Big Data at FPT (for TechCamp University)
More info at
http://engineering.adsplay.net/jobs-at-adsplay-team

More Related Content

PDF
From Data Analytics to Fast Data Intelligence
PDF
Reactive Reatime Big Data with Open Source Lambda Architecture - TechCampVN 2014
PDF
RFX - Full-Stack Technology for Real-time Big Data
PDF
Building your data driven business with Reactive Marketing Technology
PDF
Building Reactive Real-time Data Pipeline
PDF
UX Analytics for Data-driven Product Development
PDF
Data analytic for mobile app development
PDF
Lambda Architecture 2.0 for Reactive AB Testing
From Data Analytics to Fast Data Intelligence
Reactive Reatime Big Data with Open Source Lambda Architecture - TechCampVN 2014
RFX - Full-Stack Technology for Real-time Big Data
Building your data driven business with Reactive Marketing Technology
Building Reactive Real-time Data Pipeline
UX Analytics for Data-driven Product Development
Data analytic for mobile app development
Lambda Architecture 2.0 for Reactive AB Testing

What's hot (20)

PDF
Lambda Architecture and open source technology stack for real time big data
PDF
Reactive Data System in Practice
PDF
Rakuten - Recommendation Platform
PDF
Using User Behavior for Real-time Advertising
PDF
The Lyft data platform: Now and in the future
PDF
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
PPTX
Polyglot Processing - An Introduction 1.0
PPTX
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
PDF
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
PDF
Better Together: How Graph database enables easy data integration with Spark ...
PDF
Building data "Py-pelines"
PDF
Don't build a data science team
PPTX
ironSource Atom BigData Berlin
PDF
Mastering Your Customer Data on Apache Spark by Elliott Cordo
PDF
Spark Summit Europe 2016 Keynote - Databricks CEO
PDF
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
PDF
The More the Merrier: Scaling Model Building Infrastructure at Zendesk
PPTX
The Power of Data
PDF
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
PPTX
Graph Data: a New Data Management Frontier
Lambda Architecture and open source technology stack for real time big data
Reactive Data System in Practice
Rakuten - Recommendation Platform
Using User Behavior for Real-time Advertising
The Lyft data platform: Now and in the future
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Polyglot Processing - An Introduction 1.0
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
Better Together: How Graph database enables easy data integration with Spark ...
Building data "Py-pelines"
Don't build a data science team
ironSource Atom BigData Berlin
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit Europe 2016 Keynote - Databricks CEO
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
The More the Merrier: Scaling Model Building Infrastructure at Zendesk
The Power of Data
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Graph Data: a New Data Management Frontier
Ad

Viewers also liked (20)

PDF
Big data infrastructure todo-tasks Rfx Framework
PPTX
Parallel and Iterative Processing for Machine Learning Recommendations with S...
PDF
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
PDF
Content Marketing - Make an Impression
PDF
Social Media power on Shoppers' decision journey
PDF
Admicro mobileads profile 0946.251.335
PPTX
PPT
[Materials];[Admicro tang hieu_suat_quang_cao]
PDF
Effects of hashtag in Instagram Marketing
PPTX
Remote 2 android - Poly sáng tạo 2016 - Sinh viên FPT Polytechnic
PPTX
FTI Intro
PDF
Profile Admicro - English
PPTX
Admicro PR Solution 2014
PDF
Cẩm nang content marketing
PDF
Admicro Profile
PPTX
How we solved Real-time User Segmentation using HBase
PPTX
Criteo State of Mobile Commerce Report Q2 2015
PPTX
Agile data warehouse
PDF
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
PPTX
RecsysFR: Criteo presentation
Big data infrastructure todo-tasks Rfx Framework
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Giới thiệu cơ bản về Big Data và các ứng dụng thực tiễn
Content Marketing - Make an Impression
Social Media power on Shoppers' decision journey
Admicro mobileads profile 0946.251.335
[Materials];[Admicro tang hieu_suat_quang_cao]
Effects of hashtag in Instagram Marketing
Remote 2 android - Poly sáng tạo 2016 - Sinh viên FPT Polytechnic
FTI Intro
Profile Admicro - English
Admicro PR Solution 2014
Cẩm nang content marketing
Admicro Profile
How we solved Real-time User Segmentation using HBase
Criteo State of Mobile Commerce Report Q2 2015
Agile data warehouse
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...
RecsysFR: Criteo presentation
Ad

Similar to Real-time Big Data at FPT (for TechCamp University) (20)

DOCX
BCO 117 IT Software for Business Lecture Reference Notes.docx
KEY
Big data and APIs for PHP developers - SXSW 2011
PDF
Big Data made easy in the era of the Cloud - Demi Ben-Ari
DOCX
Big data and hadoop ecosystem essentials for managers
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PPTX
Python PPT
PPTX
Python for Big Data Analytics
PDF
Chatbots and Natural Language Generation - A Bird Eyes View
PDF
Designing the Next Generation Data Lake
PPT
big-data-notes1.ppt
PPT
Big Data
PPTX
John Weston rolling deck (info + trivia)
PDF
From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Ema...
PPT
Hadoop and Pig at Twitter__HadoopSummit2010
PDF
Design for X: Exploring Product Design with Apache Spark and GraphLab
PDF
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
PDF
A Gentle Introduction to Big Data
PDF
Open-source, how we survive with it?
PDF
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
PDF
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
BCO 117 IT Software for Business Lecture Reference Notes.docx
Big data and APIs for PHP developers - SXSW 2011
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big data and hadoop ecosystem essentials for managers
Why apache Flink is the 4G of Big Data Analytics Frameworks
Python PPT
Python for Big Data Analytics
Chatbots and Natural Language Generation - A Bird Eyes View
Designing the Next Generation Data Lake
big-data-notes1.ppt
Big Data
John Weston rolling deck (info + trivia)
From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Ema...
Hadoop and Pig at Twitter__HadoopSummit2010
Design for X: Exploring Product Design with Apache Spark and GraphLab
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
A Gentle Introduction to Big Data
Open-source, how we survive with it?
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...

More from Trieu Nguyen (20)

PDF
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
PDF
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
PDF
Building Your Customer Data Platform with LEO CDP
PDF
How to track and improve Customer Experience with LEO CDP
PDF
[Notes] Customer 360 Analytics with LEO CDP
PDF
Leo CDP - Pitch Deck
PDF
LEO CDP - What's new in 2022
PDF
Lộ trình triển khai LEO CDP cho ngành bất động sản
PDF
Why is LEO CDP important for digital business ?
PDF
From Dataism to Customer Data Platform
PDF
Data collection, processing & organization with USPA framework
PDF
Part 1: Introduction to digital marketing technology
PDF
Why is Customer Data Platform (CDP) ?
PDF
How to build a Personalized News Recommendation Platform
PDF
How to grow your business in the age of digital marketing 4.0
PDF
Video Ecosystem and some ideas about video big data
PDF
Concepts, use cases and principles to build big data systems (1)
PDF
Open OTT - Video Content Platform
PDF
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
PDF
Introduction to Recommendation Systems (Vietnam Web Submit)
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP
How to track and improve Customer Experience with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
Leo CDP - Pitch Deck
LEO CDP - What's new in 2022
Lộ trình triển khai LEO CDP cho ngành bất động sản
Why is LEO CDP important for digital business ?
From Dataism to Customer Data Platform
Data collection, processing & organization with USPA framework
Part 1: Introduction to digital marketing technology
Why is Customer Data Platform (CDP) ?
How to build a Personalized News Recommendation Platform
How to grow your business in the age of digital marketing 4.0
Video Ecosystem and some ideas about video big data
Concepts, use cases and principles to build big data systems (1)
Open OTT - Video Content Platform
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Introduction to Recommendation Systems (Vietnam Web Submit)

Recently uploaded (20)

PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
How to run a consulting project- client discovery
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPT
Predictive modeling basics in data cleaning process
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Introduction to the R Programming Language
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
SAP 2 completion done . PRESENTATION.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
How to run a consulting project- client discovery
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
STERILIZATION AND DISINFECTION-1.ppthhhbx
Predictive modeling basics in data cleaning process
[EN] Industrial Machine Downtime Prediction
Acceptance and paychological effects of mandatory extra coach I classes.pptx
annual-report-2024-2025 original latest.
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
A Complete Guide to Streamlining Business Processes
Introduction to the R Programming Language
Qualitative Qantitative and Mixed Methods.pptx
IMPACT OF LANDSLIDE.....................
retention in jsjsksksksnbsndjddjdnFPD.pptx

Real-time Big Data at FPT (for TechCamp University)

  • 1. Real-time Big Data at FPT and some key ideas to build real-time big data platform from open source tools ○ Apache Spark ○ Reactive Function X (RFX) Presented by @tantrieuf31 http://nguyentantrieu.info
  • 2. about me ? ● Full Stack Engineer and Tech Lead at AdsPlay, startup project from FPT Telecom ● Founder at RFXLab.com, building RFX framework and Fast Data Intelligence Platform for Data-driven Organization ● Tech Blogger at http://engineering.adsplay.net
  • 3. Abstract 1. Just 5 minutes about the history of “Big Data” 2. Does Big Data solve big problems ? 3. Overview about Open Source Tools a. Netty (Event Collector) b. Kafka (Event Queue) c. RFX-Stream (Event Processor) d. Apache Spark (Big Data processing engine) e. RFX-Iris (Fast Data Query Interface)
  • 4. 5 minutes about the history of “Big Data”
  • 5. Imagine what if you have to build a GREAT pyramid ? In fact, the Big Data was born in 3000 years ago. When you have to build a great thing, you would face with making decisions with lots of data.
  • 9. What if the business is not driven by data? Refer: http://www.nytimes.com/2011/04/24/business/24unboxed.html
  • 10. Since 2015, the Fast Data, a new trend, has been replacing Big Data http://www.tibco.com/blog/2015/03/27/how-analytics- facilitates-fast-data
  • 11. 1970s 1990s 2000s 2010s Data Management Technology and Trends ● Netty.io ● Apache Storm ● Apache Kafka ● Apache Spark ● RFX ● ... ● Hadoop Ecosystem ● NoSQL Ecosystem ● ... ● Oracle ● MySQL ● PostgreSQL ● ...
  • 12. “ Does Big Data solve our big problems ?
  • 13. tracking all access logs and user’s activities Processing in real- time( seconds) ! Storing multiple types of log (video, web, mobile, like, comment, play, … )
  • 17. boosting Sale Revenue / Profit Log events Reactive events
  • 18. How is the Big Data used at FPT ?
  • 19. Does Vietnamese love football ? The correlation said YES
  • 20. Analyzing trending events in real-time !
  • 22. Real-time Big Data Architecture
  • 23. “ How to build an “Just-Work” real- time big data system ?
  • 24. KEY IDEA is “divide and conquer”
  • 25. User Story in plain English 1. Hercules is thinking about some questions. E. g: What’s hot songs of Nhacso on Facebook ? 2. He decides to ask Iris about this question. 3. Iris analyzes the question into “query messages” and deliver them to Zeus. 4. Zeus uses his power of “large-scale data processing” to answer the question. 5. Done, Zeus return the result “hot songs on Facebook” for Iris. 6. She sends the result to Hercules
  • 26. Visualizing our user story Question about Big Data: What’s hot songs of NhacSo.net on Facebook ? messages ZeusIris Hercules
  • 28. Let’s see how it works
  • 29. Awesome Open Source Projects to follow RFXLab.com ◎ http://www.rfxlab.com ◎ https://github.com/rfxlab Kafka : http://kafka.apache.org Hadoop http://hadoop.apache.org Apache Spark https://spark.apache.org
  • 30. Awesome Open Source Projects to follow Native Kafka driver: https://github. com/edenhill/librdkafka/ PHP Kafka driver: https://github. com/EVODelavega/phpkafka Data Visualization JavaScript Library https://github.com/nvd3-community/nvd3
  • 31. Good ref books "Spend some time alone and learn to develop your personal resources." Alexander Reid Martin