SlideShare a Scribd company logo
REAL time
Analytics AT
SCALE
SMART DATA PIPES For THE
INTERNET OF THINGS
Assaf Araki, Big Data Analytics Architect
Big Data Analytics, Intel
Intro to Big Data
Analytics @ Intel People (+100)
Data
Scientists
Management
Big Data
Developers
Analytics
PMs
13%
41%
9%
37%
CONTRIBUTION TO Data Center Group
CONTRIBUTION TO INTEL
Operations
MISSIO
N
#1 Operational excellence
#2 Help Intel win area of
Intelligent machines
VISION
Analytics is a
competitive
advantage for Intel
Industry / Academy
Technical due-diligence
assessment for Intel Capital
Benchmark with startups
Academy Collaborations
Assist Intel Sales & Marketing
DESIGN
Cut validations time-to-market
MANUFACTURI
NGReduce test cost
SALES &
MARKETINGIncrease sales through analytics
Stream
Analytics
Cloud
Parkinson
Research
Machine
Learning
Strategy
The IOT challenge
CloudIngestionThings
Cloud Infrastructure
Data Platform
Analytics Platform
UI Services
Use case : The Parkinson Disease
research
44
CLINICAL TRIALS
Create and Validate
Algorithms & Measures
POPULATION STUDY
Generate insights
Using Big data analytics
10
Medication
reporting
Medication
reminder
Report
PATIENT
REPORTING
OTHER
Configurable
data
collections
Contribution
score
Integrated
Login and
registration Pebble
notifications
OBJECTIVE
MEASURES
Gait
Sleep
Tremor
Activity Level
Controlled
Tests
So, Why is it Big-Data Problem?
30
subjects
5
DaysperSubject
0.15TB
Weekly
500
subjects
30
DaysperSubject
1GB
PerSubjectperDay
15TB
Monthly
1000
subjects
365
DaysperSubject
365TB
Yearly
1GB
PerSubjectperDay
1GB
PerSubjectperDay
SERVICE
BATCH ANAYTICS
STREAM ANALYICS
INGESTION
STORAGE
USER INTERFACE
Mosquitt
o
7
CLOUD COMPUTING SERVICES
Smart Ingestion
characteristics
Personalized
Easy to use
Smart Data
Pipe
• Per single device or user
• Maintain state and required data for ML
• Easily subscribe to any Stream
• Use familiar development Languages (Java, Scala)
• Developers focus on logic development
• Apply analytics on the Stream
• Trigger actions (close the feedback loop) in timely manner
Scalability
• Linear scalability (scale Out)
• Extremely High concurrencies
• High Throughput
Fault
Tolerance• No Single point of failure
• Seamless recovery
• Persistent
Smart Data Ingestion – High level
overview
9
Device
Device
Device
Device
Scalable, Persistent Broker Processing, Stream
Analytics
What is Akka?
• Micro-service(Actor) oriented.
• Message Driven
• Lock-free
• Location-transparent
• High performance
• Fault Tolerant
• Scales linearly
Stream Processing - the Akka
way…
11
Each actor is a small peace of Java or Scala
code performing its role
A set of actors creates a topology which is
responsible for device’s data stream
processing
A single Akka node may have millions of
concurrent actors handling different streams
and operations
Change
detection
Automatic
change
detection
time rules
matcher
Detect & raise
alert for
matched rules
Sleep
quality
calculating
users’ sleep
quality
Tremor
detection
Tremor
detection based
on devices’
Aggregator
Aggregation
(50hz to
minutes / hours)
Sample Parkinson Disease re
Subscriber Parser Aggregator
HBase
Writer
Analytics
Manager
Change
Detection
UnZip
Real Time
Rules
Sleep
Quality
STREAM Processing
MANAGEMENT Layer (“Pigeon”)
• Core OS & Docker containers enable portability and ease of deployment anywhere
• Enables the flexibility of choosing a set of desired containers based on a given use case
requirements
Easy Portability With Docker &
Core OS
Preconfigured containers ready to be loaded
• IoT data Ingestion goes beyond moving the data into the cloud
• We have deployed a scalable and fault tolerance, multi-protocol pipeline that
enables stream Analytics
• Stream Analytics platform is leveraged for Other IoT projects
Summary
Thank You!

More Related Content

PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PPTX
Self-Service Analytics on Hadoop: Lessons Learned
PPTX
Stateful Stream Processing at In-Memory Speed
PDF
Bay Area Apache Flink Meetup Community Update August 2015
PDF
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PDF
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
PDF
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Self-Service Analytics on Hadoop: Lessons Learned
Stateful Stream Processing at In-Memory Speed
Bay Area Apache Flink Meetup Community Update August 2015
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...

What's hot (20)

PDF
Spark Summit EU talk by Zoltan Zvara
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PPTX
Debunking Common Myths in Stream Processing
PDF
Introduction to Apache Apex by Thomas Weise
PDF
Unified, Efficient, and Portable Data Processing with Apache Beam
PPTX
Lego-like building blocks of Storm and Spark Streaming Pipelines
PDF
Baymeetup-FlinkResearch
PPTX
SICS: Apache Flink Streaming
PDF
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
PPTX
Flink Streaming
PPTX
Slim Baltagi – Flink vs. Spark
PPTX
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
PDF
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
PPTX
Flink vs. Spark
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
PPTX
Improving Organizational Knowledge with Natural Language Processing Enriched ...
PPTX
Apache Flink and what it is used for
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
PDF
Márton Balassi Streaming ML with Flink-
Spark Summit EU talk by Zoltan Zvara
Apache Flink(tm) - A Next-Generation Stream Processor
Debunking Common Myths in Stream Processing
Introduction to Apache Apex by Thomas Weise
Unified, Efficient, and Portable Data Processing with Apache Beam
Lego-like building blocks of Storm and Spark Streaming Pipelines
Baymeetup-FlinkResearch
SICS: Apache Flink Streaming
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Streaming
Slim Baltagi – Flink vs. Spark
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Apache Flink: Real-World Use Cases for Streaming Analytics
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Flink vs. Spark
Big Data Berlin v8.0 Stream Processing with Apache Apex
Improving Organizational Knowledge with Natural Language Processing Enriched ...
Apache Flink and what it is used for
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Márton Balassi Streaming ML with Flink-
Ad

Viewers also liked (20)

PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
PDF
Mikio Braun – Data flow vs. procedural programming
PDF
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
PPTX
Flink Case Study: Bouygues Telecom
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
PDF
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
PDF
Vasia Kalavri – Training: Gelly School
PPTX
Apache Flink Training: System Overview
PPTX
Aljoscha Krettek – Notions of Time
PDF
Flink Apachecon Presentation
PDF
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
PPTX
Apache Flink: API, runtime, and project roadmap
PDF
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
PDF
Ufuc Celebi – Stream & Batch Processing in one System
PDF
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
PDF
Fabian Hueske – Juggling with Bits and Bytes
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
PDF
Matthias J. Sax – A Tale of Squirrels and Storms
PPTX
Fabian Hueske – Cascading on Flink
K. Tzoumas & S. Ewen – Flink Forward Keynote
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Mikio Braun – Data flow vs. procedural programming
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Case Study: Bouygues Telecom
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Vasia Kalavri – Training: Gelly School
Apache Flink Training: System Overview
Aljoscha Krettek – Notions of Time
Flink Apachecon Presentation
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Flink: API, runtime, and project roadmap
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Ufuc Celebi – Stream & Batch Processing in one System
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Fabian Hueske – Juggling with Bits and Bytes
Flink 0.10 @ Bay Area Meetup (October 2015)
Matthias J. Sax – A Tale of Squirrels and Storms
Fabian Hueske – Cascading on Flink
Ad

Similar to Assaf Araki – Real Time Analytics at Scale (20)

PDF
Big Data : Risks and Opportunities
PPSX
Big Data
PDF
Streaming analytics
PDF
Reactive Summit 2017 Highlights!
PDF
Real-time Analytics & Streaming by AccentFuture
PDF
Introduction to Streaming Analytics
PDF
Machine Data Analytics
PDF
Reliable Data Intestion in BigData / IoT
PPTX
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
PPTX
Big data unit 2
PPTX
Big Data Analytics with Hadoop
PDF
Data Pipelines with Spark & DataStax Enterprise
PPT
Hadoop India Summit, Feb 2011 - Informatica
PPTX
Streaming real time data with Vibe Data Stream
PDF
Data Ingestion in Big Data and IoT platforms
PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Spark meetup stream processing use cases
PDF
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
PDF
Lecture4 big data technology foundations
PDF
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
Big Data : Risks and Opportunities
Big Data
Streaming analytics
Reactive Summit 2017 Highlights!
Real-time Analytics & Streaming by AccentFuture
Introduction to Streaming Analytics
Machine Data Analytics
Reliable Data Intestion in BigData / IoT
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
Big data unit 2
Big Data Analytics with Hadoop
Data Pipelines with Spark & DataStax Enterprise
Hadoop India Summit, Feb 2011 - Informatica
Streaming real time data with Vibe Data Stream
Data Ingestion in Big Data and IoT platforms
Trivento summercamp masterclass 9/9/2016
Spark meetup stream processing use cases
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
Lecture4 big data technology foundations
2013 International Conference on Knowledge, Innovation and Enterprise Presen...

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Assaf Araki – Real Time Analytics at Scale

  • 1. REAL time Analytics AT SCALE SMART DATA PIPES For THE INTERNET OF THINGS Assaf Araki, Big Data Analytics Architect Big Data Analytics, Intel
  • 2. Intro to Big Data Analytics @ Intel People (+100) Data Scientists Management Big Data Developers Analytics PMs 13% 41% 9% 37% CONTRIBUTION TO Data Center Group CONTRIBUTION TO INTEL Operations MISSIO N #1 Operational excellence #2 Help Intel win area of Intelligent machines VISION Analytics is a competitive advantage for Intel Industry / Academy Technical due-diligence assessment for Intel Capital Benchmark with startups Academy Collaborations Assist Intel Sales & Marketing DESIGN Cut validations time-to-market MANUFACTURI NGReduce test cost SALES & MARKETINGIncrease sales through analytics Stream Analytics Cloud Parkinson Research Machine Learning Strategy
  • 3. The IOT challenge CloudIngestionThings Cloud Infrastructure Data Platform Analytics Platform UI Services
  • 4. Use case : The Parkinson Disease research 44 CLINICAL TRIALS Create and Validate Algorithms & Measures POPULATION STUDY Generate insights Using Big data analytics
  • 6. So, Why is it Big-Data Problem? 30 subjects 5 DaysperSubject 0.15TB Weekly 500 subjects 30 DaysperSubject 1GB PerSubjectperDay 15TB Monthly 1000 subjects 365 DaysperSubject 365TB Yearly 1GB PerSubjectperDay 1GB PerSubjectperDay
  • 7. SERVICE BATCH ANAYTICS STREAM ANALYICS INGESTION STORAGE USER INTERFACE Mosquitt o 7 CLOUD COMPUTING SERVICES
  • 8. Smart Ingestion characteristics Personalized Easy to use Smart Data Pipe • Per single device or user • Maintain state and required data for ML • Easily subscribe to any Stream • Use familiar development Languages (Java, Scala) • Developers focus on logic development • Apply analytics on the Stream • Trigger actions (close the feedback loop) in timely manner Scalability • Linear scalability (scale Out) • Extremely High concurrencies • High Throughput Fault Tolerance• No Single point of failure • Seamless recovery • Persistent
  • 9. Smart Data Ingestion – High level overview 9 Device Device Device Device Scalable, Persistent Broker Processing, Stream Analytics
  • 10. What is Akka? • Micro-service(Actor) oriented. • Message Driven • Lock-free • Location-transparent • High performance • Fault Tolerant • Scales linearly
  • 11. Stream Processing - the Akka way… 11 Each actor is a small peace of Java or Scala code performing its role A set of actors creates a topology which is responsible for device’s data stream processing A single Akka node may have millions of concurrent actors handling different streams and operations Change detection Automatic change detection time rules matcher Detect & raise alert for matched rules Sleep quality calculating users’ sleep quality Tremor detection Tremor detection based on devices’ Aggregator Aggregation (50hz to minutes / hours) Sample Parkinson Disease re Subscriber Parser Aggregator HBase Writer Analytics Manager Change Detection UnZip Real Time Rules Sleep Quality
  • 13. • Core OS & Docker containers enable portability and ease of deployment anywhere • Enables the flexibility of choosing a set of desired containers based on a given use case requirements Easy Portability With Docker & Core OS Preconfigured containers ready to be loaded
  • 14. • IoT data Ingestion goes beyond moving the data into the cloud • We have deployed a scalable and fault tolerance, multi-protocol pipeline that enables stream Analytics • Stream Analytics platform is leveraged for Other IoT projects Summary

Editor's Notes

  • #4: The Internet of Things (IoT) is creating unprecedented business opportunities for both individuals and organizations.
  • #5: The story The name of the man in the picture on the left is Andy Grove and he is one of Intel’s founders and has Parkinson (PD) The story begins when he reads and article in the NY times about Big Data and decides to start a project within Intel related to PD and Big Data He contacts Michael J fox foundation and then decides to start a joint effort together The idea is to elaborate Internet of things, wearable's technology and big data platforms to assist PD research PD Neurodegenerative disease, movement disorder symptoms Existing treatment are mainly for quality of life improvements and not for curing ~6M patients, ~1M in the US and ~5M in the rest of the globe Life expectancy: ~10-15 years 1 out 100 over the age of 60 is a PD patient No Test and no Progression markers
  • #6: On this slide the focus should be on the patient reported capabilities and the configurable data collection strategies. For the patient reported explain the Medication reminder and reporting capabilities which helps us track patients compliance, learn abour medication effect on the motor symptoms and this while providing value to the patients The Objective measures part is covered later on in the PPT. In the Other section talk about the ability to configure which sensorial data to use for each cohort of users
  • #8: Quick review of PD solution layers as a use case of IoT platform Batch Layer based on Spark Storage layer using Hadoop, HBase & MySQL for Metadata Powerful, scalable ingestion layer based on Akka & Kafka A dynamic stream analytics layer based on Akka actor system framework Scalable Service layer providing set of APIs for registration & data extraction out of the platform UI layer – the only layer in this diagram which is unique to PD solution – using Pebble watch and Android application to collect data and interact with patients You can note that 5 out of the presented 6 layers (excluding the UI layer) are part of the IoT platform and can be used for similar products / verticals
  • #10: Multi-protocol pipeline built over AKKA & KAFKA KAKFA is a fast, scalable, durable & distributed messaging system -  high-throughput, low-latency platform for handling real-time data feeds.  AKKA is an Actor based framework allowing high concurrency, distributed and resilient based on events / messaging This layer is responsible for: Pulling messages Parse & Process Concurrent & controlled write
  • #11: Writing correct concurrent, fault-tolerant and scalable applications is hard. Akka uses the Actor Model to raise the abstraction level and provide a better platform to build correct concurrent and scalable applications. Can support millions of concurrent actors handling different streams which is a good fit to IoT characteristics. We use Akka for: Processing messages Near Real-time rules Change detection at the device level
  • #14: Docker is an open-source project that automates the deployment of applications inside software containers CoreOS is an open source lightweight operating system based on the Linux kernel and designed for providing infrastructure to clustered deployments
  • #15: Change Detection – Single (Kolmogorov-Smirnov) & Multi sensor ( Under patent ) Anomaly Detection Periodicity Stream classification