SlideShare a Scribd company logo
IoT Data Streaming
รัฐศิลป์ รานอกภานุวัชร์, D.ENG
WHO AM I ?
 อาจารย์ผู้สอน ป.ตรี วิศวกรรมคอมพิวเตอร์ มหาวิทยาลัยธุรกิจบัณฑิตย์
 อาจารย์ผู้สอน ป.โท วิศวกรรมข้อมูลขนาดใหญ่ มหาวิทยาลัยธุรกิจบัณฑิตย์
 อาจารย์พิเศษ สอนวิชา Data Streaming and Real Time Analytics
สถาบันบัณฑิตพัฒนบริหารศาสตร์ นิด้า
 วิทยากรผู้สอน Amazon cloud ประจาสถาบัน 9expert
 ที่ปรึกษาบริษัทเอกชน ทางด้าน BigData และ Blockchain
 งานวิจัย Blockchain, IoT และ BigData
2
Outline
• Internet of Things (IoT)
• IoT Data Streaming
• Collect Data
• MQTT
• Kafka
• Streaming processing platform
• Flink
• Storm
• Spark
• Use-Case Examples
3
Internet of Things (IoT)
Credit: https://orzota.com/industrial-iot/
Software and
platform
(Data Stream
Processing)
VisualizationThings
(Generate
data steam)
4
Sensors & Actuators
IoT data characteristics
Large-Scale
Streaming Data
Heterogeneity
Time and space
correlation
High noise data
IoT
data
IoT Applications support
 High-speed data streams
 Requiring real-time or near
real-time actions
 Sometimes the need to join
○ with static data
○ with historical data
Reference: M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big data: related technologies, challenges and future prospects. Springer, 2014
What is Data Streaming?
Ref: https://www.cisco.com/c/dam/en/us/products/collateral/analytics-automation-software/data-virtualization/r20-consultancy-combining-datastreaming-wp.pdf
 The data streaming is
continuously transmitted from
one system (the producer) to
another (the consumer) which
reacts instantaneously (No delay)
on the incoming data.
Distributed Streaming
 Streaming:
 Computations on never ending “streams” of data records (“events”)
 Distributed:
 Computation spread across many machines
7Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
Stateless streaming
 Every incoming record is independent of other records.
 There is no relation between different record can processed and persisted
independently.
 Eg. Map , Filter, Join with static data
8Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
Stateful Streaming
 Computation and state
 E.g., counters, windows of past events, state machines, trained ML models
 counts of each distinct word seen in records
 Result depends on history of stream
 Processing of an incoming record depends upon the result of previously processed records
9Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
Event-Time Streaming
 Data records associated with timestamps (time series data)
 Processing depends on timestamps
 An event-time stream processor should give you the tools to reason about time
 Handle streams that are out of order
10Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
Event-Time Streaming
 Because time matters
 Time
 Event time, which is the time at which
events actually occurred
 Processing time, which is the time at
which events are observed in the system
11Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
Things are Producing Streaming Data
12
 Smart city
 Healthcare/Medical
device
 Connected cars
 Logistics
 Home automation
 Airlines
 Farmers
 Smart Machinery
 Security system
IoT Big Data Architecture
Filtering
Analytics
Source: https://mapr.com/blog/ml-iot-connected-medical-devices/ 14
Collect Data (high level architecture)
15
How to integrate? MQTT or Kafka
16Copy right : https://thenewstack.io/mqtt-protocol-iot/
Messaging Systems: Publish/Subscribe
Producer Consumer
Producer
Consumer
Topic 1 Topic 2
Topic 3
subscribe
publish(topic, msg)
Publish subscribe
system
msg
msg
Example
18
MQTT uses the pub/sub pattern to connect interested parties with each other
Arduino, Raspberry Pi
MQTT - Publish / subscribe messaging
protocol
19
 MQTT protocol is a Machine to Machine (M2M) protocol widely used in Internet of things.
 This protocol used publish-subscriber paradigm in contrast to HTTP based on request/response
paradigm.
 Built on top of TCP/IP for constrained devices and unreliable network
 Many (open source) broker implementation
 Many client libraries
MQTT Architecture (no scale)
20
MQTT Architecture (clustering depends on
broker implementation)
21
MQTT Architecture (clustering depends on
broker implementation)
22
MQTT Trade-Offs
Pros
 Lightweight
 Simple API
 Built for poor connectivity / high latency scenario
 Many client connections (tens of thousands per MQTT server)
Cons
 Queuing, not stream processing
 no buffering
 No high scalability
 No good integration to rest of the enterprise
 No reprocessing of events
23
Apache Kafka
A distributed streaming platform
24
Kafka Data Streams
Kafka is used to stream data into data lakes, applications and real-time stream analytics systems.
Kafka architecture: Broker, Topics, Producers,
and Consumers
26
Kafka Cluster is made up of multiple Kafka Brokers
Apache Kafka - Architecture
Producer
Consumer
27
Apache Kafka - Architecture
Producer
Consumer
28
Apache Kafka
Producer
Consumer
29
Kafka Zookeeper Coordination
Producer
Consumer
Producer
Broker Broker Broker Broker
Consumer
ZK
30
31
32
33
A few important characteristics
 Fast
 Kafka can handle hundreds of megabytes of reads and writes per second from a
large number of clients.
 Designed for real time activity streaming.
 Distributed and highly scalable
 Kafka has a cluster-centric design offers strong durability and fault-tolerance
guarantees.
 Messages partitioning spread over a cluster of machines
 Durable
 Message persisted to disk and replicated within cluster to prevent data loss.
 Each broker can handle terabytes of messages without performance impact
Streaming
Platform
USE CASE
Use Case – Truck Sensors
36
Kafka Trade-Offs (from IoT perspective)
Pros
 Stream processing, not just queuing
 High throughput
 Large scale
 High availability
 Long term storage and buffering
 Reprocessing of events
 Good integration to rest of the enterprise
Cons
 Not built for tens of thousands connections
 Requires stable network and good infrastructure
37
Collect Data (high level architecture)
38
How to integrate? MQTT+Kafka
End-to-End Integration from MQTT to Apache Kafka
39
MQTT Source and Sink Connectors for Kafka
Connect
40
https://www.confluent.io/hub/
https://www.confluent.io/connector/kafka-connect-mqtt/
IoT Data Ingestion through MQTT into Kafka
41Ref: https://github.com/gschmutz/stream-processing-workshop/tree/master/06-iot-data-ingestion-over-mqtt
IoT Big Data Architecture
Filtering
Analytics
Ref: https://mapr.com/blog/ml-iot-connected-medical-devices/ 42
What is stream processing?
 Technology that let users query continuous data stream and detect conditions
fast within a small time period from the time of receiving the data.
 The detection time period varies from few milliseconds to minutes.
Streams processing tools
44
Two Types of Stream Processing
45
Native Streaming
 It means every incoming record is
processed as soon as it arrives, without
waiting for others.
 There are some continuous running
processes which run for ever and every
record passes through these processes to
get processed.
 Framework to achieve the minimum
latency possible.
 But hard to achieve fault tolerance
46
Micro-batching
 It means incoming records in every few seconds are batched together and then
processed in a single mini batch with delay of few seconds.
 Cost of latency and it will not feel like a natural steaming
47
https://medium.com/@chandanbaranwal/spark-streaming-vs-flink-vs-storm-vs-kafka-streams-vs-samza-choose-your-stream-processing-
91ea3f04675b
Apache Storm
 Distributed dataflow abstraction (spouts & bolts) and large scale stream processing
 It is true streaming and is good for simple event based use cases
 Very low latency, and high throughput
 No state management
48
 if it is simple IoT kind of event based alerting system
source of streams
filtering,
functions,
aggregations,
joins, etc
Processing
Apache Flink
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
 Stateful computations over streams
 First True streaming framework with all advanced
features like event time processing, watermarks, etc
 Low latency with high throughput
Historic
Data
Streams
Application
Good for Complex event time processing,
aggregation, stream joins,etc
Architecture and Process Model
50
Ref: https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/general_arch.html
51
Ref: https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/hadoop.html
Apache Spark
 Spark has emerged as true successor of Hadoop
 Unified batch and stream processing over a batch runtime
 High throughput, Fault tolerance by default due to micro-batch nature
 Not true streaming, not suitable for low latency requirements
52
Good for Stream machine learning
Use Case
53
54
Ref: Muhammad Syafrudin, “Performance Analysis of IoT-Based Sensor, Big Data Processing, and Machine Learning Model for Real-Time
Monitoring System in Automotive Manufacturing”
Real-Time Monitoring System in Automotive Manufacturing
Detect abnormal events
and diagnosis in a process
55
System design
Ref: Muhammad Syafrudin, “Performance Analysis of IoT-Based Sensor, Big Data Processing, and Machine Learning Model for Real-Time
Monitoring System in Automotive Manufacturing”
Sensor Data
56
57
58
59
60
Performance evaluation in terms of latency with different numbers of clients (a) and servers
(b); throughput with different numbers of clients (c) and servers (d);
Thank you

More Related Content

PDF
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
PPTX
Make Streaming IoT Analytics Work for You
PDF
Transforming The Customer Experience With Real-Time Insights
PDF
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
PDF
Apache StreamPipes – Flexible Industrial IoT Management
PDF
batbern43 Self Service on a Big Data Platform
PDF
Keynote 1 the rise of stream processing for data management & micro serv...
PDF
Flink for Everyone: Self-Service Data Analytics with StreamPipes
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Make Streaming IoT Analytics Work for You
Transforming The Customer Experience With Real-Time Insights
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
Apache StreamPipes – Flexible Industrial IoT Management
batbern43 Self Service on a Big Data Platform
Keynote 1 the rise of stream processing for data management & micro serv...
Flink for Everyone: Self-Service Data Analytics with StreamPipes

What's hot (20)

DOCX
Decide if PhoneGap is for you as your mobile platform selection
PDF
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
PDF
The case of vehicle networking financial services accomplished by China Mobile
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
PDF
EDA Meets Data Engineering – What's the Big Deal?
PPTX
Bridge Your Kafka Streams to Azure Webinar
PPTX
Airline reservations and routing: a graph use case
PDF
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
PDF
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
PDF
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
PDF
Event Driven Architecture: Mistakes, I've made a few...
PDF
Apache Kafka® Use Cases for Financial Services
PPTX
Hyper-Convergence CrowdChat
PPTX
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
PPTX
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
PDF
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
PDF
IoT Data Platforms: Processing IoT Data with Apache Kafka™
PDF
Pivoting event streaming, from PROJECTS to a PLATFORM
PDF
Kubernetes Jakarta Meetup 010 - Service Mesh Observability with Kiali
PDF
"Application monitoring — from requirements to tools, not the other way aroun...
Decide if PhoneGap is for you as your mobile platform selection
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
The case of vehicle networking financial services accomplished by China Mobile
Apache Kafka Streams + Machine Learning / Deep Learning
EDA Meets Data Engineering – What's the Big Deal?
Bridge Your Kafka Streams to Azure Webinar
Airline reservations and routing: a graph use case
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
Event Driven Architecture: Mistakes, I've made a few...
Apache Kafka® Use Cases for Financial Services
Hyper-Convergence CrowdChat
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
IoT Data Platforms: Processing IoT Data with Apache Kafka™
Pivoting event streaming, from PROJECTS to a PLATFORM
Kubernetes Jakarta Meetup 010 - Service Mesh Observability with Kiali
"Application monitoring — from requirements to tools, not the other way aroun...
Ad

Similar to Io t data streaming (20)

PDF
IoT meets Big Data
PDF
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
PDF
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
PDF
Streaming Analytics and Internet of Things - Geesara Prathap
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
PDF
Spark Streaming and IoT by Mike Freedman
PPTX
Realtime Detection of DDOS attacks using Apache Spark and MLLib
PPTX
From Device to Data Center to Insights: Architectural Considerations for the ...
PPTX
Trivento summercamp masterclass 9/9/2016
PPTX
Development of Cloud-Agnostic IoT Solutions
PPTX
Trivento summercamp fast data 9/9/2016
PPTX
Data streaming fundamentals
PDF
Building end to end streaming application on Spark
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
From Device to Data Center to Insights
PDF
Data Streaming For Big Data
PDF
Apache Kafka® and Analytics in a Connected IoT World
PDF
Streaming analytics state of the art
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
IoT meets Big Data
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Streaming Analytics and Internet of Things - Geesara Prathap
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Spark Streaming and IoT by Mike Freedman
Realtime Detection of DDOS attacks using Apache Spark and MLLib
From Device to Data Center to Insights: Architectural Considerations for the ...
Trivento summercamp masterclass 9/9/2016
Development of Cloud-Agnostic IoT Solutions
Trivento summercamp fast data 9/9/2016
Data streaming fundamentals
Building end to end streaming application on Spark
K. Tzoumas & S. Ewen – Flink Forward Keynote
From Device to Data Center to Insights
Data Streaming For Big Data
Apache Kafka® and Analytics in a Connected IoT World
Streaming analytics state of the art
Getting insights from IoT data with Apache Spark and Apache Bahir
Ad

Recently uploaded (20)

PPTX
web development for engineering and engineering
PPT
Project quality management in manufacturing
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Digital Logic Computer Design lecture notes
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
web development for engineering and engineering
Project quality management in manufacturing
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
additive manufacturing of ss316l using mig welding
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
Foundation to blockchain - A guide to Blockchain Tech
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Internet of Things (IOT) - A guide to understanding
Arduino robotics embedded978-1-4302-3184-4.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
OOP with Java - Java Introduction (Basics)
Embodied AI: Ushering in the Next Era of Intelligent Systems
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
573137875-Attendance-Management-System-original
Lesson 3_Tessellation.pptx finite Mathematics
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Digital Logic Computer Design lecture notes
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx

Io t data streaming

  • 1. IoT Data Streaming รัฐศิลป์ รานอกภานุวัชร์, D.ENG
  • 2. WHO AM I ?  อาจารย์ผู้สอน ป.ตรี วิศวกรรมคอมพิวเตอร์ มหาวิทยาลัยธุรกิจบัณฑิตย์  อาจารย์ผู้สอน ป.โท วิศวกรรมข้อมูลขนาดใหญ่ มหาวิทยาลัยธุรกิจบัณฑิตย์  อาจารย์พิเศษ สอนวิชา Data Streaming and Real Time Analytics สถาบันบัณฑิตพัฒนบริหารศาสตร์ นิด้า  วิทยากรผู้สอน Amazon cloud ประจาสถาบัน 9expert  ที่ปรึกษาบริษัทเอกชน ทางด้าน BigData และ Blockchain  งานวิจัย Blockchain, IoT และ BigData 2
  • 3. Outline • Internet of Things (IoT) • IoT Data Streaming • Collect Data • MQTT • Kafka • Streaming processing platform • Flink • Storm • Spark • Use-Case Examples 3
  • 4. Internet of Things (IoT) Credit: https://orzota.com/industrial-iot/ Software and platform (Data Stream Processing) VisualizationThings (Generate data steam) 4 Sensors & Actuators
  • 5. IoT data characteristics Large-Scale Streaming Data Heterogeneity Time and space correlation High noise data IoT data IoT Applications support  High-speed data streams  Requiring real-time or near real-time actions  Sometimes the need to join ○ with static data ○ with historical data Reference: M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big data: related technologies, challenges and future prospects. Springer, 2014
  • 6. What is Data Streaming? Ref: https://www.cisco.com/c/dam/en/us/products/collateral/analytics-automation-software/data-virtualization/r20-consultancy-combining-datastreaming-wp.pdf  The data streaming is continuously transmitted from one system (the producer) to another (the consumer) which reacts instantaneously (No delay) on the incoming data.
  • 7. Distributed Streaming  Streaming:  Computations on never ending “streams” of data records (“events”)  Distributed:  Computation spread across many machines 7Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  • 8. Stateless streaming  Every incoming record is independent of other records.  There is no relation between different record can processed and persisted independently.  Eg. Map , Filter, Join with static data 8Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  • 9. Stateful Streaming  Computation and state  E.g., counters, windows of past events, state machines, trained ML models  counts of each distinct word seen in records  Result depends on history of stream  Processing of an incoming record depends upon the result of previously processed records 9Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  • 10. Event-Time Streaming  Data records associated with timestamps (time series data)  Processing depends on timestamps  An event-time stream processor should give you the tools to reason about time  Handle streams that are out of order 10Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  • 11. Event-Time Streaming  Because time matters  Time  Event time, which is the time at which events actually occurred  Processing time, which is the time at which events are observed in the system 11Ref: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
  • 12. Things are Producing Streaming Data 12  Smart city  Healthcare/Medical device  Connected cars  Logistics  Home automation  Airlines  Farmers  Smart Machinery  Security system
  • 13. IoT Big Data Architecture Filtering Analytics Source: https://mapr.com/blog/ml-iot-connected-medical-devices/ 14
  • 14. Collect Data (high level architecture) 15 How to integrate? MQTT or Kafka
  • 15. 16Copy right : https://thenewstack.io/mqtt-protocol-iot/
  • 16. Messaging Systems: Publish/Subscribe Producer Consumer Producer Consumer Topic 1 Topic 2 Topic 3 subscribe publish(topic, msg) Publish subscribe system msg msg
  • 17. Example 18 MQTT uses the pub/sub pattern to connect interested parties with each other Arduino, Raspberry Pi
  • 18. MQTT - Publish / subscribe messaging protocol 19  MQTT protocol is a Machine to Machine (M2M) protocol widely used in Internet of things.  This protocol used publish-subscriber paradigm in contrast to HTTP based on request/response paradigm.  Built on top of TCP/IP for constrained devices and unreliable network  Many (open source) broker implementation  Many client libraries
  • 20. MQTT Architecture (clustering depends on broker implementation) 21
  • 21. MQTT Architecture (clustering depends on broker implementation) 22
  • 22. MQTT Trade-Offs Pros  Lightweight  Simple API  Built for poor connectivity / high latency scenario  Many client connections (tens of thousands per MQTT server) Cons  Queuing, not stream processing  no buffering  No high scalability  No good integration to rest of the enterprise  No reprocessing of events 23
  • 23. Apache Kafka A distributed streaming platform 24
  • 24. Kafka Data Streams Kafka is used to stream data into data lakes, applications and real-time stream analytics systems.
  • 25. Kafka architecture: Broker, Topics, Producers, and Consumers 26 Kafka Cluster is made up of multiple Kafka Brokers
  • 26. Apache Kafka - Architecture Producer Consumer 27
  • 27. Apache Kafka - Architecture Producer Consumer 28
  • 30. 31
  • 31. 32
  • 32. 33
  • 33. A few important characteristics  Fast  Kafka can handle hundreds of megabytes of reads and writes per second from a large number of clients.  Designed for real time activity streaming.  Distributed and highly scalable  Kafka has a cluster-centric design offers strong durability and fault-tolerance guarantees.  Messages partitioning spread over a cluster of machines  Durable  Message persisted to disk and replicated within cluster to prevent data loss.  Each broker can handle terabytes of messages without performance impact
  • 35. Use Case – Truck Sensors 36
  • 36. Kafka Trade-Offs (from IoT perspective) Pros  Stream processing, not just queuing  High throughput  Large scale  High availability  Long term storage and buffering  Reprocessing of events  Good integration to rest of the enterprise Cons  Not built for tens of thousands connections  Requires stable network and good infrastructure 37
  • 37. Collect Data (high level architecture) 38 How to integrate? MQTT+Kafka
  • 38. End-to-End Integration from MQTT to Apache Kafka 39
  • 39. MQTT Source and Sink Connectors for Kafka Connect 40 https://www.confluent.io/hub/ https://www.confluent.io/connector/kafka-connect-mqtt/
  • 40. IoT Data Ingestion through MQTT into Kafka 41Ref: https://github.com/gschmutz/stream-processing-workshop/tree/master/06-iot-data-ingestion-over-mqtt
  • 41. IoT Big Data Architecture Filtering Analytics Ref: https://mapr.com/blog/ml-iot-connected-medical-devices/ 42
  • 42. What is stream processing?  Technology that let users query continuous data stream and detect conditions fast within a small time period from the time of receiving the data.  The detection time period varies from few milliseconds to minutes.
  • 44. Two Types of Stream Processing 45
  • 45. Native Streaming  It means every incoming record is processed as soon as it arrives, without waiting for others.  There are some continuous running processes which run for ever and every record passes through these processes to get processed.  Framework to achieve the minimum latency possible.  But hard to achieve fault tolerance 46
  • 46. Micro-batching  It means incoming records in every few seconds are batched together and then processed in a single mini batch with delay of few seconds.  Cost of latency and it will not feel like a natural steaming 47 https://medium.com/@chandanbaranwal/spark-streaming-vs-flink-vs-storm-vs-kafka-streams-vs-samza-choose-your-stream-processing- 91ea3f04675b
  • 47. Apache Storm  Distributed dataflow abstraction (spouts & bolts) and large scale stream processing  It is true streaming and is good for simple event based use cases  Very low latency, and high throughput  No state management 48  if it is simple IoT kind of event based alerting system source of streams filtering, functions, aggregations, joins, etc Processing
  • 48. Apache Flink Queries Applications Devices etc. Database Stream File / Object Storage  Stateful computations over streams  First True streaming framework with all advanced features like event time processing, watermarks, etc  Low latency with high throughput Historic Data Streams Application Good for Complex event time processing, aggregation, stream joins,etc
  • 49. Architecture and Process Model 50 Ref: https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/general_arch.html
  • 51. Apache Spark  Spark has emerged as true successor of Hadoop  Unified batch and stream processing over a batch runtime  High throughput, Fault tolerance by default due to micro-batch nature  Not true streaming, not suitable for low latency requirements 52 Good for Stream machine learning
  • 53. 54 Ref: Muhammad Syafrudin, “Performance Analysis of IoT-Based Sensor, Big Data Processing, and Machine Learning Model for Real-Time Monitoring System in Automotive Manufacturing” Real-Time Monitoring System in Automotive Manufacturing Detect abnormal events and diagnosis in a process
  • 54. 55 System design Ref: Muhammad Syafrudin, “Performance Analysis of IoT-Based Sensor, Big Data Processing, and Machine Learning Model for Real-Time Monitoring System in Automotive Manufacturing”
  • 56. 57
  • 57. 58
  • 58. 59
  • 59. 60 Performance evaluation in terms of latency with different numbers of clients (a) and servers (b); throughput with different numbers of clients (c) and servers (d);