SlideShare a Scribd company logo
Handling IOT Data with a
Modern Data Architecture
Cliff Gilmore - Data Practice Director @ 1904labs
Capture, Process and Serve (All the Things)
Challenges of IOT Data
3
Scale
Frequency of events
Size of Data
Number of Devices
Number of Users
Latency Demands
Geo Distribution
Processing
Batch Analytics
Realtime Analytics
Aggregations
Machine Learning
Reporting
Applications
Realtime Access
Report Visualization
Production Analytics
ML Driven Decisions
Microservices
IOT Data Architecture
Leading to Lambda and Kappa
Architectural Requirements
❏ Must scale out linearly on ingestion, processing, storage, and access.
❏ Need to be able to store huge amounts of data organized for different access patterns.
❏ Must have the ability to process data inflight for real time decision making, alerting and pattern
matching
❏ Need to serve the data to the rest of the organization through a common API/Service
❏ The architecture must be agile to accommodate new changes to business logic and processing
algorithms
Stack Components
❏ Distributed Log / Queue
❏ A pub/sub partitioned queue
❏ Kafka is the defacto choice due to it’s wide use in production
❏ Stream Processing
❏ Ability to process events as they arrive
❏ Event at a Time
❏ Samza, Flink, Storm
❏ Micro Batch
❏ Spark Streaming
❏ Batch Processing
❏ Process event history in bulk
❏ Spark, MapReduce on top of HDFS or Wide Column Stores
❏ Serving
❏ Expose data to the rest of the organization and serve application requests
❏ Wide Column Store
❏ Cassandra, HBase, BigTable
❏ Can also be RDBMS for some data sets (Reports, BI Rollups, etc)
Lambda Architecture
Events
Distributed
Log
Batch Layer
Speed Layer Serving Layer
Raw Data, Pattern
Matching and
Aggregates
Patterns, Rollups,
Recommendations
Kappa Architecture
Events
Distributed
Log
Batch
Streaming Serving
Stream Results
Stream V1
Stream V2
Table V1
Table V2
Raw DataRaw
Cassandra to
Serve IOT Data
The Art of Time Series
Why Cassandra?
❏ Proven linear scale up to 1000s of nodes in a single cluster
❏ Geo redundancy to collect data where it is created and replicate across the globe
❏ High capacity to ingest parallel individual writes
❏ Low latency and high throughput reads
❏ Wide-column store data model allows for data to be structured around query patterns
❏ Continuous availability suited to and used for the most mission critical systems
❏ AP platform by definition of the CAP theorem, consistency is tunable to give availability
Cassandra 101
DC1 DC2
Physical Data Model
Partition
Key
Clustering
Key Val1
Col1:Val
Col2:Val
Col3:Val
Col4:Val
Col5:Val
Col6:Val
….
Clustering
Key Val2
Col1:Val
Col6:Val
….
Clustering
Key Val3
Col6:Val
….
Clustering
Key Val4
Col1:Val
Col2:Val
Col3:Val
Col4:Val
Col5:Val
Col6:Val
….
Clustering
Key Val5
Col1:Val
Col2:Vall
….
...
….
CQL - Cassandra Query Language
❏ Simple to use language that looks like SQL
❏ No joins, group by etc
❏ Example Queries
❏ SELECT * FROM readings WHERE event_time > ? AND
event_tiime <= ? WHERE device_id= ?;
❏ INSERT INTO readings (device_id, event_time, temperature)
VALUES (?,?,?);
I’ve got this!
TimeSeries Table Example
CREATE TABLE readings (
sensor_id text,
event_time TimeUUID,
temperature decimal,
PRIMARY KEY (sensor_id,event_time)
);
TimeSeries Table Example
CREATE TABLE readings (
sensor_id text,
event_time TimeUUID,
temperature decimal,
PRIMARY KEY (sensor_id,event_time)
);
Time Ordered Sortable
UUID
TimeSeries Table Example
CREATE TABLE readings (
sensor_id text,
event_time TimeUUID,
temperature decimal,
PRIMARY KEY (sensor_id,event_time)
);
Partition Key
TimeSeries Table Example
CREATE TABLE readings (
sensor_id text,
event_time TimeUUID,
temperature decimal,
PRIMARY KEY (sensor_id,event_time)
);
Clustering Key
Physical Data Model
Station #1
12:05.15
15.9 C
12:05.16
15.9 C
12:05.17
16.0 C
12:05.18
16.1 C
12:05.19
16.0 C
...
….
Station #2
12:05.15
22.0 C
12:05.20
22.1 C
12:05.25
27.9 C
12:05.30
27.7 C
12:05.35
30.2 C
...
….
Advanced Data Model Topics
❏ Consider bucketing time in Partition Key if sample rate is high
❏ Primary Key ((device_id,year,week),event_time)
❏ If per event granularity not needed can batch or rollup events
❏ Primary Key (device_id, event_minute)
❏ If we batch events
❏ JSON blob of sensor readings within the minute
❏ Can’t update sensor readings without read-before-write
Questions?
cliff.gilmore@1904labs.com

More Related Content

PDF
A TRUE STORY ABOUT DATABASE ORCHESTRATION
PDF
Kapacitor Stream Processing
PDF
Discover some "Big Data" architectural concepts with Redis
PPTX
Moving the Elephant in the Room: Data Migration at Scale
PPTX
A novel approach to prevent cache based side-channel attack in the cloud (1)
PDF
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
PDF
Netflix Data Benchmark @ HPTS 2017
PDF
DSD-INT 2017 The use of big data for dredging - De Boer
A TRUE STORY ABOUT DATABASE ORCHESTRATION
Kapacitor Stream Processing
Discover some "Big Data" architectural concepts with Redis
Moving the Elephant in the Room: Data Migration at Scale
A novel approach to prevent cache based side-channel attack in the cloud (1)
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Netflix Data Benchmark @ HPTS 2017
DSD-INT 2017 The use of big data for dredging - De Boer

What's hot (20)

PDF
Developing Ansible Dynamic Inventory Script - Nov 2017
PDF
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
PDF
Large Scale EventLog Management @Twitter
PPTX
Graph Databases at Netflix
PPTX
An Intro to Elasticsearch and Kibana
PDF
Presto Summit 2018 - 04 - Netflix Containers
PDF
DIscover Spark and Spark streaming
PPTX
Scaling HDFS for Exabyte Storage@twitter
PDF
A True Story About Database Orchestration
PPTX
InfluxDb
PPTX
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
PDF
Taking Your Database Global with Kubernetes
PDF
Presto Summit 2018 - 07 - Lyft
PPTX
Symantec: Cassandra Data Modelling techniques in action
PDF
InfluxData Architecture for IoT | Noah Crowley | InfluxData
PPTX
Managing 100s of PetaBytes of data in Cloud
PDF
Presto Summit 2018 - 10 - Qubole
PDF
Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData
PDF
Webinar how to build a highly available time series solution with kairos-db (1)
PDF
why novoserve
Developing Ansible Dynamic Inventory Script - Nov 2017
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
Large Scale EventLog Management @Twitter
Graph Databases at Netflix
An Intro to Elasticsearch and Kibana
Presto Summit 2018 - 04 - Netflix Containers
DIscover Spark and Spark streaming
Scaling HDFS for Exabyte Storage@twitter
A True Story About Database Orchestration
InfluxDb
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Taking Your Database Global with Kubernetes
Presto Summit 2018 - 07 - Lyft
Symantec: Cassandra Data Modelling techniques in action
InfluxData Architecture for IoT | Noah Crowley | InfluxData
Managing 100s of PetaBytes of data in Cloud
Presto Summit 2018 - 10 - Qubole
Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData
Webinar how to build a highly available time series solution with kairos-db (1)
why novoserve
Ad

Viewers also liked (20)

PPTX
вететеринария
PPS
L'escleriade chambre d'hôtes de charme
PDF
Benvenuto a Paris! Hotel Le Littré, L'intemporel Parisien
PPT
лідерство л. татаринова
PPTX
Composites, Dry Waterproof Coating IDM10
PPT
Pogorelova
PPT
Petersen presentación animada
PDF
e-learning trends 2017 / Тренды в e-learning на 2017 год
PPTX
Smart Sensors , Crowding and the future , IDM9
PPT
Стандарт корпоративного электронного курса
PPT
благоустройство заречья
PPTX
Підкуп виборців
PDF
Tabela Campeonato Pernambucano Feminino 2014 - 2ª fase
PPTX
Как создать скорм-пакет мини-курсов. Практика. Елена Черныш
PDF
Introduction to TitanDB
PPS
Autres activités des he
PPTX
Using Altera Signal tap
PDF
Как создать курс, который будут проходить с удовольствием и с пользой?
PPTX
DataStax et Apache Cassandra pour la gestion des flux IoT
PPS
Les hydrocarbures
вететеринария
L'escleriade chambre d'hôtes de charme
Benvenuto a Paris! Hotel Le Littré, L'intemporel Parisien
лідерство л. татаринова
Composites, Dry Waterproof Coating IDM10
Pogorelova
Petersen presentación animada
e-learning trends 2017 / Тренды в e-learning на 2017 год
Smart Sensors , Crowding and the future , IDM9
Стандарт корпоративного электронного курса
благоустройство заречья
Підкуп виборців
Tabela Campeonato Pernambucano Feminino 2014 - 2ª fase
Как создать скорм-пакет мини-курсов. Практика. Елена Черныш
Introduction to TitanDB
Autres activités des he
Using Altera Signal tap
Как создать курс, который будут проходить с удовольствием и с пользой?
DataStax et Apache Cassandra pour la gestion des flux IoT
Les hydrocarbures
Ad

Similar to IOT meetup presentation (20)

PPTX
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
PPTX
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
PDF
Building Event Streaming Architectures on Scylla and Kafka
PDF
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
PDF
Azure Cosmos DB - Technical Deep Dive
PPTX
Event Detection Pipelines with Apache Kafka
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PPTX
Cassandra
PDF
ksqlDB Workshop
PDF
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
PPTX
Technical overview of Azure Cosmos DB
PPTX
BigData Developers MeetUp
PDF
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
PDF
SpringPeople - Introduction to Cloud Computing
PPTX
Always On: Building Highly Available Applications on Cassandra
PPTX
Tour de France Azure PaaS 3/7 Stocker des informations
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Stream or segment : what is the best way to access your events in Pulsar_Neng
PDF
Cassandra CLuster Management by Japan Cassandra Community
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Building Event Streaming Architectures on Scylla and Kafka
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Azure Cosmos DB - Technical Deep Dive
Event Detection Pipelines with Apache Kafka
The Future of Fast Databases: Lessons from a Decade of QuestDB
Cassandra
ksqlDB Workshop
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Technical overview of Azure Cosmos DB
BigData Developers MeetUp
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
SpringPeople - Introduction to Cloud Computing
Always On: Building Highly Available Applications on Cassandra
Tour de France Azure PaaS 3/7 Stocker des informations
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Stream or segment : what is the best way to access your events in Pulsar_Neng
Cassandra CLuster Management by Japan Cassandra Community

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Modernizing your data center with Dell and AMD
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Monthly Chronicles - July 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Modernizing your data center with Dell and AMD
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding

IOT meetup presentation

  • 1. Handling IOT Data with a Modern Data Architecture Cliff Gilmore - Data Practice Director @ 1904labs
  • 2. Capture, Process and Serve (All the Things)
  • 3. Challenges of IOT Data 3 Scale Frequency of events Size of Data Number of Devices Number of Users Latency Demands Geo Distribution Processing Batch Analytics Realtime Analytics Aggregations Machine Learning Reporting Applications Realtime Access Report Visualization Production Analytics ML Driven Decisions Microservices
  • 4. IOT Data Architecture Leading to Lambda and Kappa
  • 5. Architectural Requirements ❏ Must scale out linearly on ingestion, processing, storage, and access. ❏ Need to be able to store huge amounts of data organized for different access patterns. ❏ Must have the ability to process data inflight for real time decision making, alerting and pattern matching ❏ Need to serve the data to the rest of the organization through a common API/Service ❏ The architecture must be agile to accommodate new changes to business logic and processing algorithms
  • 6. Stack Components ❏ Distributed Log / Queue ❏ A pub/sub partitioned queue ❏ Kafka is the defacto choice due to it’s wide use in production ❏ Stream Processing ❏ Ability to process events as they arrive ❏ Event at a Time ❏ Samza, Flink, Storm ❏ Micro Batch ❏ Spark Streaming ❏ Batch Processing ❏ Process event history in bulk ❏ Spark, MapReduce on top of HDFS or Wide Column Stores ❏ Serving ❏ Expose data to the rest of the organization and serve application requests ❏ Wide Column Store ❏ Cassandra, HBase, BigTable ❏ Can also be RDBMS for some data sets (Reports, BI Rollups, etc)
  • 7. Lambda Architecture Events Distributed Log Batch Layer Speed Layer Serving Layer Raw Data, Pattern Matching and Aggregates Patterns, Rollups, Recommendations
  • 8. Kappa Architecture Events Distributed Log Batch Streaming Serving Stream Results Stream V1 Stream V2 Table V1 Table V2 Raw DataRaw
  • 9. Cassandra to Serve IOT Data The Art of Time Series
  • 10. Why Cassandra? ❏ Proven linear scale up to 1000s of nodes in a single cluster ❏ Geo redundancy to collect data where it is created and replicate across the globe ❏ High capacity to ingest parallel individual writes ❏ Low latency and high throughput reads ❏ Wide-column store data model allows for data to be structured around query patterns ❏ Continuous availability suited to and used for the most mission critical systems ❏ AP platform by definition of the CAP theorem, consistency is tunable to give availability
  • 12. Physical Data Model Partition Key Clustering Key Val1 Col1:Val Col2:Val Col3:Val Col4:Val Col5:Val Col6:Val …. Clustering Key Val2 Col1:Val Col6:Val …. Clustering Key Val3 Col6:Val …. Clustering Key Val4 Col1:Val Col2:Val Col3:Val Col4:Val Col5:Val Col6:Val …. Clustering Key Val5 Col1:Val Col2:Vall …. ... ….
  • 13. CQL - Cassandra Query Language ❏ Simple to use language that looks like SQL ❏ No joins, group by etc ❏ Example Queries ❏ SELECT * FROM readings WHERE event_time > ? AND event_tiime <= ? WHERE device_id= ?; ❏ INSERT INTO readings (device_id, event_time, temperature) VALUES (?,?,?); I’ve got this!
  • 14. TimeSeries Table Example CREATE TABLE readings ( sensor_id text, event_time TimeUUID, temperature decimal, PRIMARY KEY (sensor_id,event_time) );
  • 15. TimeSeries Table Example CREATE TABLE readings ( sensor_id text, event_time TimeUUID, temperature decimal, PRIMARY KEY (sensor_id,event_time) ); Time Ordered Sortable UUID
  • 16. TimeSeries Table Example CREATE TABLE readings ( sensor_id text, event_time TimeUUID, temperature decimal, PRIMARY KEY (sensor_id,event_time) ); Partition Key
  • 17. TimeSeries Table Example CREATE TABLE readings ( sensor_id text, event_time TimeUUID, temperature decimal, PRIMARY KEY (sensor_id,event_time) ); Clustering Key
  • 18. Physical Data Model Station #1 12:05.15 15.9 C 12:05.16 15.9 C 12:05.17 16.0 C 12:05.18 16.1 C 12:05.19 16.0 C ... …. Station #2 12:05.15 22.0 C 12:05.20 22.1 C 12:05.25 27.9 C 12:05.30 27.7 C 12:05.35 30.2 C ... ….
  • 19. Advanced Data Model Topics ❏ Consider bucketing time in Partition Key if sample rate is high ❏ Primary Key ((device_id,year,week),event_time) ❏ If per event granularity not needed can batch or rollup events ❏ Primary Key (device_id, event_minute) ❏ If we batch events ❏ JSON blob of sensor readings within the minute ❏ Can’t update sensor readings without read-before-write