SlideShare a Scribd company logo
© Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0
Neil Raden
Hired Brains
Research
@NeilRaden
Frank McQuillan
Pivotal
@fmcquillan
October 31, 2018
Adding Edge Data to Your
AI and Analytics Strategy
Market Perspective
By Hired Brains Research
Adding Edge Data to Your AI and Analytics Strategy
Initial Observations
● Volume of data flowing from sensors is huge
● You can’t throw away sensor data, it doesn’t exist anywhere else
● IoT isn’t one thing, there are many very different models
● Edge architectures evolving quickly to be more intelligent
● But there is no context data at the edge, only telemetry
● To do Machine Learning, you need lots of other data and powerful
analytical platforms which are NOT at the edge
● Managers want to evaluate IoT in tools they already know
● Over time, IoT will simply blend into enterprise apps
● Topology implies applications
● IoT will flow into existing applications
Ⓒ 2018 Neil Raden Hired Brains Research 2018
IoT is on track to
connect 50 billion
“smart” things by
2020 and 1 trillion
sensors soon after,
according to the
National Science
Foundation.
Smart Dust
is Covering
the World
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Internet of Things (IoT); Issues and Applications
● Buzz about IoT is dominated by discussions of architecture and
technology
● We’re going to talk about data and analytics
● Physical architectures of IoT (mobile, self-contained intelligent edge
devices, classic, etc.) determine the data and analytics that are possible
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Lamborghini creates world's first 'self-healing' sports
car
The Terzo Millennio, which
translates as third
millennium in Italian, has
the ability to detect and
repair cracks in its body
work. Using sensors the
car can conduct its own
health check to detect any
damages and self-repair
itself by filling the crack
with nanotubes to prevent
it spreading.
What’s Wrong with Most IoT Diagrams?
● IoT is not an application
● All of the other data
sources are second-hand
● Their data is created as
part of an application
and stored
● If we lose their data, we
can always get it back
● If we lose IoT data, it’s
gone forever
● Diagrams like these
overlook the need for
handling sensor data
differently
So What’s the Implication?
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Sensor Data Has to be Governed Differently
● Sensor data is fresh. It hasn’t been used for anything else
● That raises two issues
○ Stewardship: responsibilities to be guardians of it because if we
lose it, it’s gone,
○ What techniques and methodologies do we need to understand it?
Intelligence generated at the edge and dump the rest?
Or save it?
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Extremely Simplified Diagram
Sensor
Data
Local Logic Action
Gateway
Analytical
Platform/AI/ML
Ingest
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Can We Afford to Transmit/Store All of That?
?
Ⓒ 2018 Neil Raden Hired Brains Research 2018
No More Managing from Scarcity
Ⓒ 2018 Neil Raden Hired Brains Research 2018
What If Ferrari’s Were Subject to Moore’s Law?
Ⓒ 2018 Neil Raden Hired Brains Research 2018
What Businesses Need from IoT Analytics
● Level of expressiveness sufficient for the specification,
assembly and modification of common and complex
business models without code
● This supports the notion about using existing tools
● The ability to accommodate all but the most esoteric
kinds of modeling
● Frictionless, continuous intelligence
● Beyond that, there is data science….
Ⓒ 2018 Neil Raden Hired Brains Research 2018
But Don’t Forget: “Nanos Gigantum Humeris
Insidentes”
● Those who came before
● Devised brilliant models
● No need to start over
● New technology is not a
substitute for good
thinking
● Seek out those original
thinkers
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Industrial Internet of Things (IIoT); Issues and
Applications
Ⓒ 2018 Neil Raden Hired Brains Research 2018
The IIoT “Market” from an Analyst’s POV
● IIoT is a concept, not a market
● Businesses want to see outcomes thru apps already familiar with
● Bring together not only sensor data but all the other contextual information
necessary to make business decisions
● Show the impact sensor data has on production KPIs, so managers can adjust
production accordingly
● IIoT gradually, seamlessly incorporated into enterprise systems
Ⓒ 2018 Neil Raden Hired Brains Research 2018
IIoT Topology Drives Types of Analytics
● One arrangement is a collection of self-contained units with lots of sensors
● Jet engines in flight, other conveyances, complex machinery, mining equipment.
● Equipped not only with sensors, but also are their own “edge” device, with the
intelligence to monitor and control the machine
● The mining drill, e.g. comes equipped with software updated remotely.
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Digital Twin: Simulation Models with IoT Data
● Virtual counterpart
to real-world
components
● Abstracting
hardware
monitoring and
maintenance
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Industrial Internet of Things (IIoT); Issues and
Applications
● Software developed to run that piece of equipment, rather than conforming to some
industry standard or protocol, may be a problem capturing that data and using it
over time.
● In fact, like the case of John Deere and the farmers’ tractors, there may be conflict
over who actually owns the data and can access the logic that captured it from the
sensors.
Ⓒ 2018 Neil Raden Hired Brains Research 2018
What Businesses Need from IoT Analytics
● Is there an “informating” application for your data?
● Things to consider
● Do you have the infrastructure to capitalize on it?
● Do you have an agile enough culture for something
different?
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Three Things We Know About IoT Opportunities
● It’s not about the hardware and architecture
○ One thing will be common across all IoT applications: analytics will play a
crucial role in operating and driving value from IoT resources
● IoT changes operations, not just costs
○ Merely looking to reduce costs will fail because the cost and effort to install
instrumentation, conducting telemetry, perform complex analytics processing, etc., likely
will exceed any operational savings.
● Existing Governance and Security Regimes Will Be Inadequate
○ IoT also is growing alongside a more common abstraction of compute and storage,
changing access patterns further, not a monolithic application, but many use cases at
once on the same repository
Ⓒ 2018 Neil Raden Hired Brains Research 2018
Pivotal Perspective
Adding Edge Data to Your AI and Analytics Strategy
IoT Components
Device
Management
Edge Processing
and Analytics
Advanced Analytics and
Machine Learning
Business Decisions
and Applications
Edge Data Center/Cloud
(Consumption at
multiple locations.)
Reduced Costs Drive IoT Growth
What analytics can one do with sensor data?
Where is the right place to perform these
operations?
Sensor Data Analytics
Reference Architectures
Adding Edge Data to Your AI and Analytics Strategy
Parallel Configurable Data Load
Parallel Data Load and External Tables
Hadoop Data Lakes
Modern Analytical Platform
Predefined Libraries Programmatic
GPText
High Speed Ingestion
Analytical
Data to cache
In-Memory Data Grid
In-DB Predictive Analytics
Public Cloud Data
Lakes
Other Data Platforms
Cloud Native Analytical Apps
(Data Microservices)
Operationalization of Models
Pivotal Technology for IoT
Pivotal Application
Service (PAS)
Pivotal Container
Service (PKS)
Pivotal Greenplum
Adding Edge Data to Your AI and Analytics Strategy
ANALYTICAL
APPLICATIONS
NATIVE INTERFACES
PIVOTAL
GREENPLUM
PLATFORM
MULTI-
STRUCTURED DATA
SOURCES &
PIPELINES
Structured Data
JDBC, ODBC
SQL
ANSI SQL
USERS
FLEXIBLE
DEPLOYMENT
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS
JSON, Apache AVRO, Apache Parquet & XML
Teradata SQL
Other DB SQL
Apache MADlib
ML/Statistics/Graph
Python. R,
Java, Perl, C
Programmatic
Apache SOLR
Text
PostGIS
GeoSpatial
Custom Apps BI / Reporting Machine Learning AI
On-Premises
Public
Clouds
Private
Clouds
Fully
Managed
Clouds
GREENPLUM
MODERM
DATA
PLATFORM
KafkaETL
Spring
Cloud
Data Flow
Massively
Parallel
(MPP)
PostgresSQ
L
Kernel
Petabyte
Scale
Loading
Query
Optimizer
(GPORCA)
Workload
Manager
Polymorphic
Storage
Command
Center
SQL
Compatibilit
y
(Hyper-Q)
Containers…
Pivotal Greenplum
Pivotal Greenplum
Standby
Master
…
Master
Host
SQL
Interconnect
Segment Host
Node1
Segment Host
Node2
Segment Host
Node3
Segment Host
NodeN
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS KafkaETL
Spring
Cloud
Data Flow
Data Transformation
Traditional BI
Machine
Learning
GraphData Science
Productivity Tools
Geospatial
Text
Greenplum Integrated Analytics
Scalable, In-Database
Machine Learning
• Open source https://github.com/apache/madlib
• Downloads and docs http://madlib.apache.org/
• Wiki https://cwiki.apache.org/confluence/display/MADLIB/
Apache MADlib: Big Data Machine Learning in SQL
Open source,
top level
Apache project
For PostgreSQL
and Greenplum
Database
Powerful machine
learning, graph,
statistics and analytics
for data scientists
Functions
Data Types and Transformations
Array and Matrix Operations
Matrix Factorization
• Low Rank
• Singular Value Decomposition (SVD)
Norms and Distance Functions
Sparse Vectors
Encoding Categorical Variables
Path Functions
Pivot
Sessionize
Stemming
Aug 2018
Graph
All Pairs Shortest Path (APSP)
Breadth-First Search
Hyperlink-Induced Topic Search (HITS)
Average Path Length
Closeness Centrality
Graph Diameter
In-Out Degree
PageRank and Personalized PageRank
Single Source Shortest Path (SSSP)
Weakly Connected Components
Model Selection
Cross Validation
Prediction Metrics
Train-Test Split
Statistics
Descriptive Statistics
• Cardinality Estimators
• Correlation and Covariance
• Summary
Inferential Statistics
• Hypothesis Tests
Probability Functions
Supervised Learning
Neural Networks
Support Vector Machines (SVM)
Conditional Random Field (CRF)
Regression Models
• Clustered Variance
• Cox-Proportional Hazards Regression
• Elastic Net Regularization
• Generalized Linear Models
• Linear Regression
• Logistic Regression
• Marginal Effects
• Multinomial Regression
• Naïve Bayes
• Ordinal Regression
• Robust Variance
Tree Methods
• Decision Tree
• Random Forest
Time Series Analysis
• ARIMA
Unsupervised Learning
Association Rules (Apriori)
Clustering (k-Means)
Principal Component Analysis (PCA)
Topic Modelling (Latent Dirichlet Allocation)
Utility Functions
Columns to Vector
Conjugate Gradient
Linear Solvers
• Dense Linear Systems
• Sparse Linear Systems
Mini-Batching
PMML Export
Term Frequency for Text
Vector to Columns
Nearest Neighbors
• k-Nearest Neighbors
Sampling
Balanced
Random
Stratified
Geospatial Analytics with
PostGIS is a spatial database extension which allows
for analysis and processing of GIS objects
Spatial Indexes & Bounding Boxes
Round earth calculations
RasterVector
100+ Libraries
in Python & R
Data Science Bundle
Spring Cloud Data Flow
Adding Edge Data to Your AI and Analytics Strategy
Spring Cloud Data Flow is a microservices
toolkit for building data integration and
real-time data processing pipelines.
Pipelines consist of Spring Boot apps,
using Spring Cloud Stream for events
or Spring Cloud Task for batch
processes.
The Data Flow server provides interfaces
to compose and deploy pipelines onto a
modern platform like Cloud Foundry and
Kubernetes.
Spring Cloud Data Flow
Batch
Integration
Spring Cloud Task
Tasks are finite Boot Microservices
connecting to data/storage.
System tracks invocations, exit-status.
Messaging
Integration
Spring Cloud Stream
Microservices operate on Message Streams.
Loose coupling via Pub/Sub Topics
over pluggable Message Bus like Kafka.
IoT Event Streaming for
! Automotive
Developed using
- Spring Boot Microservices
- Spring Cloud Stream
- Spring Kafka
Deployed on
- Pivotal Cloud Foundry
- Microsoft Azure & on-premise
Integrating
- Kafka, RabbitMQ
- Hadoop, NiFi, Spark
https://www.youtube.com/watch?v=_idsi5JWJj4
★ Ingest vehicle data to cloud
★ Interpret event streams
★ Enrich with customer data
★ Expose in real-time to apps
★ Feed downstream analytics
Next-Gen Data Workload
Example Use Cases
Adding Edge Data to Your AI and Analytics Strategy
Public Safety in Extreme Weather
Public Safety in Extreme Weather
Public Safety in Extreme Weather
Large Scale Fleet Management
Large Scale Fleet Management
For many large organizations with fleets of vehicles, fleet
management practices have remained largely unchanged for
20+ years
● Manual process
● Siloed data
● Rules based approach to decision making
(e.g. end of life after 12 years of service)
● No machine learning and/or optimization
● Opportunities to gain significant cost
savings
25
GB/hr
380
TB/year
Source: American Automobile Association
(AAA)
Modern Vehicle Instrumentation
● 200,000+ fleet of trucks
● Predictive maintenance
● Parts life estimation and
bulk purchasing of parts
● Vehicle end-of-life
prediction based on
usage, maintenance
and repair data
● Optimal point of resale
● Automated data
collection, reporting and
analytics
● Data stored in a secure,
scalable, and integrated
analytical environment
Purchase/Financial
Records
Vehicle Telemetry
Assignment
Maintenance
Records
Repair Reports
/ Images
Pivotal
GemFire
/Cloud Cache
Normalized
Real-Time Data
Customer
Database
Reporting/BI
Storage & Analytics DB
Data Science Tools
Operationalize
ML Models
Ext. Apps
Fault interpretation data,
service center data, etc.
REST API
Containerized
Data Science
Large Scale Fleet Management
Theme Park
Predictive Maintenance
Current
Timeline
Mon
Thu
Motor on ride is weakened
but no one is aware
• Ride is Stopped due to fault
• Ride evacuated
• Team repairs motor
• Ride restarted
Desired
Timeline
Mon
Thu
Motor on ride
is weakened and data is
being collected
Tue
• Data from previous day suggests
motor problem and triggers
alarm
• After park closes Team
investigates
• Finds and repairs weakened
motor
Guests enjoy ride all day!
Learned
Predictive Model
(ML)
Maintenance
Logging
Database
Motor starts to fail, but not
enough to trigger a ride
stop
Tu
e
Ride Failure Prediction
Values of binary sensors
every 1ms
Good behavior:
Clean transitions
Bad behavior:
Chatter
Count 100ms windows per day
with k or more transitions
1 or more
2 or more
3 or more
Something is going on
prior to day 90
Team discovers
and fixes problem
System alerts Team
to problem
Motor Logging Database
Learn model of
sensor behavior
from historical data
and raise alarms
when it is
statistically
unusual
Find Motor Specific Anomalies in Motor Logs
Engineer’s
Report
Normal behavior
Abnormal behavior found
completely automatically by
algorithm
Model breaks threshold and
signal is sent to replace
motor after hours
Data and ML driven approach
gives days in advance
warning of ride issue
101 @ 19:32 stop due to non-responding motor Motor #1 failed to start up. Motor was visually verified and replaced and Ops Reset
102 @ 20:50 continue
Avoid Ride Shutdown
Offshore
Drilling
Ocean Floor (plan view)Equipment Perspective
Lidar Data from Drilling Site
Storing and Querying Lidar Data on Greenplum
• PostgreSQL extension for
storing point cloud (Lidar)
• Flexible schema document
format to handle variability of
Lidar data
• Integrated with PostGIS
geospatial library
https://github.com/pgpointcloud
Looking Ahead
• Consider adding edge data to your AI and analytics strategy, where it can add business value
• Intelligence will move towards the edge
• Many IoT projects are custom today, will see more standardization in the future
Learn More
● Pivotal Greenplum Overview: http://pivotal.io/greenplum
● Download Greenplum Database open source: https://greenplum.org/
● Follow Hired Brains Research:
○ Neil Raden: @NeilRaden
● Follow Pivotal
○ Frank McQuillan: @fmcquillan
○ @pivotaldata
○ @greenplum
Q&A
Adding Edge Data to Your AI and Analytics Strategy
Thank You!
Adding Edge Data to Your AI and Analytics Strategy
© Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0
Adding Edge Data to
Your AI and Analytics
Strategy

More Related Content

PDF
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
PDF
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
PDF
Modernizing Data Management
PDF
AI & Big Data Analytics : Innovation trends and use cases
PDF
A Connections-first Approach to Supply Chain Optimization
PPTX
Internet of things 14-dec2013
PDF
Artificial Intelligence & Machine Learning - A CIOs Perspective
PDF
Kick Off – Graphs: The Fuel Behind Innovation and Transformation in Every Field
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
Modernizing Data Management
AI & Big Data Analytics : Innovation trends and use cases
A Connections-first Approach to Supply Chain Optimization
Internet of things 14-dec2013
Artificial Intelligence & Machine Learning - A CIOs Perspective
Kick Off – Graphs: The Fuel Behind Innovation and Transformation in Every Field

What's hot (19)

PDF
Top 20 artificial intelligence companies to watch out in 2022
PPTX
In pursuit of augmented intelligence
PPTX
Guide to big data analytics
PDF
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
PDF
Enterprise IoT solution in 30 days
PDF
Robert Harrison, WMG - IIoT and Industry 4.0 in Automation Systems Engineering
PDF
1. The Importance of Graphs in Government
PDF
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
PDF
Building Open Data Markets Using Sensing as a Service Model
PPTX
Io t first(1)
PPTX
Advancing Cookstove Projects with Data Acquisition and Analytics
PDF
GETTING STARTED WITH IOT DATA MANAGEMENT
PPTX
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
PDF
Big data Introduction by Mohan
PPTX
Managing your Assets with Big Data Tools
PDF
Neo4j im Fianzsektor: DIVIZEND
PPTX
Big data analysis
PDF
Neo4j Graph Data Platform: Making Your Data More Intelligent
PPTX
Essential Tools For Your Big Data Arsenal
Top 20 artificial intelligence companies to watch out in 2022
In pursuit of augmented intelligence
Guide to big data analytics
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
Enterprise IoT solution in 30 days
Robert Harrison, WMG - IIoT and Industry 4.0 in Automation Systems Engineering
1. The Importance of Graphs in Government
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
Building Open Data Markets Using Sensing as a Service Model
Io t first(1)
Advancing Cookstove Projects with Data Acquisition and Analytics
GETTING STARTED WITH IOT DATA MANAGEMENT
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Big data Introduction by Mohan
Managing your Assets with Big Data Tools
Neo4j im Fianzsektor: DIVIZEND
Big data analysis
Neo4j Graph Data Platform: Making Your Data More Intelligent
Essential Tools For Your Big Data Arsenal
Ad

Similar to Adding Edge Data to Your AI and Analytics Strategy (20)

PDF
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
PDF
Zühlke Meetup - Mai 2017
PDF
IOT_MODULE_4.pd easy to understand notes
PDF
IIoT : Old Wine in a New Bottle?
PDF
Data Analytics for IoT - BrightTalk Webinar
PDF
Data Analytics for IoT
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PPTX
Io t research_arpanpal_iem
PDF
Cognitive-IoT-AI-Driven-Intelligence-for-Smart-Cities-Automation-and-Autonomo...
PDF
General introduction to IoTCrawler
PDF
WSO2Con ASIA 2016: IoT Analytics
PDF
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
PPTX
2015-09-16 IoT in Oil and Gas Conference
PDF
A Bigger Magnifying Glass: Analyzing the Internet of Things
PDF
Internet of Things Presentation to Los Angeles CTO Forum
PDF
Predictive Analytics: Why (I)IoT Is Different
PDF
IoT Analytics
PDF
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
PDF
AWS O&G Day - Ambyint and AWS
PDF
Cisco data analytics in ioe_rajiv niles_2015 nov
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Zühlke Meetup - Mai 2017
IOT_MODULE_4.pd easy to understand notes
IIoT : Old Wine in a New Bottle?
Data Analytics for IoT - BrightTalk Webinar
Data Analytics for IoT
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Io t research_arpanpal_iem
Cognitive-IoT-AI-Driven-Intelligence-for-Smart-Cities-Automation-and-Autonomo...
General introduction to IoTCrawler
WSO2Con ASIA 2016: IoT Analytics
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
2015-09-16 IoT in Oil and Gas Conference
A Bigger Magnifying Glass: Analyzing the Internet of Things
Internet of Things Presentation to Los Angeles CTO Forum
Predictive Analytics: Why (I)IoT Is Different
IoT Analytics
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
AWS O&G Day - Ambyint and AWS
Cisco data analytics in ioe_rajiv niles_2015 nov
Ad

More from VMware Tanzu (20)

PDF
Spring into AI presented by Dan Vega 5/14
PDF
What AI Means For Your Product Strategy And What To Do About It
PDF
Make the Right Thing the Obvious Thing at Cardinal Health 2023
PPTX
Enhancing DevEx and Simplifying Operations at Scale
PDF
Spring Update | July 2023
PPTX
Platforms, Platform Engineering, & Platform as a Product
PPTX
Building Cloud Ready Apps
PDF
Spring Boot 3 And Beyond
PDF
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
PDF
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
PDF
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
PPTX
tanzu_developer_connect.pptx
PDF
Tanzu Virtual Developer Connect Workshop - French
PDF
Tanzu Developer Connect Workshop - English
PDF
Virtual Developer Connect Workshop - English
PDF
Tanzu Developer Connect - French
PDF
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
PDF
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
PDF
SpringOne Tour: The Influential Software Engineer
PDF
SpringOne Tour: Domain-Driven Design: Theory vs Practice
Spring into AI presented by Dan Vega 5/14
What AI Means For Your Product Strategy And What To Do About It
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Enhancing DevEx and Simplifying Operations at Scale
Spring Update | July 2023
Platforms, Platform Engineering, & Platform as a Product
Building Cloud Ready Apps
Spring Boot 3 And Beyond
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
tanzu_developer_connect.pptx
Tanzu Virtual Developer Connect Workshop - French
Tanzu Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
Tanzu Developer Connect - French
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: Domain-Driven Design: Theory vs Practice

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
sap open course for s4hana steps from ECC to s4
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
sap open course for s4hana steps from ECC to s4

Adding Edge Data to Your AI and Analytics Strategy

  • 1. © Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0 Neil Raden Hired Brains Research @NeilRaden Frank McQuillan Pivotal @fmcquillan October 31, 2018 Adding Edge Data to Your AI and Analytics Strategy
  • 2. Market Perspective By Hired Brains Research Adding Edge Data to Your AI and Analytics Strategy
  • 3. Initial Observations ● Volume of data flowing from sensors is huge ● You can’t throw away sensor data, it doesn’t exist anywhere else ● IoT isn’t one thing, there are many very different models ● Edge architectures evolving quickly to be more intelligent ● But there is no context data at the edge, only telemetry ● To do Machine Learning, you need lots of other data and powerful analytical platforms which are NOT at the edge ● Managers want to evaluate IoT in tools they already know ● Over time, IoT will simply blend into enterprise apps ● Topology implies applications ● IoT will flow into existing applications Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 4. IoT is on track to connect 50 billion “smart” things by 2020 and 1 trillion sensors soon after, according to the National Science Foundation. Smart Dust is Covering the World Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 5. Internet of Things (IoT); Issues and Applications ● Buzz about IoT is dominated by discussions of architecture and technology ● We’re going to talk about data and analytics ● Physical architectures of IoT (mobile, self-contained intelligent edge devices, classic, etc.) determine the data and analytics that are possible Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 6. Lamborghini creates world's first 'self-healing' sports car The Terzo Millennio, which translates as third millennium in Italian, has the ability to detect and repair cracks in its body work. Using sensors the car can conduct its own health check to detect any damages and self-repair itself by filling the crack with nanotubes to prevent it spreading.
  • 7. What’s Wrong with Most IoT Diagrams? ● IoT is not an application ● All of the other data sources are second-hand ● Their data is created as part of an application and stored ● If we lose their data, we can always get it back ● If we lose IoT data, it’s gone forever ● Diagrams like these overlook the need for handling sensor data differently So What’s the Implication? Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 8. Sensor Data Has to be Governed Differently ● Sensor data is fresh. It hasn’t been used for anything else ● That raises two issues ○ Stewardship: responsibilities to be guardians of it because if we lose it, it’s gone, ○ What techniques and methodologies do we need to understand it? Intelligence generated at the edge and dump the rest? Or save it? Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 9. Extremely Simplified Diagram Sensor Data Local Logic Action Gateway Analytical Platform/AI/ML Ingest Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 10. Can We Afford to Transmit/Store All of That? ? Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 11. No More Managing from Scarcity Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 12. What If Ferrari’s Were Subject to Moore’s Law? Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 13. What Businesses Need from IoT Analytics ● Level of expressiveness sufficient for the specification, assembly and modification of common and complex business models without code ● This supports the notion about using existing tools ● The ability to accommodate all but the most esoteric kinds of modeling ● Frictionless, continuous intelligence ● Beyond that, there is data science…. Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 14. But Don’t Forget: “Nanos Gigantum Humeris Insidentes” ● Those who came before ● Devised brilliant models ● No need to start over ● New technology is not a substitute for good thinking ● Seek out those original thinkers Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 15. Industrial Internet of Things (IIoT); Issues and Applications Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 16. The IIoT “Market” from an Analyst’s POV ● IIoT is a concept, not a market ● Businesses want to see outcomes thru apps already familiar with ● Bring together not only sensor data but all the other contextual information necessary to make business decisions ● Show the impact sensor data has on production KPIs, so managers can adjust production accordingly ● IIoT gradually, seamlessly incorporated into enterprise systems Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 17. IIoT Topology Drives Types of Analytics ● One arrangement is a collection of self-contained units with lots of sensors ● Jet engines in flight, other conveyances, complex machinery, mining equipment. ● Equipped not only with sensors, but also are their own “edge” device, with the intelligence to monitor and control the machine ● The mining drill, e.g. comes equipped with software updated remotely. Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 18. Digital Twin: Simulation Models with IoT Data ● Virtual counterpart to real-world components ● Abstracting hardware monitoring and maintenance Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 19. Industrial Internet of Things (IIoT); Issues and Applications ● Software developed to run that piece of equipment, rather than conforming to some industry standard or protocol, may be a problem capturing that data and using it over time. ● In fact, like the case of John Deere and the farmers’ tractors, there may be conflict over who actually owns the data and can access the logic that captured it from the sensors. Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 20. What Businesses Need from IoT Analytics ● Is there an “informating” application for your data? ● Things to consider ● Do you have the infrastructure to capitalize on it? ● Do you have an agile enough culture for something different? Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 21. Three Things We Know About IoT Opportunities ● It’s not about the hardware and architecture ○ One thing will be common across all IoT applications: analytics will play a crucial role in operating and driving value from IoT resources ● IoT changes operations, not just costs ○ Merely looking to reduce costs will fail because the cost and effort to install instrumentation, conducting telemetry, perform complex analytics processing, etc., likely will exceed any operational savings. ● Existing Governance and Security Regimes Will Be Inadequate ○ IoT also is growing alongside a more common abstraction of compute and storage, changing access patterns further, not a monolithic application, but many use cases at once on the same repository Ⓒ 2018 Neil Raden Hired Brains Research 2018
  • 22. Pivotal Perspective Adding Edge Data to Your AI and Analytics Strategy
  • 23. IoT Components Device Management Edge Processing and Analytics Advanced Analytics and Machine Learning Business Decisions and Applications Edge Data Center/Cloud (Consumption at multiple locations.)
  • 24. Reduced Costs Drive IoT Growth
  • 25. What analytics can one do with sensor data? Where is the right place to perform these operations?
  • 27. Reference Architectures Adding Edge Data to Your AI and Analytics Strategy
  • 28. Parallel Configurable Data Load Parallel Data Load and External Tables Hadoop Data Lakes Modern Analytical Platform Predefined Libraries Programmatic GPText High Speed Ingestion Analytical Data to cache In-Memory Data Grid In-DB Predictive Analytics Public Cloud Data Lakes Other Data Platforms Cloud Native Analytical Apps (Data Microservices) Operationalization of Models Pivotal Technology for IoT Pivotal Application Service (PAS) Pivotal Container Service (PKS)
  • 29. Pivotal Greenplum Adding Edge Data to Your AI and Analytics Strategy
  • 30. ANALYTICAL APPLICATIONS NATIVE INTERFACES PIVOTAL GREENPLUM PLATFORM MULTI- STRUCTURED DATA SOURCES & PIPELINES Structured Data JDBC, ODBC SQL ANSI SQL USERS FLEXIBLE DEPLOYMENT Local Storage Other RDBMSes SparkGemFire Cloud Object Storage HDFS JSON, Apache AVRO, Apache Parquet & XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph Python. R, Java, Perl, C Programmatic Apache SOLR Text PostGIS GeoSpatial Custom Apps BI / Reporting Machine Learning AI On-Premises Public Clouds Private Clouds Fully Managed Clouds GREENPLUM MODERM DATA PLATFORM KafkaETL Spring Cloud Data Flow Massively Parallel (MPP) PostgresSQ L Kernel Petabyte Scale Loading Query Optimizer (GPORCA) Workload Manager Polymorphic Storage Command Center SQL Compatibilit y (Hyper-Q) Containers… Pivotal Greenplum
  • 31. Pivotal Greenplum Standby Master … Master Host SQL Interconnect Segment Host Node1 Segment Host Node2 Segment Host Node3 Segment Host NodeN Local Storage Other RDBMSes SparkGemFire Cloud Object Storage HDFS KafkaETL Spring Cloud Data Flow
  • 32. Data Transformation Traditional BI Machine Learning GraphData Science Productivity Tools Geospatial Text Greenplum Integrated Analytics
  • 33. Scalable, In-Database Machine Learning • Open source https://github.com/apache/madlib • Downloads and docs http://madlib.apache.org/ • Wiki https://cwiki.apache.org/confluence/display/MADLIB/ Apache MADlib: Big Data Machine Learning in SQL Open source, top level Apache project For PostgreSQL and Greenplum Database Powerful machine learning, graph, statistics and analytics for data scientists
  • 34. Functions Data Types and Transformations Array and Matrix Operations Matrix Factorization • Low Rank • Singular Value Decomposition (SVD) Norms and Distance Functions Sparse Vectors Encoding Categorical Variables Path Functions Pivot Sessionize Stemming Aug 2018 Graph All Pairs Shortest Path (APSP) Breadth-First Search Hyperlink-Induced Topic Search (HITS) Average Path Length Closeness Centrality Graph Diameter In-Out Degree PageRank and Personalized PageRank Single Source Shortest Path (SSSP) Weakly Connected Components Model Selection Cross Validation Prediction Metrics Train-Test Split Statistics Descriptive Statistics • Cardinality Estimators • Correlation and Covariance • Summary Inferential Statistics • Hypothesis Tests Probability Functions Supervised Learning Neural Networks Support Vector Machines (SVM) Conditional Random Field (CRF) Regression Models • Clustered Variance • Cox-Proportional Hazards Regression • Elastic Net Regularization • Generalized Linear Models • Linear Regression • Logistic Regression • Marginal Effects • Multinomial Regression • Naïve Bayes • Ordinal Regression • Robust Variance Tree Methods • Decision Tree • Random Forest Time Series Analysis • ARIMA Unsupervised Learning Association Rules (Apriori) Clustering (k-Means) Principal Component Analysis (PCA) Topic Modelling (Latent Dirichlet Allocation) Utility Functions Columns to Vector Conjugate Gradient Linear Solvers • Dense Linear Systems • Sparse Linear Systems Mini-Batching PMML Export Term Frequency for Text Vector to Columns Nearest Neighbors • k-Nearest Neighbors Sampling Balanced Random Stratified
  • 35. Geospatial Analytics with PostGIS is a spatial database extension which allows for analysis and processing of GIS objects Spatial Indexes & Bounding Boxes Round earth calculations RasterVector
  • 36. 100+ Libraries in Python & R Data Science Bundle
  • 37. Spring Cloud Data Flow Adding Edge Data to Your AI and Analytics Strategy
  • 38. Spring Cloud Data Flow is a microservices toolkit for building data integration and real-time data processing pipelines. Pipelines consist of Spring Boot apps, using Spring Cloud Stream for events or Spring Cloud Task for batch processes. The Data Flow server provides interfaces to compose and deploy pipelines onto a modern platform like Cloud Foundry and Kubernetes. Spring Cloud Data Flow
  • 39. Batch Integration Spring Cloud Task Tasks are finite Boot Microservices connecting to data/storage. System tracks invocations, exit-status. Messaging Integration Spring Cloud Stream Microservices operate on Message Streams. Loose coupling via Pub/Sub Topics over pluggable Message Bus like Kafka.
  • 40. IoT Event Streaming for ! Automotive Developed using - Spring Boot Microservices - Spring Cloud Stream - Spring Kafka Deployed on - Pivotal Cloud Foundry - Microsoft Azure & on-premise Integrating - Kafka, RabbitMQ - Hadoop, NiFi, Spark https://www.youtube.com/watch?v=_idsi5JWJj4 ★ Ingest vehicle data to cloud ★ Interpret event streams ★ Enrich with customer data ★ Expose in real-time to apps ★ Feed downstream analytics Next-Gen Data Workload
  • 41. Example Use Cases Adding Edge Data to Your AI and Analytics Strategy
  • 42. Public Safety in Extreme Weather
  • 43. Public Safety in Extreme Weather
  • 44. Public Safety in Extreme Weather
  • 45. Large Scale Fleet Management
  • 46. Large Scale Fleet Management For many large organizations with fleets of vehicles, fleet management practices have remained largely unchanged for 20+ years ● Manual process ● Siloed data ● Rules based approach to decision making (e.g. end of life after 12 years of service) ● No machine learning and/or optimization ● Opportunities to gain significant cost savings
  • 47. 25 GB/hr 380 TB/year Source: American Automobile Association (AAA) Modern Vehicle Instrumentation
  • 48. ● 200,000+ fleet of trucks ● Predictive maintenance ● Parts life estimation and bulk purchasing of parts ● Vehicle end-of-life prediction based on usage, maintenance and repair data ● Optimal point of resale ● Automated data collection, reporting and analytics ● Data stored in a secure, scalable, and integrated analytical environment Purchase/Financial Records Vehicle Telemetry Assignment Maintenance Records Repair Reports / Images Pivotal GemFire /Cloud Cache Normalized Real-Time Data Customer Database Reporting/BI Storage & Analytics DB Data Science Tools Operationalize ML Models Ext. Apps Fault interpretation data, service center data, etc. REST API Containerized Data Science Large Scale Fleet Management
  • 50. Current Timeline Mon Thu Motor on ride is weakened but no one is aware • Ride is Stopped due to fault • Ride evacuated • Team repairs motor • Ride restarted Desired Timeline Mon Thu Motor on ride is weakened and data is being collected Tue • Data from previous day suggests motor problem and triggers alarm • After park closes Team investigates • Finds and repairs weakened motor Guests enjoy ride all day! Learned Predictive Model (ML) Maintenance Logging Database Motor starts to fail, but not enough to trigger a ride stop Tu e Ride Failure Prediction
  • 51. Values of binary sensors every 1ms Good behavior: Clean transitions Bad behavior: Chatter Count 100ms windows per day with k or more transitions 1 or more 2 or more 3 or more Something is going on prior to day 90 Team discovers and fixes problem System alerts Team to problem Motor Logging Database Learn model of sensor behavior from historical data and raise alarms when it is statistically unusual Find Motor Specific Anomalies in Motor Logs
  • 52. Engineer’s Report Normal behavior Abnormal behavior found completely automatically by algorithm Model breaks threshold and signal is sent to replace motor after hours Data and ML driven approach gives days in advance warning of ride issue 101 @ 19:32 stop due to non-responding motor Motor #1 failed to start up. Motor was visually verified and replaced and Ops Reset 102 @ 20:50 continue Avoid Ride Shutdown
  • 54. Ocean Floor (plan view)Equipment Perspective Lidar Data from Drilling Site
  • 55. Storing and Querying Lidar Data on Greenplum • PostgreSQL extension for storing point cloud (Lidar) • Flexible schema document format to handle variability of Lidar data • Integrated with PostGIS geospatial library https://github.com/pgpointcloud
  • 56. Looking Ahead • Consider adding edge data to your AI and analytics strategy, where it can add business value • Intelligence will move towards the edge • Many IoT projects are custom today, will see more standardization in the future
  • 57. Learn More ● Pivotal Greenplum Overview: http://pivotal.io/greenplum ● Download Greenplum Database open source: https://greenplum.org/ ● Follow Hired Brains Research: ○ Neil Raden: @NeilRaden ● Follow Pivotal ○ Frank McQuillan: @fmcquillan ○ @pivotaldata ○ @greenplum
  • 58. Q&A Adding Edge Data to Your AI and Analytics Strategy
  • 59. Thank You! Adding Edge Data to Your AI and Analytics Strategy
  • 60. © Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0 Adding Edge Data to Your AI and Analytics Strategy