SlideShare a Scribd company logo
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016
Things we will cover
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 2
GOAL
Explain Cloud IoT, its challenges, and a
principled, agile approach to prediction amidst
uncertainty in such a way that people from a
broad audience can (hopefully) relate.
WILL
►  IoT, Cloud landscape, and CTL
►  Prediction Lifecycle
►  Challenges by business domain
►  Data Science Lessons Learned
WILL NOT
►  Big Data
►  Architecture
►  Algorithms
►  Technology
WHO WE ARE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 3
Who I am
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 4
I am interested in creating intelligent systems
through incorporating humans and machines in
an active learning loop.
►  Decision Scientist with PhD in HCI from Iowa
State
►  Principal Data Scientist for CenturyLink Cloud
►  Curricular Design, Educational Technology,
Online Advertising, Online Retail, Big Data
UX, Cloud, IoT, Physics
►  Hiking, Data journalism, Stocks, Horse Racing
ryankirk.info
Who we are: CenturyLink Cloud
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 5
+ ++
CLOUD COLOCATION NETWORK MANAGED
SERVICES
What is IoT
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 6
Human desire to connect ourselves to
each other via technology
►  Modern plumbing…
►  Telegraph ! Telephone
►  Telephone ! Dial-up
►  Dial-up ! HSN
►  HSN ! WAN
►  WAN ! IoT
Human desire to connect ourselves to
each other via technology to empower
each other
Internet growth > Hardware growth
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 7
motherboard.vice.com
newscientist.com
CenturyLink Cloud IoT Advantage
►  37 states
►  550,000 miles of network
►  Innovative Gigabit
fiber network
►  25MM+ consumer
endpoints
►  60+ DCS
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 8
PROBLEM
STATEMENT
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 9
Problem statement:
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 10
►  Prevent incidents
through early
detection
►  Reduce MTTR by
facilitating root-cause
analytics
►  Facilitate domain
experts and harvest
their knowledge "
GOAL
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 11
Build a real-time artificial intelligence
capable of analyzing all incoming
streams of data in order to know
which actions our machines need to
automatically take.
It’s simple, really… build Skynet
PREDICTION
LANDSCAPE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 12
Prediction Adoption Model
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 13
Stage I:
INTRODUCTION
1. Design
2. Measure
Stage III:
MATURITY
5. Predict
6. Act TIME
SOPHISTICATION
INTRO GROWTH MATURITY DECLINE
Stage II:
GROWTH
3. Describe
4. Detect
Stage IV:
DECLINE
7. Feedback
8. Obsolescence
Prediction Adoption Model (actual)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 14
TIME
SOPHISTICATION
CHECK
THIS
OUT
OH NO,
OH NO,
OH NO!
HAHA,
IT
WORKED!
I NEVER
SAID IT
WOULD …
Stage I:
CHECK
THIS OUT
1. It runs
2. Results are
promising
Stage III:
HAHA,
IT WORKED!
5. I surprise myself
sometimes
6. I found a
shortcut to scale it
Stage II:
OH NO, OH NO,
OH NO!
3. It works but it’s
terrible
4. It will never scale
Stage IV:
I NEVER SAID
IT WOULD…
7. How do I prove it is
still working?
8. There is no way to
apply it to this scenario
Stage I: INTRODUCTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 15
1. Design
►  What should we measure?
►  What are the core business
processes?
►  What is the unit of analysis?
►  What are our research questions/
hypotheses?
2. Measure
►  Do we push or pull?
►  How often should we measure?
►  How long do we need the data?
►  How do we represent the data
schema?
Stage II: GROWTH
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 16
3. Describe
►  Which metrics relate to our
outcomes of interest?
►  What is the typical value of each
metric?
►  How do you visualize each
metric?
4. Detect
►  What do we expect to happen?
►  Which values/events are
unexpected?
►  When should we alert?
►  How will we scale our analysis?
Stage III: MATURITY
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 17
7. Predict
►  Are there patterns?
►  Are there more complex
relationships?
►  What is going to happen?
►  How do we get training data?
6. Act
►  What actions should we take?
►  How can we incorporate new
outcomes into the current
model?
Stage IV: DECLINE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 18
7. Feedback
►  Is my model primarily basing its
decisions upon its previous
decisions?
►  Can I separate the model from its
parameters?
►  Can I still evaluate accuracy?
8. Obsolescence
►  Are my business scenarios still
grounded?
►  Do my model assumptions still hold?
►  Does it still scale?
►  Is the intervention still needed?
Domain process involvement
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 19
BUSINESS
►  Is involved early
in defining
requirements
ENGINEERING
►  Builds MVP
►  Solidifies solution
RESEARCH
►  Builds prototype
and suggests
solution
SOLUTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 20
Working backwards
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 21
ITEM
1 Skynet
2 Action mapping
3 Action landscape
4 Prediction
5 Categorical learning
6 Training Data
7 Feedback loop
8 High SNR
9 Unsupervised learning
10 Anomaly Detection
11 Normalization
12 Retention
13 Sampling
14 Collection
15 Approach
16 Domain model
“In life, unless you’re more gifted than
Einstein, inversion [i.e. working
backwards] will help you solve
problems.”
Charlie Munger
Working backwards (cont.)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 22
ITEM STAGE
1 Skynet ACT
2 Action mapping ACT
3 Action landscape ACT
4 Prediction PREDICT
5 Categorical learning PREDICT
6 Training Data PREDICT
7 Feedback loop PREDICT
8 High SNR DETECT
9 Unsupervised learning DETECT
10 Anomaly Detection DETECT
11 Normalization DESCRIBE
12 Retention DESCRIBE
13 Sampling MEASURE
14 Collection MEASURE
15 Approach DESIGN
16 Domain model DESIGN
TIME
SOPHISTICATION
INTRO GROWTH MATURITY DECLINE
Working backwards (cont.)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 23
ITEM STAGE PRIMARY DOMAIN
1 Skynet ACT ENGINEERING
2 Action mapping ACT BUSINES
3 Action landscape ACT RESEARCH
4 Prediction PREDICT RESEARCH
5 Categorical learning PREDICT RESEARCH
6 Training Data PREDICT ENGINEERING
7 Feedback loop PREDICT BUSINESS
8 High SNR DETECT RESEARCH
9 Unsupervised learning DETECT RESEARCH
10 Anomaly Detection DETECT RESEARCH
11 Normalization DESCRIBE RESEARCH
12 Retention DESCRIBE ENGINEERING
13 Sampling MEASURE RESEARCH
14 Collection MEASURE ENGINEERING
15 Approach DESIGN RESEARCH
16 Domain model DESIGN BUSINESS
This is a WIP
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 24
ITEM STAGE PRIMARY DOMAIN
1 Skynet ACT ENGINEERING
2 Action mapping ACT BUSINES
3 Action landscape ACT RESEARCH
4 Prediction PREDICT RESEARCH
5 Categorical learning PREDICT RESEARCH
6 Training Data PREDICT ENGINEERING
7 Feedback loop PREDICT BUSINESS
8 High SNR DETECT RESEARCH
9 Unsupervised learning DETECT RESEARCH
10 Anomaly Detection DETECT RESEARCH
11 Normalization DESCRIBE RESEARCH
12 Sampling MEASURE RESEARCH
13 Collection MEASURE ENGINEERING
14 Domain model DESIGN BUSINESS
QUEUED
(StampedCon 2017?)
WORKING
PRODUCTION
LESSONS
LEARNED
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 25
16. DOMAIN MODEL
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 26
►  938,076 metrics
►  Verify the unique stream of
data across systems
►  Key-based
DESIGN
15. APPROACH
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 27
VARIABILITY
►  Changes in observed state
►  Plan for variability
UNCERTAINTY
►  Unobserved state(s)
►  Design for uncertainty
DESIGN (cont.)
14. COLLECTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 28
►  Agreement of signals
►  Cacophony of
signals
►  How often should we
measure?
►  We have no labeled
training data
►  An approach we
can build upon in the
future
MEASURE
13. SAMPLING
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 29
Shannon-Nyquist Paradox
►  The more you measure
something the more it varies
►  Bias related to time and
variability
►  EG. Temperature yesterday
was 68 degrees
MEASURE (cont.)
12. RETENTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 30
►  Recall that precision relates to
sampling consistency
►  Not all metrics are created
equal
►  Coverage remains
problematic
DESCRIBE
11. NORMALIZATION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 31
Kievit, R.A., Frankenhuis, et al. (2013). Simpson’s paradox in
psychological science. Frontiers in Psychology
Simpson’s Paradox
►  aggregate trend != sum of
individual trends
►  Applies to all aggregates:
sums, averages, correlations,
etc.
►  What is the unit of analysis?
DESCRIBE (cont.)
26-Jul-16 32
Predicted
CenturyLink Confidential
Actual Boundary
10. ANOMALY DETECTION
►  Capture the time series data
for each piece of connected
platform technology
►  Find implicit anomalies within a
time series vector
►  Values that are surprising
►  Highly scalable
DETECT
presented by Ryan Kirk at StampedeCon 2016
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 33
►  Time series data shows
the context behind
anomalies that co-occur
►  Group anomalous
vectors based upon
structural properties and
co-occurrence
►  Up-level anomalies into
higher-order alerts using
contextual information
9. UNSUPERVISED
LEARNING
DETECT (cont.)
8. HIGH SNR
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 34
►  We have also built a search
engine for time series data
that allows us to build cool
looking graphs in real-time
►  We basically do all of this to
empower slack alerts
►  Allows tags to propagate
forwards
7. FEEDBACK LOOP PREDICT
6. TRAINING DATA
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 35
►  Evaluate ALL assumptions
in regards to training data
►  Ideally use active learning
approach or risk
becoming tautological
PREDICT (cont.)
RESULTS
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 36
Prediction Results
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 37
►  38,392,438 predictions every 24hr.
►  Anomaly rate < 0.01% (0.0001)
~3K anomalies/day
►  Accuracy is ~90%
►  Prediction latency ~3.0 seconds
►  ~30 Higher order alerts/day
Want to join me?
Let’s connect:
►  @ryan_kirk
Try CenturyLink Cloud free:
►  ctl.io
We are hiring
►  ctl.io/careers/jobs
Thanks to:
►  StampedeCon2016
►  pixabay.com
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 38

More Related Content

PPTX
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
PDF
How to get started in Big Data without Big Costs - StampedeCon 2016
PPTX
Creating a Data Driven Organization - StampedeCon 2016
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
PDF
Can we know the future? By John Wilkins
PDF
Floods of Twitter Data - StampedeCon 2016
PDF
Interplay of Big Data and IoT - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Can we know the future? By John Wilkins
Floods of Twitter Data - StampedeCon 2016
Interplay of Big Data and IoT - StampedeCon 2016

Viewers also liked (12)

PPTX
Enabling Diverse Workload Scheduling in YARN
PPTX
Get most out of Spark on YARN
PPTX
HBase Operations and Best Practices
PPTX
ML on Big Data: Real-Time Analysis on Time Series
PDF
Stock Market Analysis
PDF
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
PDF
Time Series Analysis with Spark by Sandy Ryza
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
PDF
Building large scale applications in yarn with apache twill
PDF
Harnessing the power of YARN with Apache Twill
PPTX
A Multi Colored YARN
Enabling Diverse Workload Scheduling in YARN
Get most out of Spark on YARN
HBase Operations and Best Practices
ML on Big Data: Real-Time Analysis on Time Series
Stock Market Analysis
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Time Series Analysis with Spark by Sandy Ryza
Analyzing Time Series Data with Apache Spark and Cassandra
File Format Benchmark - Avro, JSON, ORC & Parquet
Building large scale applications in yarn with apache twill
Harnessing the power of YARN with Apache Twill
A Multi Colored YARN
Ad

Similar to Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016 (20)

PDF
From AirBox to Smart City: where are we and what's next?
PPTX
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
PPTX
Enabling Data-Driven Private-Public Collaborations
PDF
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
PDF
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
PDF
Walk This Way: CIS CSC and NIST CSF is the 80 in the 80/20 rule
PPTX
probability and statistics.pptx
PPTX
probability & statistics
PPTX
Streaming Hypothesis Reasoning - William Smith, Jan 2016
PDF
Why I Am a Software Engineer
PDF
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
PDF
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
PPTX
Streaming HYpothesis REasoning
PDF
OSMC 2024 | Palindrome.js: 3D Monitoring for Distributed Systems by Jonathan ...
PPTX
Investigating data scientists
PDF
First QuantUniversity Online Summit -2020
PDF
Challenges in Analytics for BIG Data
PPTX
A Blockchain Technology Solution to Breaking Hilton Hotel's links to Human Tr...
PDF
Isep master thesis
From AirBox to Smart City: where are we and what's next?
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
Enabling Data-Driven Private-Public Collaborations
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Walk This Way: CIS CSC and NIST CSF is the 80 in the 80/20 rule
probability and statistics.pptx
probability & statistics
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Why I Am a Software Engineer
IC-SDV 2018: Harald Jenny (CENTREDOC) When Artificial Intelligence Joins Inte...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Streaming HYpothesis REasoning
OSMC 2024 | Palindrome.js: 3D Monitoring for Distributed Systems by Jonathan ...
Investigating data scientists
First QuantUniversity Online Summit -2020
Challenges in Analytics for BIG Data
A Blockchain Technology Solution to Breaking Hilton Hotel's links to Human Tr...
Isep master thesis
Ad

More from StampedeCon (20)

PDF
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
PDF
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
PDF
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
PDF
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
PDF
Foundations of Machine Learning - StampedeCon AI Summit 2017
PDF
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
PDF
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
PDF
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
PDF
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
PDF
A Different Data Science Approach - StampedeCon AI Summit 2017
PDF
Graph in Customer 360 - StampedeCon Big Data Conference 2017
PDF
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
PDF
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
PDF
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
PPTX
Using The Internet of Things for Population Health Management - StampedeCon 2016
PDF
Turn Data Into Actionable Insights - StampedeCon 2016
PDF
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
PDF
Visualizing Big Data – The Fundamentals
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Using The Internet of Things for Population Health Management - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
Visualizing Big Data – The Fundamentals

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Artificial Intelligence
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Artificial Intelligence
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

  • 2. Things we will cover 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 2 GOAL Explain Cloud IoT, its challenges, and a principled, agile approach to prediction amidst uncertainty in such a way that people from a broad audience can (hopefully) relate. WILL ►  IoT, Cloud landscape, and CTL ►  Prediction Lifecycle ►  Challenges by business domain ►  Data Science Lessons Learned WILL NOT ►  Big Data ►  Architecture ►  Algorithms ►  Technology
  • 3. WHO WE ARE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 3
  • 4. Who I am 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 4 I am interested in creating intelligent systems through incorporating humans and machines in an active learning loop. ►  Decision Scientist with PhD in HCI from Iowa State ►  Principal Data Scientist for CenturyLink Cloud ►  Curricular Design, Educational Technology, Online Advertising, Online Retail, Big Data UX, Cloud, IoT, Physics ►  Hiking, Data journalism, Stocks, Horse Racing ryankirk.info
  • 5. Who we are: CenturyLink Cloud 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 5 + ++ CLOUD COLOCATION NETWORK MANAGED SERVICES
  • 6. What is IoT 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 6 Human desire to connect ourselves to each other via technology ►  Modern plumbing… ►  Telegraph ! Telephone ►  Telephone ! Dial-up ►  Dial-up ! HSN ►  HSN ! WAN ►  WAN ! IoT Human desire to connect ourselves to each other via technology to empower each other
  • 7. Internet growth > Hardware growth 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 7 motherboard.vice.com newscientist.com
  • 8. CenturyLink Cloud IoT Advantage ►  37 states ►  550,000 miles of network ►  Innovative Gigabit fiber network ►  25MM+ consumer endpoints ►  60+ DCS 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 8
  • 9. PROBLEM STATEMENT 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 9
  • 10. Problem statement: 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 10 ►  Prevent incidents through early detection ►  Reduce MTTR by facilitating root-cause analytics ►  Facilitate domain experts and harvest their knowledge "
  • 11. GOAL 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 11 Build a real-time artificial intelligence capable of analyzing all incoming streams of data in order to know which actions our machines need to automatically take. It’s simple, really… build Skynet
  • 12. PREDICTION LANDSCAPE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 12
  • 13. Prediction Adoption Model 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 13 Stage I: INTRODUCTION 1. Design 2. Measure Stage III: MATURITY 5. Predict 6. Act TIME SOPHISTICATION INTRO GROWTH MATURITY DECLINE Stage II: GROWTH 3. Describe 4. Detect Stage IV: DECLINE 7. Feedback 8. Obsolescence
  • 14. Prediction Adoption Model (actual) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 14 TIME SOPHISTICATION CHECK THIS OUT OH NO, OH NO, OH NO! HAHA, IT WORKED! I NEVER SAID IT WOULD … Stage I: CHECK THIS OUT 1. It runs 2. Results are promising Stage III: HAHA, IT WORKED! 5. I surprise myself sometimes 6. I found a shortcut to scale it Stage II: OH NO, OH NO, OH NO! 3. It works but it’s terrible 4. It will never scale Stage IV: I NEVER SAID IT WOULD… 7. How do I prove it is still working? 8. There is no way to apply it to this scenario
  • 15. Stage I: INTRODUCTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 15 1. Design ►  What should we measure? ►  What are the core business processes? ►  What is the unit of analysis? ►  What are our research questions/ hypotheses? 2. Measure ►  Do we push or pull? ►  How often should we measure? ►  How long do we need the data? ►  How do we represent the data schema?
  • 16. Stage II: GROWTH 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 16 3. Describe ►  Which metrics relate to our outcomes of interest? ►  What is the typical value of each metric? ►  How do you visualize each metric? 4. Detect ►  What do we expect to happen? ►  Which values/events are unexpected? ►  When should we alert? ►  How will we scale our analysis?
  • 17. Stage III: MATURITY 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 17 7. Predict ►  Are there patterns? ►  Are there more complex relationships? ►  What is going to happen? ►  How do we get training data? 6. Act ►  What actions should we take? ►  How can we incorporate new outcomes into the current model?
  • 18. Stage IV: DECLINE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 18 7. Feedback ►  Is my model primarily basing its decisions upon its previous decisions? ►  Can I separate the model from its parameters? ►  Can I still evaluate accuracy? 8. Obsolescence ►  Are my business scenarios still grounded? ►  Do my model assumptions still hold? ►  Does it still scale? ►  Is the intervention still needed?
  • 19. Domain process involvement 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 19 BUSINESS ►  Is involved early in defining requirements ENGINEERING ►  Builds MVP ►  Solidifies solution RESEARCH ►  Builds prototype and suggests solution
  • 20. SOLUTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 20
  • 21. Working backwards 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 21 ITEM 1 Skynet 2 Action mapping 3 Action landscape 4 Prediction 5 Categorical learning 6 Training Data 7 Feedback loop 8 High SNR 9 Unsupervised learning 10 Anomaly Detection 11 Normalization 12 Retention 13 Sampling 14 Collection 15 Approach 16 Domain model “In life, unless you’re more gifted than Einstein, inversion [i.e. working backwards] will help you solve problems.” Charlie Munger
  • 22. Working backwards (cont.) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 22 ITEM STAGE 1 Skynet ACT 2 Action mapping ACT 3 Action landscape ACT 4 Prediction PREDICT 5 Categorical learning PREDICT 6 Training Data PREDICT 7 Feedback loop PREDICT 8 High SNR DETECT 9 Unsupervised learning DETECT 10 Anomaly Detection DETECT 11 Normalization DESCRIBE 12 Retention DESCRIBE 13 Sampling MEASURE 14 Collection MEASURE 15 Approach DESIGN 16 Domain model DESIGN TIME SOPHISTICATION INTRO GROWTH MATURITY DECLINE
  • 23. Working backwards (cont.) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 23 ITEM STAGE PRIMARY DOMAIN 1 Skynet ACT ENGINEERING 2 Action mapping ACT BUSINES 3 Action landscape ACT RESEARCH 4 Prediction PREDICT RESEARCH 5 Categorical learning PREDICT RESEARCH 6 Training Data PREDICT ENGINEERING 7 Feedback loop PREDICT BUSINESS 8 High SNR DETECT RESEARCH 9 Unsupervised learning DETECT RESEARCH 10 Anomaly Detection DETECT RESEARCH 11 Normalization DESCRIBE RESEARCH 12 Retention DESCRIBE ENGINEERING 13 Sampling MEASURE RESEARCH 14 Collection MEASURE ENGINEERING 15 Approach DESIGN RESEARCH 16 Domain model DESIGN BUSINESS
  • 24. This is a WIP 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 24 ITEM STAGE PRIMARY DOMAIN 1 Skynet ACT ENGINEERING 2 Action mapping ACT BUSINES 3 Action landscape ACT RESEARCH 4 Prediction PREDICT RESEARCH 5 Categorical learning PREDICT RESEARCH 6 Training Data PREDICT ENGINEERING 7 Feedback loop PREDICT BUSINESS 8 High SNR DETECT RESEARCH 9 Unsupervised learning DETECT RESEARCH 10 Anomaly Detection DETECT RESEARCH 11 Normalization DESCRIBE RESEARCH 12 Sampling MEASURE RESEARCH 13 Collection MEASURE ENGINEERING 14 Domain model DESIGN BUSINESS QUEUED (StampedCon 2017?) WORKING PRODUCTION
  • 25. LESSONS LEARNED 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 25
  • 26. 16. DOMAIN MODEL 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 26 ►  938,076 metrics ►  Verify the unique stream of data across systems ►  Key-based DESIGN
  • 27. 15. APPROACH 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 27 VARIABILITY ►  Changes in observed state ►  Plan for variability UNCERTAINTY ►  Unobserved state(s) ►  Design for uncertainty DESIGN (cont.)
  • 28. 14. COLLECTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 28 ►  Agreement of signals ►  Cacophony of signals ►  How often should we measure? ►  We have no labeled training data ►  An approach we can build upon in the future MEASURE
  • 29. 13. SAMPLING 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 29 Shannon-Nyquist Paradox ►  The more you measure something the more it varies ►  Bias related to time and variability ►  EG. Temperature yesterday was 68 degrees MEASURE (cont.)
  • 30. 12. RETENTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 30 ►  Recall that precision relates to sampling consistency ►  Not all metrics are created equal ►  Coverage remains problematic DESCRIBE
  • 31. 11. NORMALIZATION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 31 Kievit, R.A., Frankenhuis, et al. (2013). Simpson’s paradox in psychological science. Frontiers in Psychology Simpson’s Paradox ►  aggregate trend != sum of individual trends ►  Applies to all aggregates: sums, averages, correlations, etc. ►  What is the unit of analysis? DESCRIBE (cont.)
  • 32. 26-Jul-16 32 Predicted CenturyLink Confidential Actual Boundary 10. ANOMALY DETECTION ►  Capture the time series data for each piece of connected platform technology ►  Find implicit anomalies within a time series vector ►  Values that are surprising ►  Highly scalable DETECT presented by Ryan Kirk at StampedeCon 2016
  • 33. 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 33 ►  Time series data shows the context behind anomalies that co-occur ►  Group anomalous vectors based upon structural properties and co-occurrence ►  Up-level anomalies into higher-order alerts using contextual information 9. UNSUPERVISED LEARNING DETECT (cont.) 8. HIGH SNR
  • 34. 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 34 ►  We have also built a search engine for time series data that allows us to build cool looking graphs in real-time ►  We basically do all of this to empower slack alerts ►  Allows tags to propagate forwards 7. FEEDBACK LOOP PREDICT
  • 35. 6. TRAINING DATA 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 35 ►  Evaluate ALL assumptions in regards to training data ►  Ideally use active learning approach or risk becoming tautological PREDICT (cont.)
  • 36. RESULTS 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 36
  • 37. Prediction Results 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 37 ►  38,392,438 predictions every 24hr. ►  Anomaly rate < 0.01% (0.0001) ~3K anomalies/day ►  Accuracy is ~90% ►  Prediction latency ~3.0 seconds ►  ~30 Higher order alerts/day
  • 38. Want to join me? Let’s connect: ►  @ryan_kirk Try CenturyLink Cloud free: ►  ctl.io We are hiring ►  ctl.io/careers/jobs Thanks to: ►  StampedeCon2016 ►  pixabay.com 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 38