SlideShare a Scribd company logo
Data Science Case Studies: The Internet of Things: Implications for the Enterprise
2© 2015 Pivotal Software, Inc. All rights reserved. 2© 2015 Pivotal Software, Inc. All rights reserved.
Internet of Things:
Implications for the Enterprise
Rashmi Raghu, Ph.D.
Principal Data Scientist
3© 2015 Pivotal Software, Inc. All rights reserved.
Gene Sequencing
Smart Grids
COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$100M IN
2001
TO $10K IN 2011
TO $1K IN 2014
READING SMART METERS
EVERY 15 MINUTES IS
3000X MORE
DATA INTENSIVE
Stock Market
Social Media
FACEBOOK UPLOADS
250 MILLION
PHOTOS EACH DAY
Billions of Data Points
Oil Exploration
Video Surveillance
OIL RIGS GENERATE
25000
DATA POINTS
PER SECOND
Medical Imaging
Mobile Sensors
4© 2015 Pivotal Software, Inc. All rights reserved.
Implications for the Enterprise
Ÿ  Organizational
–  Vision
–  Preparedness
–  Execution
Ÿ  Technical
–  Data quality & completeness
–  Heterogeneity of data sources
–  Technology architecture
5© 2015 Pivotal Software, Inc. All rights reserved.
Implications for the Enterprise
Ÿ  Organizational
–  Vision
–  Preparedness
–  Execution
Ÿ  Technical
–  Data quality & completeness
–  Heterogeneity of data sources
–  Technology architecture
Issues in any of these have implications for data science
approaches and their effectiveness
6© 2015 Pivotal Software, Inc. All rights reserved.
Case Studies
Oil Drilling Telecommunications
Predictive Maintenance Customer Micro-segmentation
7© 2015 Pivotal Software, Inc. All rights reserved.
Case Studies
Oil Drilling Telecommunications
Predictive Maintenance Customer Micro-segmentation
8© 2015 Pivotal Software, Inc. All rights reserved.
Data: The New Oil
Ÿ  Oil & gas exploration and production activities generate
large amounts of data from sensors
Ÿ  What opportunities exist for data-driven approaches to
improve operations?
Drilling into the San Andreas Fault at Parkfield California.
Credit: Stephen H. Hickman, USGS
*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry
9© 2015 Pivotal Software, Inc. All rights reserved.
Data: The New Oil
Ÿ  Oil & gas exploration and production activities generate
large amounts of data from sensors
Ÿ  What opportunities exist for data-driven approaches to
improve operations?
Drilling into the San Andreas Fault at Parkfield California.
Credit: Stephen H. Hickman, USGS
*http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry
Predictive maintenance
•  Predict equipment function and failure
•  Motivation: Failure costs estimated at
$150,000/incident (billions annually)*
•  Goals:
–  Early warning system
–  Insights into prominent features impacting
operation and failure
–  Reduction of non-productive drill time
–  Reduced incidents
10© 2015 Pivotal Software, Inc. All rights reserved.
Predictive Maintenance for Drilling Operations
Integrating
& Cleansing
Feature
Building
Modeling
11© 2015 Pivotal Software, Inc. All rights reserved.
Primary Data Sources
Integrating
& Cleansing
Feature
Building
Modeling
Integrated Data
Primary data sources
Operator Data
( ~ thousands of records )
•  Failure details
•  Component details
•  Drill Bit details
Drill Rig Sensor Data
( ~ billions of records )
•  Rate of Penetration (ROP)
•  RPM
•  Weight on Bit (WOB) …
12© 2015 Pivotal Software, Inc. All rights reserved.
Primary Data Sources: Challenges
Integrating
& Cleansing
Feature
Building
Modeling
Primary data sources
Operator Data
( ~ thousands of records )
•  Failure details
•  Component details
•  Drill Bit details
Drill Rig Sensor Data
( ~ billions of records )
•  Rate of Penetration (ROP)
•  RPM
•  Weight on Bit (WOB) …
Challenges
•  Failure instances not clearly labeled
•  Labels may be embedded in reports or comments
Implications
•  Dependent variable generation also becomes a
machine learning exercise
•  Accuracy of failure prediction impacted by
accuracy of failure label derivation
13© 2015 Pivotal Software, Inc. All rights reserved.
Primary Data Sources: Challenges
Well ID Depth Comment Event flag
1 1000 equipment not responding 1
2 2000 TOOH to bit. rubber pieces seen 1
Integrating
& Cleansing
Feature
Building
Modeling
•  Dependent variable generation – a machine learning exercise
•  Text analytics pipeline needed to convert failure reports or comments to event flags
14© 2015 Pivotal Software, Inc. All rights reserved.
Complex Feature Set Across Data Sources
Integrating
& Cleansing
Feature
Building
Modeling
•  A failure occurred at the
end of this run
•  Taking a window of time
prior to failure, what
features could we extract
(e.g. variance of RPM,
max bit position velocity)?
BitpositionRPM
ROPWOB
15© 2015 Pivotal Software, Inc. All rights reserved.
Complex Feature Set Across Data Sources
•  Depth
•  Rate of Penetration
•  Torque
•  Weight on Bit
•  RPM
•  …
•  Drill Bit details
•  Component
details etc.
•  Failure events
•  …
Features on
Time
Windows
•  Mean
•  Median
•  Standard Deviation
•  Range
•  Skewness
•  …
Final Set of
Features on
Time
Windows
•  Leverage GPDB / HAWQ (+ MADlib, PL/X) for fast computation of hundreds of features
over time windows within billions of rows (or more) of time-series data
Operator
data
Drill Rig
Sensor
data
16© 2015 Pivotal Software, Inc. All rights reserved.
Predictive Maintenance App Pipeline
Data Lake
Ingest
Business Levers
Early Warning System
Rig Operator Dashboard
Models
•  Elastic Net Regression
•  Cox Proportional
Hazards Regression
•  Decision Trees
Initial data
cleansing filters
Wells with failure
scores and early
warning indicators
Feedback loop for continuous
model improvementDomain
Knowledge
Oil Rig
Operator
HAWQ
GPDB
PL/X
MADlib
R Python
CJava Perl
Spark + MLlib
17© 2015 Pivotal Software, Inc. All rights reserved.
Case Studies
Oil Drilling Telecommunications
Predictive Maintenance Customer Micro-segmentation
18© 2015 Pivotal Software, Inc. All rights reserved.
State of Data at Telco Company
Customer Segments New Data Sources
Multi-Gadget Families Affluent Matures
Thrifty Families High Tech Singles
Budget Singles Seniors
Internet Deep Packet
Inspection
TV Consumption (Linear)
Video On Demand
Consumption
19© 2015 Pivotal Software, Inc. All rights reserved.
Native Services
Video On
Demand TVInternet
Internet Devices
OTT (Over The Top) Services
What is the level of engagement with
client’s products (TV, VOD, Internet)?
What are the patterns of device usage
behavior?
What is the level of OTT engagement, by
segment, and by bandwidth?
Understanding Subscriber Behavior
20© 2015 Pivotal Software, Inc. All rights reserved.
Newly Identified Behavior-Based SegmentsSubscribers
Moderates
OTT & Data Heavyweights
Portable OTT Entertainment Seekers
iPhone Heavy
Android Heavy
iPad Heavy
In-Home OTT Entertainment Seekers
In-Home Native Content Seekers
VOD Heavy
TV Heavy
21© 2015 Pivotal Software, Inc. All rights reserved.
Moderates
OTT & Data Heavyweights
In-Home OTT Entertainment Seekers
Portable OTT Entertainment Seekers - iPhone Heavy
Portable OTT Entertainment Seekers - Android Heavy
Portable OTT Entertainment Seekers - iPad Heavy
In-Home Native Content Seekers - VOD Heavy
In-Home Native Content Seekers - TV Heavy
Cross Behavior-based and Existing Segments
New Behavior-Based Segments
Customized Micro-Segments!
Existing Segments
Multi-Gadget Families
Affluent Matures
Thrifty Families
Budget Singles
High Tech Singles
Seniors
22© 2015 Pivotal Software, Inc. All rights reserved.
Heterogeneous Data Sources
Ÿ  Prevalence of new data sources was
limited but increasing
–  Rich usage data available on a
subset of the subscribers
–  Leads to limited applicability of
micro-segments
Ÿ  Lack of data may be alleviated by
expanding data science efforts
–  Leverage micro-segmentation model to
score a different subset of subscribers
(who we have limited data on)
New Data Sources
Internet Deep Packet
Inspection
TV Consumption (Linear)
Video On Demand
Consumption
23© 2015 Pivotal Software, Inc. All rights reserved.
Driving New Business Value
Upsell and Cross-Sell New Product Offerings Data Monetization
24© 2015 Pivotal Software, Inc. All rights reserved.
Implications for the Enterprise
Ÿ  Organizational
–  Vision
–  Preparedness
–  Execution
Ÿ  Technical / Data
–  Data quality & completeness
–  Heterogeneity of data sources
–  Technology architecture
•  Data quality & completeness:
•  Data capture mechanisms can have a lasting impact on ability to solve a
business problem
•  Heterogeneity of data sources:
•  Existence of legacy systems & devices may limit the applicability of new models
unless that is taken into account ahead of time
•  Feedback to spur upgrading of equipment wherever possible
25© 2015 Pivotal Software, Inc. All rights reserved.
Implications for the Enterprise
Ÿ  Creating value from IoT requires organizational and technical alignment
Ÿ  Impacts of these considerations on data science efforts and outcomes
are non-trivial
Ÿ  Specific impacts of data issues include:
–  Longer time to realization of value
–  Model accuracy issues
–  Limited applicability of results
–  And more …
26© 2015 Pivotal Software, Inc. All rights reserved.
For further information, checkout …
Ÿ  Pivotal Blog @ http://blog.pivotal.io
Ÿ  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal
Ÿ  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data
Ÿ  Oil & Gas Use Case Webinar:
–  Video: https://www.youtube.com/watch?v=dhT-tjHCr9E
–  Slides: http://www.slideshare.net/Pivotal/data-as-thenewoil
Ÿ  Blogs:
–  Oil & Gas Use Case:
http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-
industry
–  Time Series Analysis: http://blog.pivotal.io/tag/time-series-analysis
Data Science Case Studies: The Internet of Things: Implications for the Enterprise

More Related Content

PDF
Data science presentation 2nd CI day
PPTX
Data science applications and usecases
PPTX
Introduction to data science
ODP
Introduction To Analytics
PPTX
Dimensional Modeling
PDF
Practicing Data Science: A Collection of Case Studies
PPTX
Python pandas Library
PDF
Data visualization in Python
Data science presentation 2nd CI day
Data science applications and usecases
Introduction to data science
Introduction To Analytics
Dimensional Modeling
Practicing Data Science: A Collection of Case Studies
Python pandas Library
Data visualization in Python

What's hot (20)

PPTX
Introduction to Data Engineering
PPTX
Data Modeling PPT
PPTX
Data mining
PDF
RWDG Slides: A Complete Set of Data Governance Roles & Responsibilities
PPTX
Intro to Data Science by DatalentTeam at Data Science Clinic#11
PDF
Data science presentation
PDF
Data visualization introduction
PPTX
In-Memory Big Data Analytics
PPTX
Data Mining : Concepts
PPTX
Exploratory data analysis
PPTX
Introduction to Data Engineering
PDF
Introduction to Data Science
PDF
Tools and techniques for data science
PPT
Data mining slides
 
PPTX
Microsoft Power BI | Brief Introduction | PPT
PPTX
Introduction to Data Science
PPTX
Credit card fraud detection
PDF
Summary introduction to data engineering
PPTX
Data mining
PDF
Business Intelligence & Data Analytics– An Architected Approach
Introduction to Data Engineering
Data Modeling PPT
Data mining
RWDG Slides: A Complete Set of Data Governance Roles & Responsibilities
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Data science presentation
Data visualization introduction
In-Memory Big Data Analytics
Data Mining : Concepts
Exploratory data analysis
Introduction to Data Engineering
Introduction to Data Science
Tools and techniques for data science
Data mining slides
 
Microsoft Power BI | Brief Introduction | PPT
Introduction to Data Science
Credit card fraud detection
Summary introduction to data engineering
Data mining
Business Intelligence & Data Analytics– An Architected Approach
Ad

Viewers also liked (18)

PDF
Data as the New Oil: Producing Value in the Oil and Gas Industry
PDF
Pipeline analytics concept for posting
PDF
Personal Healthcare IOT on PCF using Spring
PDF
Internet Of Things: How Data Science Driven Software is Eating the Connected ...
PPTX
Data Science At Scale for IoT on the Pivotal Platform
PDF
SALESmanago - Internet of Things
PDF
Dr. Denner opening keynote at Bosch Connected World
PDF
Pivotal Big Data Roadshow
PPTX
Duties & responsibility
PDF
Global Oil and Gas Pipeline Leak Detection Market Forecast and Opportunities,...
PPTX
Predictive Maintenance for Oil and Gas
PPTX
Oil and gas big data analytics data Visualization
PDF
Business Impact From IoT? Just Add Data Science
PDF
Managing Downhole Failures in a Rod Pumped Well
PPTX
Big Data in Oil and Gas
PDF
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
PPTX
Predictive Analytics: Extending asset management framework for multi-industry...
PDF
Big Data Analytics in Energy & Utilities
Data as the New Oil: Producing Value in the Oil and Gas Industry
Pipeline analytics concept for posting
Personal Healthcare IOT on PCF using Spring
Internet Of Things: How Data Science Driven Software is Eating the Connected ...
Data Science At Scale for IoT on the Pivotal Platform
SALESmanago - Internet of Things
Dr. Denner opening keynote at Bosch Connected World
Pivotal Big Data Roadshow
Duties & responsibility
Global Oil and Gas Pipeline Leak Detection Market Forecast and Opportunities,...
Predictive Maintenance for Oil and Gas
Oil and gas big data analytics data Visualization
Business Impact From IoT? Just Add Data Science
Managing Downhole Failures in a Rod Pumped Well
Big Data in Oil and Gas
“The Digital Oilfield” : Using IoT to reduce costs in an era of decreasing oi...
Predictive Analytics: Extending asset management framework for multi-industry...
Big Data Analytics in Energy & Utilities
Ad

Similar to Data Science Case Studies: The Internet of Things: Implications for the Enterprise (20)

PDF
Ahead of the Stream: How to Future-Proof Real-Time Analytics
PDF
IoT Cloud Service & Partner IoT Solution
PPTX
There are 250 Database products, are you running the right one?
PDF
Virtualization to Improve Speed and Increase Quality
PPTX
Going Beyond the Device Heart Beat
PDF
You Sold Your First 1,000 Devices? Now What?
PDF
Pivotal Big Data Suite: A Technical Overview
PDF
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
PDF
Data Day - Escuchando la red
PDF
Streaming Analytics - Comparison of Open Source Frameworks and Products
PDF
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
PPTX
Splunk for ITOA Breakout Session
PDF
Sensor Data Management & Analytics: Advanced Process Control
PPTX
Competing with Software: It Takes a Platform -- Devops @ EMC World
PDF
Framework and Product Comparison for Big Data Log Analytics and ITOA
PDF
Hey IT, Meet OT with Hima Mukkamala
PDF
Steps to Scale Internet of Things (IoT)
PDF
Enabling the-Connected-Car-Java
PPTX
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...
Ahead of the Stream: How to Future-Proof Real-Time Analytics
IoT Cloud Service & Partner IoT Solution
There are 250 Database products, are you running the right one?
Virtualization to Improve Speed and Increase Quality
Going Beyond the Device Heart Beat
You Sold Your First 1,000 Devices? Now What?
Pivotal Big Data Suite: A Technical Overview
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Data Day - Escuchando la red
Streaming Analytics - Comparison of Open Source Frameworks and Products
Best Practices for Managing IaaS, PaaS, and Container-Based Deployments - App...
Splunk for ITOA Breakout Session
Sensor Data Management & Analytics: Advanced Process Control
Competing with Software: It Takes a Platform -- Devops @ EMC World
Framework and Product Comparison for Big Data Log Analytics and ITOA
Hey IT, Meet OT with Hima Mukkamala
Steps to Scale Internet of Things (IoT)
Enabling the-Connected-Car-Java
Give ‘Em What They Want! Self-Service Middleware Monitoring in a Shared Servi...

More from VMware Tanzu (20)

PDF
Spring into AI presented by Dan Vega 5/14
PDF
What AI Means For Your Product Strategy And What To Do About It
PDF
Make the Right Thing the Obvious Thing at Cardinal Health 2023
PPTX
Enhancing DevEx and Simplifying Operations at Scale
PDF
Spring Update | July 2023
PPTX
Platforms, Platform Engineering, & Platform as a Product
PPTX
Building Cloud Ready Apps
PDF
Spring Boot 3 And Beyond
PDF
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
PDF
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
PDF
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
PPTX
tanzu_developer_connect.pptx
PDF
Tanzu Virtual Developer Connect Workshop - French
PDF
Tanzu Developer Connect Workshop - English
PDF
Virtual Developer Connect Workshop - English
PDF
Tanzu Developer Connect - French
PDF
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
PDF
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
PDF
SpringOne Tour: The Influential Software Engineer
PDF
SpringOne Tour: Domain-Driven Design: Theory vs Practice
Spring into AI presented by Dan Vega 5/14
What AI Means For Your Product Strategy And What To Do About It
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Enhancing DevEx and Simplifying Operations at Scale
Spring Update | July 2023
Platforms, Platform Engineering, & Platform as a Product
Building Cloud Ready Apps
Spring Boot 3 And Beyond
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
tanzu_developer_connect.pptx
Tanzu Virtual Developer Connect Workshop - French
Tanzu Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
Tanzu Developer Connect - French
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: Domain-Driven Design: Theory vs Practice

Recently uploaded (20)

PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Leprosy and NLEP programme community medicine
PDF
Transcultural that can help you someday.
PDF
Global Data and Analytics Market Outlook Report
PDF
annual-report-2024-2025 original latest.
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
A Complete Guide to Streamlining Business Processes
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Optimise Shopper Experiences with a Strong Data Estate.pdf
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
CYBER SECURITY the Next Warefare Tactics
Leprosy and NLEP programme community medicine
Transcultural that can help you someday.
Global Data and Analytics Market Outlook Report
annual-report-2024-2025 original latest.
STERILIZATION AND DISINFECTION-1.ppthhhbx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Qualitative Qantitative and Mixed Methods.pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Navigating the Thai Supplements Landscape.pdf
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
A Complete Guide to Streamlining Business Processes
ISS -ESG Data flows What is ESG and HowHow
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...

Data Science Case Studies: The Internet of Things: Implications for the Enterprise

  • 2. 2© 2015 Pivotal Software, Inc. All rights reserved. 2© 2015 Pivotal Software, Inc. All rights reserved. Internet of Things: Implications for the Enterprise Rashmi Raghu, Ph.D. Principal Data Scientist
  • 3. 3© 2015 Pivotal Software, Inc. All rights reserved. Gene Sequencing Smart Grids COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014 READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE Stock Market Social Media FACEBOOK UPLOADS 250 MILLION PHOTOS EACH DAY Billions of Data Points Oil Exploration Video Surveillance OIL RIGS GENERATE 25000 DATA POINTS PER SECOND Medical Imaging Mobile Sensors
  • 4. 4© 2015 Pivotal Software, Inc. All rights reserved. Implications for the Enterprise Ÿ  Organizational –  Vision –  Preparedness –  Execution Ÿ  Technical –  Data quality & completeness –  Heterogeneity of data sources –  Technology architecture
  • 5. 5© 2015 Pivotal Software, Inc. All rights reserved. Implications for the Enterprise Ÿ  Organizational –  Vision –  Preparedness –  Execution Ÿ  Technical –  Data quality & completeness –  Heterogeneity of data sources –  Technology architecture Issues in any of these have implications for data science approaches and their effectiveness
  • 6. 6© 2015 Pivotal Software, Inc. All rights reserved. Case Studies Oil Drilling Telecommunications Predictive Maintenance Customer Micro-segmentation
  • 7. 7© 2015 Pivotal Software, Inc. All rights reserved. Case Studies Oil Drilling Telecommunications Predictive Maintenance Customer Micro-segmentation
  • 8. 8© 2015 Pivotal Software, Inc. All rights reserved. Data: The New Oil Ÿ  Oil & gas exploration and production activities generate large amounts of data from sensors Ÿ  What opportunities exist for data-driven approaches to improve operations? Drilling into the San Andreas Fault at Parkfield California. Credit: Stephen H. Hickman, USGS *http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry
  • 9. 9© 2015 Pivotal Software, Inc. All rights reserved. Data: The New Oil Ÿ  Oil & gas exploration and production activities generate large amounts of data from sensors Ÿ  What opportunities exist for data-driven approaches to improve operations? Drilling into the San Andreas Fault at Parkfield California. Credit: Stephen H. Hickman, USGS *http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas-industry Predictive maintenance •  Predict equipment function and failure •  Motivation: Failure costs estimated at $150,000/incident (billions annually)* •  Goals: –  Early warning system –  Insights into prominent features impacting operation and failure –  Reduction of non-productive drill time –  Reduced incidents
  • 10. 10© 2015 Pivotal Software, Inc. All rights reserved. Predictive Maintenance for Drilling Operations Integrating & Cleansing Feature Building Modeling
  • 11. 11© 2015 Pivotal Software, Inc. All rights reserved. Primary Data Sources Integrating & Cleansing Feature Building Modeling Integrated Data Primary data sources Operator Data ( ~ thousands of records ) •  Failure details •  Component details •  Drill Bit details Drill Rig Sensor Data ( ~ billions of records ) •  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB) …
  • 12. 12© 2015 Pivotal Software, Inc. All rights reserved. Primary Data Sources: Challenges Integrating & Cleansing Feature Building Modeling Primary data sources Operator Data ( ~ thousands of records ) •  Failure details •  Component details •  Drill Bit details Drill Rig Sensor Data ( ~ billions of records ) •  Rate of Penetration (ROP) •  RPM •  Weight on Bit (WOB) … Challenges •  Failure instances not clearly labeled •  Labels may be embedded in reports or comments Implications •  Dependent variable generation also becomes a machine learning exercise •  Accuracy of failure prediction impacted by accuracy of failure label derivation
  • 13. 13© 2015 Pivotal Software, Inc. All rights reserved. Primary Data Sources: Challenges Well ID Depth Comment Event flag 1 1000 equipment not responding 1 2 2000 TOOH to bit. rubber pieces seen 1 Integrating & Cleansing Feature Building Modeling •  Dependent variable generation – a machine learning exercise •  Text analytics pipeline needed to convert failure reports or comments to event flags
  • 14. 14© 2015 Pivotal Software, Inc. All rights reserved. Complex Feature Set Across Data Sources Integrating & Cleansing Feature Building Modeling •  A failure occurred at the end of this run •  Taking a window of time prior to failure, what features could we extract (e.g. variance of RPM, max bit position velocity)? BitpositionRPM ROPWOB
  • 15. 15© 2015 Pivotal Software, Inc. All rights reserved. Complex Feature Set Across Data Sources •  Depth •  Rate of Penetration •  Torque •  Weight on Bit •  RPM •  … •  Drill Bit details •  Component details etc. •  Failure events •  … Features on Time Windows •  Mean •  Median •  Standard Deviation •  Range •  Skewness •  … Final Set of Features on Time Windows •  Leverage GPDB / HAWQ (+ MADlib, PL/X) for fast computation of hundreds of features over time windows within billions of rows (or more) of time-series data Operator data Drill Rig Sensor data
  • 16. 16© 2015 Pivotal Software, Inc. All rights reserved. Predictive Maintenance App Pipeline Data Lake Ingest Business Levers Early Warning System Rig Operator Dashboard Models •  Elastic Net Regression •  Cox Proportional Hazards Regression •  Decision Trees Initial data cleansing filters Wells with failure scores and early warning indicators Feedback loop for continuous model improvementDomain Knowledge Oil Rig Operator HAWQ GPDB PL/X MADlib R Python CJava Perl Spark + MLlib
  • 17. 17© 2015 Pivotal Software, Inc. All rights reserved. Case Studies Oil Drilling Telecommunications Predictive Maintenance Customer Micro-segmentation
  • 18. 18© 2015 Pivotal Software, Inc. All rights reserved. State of Data at Telco Company Customer Segments New Data Sources Multi-Gadget Families Affluent Matures Thrifty Families High Tech Singles Budget Singles Seniors Internet Deep Packet Inspection TV Consumption (Linear) Video On Demand Consumption
  • 19. 19© 2015 Pivotal Software, Inc. All rights reserved. Native Services Video On Demand TVInternet Internet Devices OTT (Over The Top) Services What is the level of engagement with client’s products (TV, VOD, Internet)? What are the patterns of device usage behavior? What is the level of OTT engagement, by segment, and by bandwidth? Understanding Subscriber Behavior
  • 20. 20© 2015 Pivotal Software, Inc. All rights reserved. Newly Identified Behavior-Based SegmentsSubscribers Moderates OTT & Data Heavyweights Portable OTT Entertainment Seekers iPhone Heavy Android Heavy iPad Heavy In-Home OTT Entertainment Seekers In-Home Native Content Seekers VOD Heavy TV Heavy
  • 21. 21© 2015 Pivotal Software, Inc. All rights reserved. Moderates OTT & Data Heavyweights In-Home OTT Entertainment Seekers Portable OTT Entertainment Seekers - iPhone Heavy Portable OTT Entertainment Seekers - Android Heavy Portable OTT Entertainment Seekers - iPad Heavy In-Home Native Content Seekers - VOD Heavy In-Home Native Content Seekers - TV Heavy Cross Behavior-based and Existing Segments New Behavior-Based Segments Customized Micro-Segments! Existing Segments Multi-Gadget Families Affluent Matures Thrifty Families Budget Singles High Tech Singles Seniors
  • 22. 22© 2015 Pivotal Software, Inc. All rights reserved. Heterogeneous Data Sources Ÿ  Prevalence of new data sources was limited but increasing –  Rich usage data available on a subset of the subscribers –  Leads to limited applicability of micro-segments Ÿ  Lack of data may be alleviated by expanding data science efforts –  Leverage micro-segmentation model to score a different subset of subscribers (who we have limited data on) New Data Sources Internet Deep Packet Inspection TV Consumption (Linear) Video On Demand Consumption
  • 23. 23© 2015 Pivotal Software, Inc. All rights reserved. Driving New Business Value Upsell and Cross-Sell New Product Offerings Data Monetization
  • 24. 24© 2015 Pivotal Software, Inc. All rights reserved. Implications for the Enterprise Ÿ  Organizational –  Vision –  Preparedness –  Execution Ÿ  Technical / Data –  Data quality & completeness –  Heterogeneity of data sources –  Technology architecture •  Data quality & completeness: •  Data capture mechanisms can have a lasting impact on ability to solve a business problem •  Heterogeneity of data sources: •  Existence of legacy systems & devices may limit the applicability of new models unless that is taken into account ahead of time •  Feedback to spur upgrading of equipment wherever possible
  • 25. 25© 2015 Pivotal Software, Inc. All rights reserved. Implications for the Enterprise Ÿ  Creating value from IoT requires organizational and technical alignment Ÿ  Impacts of these considerations on data science efforts and outcomes are non-trivial Ÿ  Specific impacts of data issues include: –  Longer time to realization of value –  Model accuracy issues –  Limited applicability of results –  And more …
  • 26. 26© 2015 Pivotal Software, Inc. All rights reserved. For further information, checkout … Ÿ  Pivotal Blog @ http://blog.pivotal.io Ÿ  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal Ÿ  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data Ÿ  Oil & Gas Use Case Webinar: –  Video: https://www.youtube.com/watch?v=dhT-tjHCr9E –  Slides: http://www.slideshare.net/Pivotal/data-as-thenewoil Ÿ  Blogs: –  Oil & Gas Use Case: http://blog.pivotal.io/pivotal/case-studies-2/data-as-the-new-oil-producing-value-for-the-oil-gas- industry –  Time Series Analysis: http://blog.pivotal.io/tag/time-series-analysis