SlideShare a Scribd company logo
Efficiently Building Machine Learning Models for
Predictive Maintenance in Oil & Gas Industry with
Databricks
Daili Zhang, Varun Tyagi
Data Scientists
Halliburton
Introduction
▪ Halliburton Digital Solution (HDS)
▪ Support and Consolidate Digital Transformation across all PSLs
▪ Provide common platforms/architectures from data warehouse/data governance, analytics development, to BI
reporting
▪ Streamline and consolidate various digital processes across all PSLs
▪ Provide and build a strong talent pipeline for software/digital development
▪ Data Science Team
▪ Develop analytics/ML models to
▪ Improve operational efficiency
▪ Increase productive uptime
▪ Reduce operational cost
▪ Provide insights at the right time to the right people to help make business-level decisions
Analytics Life Cycle in Halliburton
Rig Edge
SQL
Files
Raw Data
Ingestion &
Integration
Data Cleaning &
Aggregation &
Transformation
Feature
Engineering
Model Training
& Testing
Model
Selection
Model
Deployment
Results
Visualization
Model
Performance
Monitoring
Model
Management
ML model training & testing
takes less than 5% time
What Data Do We Have?
▪ Operational Data
▪ Historical data (ADI (proprietary format), Parquet, csv)
▪ One Product Service Line (PSL): 500,000+ ADI files (3GB per file in Parquet format) for fracturing jobs
collected over 12+ years (1,500TB+ data)
▪ Real-time data (edge device, growing significantly over time)
▪ Hardware Configuration/Maintenance/Event Data
▪ SAP (for example, 5M maintenance orders in less than 2.5 years)
▪ SQL database
▪ Files
▪ External Data
▪ Weather data
▪ Geological & geophysical data
No lack of data,
but lack of data with QUALITY
Predictive Maintenance Project (example)
▪ Objective
▪ Reduce annual maintenance cost by 10% through field operation optimization based on avoiding failure modes identified by big
data analytics for transmissions
Data Cleaning & Aggregation
▪ Marry the operational data and the configuration/maintenance data in
a consistent way
▪ Different sample frequency (from 1hz to 1000Hz)
▪ Free text input in maintenance records
▪ Mixed equipment identification
▪ Data discontinuity
▪ Missing data
▪ Wrong data
▪ Use Databricks cluster and run time
▪ Leverage Delta Lake
▪ Use pandas_udf functions to gain 10-100x speed
Feature Engineering (example)
Select High Load Windows
with Continuous Data
• Load Threshold
• Window Size
• # of Windows
Welch Fourier Transform
for Each Window
• # of points for each
section
Create Features from
Windowed Data
• Lag Window Size for
Correlation
• # of peaks to select in
frequency domain
• Etc.
Combine Features and
Cleaning
• Classification/Regression
• Time window to prediction
failure
• Balance Data or Not
Model Training/Testing/Selection (example)
▪ Explored various methods
▪ Spark ML
▪ Deep Learning
▪ Azure AutoML
▪ XGBoost
▪ Sklearn
▪ Evaluated the models with
various metrics
▪ Recall rate
▪ F1 score
▪ Accuracy
▪ Business impact with dollar value
Model Deployment and Visualization
▪ Modularize the whole process from extracting data to model
prediction to results visualization into different notebooks
▪ Use the Notebooks workflows to synchronize different notebooks
runs
▪ Use user-set widget parameters to pass the parameters used in
different notebooks
▪ Schedule the job to run on daily basis through the notebook UI
▪ The results are visualized in PowerBI for end users
Model Performance Monitoring
▪ Store the prediction into blob storage continuously
▪ Store the actual results into blob storage continuously
▪ View the discrepancy along the time in PowerBI
▪ Alerts are set to send emails to users based on specified thresholds
▪ Investigate the model drifting and re-train the models
Model Management
▪ Prior using MLflow, manually wrote the model specific information into a .csv file
and stored the models into a blob storage with certain name conventions
▪ MLflow greatly simplifies the process with consistency and quality

More Related Content

PDF
Original N-CHANNEL Mosfet CEP50N06 50N06 50A 60V TO-220 New
PPTX
An AI accelerator ASIC architecture
PPTX
Interactive Digital Signage
ODP
Excel Avançado - Aulas
PDF
FPGA In a Nutshell
PDF
TFLite NNAPI and GPU Delegates
PPTX
Turn Any Panel PC Into an Ignition HMI
PPTX
Security Best Practices for Your Ignition System
Original N-CHANNEL Mosfet CEP50N06 50N06 50A 60V TO-220 New
An AI accelerator ASIC architecture
Interactive Digital Signage
Excel Avançado - Aulas
FPGA In a Nutshell
TFLite NNAPI and GPU Delegates
Turn Any Panel PC Into an Ignition HMI
Security Best Practices for Your Ignition System

What's hot (20)

PDF
使用 Dependency Injection 撰寫簡潔 C# 程式碼原來這麼簡單 (.NET Conf 2018)
PDF
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
PDF
Lecture 7 cuda execution model
PDF
Funções e procedimentos
PPTX
Design Like a Pro: Building Mobile-Responsive HMIs in Ignition Perspective
PPTX
Ship 378 Quarterdeck Training (2013)
PDF
ecx.io @ JobFair 2018
ODP
Python and Machine Learning
PPTX
5 steps to Automated Network Operations (NetOps)
PDF
The Spectre of Meltdowns
PPTX
Introduction to FPGA acceleration
PDF
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
PDF
Aula 5 Governança de TI
PDF
Jwt == insecurity?
PPT
Estruturas de Dados em C#
PDF
FPGA Hardware Accelerator for Machine Learning
PDF
AI Chip Trends and Forecast
PDF
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
PDF
Big Data, o que é isso?
使用 Dependency Injection 撰寫簡潔 C# 程式碼原來這麼簡單 (.NET Conf 2018)
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
Lecture 7 cuda execution model
Funções e procedimentos
Design Like a Pro: Building Mobile-Responsive HMIs in Ignition Perspective
Ship 378 Quarterdeck Training (2013)
ecx.io @ JobFair 2018
Python and Machine Learning
5 steps to Automated Network Operations (NetOps)
The Spectre of Meltdowns
Introduction to FPGA acceleration
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
Aula 5 Governança de TI
Jwt == insecurity?
Estruturas de Dados em C#
FPGA Hardware Accelerator for Machine Learning
AI Chip Trends and Forecast
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Big Data, o que é isso?
Ad

Similar to Efficiently Building Machine Learning Models for Predictive Maintenance in the Oil & Gas Industry with Databricks (20)

PPTX
Sunrun slide for informatica summit - Harish Ramachandraiah
DOC
Copy of Alok_Singh_CV
PDF
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
PPTX
Your Roadmap for An Enterprise Graph Strategy
PDF
Google Cloud Machine Learning
PPTX
Neo4j GraphTour New York_EY Presentation_Michael Moore
PDF
End to end MLworkflows
PDF
Challenges of Operationalising Data Science in Production
PDF
Logical Data Fabric and Data Mesh – Driving Business Outcomes
PPTX
Data Warehouse Optimization
PPTX
rough-work.pptx
PDF
Designing a modern data warehouse in azure
PDF
Designing a modern data warehouse in azure
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PDF
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
DOC
Sandeep Grandhi (1)
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Azure Data Engineer Online Training | Microsoft Azure Data Engineer
PDF
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
PDF
Become More Data-driven by Leveraging Your SAP Data
Sunrun slide for informatica summit - Harish Ramachandraiah
Copy of Alok_Singh_CV
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Your Roadmap for An Enterprise Graph Strategy
Google Cloud Machine Learning
Neo4j GraphTour New York_EY Presentation_Michael Moore
End to end MLworkflows
Challenges of Operationalising Data Science in Production
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Data Warehouse Optimization
rough-work.pptx
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Big Data Analytics in the Cloud with Microsoft Azure
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Sandeep Grandhi (1)
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Azure Data Engineer Online Training | Microsoft Azure Data Engineer
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Become More Data-driven by Leveraging Your SAP Data
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Foundation of Data Science unit number two notes
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Clinical guidelines as a resource for EBP(1).pdf
Data_Analytics_and_PowerBI_Presentation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
1_Introduction to advance data techniques.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Business Ppt On Nestle.pptx huunnnhhgfvu
.pdf is not working space design for the following data for the following dat...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
Foundation of Data Science unit number two notes
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Supervised vs unsupervised machine learning algorithms
Clinical guidelines as a resource for EBP(1).pdf

Efficiently Building Machine Learning Models for Predictive Maintenance in the Oil & Gas Industry with Databricks

  • 1. Efficiently Building Machine Learning Models for Predictive Maintenance in Oil & Gas Industry with Databricks Daili Zhang, Varun Tyagi Data Scientists Halliburton
  • 2. Introduction ▪ Halliburton Digital Solution (HDS) ▪ Support and Consolidate Digital Transformation across all PSLs ▪ Provide common platforms/architectures from data warehouse/data governance, analytics development, to BI reporting ▪ Streamline and consolidate various digital processes across all PSLs ▪ Provide and build a strong talent pipeline for software/digital development ▪ Data Science Team ▪ Develop analytics/ML models to ▪ Improve operational efficiency ▪ Increase productive uptime ▪ Reduce operational cost ▪ Provide insights at the right time to the right people to help make business-level decisions
  • 3. Analytics Life Cycle in Halliburton Rig Edge SQL Files Raw Data Ingestion & Integration Data Cleaning & Aggregation & Transformation Feature Engineering Model Training & Testing Model Selection Model Deployment Results Visualization Model Performance Monitoring Model Management ML model training & testing takes less than 5% time
  • 4. What Data Do We Have? ▪ Operational Data ▪ Historical data (ADI (proprietary format), Parquet, csv) ▪ One Product Service Line (PSL): 500,000+ ADI files (3GB per file in Parquet format) for fracturing jobs collected over 12+ years (1,500TB+ data) ▪ Real-time data (edge device, growing significantly over time) ▪ Hardware Configuration/Maintenance/Event Data ▪ SAP (for example, 5M maintenance orders in less than 2.5 years) ▪ SQL database ▪ Files ▪ External Data ▪ Weather data ▪ Geological & geophysical data No lack of data, but lack of data with QUALITY
  • 5. Predictive Maintenance Project (example) ▪ Objective ▪ Reduce annual maintenance cost by 10% through field operation optimization based on avoiding failure modes identified by big data analytics for transmissions
  • 6. Data Cleaning & Aggregation ▪ Marry the operational data and the configuration/maintenance data in a consistent way ▪ Different sample frequency (from 1hz to 1000Hz) ▪ Free text input in maintenance records ▪ Mixed equipment identification ▪ Data discontinuity ▪ Missing data ▪ Wrong data ▪ Use Databricks cluster and run time ▪ Leverage Delta Lake ▪ Use pandas_udf functions to gain 10-100x speed
  • 7. Feature Engineering (example) Select High Load Windows with Continuous Data • Load Threshold • Window Size • # of Windows Welch Fourier Transform for Each Window • # of points for each section Create Features from Windowed Data • Lag Window Size for Correlation • # of peaks to select in frequency domain • Etc. Combine Features and Cleaning • Classification/Regression • Time window to prediction failure • Balance Data or Not
  • 8. Model Training/Testing/Selection (example) ▪ Explored various methods ▪ Spark ML ▪ Deep Learning ▪ Azure AutoML ▪ XGBoost ▪ Sklearn ▪ Evaluated the models with various metrics ▪ Recall rate ▪ F1 score ▪ Accuracy ▪ Business impact with dollar value
  • 9. Model Deployment and Visualization ▪ Modularize the whole process from extracting data to model prediction to results visualization into different notebooks ▪ Use the Notebooks workflows to synchronize different notebooks runs ▪ Use user-set widget parameters to pass the parameters used in different notebooks ▪ Schedule the job to run on daily basis through the notebook UI ▪ The results are visualized in PowerBI for end users
  • 10. Model Performance Monitoring ▪ Store the prediction into blob storage continuously ▪ Store the actual results into blob storage continuously ▪ View the discrepancy along the time in PowerBI ▪ Alerts are set to send emails to users based on specified thresholds ▪ Investigate the model drifting and re-train the models
  • 11. Model Management ▪ Prior using MLflow, manually wrote the model specific information into a .csv file and stored the models into a blob storage with certain name conventions ▪ MLflow greatly simplifies the process with consistency and quality