SlideShare a Scribd company logo
ExxonMobil’s journey to
unleash time-series data
with open source
technology
June, 2018
Kevin Brown
Big Data Platform Engineer
ExxonMobil’s journey to
unleash time-series data
with open source technology
June, 2018
Kevin Brown
Big Data Platform Engineer
Introduction
3
Kevin Brown
Big Data Platform Engineer
ExxonMobil
Introduction to time-series data at
ExxonMobil
Global data collection and ingestion with
Apache NiFi™
Apache Spark™: normalization,
validation, aggregations, and interpolation
Apache HBase® & Apache Hive™
partitioning, performance
Consumption APIs
Today’s Objectives
4
A series of data points
indexed in time order.
What is time-series data?
5
• Global refineries and chemical plants
• Millions of sensors/tags
• Decades worth time-series data
Time-series data at ExxonMobil
6
Collection and
Ingestion with
Apache NiFi™
• Interoperability
• Ease of use
• Fine control of flow
Why Apache NiFi™
Single node design
9
Simple Regional Design - NiFi
10
Redundancy Considerations
11
• Repository sizing
• Run Schedule
• Back Pressure
• Monitoring
• NiFi Expression Language
Apache NiFi - Flow Design
12
Normalization,
Contextualization,
Validation…
13
“Your metadata is key, but it’s probably not consistent.”
Data Contextualization
• Global data challenges
• Language and abbreviation variation
• Diversity of vendors and tools.
• Naming standard
• Variance in functionality (confidence levels, frequency, resolution)
• Synchronization and mutable data
• Calculated tags, faulty sensors
• Delayed lab test results
Getting your data ready
14
Storage,
Interpolation,
Aggregation
15
• Archival and compression of data
• Interpolation
• Aggregation
• Partitioning
“Sequence Matters”
Storage, interpolation, aggregation …
16
Consumption
17
• Who are your users?
• Apache HBase® & Apache Hive™ ?
• Off cluster?
• APIs
• Serialization
Consumption
18
Questions
19
Apache HBase ® , Apache Hive™ and Apache NiFi™ are trademarks of the Apache
Software Foundation.

More Related Content

PPTX
SpaceX
PPTX
Amul b12
PPTX
Ambari: Agent Registration Flow
PDF
Resistance to Change (Nokia Case Study)
PPTX
Hadoop Backup and Disaster Recovery
PDF
الأشعة في طب الأسنان وCT.pdf
PPTX
Nestle: Baby Formula Case Study
PDF
vSphere7 with Tanzu
SpaceX
Amul b12
Ambari: Agent Registration Flow
Resistance to Change (Nokia Case Study)
Hadoop Backup and Disaster Recovery
الأشعة في طب الأسنان وCT.pdf
Nestle: Baby Formula Case Study
vSphere7 with Tanzu

What's hot (19)

PDF
Overview intel capital 1.4.12
PDF
Blockbuster 2007 Analyst Report
PDF
Company Report: SpaceX
PPT
strategic management - krispy kreme, is turnaround possible?
PDF
DRP (Stretch Cluster) for HDP - Future of Data : Paris
PPTX
Big Data Sunum
PDF
Building Your Data Streams for all the IoT
PDF
Honeywell vdp
PDF
Microsoft Growth Strategy
PPTX
Hyper-Converged Infrastructure Vx Rail
PPTX
DOCX
Comprehensive Exam March 2014
PPTX
nestle case study
PDF
20221122-IBM Power10全新IBM Power10 入門和中階 .pdf
PPT
What is Virtualization
PPTX
Space x ppt
DOCX
Project Report: Hershey's
PPTX
Case Study on Xerox: Rise and fall of Xerox
DOCX
Horizontal clustering configuration steps
Overview intel capital 1.4.12
Blockbuster 2007 Analyst Report
Company Report: SpaceX
strategic management - krispy kreme, is turnaround possible?
DRP (Stretch Cluster) for HDP - Future of Data : Paris
Big Data Sunum
Building Your Data Streams for all the IoT
Honeywell vdp
Microsoft Growth Strategy
Hyper-Converged Infrastructure Vx Rail
Comprehensive Exam March 2014
nestle case study
20221122-IBM Power10全新IBM Power10 入門和中階 .pdf
What is Virtualization
Space x ppt
Project Report: Hershey's
Case Study on Xerox: Rise and fall of Xerox
Horizontal clustering configuration steps
Ad

Similar to ExxonMobil’s journey to unleash time-series data with open source technology (20)

PDF
Hail hydrate! from stream to lake using open source
PPTX
Real-Time Data Flows with Apache NiFi
PDF
Breathe new life into your data warehouse by offloading etl processes to hadoop
PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
PPTX
Integração de Dados com Apache NIFI - Marco Garcia Cetax
PDF
Hadoop at datasift
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
PPTX
Apache NiFi Toronto Meetup
PDF
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
PDF
Pivotal Real Time Data Stream Analytics
PDF
Introduction to InfluxDB
PDF
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
PDF
Oil & Gas Big Data use cases
PDF
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...
PDF
The Evolving Landscape of Data Engineering
PPTX
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
PPTX
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
PPTX
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
Hail hydrate! from stream to lake using open source
Real-Time Data Flows with Apache NiFi
Breathe new life into your data warehouse by offloading etl processes to hadoop
ApacheCon 2021 - Apache NiFi Deep Dive 300
Best practices and lessons learnt from Running Apache NiFi at Renault
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Hadoop at datasift
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Apache NiFi Toronto Meetup
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
Pivotal Real Time Data Stream Analytics
Introduction to InfluxDB
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Oil & Gas Big Data use cases
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...
The Evolving Landscape of Data Engineering
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Introduction to Apache NiFi - Seattle Scalability Meetup
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks

ExxonMobil’s journey to unleash time-series data with open source technology

Editor's Notes

  • #4: Personal Introduction: Education 2012 – BYU – BS in Information Technology ExxonMobil 6 years @ ExxonMobil, Linux Systems Administrations Pioneered our initial journey into Hadoop/Big Data Responsible for architecting/maintaining/supporting our current Big Data platform
  • #11: From Left to right: Data originates at site Regional NiFi Instance pulls from sites. Data is sent securely to a clustered central NiFi.
  • #17: Archival and compression of data Interpolation Rely on both Step and Linear depending usually on data type. (float with no manual input = linear) otherwise step (integer, or float with manual input) Aggregation Partitioning