SlideShare a Scribd company logo
© Copyright 2015 EMC Corporation. All rights reserved. 1© Copyright 2015 EMC Corporation. All rights reserved.
PREDICTIVE MAINTENANCE
EMC/VIRTUSTREAM SOLUTION
RICCARDO ROMANI
© Copyright 2015 EMC Corporation. All rights reserved.
PREDICTIVE MAINTENANCE PROCESS
1. Data acquisition and processing:
A. Data acquisition from raw data
sources ( i.e. railways sensors )
B. Long times for data preparation
and transformation into data set
be used as structured data model
and historical data base
C. Long times for model training and
testing iterations
D. Some batch calculations required
1. Data Storage : Huge amount of data
generated at both acquisition as well
as historical stages
MOST COMMON ISSUES
© Copyright 2015 EMC Corporation. All rights reserved.
DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS
FROM RAW DATA SOURCE…
• Example of Data Sets created starting from the raw data
• Structured Data
• Training data: It is the engine run-to-failure data.
• Testing data: It is the engine operating data without failure events
recorded.
• Ground truth data: It contains the information of true remaining cycles
for each engine in the testing data.
• Predictive data used to predict when an in-service machine will fail, so
that maintenance can be planned in advance.
• Responds to the question :” Given these aircraft engine operation and
failure events history, can we predict when an in-service engine will fail?”
• Regression: Predict the Remaining Useful Life (RUL), or Time to Failure
(TTF).
• Example of raw data gathered from a Stream Processing System.
• Millions of txt files
• Unstructured/semi-structured data
…TO DATASET
© Copyright 2015 EMC Corporation. All rights reserved.
SAP HANA Sybase IQ Sybase IQ
Hot Data Warm Data Cold Data
• Analytics run on Data Model
− Modern in-memory
platform
− Transact/analyze in real-
time
− Native predictive, text,
and spatial algorithms
• Warm Historical Data Model
− Disk backed, smart column store (
HANA on disk or IQ )
− It helps in offloading HANA from
huge amount of data that are
dinamically stored on disk instead
of in-memory
− Excels at queries on structured
data from terabyte to petabyte
scale
• Cold Historical Data Model
− Less frequently accessed data is
archived in time partitions on IQ
− The data is static and used
primarily for read access
− data resides in cost- efficient
storage with fewer backups to
reduce operational costs
− Lower SLA requirements
HOW COMBINING SAP AND EMC/VIRTUSTREAM CAN HELP
HADOOP
Raw data
• Data acquired from Sensors
− New type of User Defined
Function for data federation
− Direct access to HDFS without
need for the package, mapper,
and reducer specification
− Invoke custom Map Reduce jobs
Solution: EMC with Isilon,
Hadoop-native storage
combined with Virtustream
Cloud Storage (EMC ECS Object
Storage )
DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS
Solution : Virtustream with SAP HANA Cloud IaaS/PaaS
© Copyright 2015 EMC Corporation. All rights reserved.
SCENARIO #1
SAP HANA, SAP IQ @ VIRTUSTREAM , DATA ACQUISITION ON ISILON @ HYPERCED
HISTORIC/COLD
Gateway
Rolling Stock
Raw data
RAW DATA
1. Raw Data acquisition done at gateway
level– serialization could be a
bottleneck as well as raw data history
growth
2. Predictive model computation done
at Hot Data level, in Hana
3. Warm data on SAP IQ
4. Cold data archived ( could be SAP IQ
) for compliance and outlier data from
faulty sensors and for statistical
purposes
• EMC can control rolling stock data
growth during acquisition with Isilon
storage certified by SAP and sitting at
gateway layer
HISTORIC/
WARM
HOT DATA
Datacenter
Almaviva “Hyperced” Datacenter
© Copyright 2015 EMC Corporation. All rights reserved.
SCENARIO #2
ADDING HADOOP COMBINED WITH ISILON @ HYPERCED
1. Raw Data acquistion done in
parallel streams. Hadoop
speed up parsing, and
creation of reduced dataset to
be loaded and historicized
2. Predictive model computation
can be done accessing a wider
dataset accessing Warm data
on IQ and Hadoop
3. Cold data stored on hadoop
filesystem as “intelligent”
archiving for statistical
analysis and retrieval
• EMC can control and improve
Data growth management at
both Hadoop and Storage level
− Hadoop for unlimited capacity for raw data
processing
− Apache Kafka or Storm for distributed stream
processing
Gateway/Data Lake
Rolling Stock
Raw data
INTELLIGENT
ARCHIVING FOR
RAW DATA AND
HSISTORIC
HISTORIC/
WARM
HOT DATA
Almaviva
“Hyperced”
Datacenter
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP AND SAP 1/3
HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
Where SAP and EMC combined technologies fits in a predictive
maintenance project
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP AND SAP 2/3
Data acquisition
• Real time events streams coming in at a rate of
thousands of raw events per seconds.
• The stream processing system should be able to
process those events in a fault tolerant and distributed
manner and with parallel processing
• Streaming processing systems should also keep record
of old data for some reasonable amount of time before
they are archived or destroyed in order to :
A. build pattern recognition and statistical model
building methods
B. Cope with local country laws.
C. to identify any potential outliers in the
streamed-in data from the sensors. While
monitoring for the faults in the assets it is
possible that the sensor that is taking the
readings, being a machine itself, could fail and
start sending faulty records. Intelligent
CBM management systems capable of detecting
such outliers will try to isolate these faulty
sensors and notify
We have evaluated two popular open source technologies; Apache Kafka , which is the
distributed messaging system, and Storm which is a distributed stream processing engine.
Both having HADOOP as repository for managing data growth.
SAP provides a product called “ESP – Event Stream Processor” that integrates HANA ad
Hadoop
HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
© Copyright 2015 EMC Corporation. All rights reserved.
Build, train and validate the model – creating Data Sets
• To detect failure in a given stream of sensor data, we need to first
define normal behavior.
• For this we need to build model around the historical sensor data.
• Predictive models analyze current and historical data on individuals
to produce metrics.
• A model is reusable and is created by training an algorithm using
historical data and saving the model for reuse purpose to share the
common business rules which can be applied to similar data, in
order to analyze results without the historical data, by using the
trained algorithm
• The process involve running one or more algorithms on the data set
where prediction is going to be carried out. This is an iterative
processing and often involves training the model, using multiple
models on the same data set and finally arriving on the best fit
model based on the business data understanding.
• Raw data, once prepared, are organized in Data Sets, needed to be
stored to be used by different type of models and algorithms
HADOOP AND SAP 3/3
HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
© Copyright 2015 EMC Corporation. All rights reserved.
VIRTUSTREAM
SAP HANA
DT
ALMAVIVA
HYPERCED
Servers & Network
DATA STORAGE powered by EMC
HDFS or NFS
EMC ISILON , VNX
FibreChannel or NFS
EMC VMAX, VNX or XtremIO
EMC/VIRTUSTREAM HYBRID ARCHITECTURE
HOT
Extended
Storage
IQ
NLS
Backup / Archive
Best in class Cloud IaaS/PaaS
for SAP workloads , through
Almaviva White Label Option
EMC Scale Out Unstructured
Storage with de-duplication
features and native HDFS
support
From 15% to 35 % of data
storage efficiency depending
on data specific nature GATEWAY
Unstructured Storage
Raw Data
© Copyright 2015 EMC Corporation. All rights reserved.© Copyright 2015 EMC Corporation. All rights reserved.
© Copyright 2015 EMC Corporation. All rights reserved.
1. Acquire OT data in parallel
streams
2. Clean data and import only the
validated data into SAP HANA
3. Historic OT data in IQ and/or
Hadoop
4. Use Hive to build a
relational/historical view of the
OT data also at Hadoop level
5. Unify OT view with IT data, if
needed
6. Save HANA resources for
realtime predictions only
7. De-duplicate and store raw OT
data for future use or statistic
purposes
Gateway/Data Lake
HADOOP AND SAP
PDMS BLUEPRINT
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP AND SAP
GATHERING REAL-TIME EVENTS AT SCALE

More Related Content

PDF
Dchug m7-30 apr2013
PPTX
Performance Optimizations in Apache Impala
PPTX
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
PPTX
Introduction to Hadoop
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
PPTX
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
PPTX
Azure_Business_Opportunity
PDF
Common and unique use cases for Apache Hadoop
Dchug m7-30 apr2013
Performance Optimizations in Apache Impala
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Introduction to Hadoop
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Azure_Business_Opportunity
Common and unique use cases for Apache Hadoop

What's hot (20)

PPTX
PDF
Apache Flink & Kudu: a connector to develop Kappa architectures
PDF
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
PPTX
Hadoop data ingestion
PDF
Impala use case @ Zoosk
PPTX
A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database
PDF
Splice machine-bloor-webinar-data-lakes
PPTX
Dealing with Changed Data in Hadoop
PPTX
Enabling Diverse Workload Scheduling in YARN
PPT
Architecting Big Data Ingest & Manipulation
PPTX
Hadoop in three use cases
PPTX
Mutable Data in Hive's Immutable World
PPTX
Exploiting machine learning to keep Hadoop clusters healthy
PPTX
Hadoop crash course workshop at Hadoop Summit
PPTX
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
PPTX
Enabling the Active Data Warehouse with Apache Kudu
PDF
Summary machine learning and model deployment
PPTX
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
PPTX
Format Wars: from VHS and Beta to Avro and Parquet
PDF
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Apache Flink & Kudu: a connector to develop Kappa architectures
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Hadoop data ingestion
Impala use case @ Zoosk
A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database
Splice machine-bloor-webinar-data-lakes
Dealing with Changed Data in Hadoop
Enabling Diverse Workload Scheduling in YARN
Architecting Big Data Ingest & Manipulation
Hadoop in three use cases
Mutable Data in Hive's Immutable World
Exploiting machine learning to keep Hadoop clusters healthy
Hadoop crash course workshop at Hadoop Summit
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Enabling the Active Data Warehouse with Apache Kudu
Summary machine learning and model deployment
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
Format Wars: from VHS and Beta to Avro and Parquet
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Ad

Similar to Sap Hana and Virtustream for Predictive Maintenance and Big Data (20)

PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PPTX
IBM Spectrum Scale Overview november 2015
PPTX
Hadoop project design and a usecase
PPTX
Introduction to Kudu - StampedeCon 2016
PDF
Autodesk Technical Webinar: SAP HANA in-memory database
PPTX
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
PPT
Survey of Real-time Processing Systems for Big Data
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
PDF
IBM Data Centric Systems & OpenPOWER
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
PPTX
real time data processing is a tsubtopic in the topic in the domain bigdata
PPT
Hadoop_Its_Not_Just_Internal_Storage_V14
PPTX
Lessons learned processing 70 billion data points a day using the hybrid cloud
PDF
Intelligent Integration OOW2017 - Jeff Pollock
PDF
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
PDF
2017 OpenWorld Keynote for Data Integration
PDF
Informix warehouse accelerator update
PDF
Enabling big data & AI workloads on the object store at DBS
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
How the Development Bank of Singapore solves on-prem compute capacity challen...
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
IBM Spectrum Scale Overview november 2015
Hadoop project design and a usecase
Introduction to Kudu - StampedeCon 2016
Autodesk Technical Webinar: SAP HANA in-memory database
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
Survey of Real-time Processing Systems for Big Data
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
IBM Data Centric Systems & OpenPOWER
Simplifying Real-Time Architectures for IoT with Apache Kudu
real time data processing is a tsubtopic in the topic in the domain bigdata
Hadoop_Its_Not_Just_Internal_Storage_V14
Lessons learned processing 70 billion data points a day using the hybrid cloud
Intelligent Integration OOW2017 - Jeff Pollock
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
2017 OpenWorld Keynote for Data Integration
Informix warehouse accelerator update
Enabling big data & AI workloads on the object store at DBS
Ad

More from Riccardo Romani (15)

PDF
IDC Multicloud 2019 - Conference Milano , Oracle speech
PDF
Systems Advantage Forum : Autonomous DB e DBaaS
PDF
IDC datacenter of the future : Oracle point of view
PDF
Annuncio organizzativo-presales-director
PPT
Communications Inustry : innovation solutions for Service Providers
PDF
Virtustream Cloud first sales pitch
PDF
Digital health Oracle : dal fascicolo sanitario ai servizi a valore
PDF
Lift and shift to sparc cloud
PDF
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
PDF
Il Cliente Al Centro del Datacenter : Tavola Rotonda
PDF
Public Cloud services delivered to your Datacenter
PDF
Oracle Cloud Networking And Security Exposed
PDF
Five Journeys to (your) Cloud Infrastructure
PDF
Utilities Digital Data Driven Innovation
PDF
Oracle Cloud : Big Data Use Cases and Architecture
IDC Multicloud 2019 - Conference Milano , Oracle speech
Systems Advantage Forum : Autonomous DB e DBaaS
IDC datacenter of the future : Oracle point of view
Annuncio organizzativo-presales-director
Communications Inustry : innovation solutions for Service Providers
Virtustream Cloud first sales pitch
Digital health Oracle : dal fascicolo sanitario ai servizi a valore
Lift and shift to sparc cloud
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Il Cliente Al Centro del Datacenter : Tavola Rotonda
Public Cloud services delivered to your Datacenter
Oracle Cloud Networking And Security Exposed
Five Journeys to (your) Cloud Infrastructure
Utilities Digital Data Driven Innovation
Oracle Cloud : Big Data Use Cases and Architecture

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Dropbox Q2 2025 Financial Results & Investor Presentation

Sap Hana and Virtustream for Predictive Maintenance and Big Data

  • 1. © Copyright 2015 EMC Corporation. All rights reserved. 1© Copyright 2015 EMC Corporation. All rights reserved. PREDICTIVE MAINTENANCE EMC/VIRTUSTREAM SOLUTION RICCARDO ROMANI
  • 2. © Copyright 2015 EMC Corporation. All rights reserved. PREDICTIVE MAINTENANCE PROCESS 1. Data acquisition and processing: A. Data acquisition from raw data sources ( i.e. railways sensors ) B. Long times for data preparation and transformation into data set be used as structured data model and historical data base C. Long times for model training and testing iterations D. Some batch calculations required 1. Data Storage : Huge amount of data generated at both acquisition as well as historical stages MOST COMMON ISSUES
  • 3. © Copyright 2015 EMC Corporation. All rights reserved. DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS FROM RAW DATA SOURCE… • Example of Data Sets created starting from the raw data • Structured Data • Training data: It is the engine run-to-failure data. • Testing data: It is the engine operating data without failure events recorded. • Ground truth data: It contains the information of true remaining cycles for each engine in the testing data. • Predictive data used to predict when an in-service machine will fail, so that maintenance can be planned in advance. • Responds to the question :” Given these aircraft engine operation and failure events history, can we predict when an in-service engine will fail?” • Regression: Predict the Remaining Useful Life (RUL), or Time to Failure (TTF). • Example of raw data gathered from a Stream Processing System. • Millions of txt files • Unstructured/semi-structured data …TO DATASET
  • 4. © Copyright 2015 EMC Corporation. All rights reserved. SAP HANA Sybase IQ Sybase IQ Hot Data Warm Data Cold Data • Analytics run on Data Model − Modern in-memory platform − Transact/analyze in real- time − Native predictive, text, and spatial algorithms • Warm Historical Data Model − Disk backed, smart column store ( HANA on disk or IQ ) − It helps in offloading HANA from huge amount of data that are dinamically stored on disk instead of in-memory − Excels at queries on structured data from terabyte to petabyte scale • Cold Historical Data Model − Less frequently accessed data is archived in time partitions on IQ − The data is static and used primarily for read access − data resides in cost- efficient storage with fewer backups to reduce operational costs − Lower SLA requirements HOW COMBINING SAP AND EMC/VIRTUSTREAM CAN HELP HADOOP Raw data • Data acquired from Sensors − New type of User Defined Function for data federation − Direct access to HDFS without need for the package, mapper, and reducer specification − Invoke custom Map Reduce jobs Solution: EMC with Isilon, Hadoop-native storage combined with Virtustream Cloud Storage (EMC ECS Object Storage ) DIFFERENT TYPES OF DATA IN PREDICTIVE ANALYTICS Solution : Virtustream with SAP HANA Cloud IaaS/PaaS
  • 5. © Copyright 2015 EMC Corporation. All rights reserved. SCENARIO #1 SAP HANA, SAP IQ @ VIRTUSTREAM , DATA ACQUISITION ON ISILON @ HYPERCED HISTORIC/COLD Gateway Rolling Stock Raw data RAW DATA 1. Raw Data acquisition done at gateway level– serialization could be a bottleneck as well as raw data history growth 2. Predictive model computation done at Hot Data level, in Hana 3. Warm data on SAP IQ 4. Cold data archived ( could be SAP IQ ) for compliance and outlier data from faulty sensors and for statistical purposes • EMC can control rolling stock data growth during acquisition with Isilon storage certified by SAP and sitting at gateway layer HISTORIC/ WARM HOT DATA Datacenter Almaviva “Hyperced” Datacenter
  • 6. © Copyright 2015 EMC Corporation. All rights reserved. SCENARIO #2 ADDING HADOOP COMBINED WITH ISILON @ HYPERCED 1. Raw Data acquistion done in parallel streams. Hadoop speed up parsing, and creation of reduced dataset to be loaded and historicized 2. Predictive model computation can be done accessing a wider dataset accessing Warm data on IQ and Hadoop 3. Cold data stored on hadoop filesystem as “intelligent” archiving for statistical analysis and retrieval • EMC can control and improve Data growth management at both Hadoop and Storage level − Hadoop for unlimited capacity for raw data processing − Apache Kafka or Storm for distributed stream processing Gateway/Data Lake Rolling Stock Raw data INTELLIGENT ARCHIVING FOR RAW DATA AND HSISTORIC HISTORIC/ WARM HOT DATA Almaviva “Hyperced” Datacenter
  • 7. © Copyright 2015 EMC Corporation. All rights reserved. HADOOP AND SAP 1/3 HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS Where SAP and EMC combined technologies fits in a predictive maintenance project
  • 8. © Copyright 2015 EMC Corporation. All rights reserved. HADOOP AND SAP 2/3 Data acquisition • Real time events streams coming in at a rate of thousands of raw events per seconds. • The stream processing system should be able to process those events in a fault tolerant and distributed manner and with parallel processing • Streaming processing systems should also keep record of old data for some reasonable amount of time before they are archived or destroyed in order to : A. build pattern recognition and statistical model building methods B. Cope with local country laws. C. to identify any potential outliers in the streamed-in data from the sensors. While monitoring for the faults in the assets it is possible that the sensor that is taking the readings, being a machine itself, could fail and start sending faulty records. Intelligent CBM management systems capable of detecting such outliers will try to isolate these faulty sensors and notify We have evaluated two popular open source technologies; Apache Kafka , which is the distributed messaging system, and Storm which is a distributed stream processing engine. Both having HADOOP as repository for managing data growth. SAP provides a product called “ESP – Event Stream Processor” that integrates HANA ad Hadoop HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
  • 9. © Copyright 2015 EMC Corporation. All rights reserved. Build, train and validate the model – creating Data Sets • To detect failure in a given stream of sensor data, we need to first define normal behavior. • For this we need to build model around the historical sensor data. • Predictive models analyze current and historical data on individuals to produce metrics. • A model is reusable and is created by training an algorithm using historical data and saving the model for reuse purpose to share the common business rules which can be applied to similar data, in order to analyze results without the historical data, by using the trained algorithm • The process involve running one or more algorithms on the data set where prediction is going to be carried out. This is an iterative processing and often involves training the model, using multiple models on the same data set and finally arriving on the best fit model based on the business data understanding. • Raw data, once prepared, are organized in Data Sets, needed to be stored to be used by different type of models and algorithms HADOOP AND SAP 3/3 HADOOP IN ACTION @ SAP PREDICTIVE MAINTENANCE BLUEPRINTS
  • 10. © Copyright 2015 EMC Corporation. All rights reserved. VIRTUSTREAM SAP HANA DT ALMAVIVA HYPERCED Servers & Network DATA STORAGE powered by EMC HDFS or NFS EMC ISILON , VNX FibreChannel or NFS EMC VMAX, VNX or XtremIO EMC/VIRTUSTREAM HYBRID ARCHITECTURE HOT Extended Storage IQ NLS Backup / Archive Best in class Cloud IaaS/PaaS for SAP workloads , through Almaviva White Label Option EMC Scale Out Unstructured Storage with de-duplication features and native HDFS support From 15% to 35 % of data storage efficiency depending on data specific nature GATEWAY Unstructured Storage Raw Data
  • 11. © Copyright 2015 EMC Corporation. All rights reserved.© Copyright 2015 EMC Corporation. All rights reserved.
  • 12. © Copyright 2015 EMC Corporation. All rights reserved. 1. Acquire OT data in parallel streams 2. Clean data and import only the validated data into SAP HANA 3. Historic OT data in IQ and/or Hadoop 4. Use Hive to build a relational/historical view of the OT data also at Hadoop level 5. Unify OT view with IT data, if needed 6. Save HANA resources for realtime predictions only 7. De-duplicate and store raw OT data for future use or statistic purposes Gateway/Data Lake HADOOP AND SAP PDMS BLUEPRINT
  • 13. © Copyright 2015 EMC Corporation. All rights reserved. HADOOP AND SAP GATHERING REAL-TIME EVENTS AT SCALE