MAN, MACHINE & MATHEMATICS
How In-Memory & Open Source Technologies
helping solve Big Data problems
Arup Ray
3rd
International Conference on Business Analytics &
Intelligence, IIM Bangalore, 17th
to 19th
December 2015
Man, Machine & Mathematics | How In
Memory & Open Source Technologies
helping solve Big Data problems
Abstract
Emergence of new technologies helping us capture, manage and interpret the deluge of data
coming from multiple sources (Big Data) giving us the opportunity to run the business using
signals that are fast, real time and hence more relevant. These signals can measure
performance, provide critical indicators about the business, identify customer issues and
complaints, help market more effectively and accurately and make decision real time. This
paper explores these emerging trends in big data technology like in-memory database,
integration of proprietary technologies with open source technologies like Hadoop, R and
application of big data technology in the area of Internet of Things (IOT).
The paper also introduces the concept of Analytics Maturity Quadrant (AMQ) to help businesses
evaluate and develop their analytics strategy.
INTRODUCTION
Solving business problems and generating disruptive business insights with petabytes of
data sounds great on paper, but can be extremely challenging task in real life. For
example, an $18 billion-a-year CPG conglomerate with global footprint, must quickly
respond to the fluctuating costs of 4,000 raw materials that go into more than 20,000
products. What’s more, if they can make promotions for these products more timely by
using faster analysis, the company and its retailer customers can command higher prices
in a business known for razor-thin profit margins. Challenge is not only about storing
petabytes of data (big data), but how fast can we run mathematical models on these
huge data to generate intelligent insights in real time.
This paper explores following aspects taking SAP HANA as a reference
- the emerging trend in in- memory computing and its impact on
analytics
- how marriage between proprietary in memory technology and open
source technology helping mathematicians solve real life problems
- how this technology evolution has made analytics the ‘brain’ behind the
internet of things (IOT) revolution
-
1. IN MEMORY COMPUTING AND ANALYTICS
1.1. Arrival of In memory Analytics
As the cost of RAM declines, in-memory analytics is no more pipe dream for many
businesses. The 64-bit operating systems with 2 terabyte (TB) addressable
memory have made it possible to cache large volumes of data, potentially an
entire data warehouse or data mart in a computer’s RAM. In addition to
incredibly fast query response times, in-memory analytics can reduce or
eliminate the need for data indexing and storing pre-aggregated data in OLAP
(On Line Analytical Processing) cubes or aggregate tables.
1.2. Advent of column storage
Another evolution is usage of columnar data storage for Analytics applications.
Unlike the traditional data storage, where the data records are indexed and
stored in rows with the record containing all the fields, products like Sybase IQ
leverage columnar data storage for analytics which permits faster data access
for OLAP system. For example, in column storage, data is only partially blocked
during access & individual columns can be processed at the same time by
different cores.
However row storage continues to be a preferred option for an OLTP (On Line
Transaction Processing) system where the transaction system may require access
to all the fields of a record every time user accesses the record (e.g., creation
of a sales order). Hence OLTP and OLAP continue to sit in two different boxes
and data need to move from OLTP to OLAP( to run analytics models, reports &
dashboards) & OLAP to OLTP ( for analytics to trigger action in transaction
system). A marriage of row based and column-based technologies can eliminate
the need of maintaining two different systems.
1.3. In-memory Database
An in-memory database means all the data is stored in the memory (RAM) and no
time is wasted in loading the data from hard disk to RAM. Everything is in-memory,
which gives the CPUs quick access to data for processing. The speed advantages
offered by this RAM storage system is further accelerated by the use of multi-
core CPUs, multiple CPUs per board, and multiple boards per server appliance.
In-memory database like SAP HANA combines the power of hardware and
software to process massive volume of real time data using in-memory
computing, e.g.,
 It combines row-based and column-based database technology.
 Data now resides in main-memory (RAM) and no longer on a hard disk.
It is best suited for performing real-time analytics and developing and
deploying real-time applications.
Fig 1. In Memory Computing: Combining power of Hardware & Software
As Forrester Research has pointed out, the outcome of this evolution in
technology is a distributed in-memory data platform like SAP HANA that
enterprises can use to support real-time analytics, predictive and text analytics,
and extreme transaction volumes. The next-generation data platform demands
looking at these new technologies to help deliver the speed, agility and new
insights critical to helping your business grow. For decades, organizations have
built the transactional, operational and analytical layers to support various
applications, operational reporting, and analytics. However, with the growing
need to support real-time data sharing driven by mobile enterprise, separate
transactional, operational, and analytical layers are creating an obstacle in
supporting such an initiative. Distributed in-memory data platform offers a new
approach to collapse the technology stack that can eliminate redundant
hardware, software and middleware components to save money and reduce
complexity through automation and integrated systems that can help developers
and DBAs become more productive.
1.4. OLTP & OLAP in a box
An appliance like SAP HANA blends the column and row storage in the same
database eliminating the need for data movement between two different boxes,
which allows OLTP and OLAP in a single box. Hence the moment a sales order is
created in ERP system, the data is accessible to analytical application and the
business users can access operational reports real time helping them take
corrective action real time instead of doing a post mortem. In section 4, the
paper describes how HANA PAL (predictive analytics library) can access
transaction data real time and deliver a decision of predictive maintenance just
in time to avoid costly breakdowns and prevent millions of dollars loss due to
unplanned maintenance.
2. IN MEMORY COMPUTING AND BIG DATA ANALYTICS
With the explosion of data, even with current high capacity RAM & multi core
processors, the cost of hardware can be prohibitive if business need to store huge
volume of data generated every second (,e.g., clickstreams from website or real
time data generated by hundreds of sensors attached to a Formula One car). This
challenge of large volume of data can be addressed by low cost open source
technologies like Hadoop. However the advantage of the in memory computing
will remain unutilized unless the in-memory analytics can be integrated with
data lakes seamlessly. Hence the built in integration with Hadoop or Spark with
SAP HANA (or similar other in-memory databases) can support an architecture
where data can be stored based on the ‘data temperature’ or frequency of access.
Fig 2. An Big Data Integrated Architecture
2.1. In Memory Data Base and Hadoop Integration
Taking SAP Big Data architecture as an example, a typical big data integrated
architecture stores hot data (more frequently accessed data) in SAP HANA (in
memory database), warm data in Sybase IQ while cold data (less frequently
accessed data) in Hadoop. This integrated architecture allows need based data
access while keeping the infrastructure cost manageable.
SAP recently launched SAP HANA Vora, an in-memory query engine. It runs on
Apache Spark to analyze Big Data stored in Hadoop. The goal is to deliver a single
integrated platform that embraces Hadoop ecosystem for
◾ All Data: OLTP, OLAP & Big Data
◾ All Operations setup, admin, monitoring, operations
◾ ONE interface for applications
Key features of this integration are
◾ Building OLAP style capabilities on Hadoop/HDFS and extend SAP HANA
& Hadoop integrations to provide optimized data processing and
movement between the two platforms
◾ Enabling massive scale out scenarios for HANA
However, SAP HANA VORA reveals another trend that software vendors adopting.
It permits users of open source technologies to continue with the open source
environment to build analytical applications while leveraging power of in
memory computing. For example, SAP HANA VORA offers
 Extensive programming support for Scala, python, C, C++, R, and
Java allow data scientists to use their tool of choice,
 Enable data scientists and developers who prefer Spark R, Spark ML
to mash up corporate data with Hadoop/Spark data easily
 Leverage SAP HANA’s multiple data processing engines and in
memory computing for developing new insights from business and
contextual data.
2.2. In-memory Data base and R Integration
Another aspect of integration of in memory database with open source is
integration with R which gives access to practically infinite number of readily
available algorithms.
Extending the example of SAP Big Data Architecture, SAP Predictive Analytics
offers advanced SAP HANA integration and provides in-database and in-memory
computing through dedicated SAP HANA native libraries and R :
• SAP PAL for HANA : the Predictive Analytics Library with HANA native and
optimized implementations of industry standard predictive algorithms
• SAP APL for HANA : the Automated Predictive Library with SAP proprietary
algorithms which automate many tasks for a simplified, quicker and high
quality model definition
• The R Server enables using any R open source algorithm in the HANA engine
The goal of the integration of the SAP HANA database with R is to enable the
embedding of R code in the SAP HANA database context. That is, the SAP HANA
database allows R code to be processed in-line as part of the overall query
execution plan. This scenario is suitable when an SAP HANA-based modeling and
consumption application wants to use the R environment for specific statistical
functions.
An efficient data exchange mechanism supports the transfer of intermediate
database tables directly into the vector-oriented data structures of R. This offers
a performance advantage compared to standard SQL interfaces, which are tuple
based and therefore require an additional data copy on the R side.
Fig 3. SAP HANA – R Integration Architecture
To process R code in the context of the SAP HANA database, the R code is
embedded in SAP HANA SQL code in the form of a RLANG procedure. The SAP
HANA database uses the external R environment to execute this R code, similar
to native database operations like joins or aggregations. This allows the
application developer to elegantly embed R function definitions and calls within
SQL Script and submit the entire code as part of a query to the database.
Fig 3. shows three main components of the integrated solution: the SAP HANA-
based application, the SAP HANA database, and the R environment. When the
calculation model plan execution reaches an R-operator, the calculation engine’s
R-client issues a request through the Rserve mechanism to create a dedicated R
process on the R host. Then, the R-Client efficiently transfers the R function
code and its input tables to this R process, and triggers R execution. Once the R
process completes the function execution, the resulting R data frame is returned
to the calculation engine, which converts it. Since the internal column-oriented
data structure used within the SAP HANA database for intermediate results is
very similar to the vector-oriented R data frame, this conversion is very efficient.
A key benefit of having the overall control flow situated on the database side is
that the database execution plans are inherently parallel and, therefore,
multiple R processes can be triggered to run in parallel without having to worry
about parallel execution within a single R process.
While the leading vendors of in memory computing and analytics products are
integrating their products to open source technologies, the product upgrade
cycle for software vendors are relatively slow compared to open source
technologies. This may lead to occasional integration challenges due to version
incompatibility.
3. BIG DATA ANALYTICS AND IOT
All these developments has opened up new opportunities for practical
application of analytics. While sensors, clicks, POS devices etc. can generate
large volume of valuable data, making sense of data requires predictive
modelling and processing huge volume of data within reasonable time. Taking
the example of tracking a formula one car, the predictive models can identify
potential failure of a component well in advance by processing zillions of data
from sensors real time and hence can prevent a major accident saving life and
millions of dollars investment.
Building an IoT solution involves three main steps, or phases:
Step 1. Data integration: This is the first and primary step. It brings a variety of
data into a coherent, complete set – from the edge to the core – to offer the
deepest and broadest insights possible.
Step 2. Data management: This step, which brings together IT and
infrastructures, requires special attention. Data management must address the
challenge of managing large volumes of data, as well as layering on contextual
information such as asset taxonomy and time and location data.
Step 3. Making sense of Data: Once foundational data integration and data
management are put in place, many types of business innovation are possible.
Enterprises can create meaningful insights and reimagine their business models
and customer experiences by leveraging predictive /prescriptive analytics.
The next section describes a typical architecture and data flow for an IOT
scenario.
4. CASE STUDY : PREDICTIVE MAINTENACE AND SERVICE FOR A EUROPEAN
MANUFACTURER OF AIR SYTEMS
The customer, a leading manufacturer of air compressors, wanted to provide
differentiated value by supplying compressed gas as a business focus in addition
to their compressors. For this reason, downtime and breakdowns becomes a
critical factor, as it would result in substantial loss for the company. Predictive
maintenance and service would help to understand the availability of the
machinery and would help to avoid lost revenue and lower maintenance costs.
4.1. The Process Innovation:
 Move from Preventative Maintenance to Predictive Maintenance in order to
improve product reliability, service revenue, and customer satisfaction.
 Application of monitoring and predictive analysis by coupling and analyzing
disperse historical data with actual equipment data to more accurately
predict future equipment failures.
 Combined customer data and service level agreements / contracts to alert
and support the service team in preventing failures in an optimized way by
an analytics solution.
Predictive Maintenance provides the ability to plan demand for aftermarket
service and sales based on visibility into the customer base and the support
needed.
4.2. Architecture & Requirements
 Big Data volume from sensor data (temperature, pressure, machine
conditions) in combination with product data including failure codes, machine
master data and business data (OEM, dealer). Integration with Hadoop to
build a data lake is under consideration.
 Flexible predictive tools and algorithms combining technical and business
data
 HANA/IQ central store fed by Data Services and ESP using SAP BI Tools/Portal
for visualization
 Machine Data Insight combining stream & signal intelligence
Fig 4. The IOT Architecture for the compressor manufacturing company
4.3. Value Drivers/Benefits
 Customers can generate additional revenue since they can extend their
service with a prediction on when a machine might break down in the future,
how long they can run a production line with an existing failure and they can
provide the right spare parts to shorten a maintenance shut down.
 Reduced service, warranty and maintenance costs
 Higher service profitability and customer satisfaction.
 Better alignment with spare parts planning and availability
 Improved service intervals / lower service costs for customer
5. ANALYTICS MATURITY QUADRANT
As part of the study, the concept of Analytics Maturity Quadrant (AMQ) was
introduced in the light of above evolution.
The AMQ maps a business with respect to two parameters:
- Maturity in application of analytics as a business tool
- Maturity in usage of data & technology
Fig 5 A. Analytics Maturity Quadrant
As part of the ongoing study, companies from different industry sectors are being
analyzed in terms of their technology footprint and usage of analytics. The
general trend indicates sectors like Retail, Telecom and Financial Services are
way ahead of their counterparts from other sectors and majority of them have
invested in technology and in-house center of excellence for advanced analytics.
In contrast, sectors like manufacturing, CPG, Energy etc. continue to depend
extensively on traditional BI reporting with occasional small pockets of
predictive/prescriptive analytics usage (e.g. forecasting, optimization of
distribution etc.). The later segment has started realizing potential of analytics
being used as strategic tool for competitive advantage. An outcome of this
realization is investment in technology although usage of advanced analytics is
lacking.
The diagram 5 B is an indicative mapping of some of the key industry sectors.
The objective of this mapping is to assist interested companies identify relative
position in the quadrant and define their strategy to move towards the top right
corners.
Fig 5 B. Analytics Maturity Quadrant : Industry Perspective
References
1. Evelson, Boris, I forget: what's in-memory (2010) ? http://blogs.forrester.com,
March 2010
2. Connect, Transform, and Reimagine Business in a Hyperconnected
Future(2014), SAP Thought leadership paper, 2014.
3. Abadi, D. J., Madden , S. R. , Hachem , N. , Column-stores vs. row-stores: how
different are they really? (2008) ,SIGMOD, 2008. pp. 967–980
4. Hensen Doug, In Memory Data bases, InformationWeek, March, 2014, pp. 9-16
5. McKinsey Global Institute , Big data: The next frontier for innovation,
competition, and productivity , May 2011
6. Plattner,Hasso , Zeier Alexander , In-Memory Data Management :An Inflection
Point for Enterprise Applications (2011), Springer, Germany
About Author
Arup Ray, a senior executive with 20+ years of industry experience, manages a global vertical
in Analytics, Big Data, EIM & HANA in SAP SDC. In his role, Arup is also member of the SAP
services global leadership team for Analytics, Big Data & HANA. As a consultant & business
head, Arup had an opportunity to assist customers across five continents in the area of
Analytics, Big Data & Supply Chain Management in multiple industries, e.g., CPG, Retail,
Telecom, Manufacturing etc.. He has also incubated Big Data & HANA CoE in SDC global and
helping customer leverage Analytics & Big Data technology to achieve their strategic goals.
An alumnus of IIT Delhi & ISB Hyderabad, Arup is currently pursuing Business Analytics &
Intelligence program at IIM Bangalore.

More Related Content

PDF
MapR Data Hub White Paper V2 2014
PPTX
SAP HANA Integrated with Microstrategy
PDF
CIO Guide to Using SAP HANA Platform For Big Data
PDF
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
PPTX
Implementing bi in proof of concept techniques
PDF
Big data/Hadoop/HANA Basics
PPTX
Big data and apache hadoop adoption
PDF
Massive sacalabilitty with InterSystems IRIS Data Platform
MapR Data Hub White Paper V2 2014
SAP HANA Integrated with Microstrategy
CIO Guide to Using SAP HANA Platform For Big Data
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
Implementing bi in proof of concept techniques
Big data/Hadoop/HANA Basics
Big data and apache hadoop adoption
Massive sacalabilitty with InterSystems IRIS Data Platform

What's hot (20)

PDF
SAP BW vs Teradat; A White Paper
PDF
Data warehousing
PPT
OLAP Cubes in Datawarehousing
PDF
Redefining Data Analytics Through Search
PDF
SAP Lambda Architecture Point of View
PPTX
Data ware house design
PDF
A treatise on SAP logistics information reporting
PPT
CS8091_BDA_Unit_I_Analytical_Architecture
PPT
Gulabs Ppt On Data Warehousing And Mining
PDF
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
PPT
Date warehousing concepts
PPTX
Data ware house architecture
PDF
Teradata - Presentation at Hortonworks Booth - Strata 2014
PDF
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
PDF
Traditional data word
PDF
Optimising Data Lakes for Financial Services
PPTX
DATA WAREHOUSING
PPTX
Comparison with Traditional databases
PPTX
Data Warehousing - in the real world
SAP BW vs Teradat; A White Paper
Data warehousing
OLAP Cubes in Datawarehousing
Redefining Data Analytics Through Search
SAP Lambda Architecture Point of View
Data ware house design
A treatise on SAP logistics information reporting
CS8091_BDA_Unit_I_Analytical_Architecture
Gulabs Ppt On Data Warehousing And Mining
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
Date warehousing concepts
Data ware house architecture
Teradata - Presentation at Hortonworks Booth - Strata 2014
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
Traditional data word
Optimising Data Lakes for Financial Services
DATA WAREHOUSING
Comparison with Traditional databases
Data Warehousing - in the real world
Ad

Viewers also liked (13)

PDF
Horizon 1
PDF
Indice c3 m2
PPTX
Simple smo
PPT
Redes sociales
ODP
Presentacion HOUSE.
PDF
PDF
WANLIMA_
PDF
La religión como enfermedad mental
DOCX
Selfie: el olvido del ser-para-otro
DOCX
El segundo sexo: alcances, logros y fracasos sobre la condición de la mujer e...
PPTX
Elaboración del estudio de impacto ambiental
PPTX
Impacto ambiental actualizado
PDF
Legitimación de la violencia como principio de equilibrio social
Horizon 1
Indice c3 m2
Simple smo
Redes sociales
Presentacion HOUSE.
WANLIMA_
La religión como enfermedad mental
Selfie: el olvido del ser-para-otro
El segundo sexo: alcances, logros y fracasos sobre la condición de la mujer e...
Elaboración del estudio de impacto ambiental
Impacto ambiental actualizado
Legitimación de la violencia como principio de equilibrio social
Ad

Similar to ManMachine&Mathematics_Arup_Ray_Ext (20)

PDF
Comparison among rdbms, hadoop and spark
PDF
Lecture about SAP HANA and Enterprise Comupting at University of Halle
PDF
Empowering SAP HANA Customers and Use Cases
PDF
Big Data, Big Thinking: Simplified Architecture Webinar Fact Sheet
PPTX
PDF
5507832a c074-4013-9d49-6e58befa9c3e-161121113026
PDF
What Is SAP HANA And Its Benefits?
PDF
Unstructured Datasets Analysis: Thesaurus Model
PPT
Hana Training Day 1
PDF
Top 10 Big Data Tools that you should know about.pdf
PDF
Real time data processing frameworks
PDF
IJSRED-V2I3P43
PDF
HANA Demystified by DataMagnum
PDF
Enabling SQL Access to Data Lakes
PDF
Big Data Tools: A Deep Dive into Essential Tools
PDF
SAP HORTONWORKS
PPT
Sap Interview Questions - Part 1
DOCX
PDF
Memory Management in BigData: A Perpective View
PPTX
Analysis of Major Trends in Big Data Analytics
Comparison among rdbms, hadoop and spark
Lecture about SAP HANA and Enterprise Comupting at University of Halle
Empowering SAP HANA Customers and Use Cases
Big Data, Big Thinking: Simplified Architecture Webinar Fact Sheet
5507832a c074-4013-9d49-6e58befa9c3e-161121113026
What Is SAP HANA And Its Benefits?
Unstructured Datasets Analysis: Thesaurus Model
Hana Training Day 1
Top 10 Big Data Tools that you should know about.pdf
Real time data processing frameworks
IJSRED-V2I3P43
HANA Demystified by DataMagnum
Enabling SQL Access to Data Lakes
Big Data Tools: A Deep Dive into Essential Tools
SAP HORTONWORKS
Sap Interview Questions - Part 1
Memory Management in BigData: A Perpective View
Analysis of Major Trends in Big Data Analytics

ManMachine&Mathematics_Arup_Ray_Ext

  • 1. MAN, MACHINE & MATHEMATICS How In-Memory & Open Source Technologies helping solve Big Data problems Arup Ray 3rd International Conference on Business Analytics & Intelligence, IIM Bangalore, 17th to 19th December 2015
  • 2. Man, Machine & Mathematics | How In Memory & Open Source Technologies helping solve Big Data problems Abstract Emergence of new technologies helping us capture, manage and interpret the deluge of data coming from multiple sources (Big Data) giving us the opportunity to run the business using signals that are fast, real time and hence more relevant. These signals can measure performance, provide critical indicators about the business, identify customer issues and complaints, help market more effectively and accurately and make decision real time. This paper explores these emerging trends in big data technology like in-memory database, integration of proprietary technologies with open source technologies like Hadoop, R and application of big data technology in the area of Internet of Things (IOT). The paper also introduces the concept of Analytics Maturity Quadrant (AMQ) to help businesses evaluate and develop their analytics strategy. INTRODUCTION Solving business problems and generating disruptive business insights with petabytes of data sounds great on paper, but can be extremely challenging task in real life. For example, an $18 billion-a-year CPG conglomerate with global footprint, must quickly respond to the fluctuating costs of 4,000 raw materials that go into more than 20,000 products. What’s more, if they can make promotions for these products more timely by using faster analysis, the company and its retailer customers can command higher prices in a business known for razor-thin profit margins. Challenge is not only about storing petabytes of data (big data), but how fast can we run mathematical models on these huge data to generate intelligent insights in real time. This paper explores following aspects taking SAP HANA as a reference - the emerging trend in in- memory computing and its impact on analytics - how marriage between proprietary in memory technology and open source technology helping mathematicians solve real life problems - how this technology evolution has made analytics the ‘brain’ behind the internet of things (IOT) revolution -
  • 3. 1. IN MEMORY COMPUTING AND ANALYTICS 1.1. Arrival of In memory Analytics As the cost of RAM declines, in-memory analytics is no more pipe dream for many businesses. The 64-bit operating systems with 2 terabyte (TB) addressable memory have made it possible to cache large volumes of data, potentially an entire data warehouse or data mart in a computer’s RAM. In addition to incredibly fast query response times, in-memory analytics can reduce or eliminate the need for data indexing and storing pre-aggregated data in OLAP (On Line Analytical Processing) cubes or aggregate tables. 1.2. Advent of column storage Another evolution is usage of columnar data storage for Analytics applications. Unlike the traditional data storage, where the data records are indexed and stored in rows with the record containing all the fields, products like Sybase IQ leverage columnar data storage for analytics which permits faster data access for OLAP system. For example, in column storage, data is only partially blocked during access & individual columns can be processed at the same time by different cores. However row storage continues to be a preferred option for an OLTP (On Line Transaction Processing) system where the transaction system may require access to all the fields of a record every time user accesses the record (e.g., creation of a sales order). Hence OLTP and OLAP continue to sit in two different boxes and data need to move from OLTP to OLAP( to run analytics models, reports & dashboards) & OLAP to OLTP ( for analytics to trigger action in transaction system). A marriage of row based and column-based technologies can eliminate the need of maintaining two different systems. 1.3. In-memory Database An in-memory database means all the data is stored in the memory (RAM) and no time is wasted in loading the data from hard disk to RAM. Everything is in-memory, which gives the CPUs quick access to data for processing. The speed advantages offered by this RAM storage system is further accelerated by the use of multi- core CPUs, multiple CPUs per board, and multiple boards per server appliance. In-memory database like SAP HANA combines the power of hardware and software to process massive volume of real time data using in-memory computing, e.g.,  It combines row-based and column-based database technology.  Data now resides in main-memory (RAM) and no longer on a hard disk.
  • 4. It is best suited for performing real-time analytics and developing and deploying real-time applications. Fig 1. In Memory Computing: Combining power of Hardware & Software As Forrester Research has pointed out, the outcome of this evolution in technology is a distributed in-memory data platform like SAP HANA that enterprises can use to support real-time analytics, predictive and text analytics, and extreme transaction volumes. The next-generation data platform demands looking at these new technologies to help deliver the speed, agility and new insights critical to helping your business grow. For decades, organizations have built the transactional, operational and analytical layers to support various applications, operational reporting, and analytics. However, with the growing need to support real-time data sharing driven by mobile enterprise, separate transactional, operational, and analytical layers are creating an obstacle in supporting such an initiative. Distributed in-memory data platform offers a new approach to collapse the technology stack that can eliminate redundant hardware, software and middleware components to save money and reduce complexity through automation and integrated systems that can help developers and DBAs become more productive. 1.4. OLTP & OLAP in a box An appliance like SAP HANA blends the column and row storage in the same database eliminating the need for data movement between two different boxes, which allows OLTP and OLAP in a single box. Hence the moment a sales order is created in ERP system, the data is accessible to analytical application and the business users can access operational reports real time helping them take corrective action real time instead of doing a post mortem. In section 4, the
  • 5. paper describes how HANA PAL (predictive analytics library) can access transaction data real time and deliver a decision of predictive maintenance just in time to avoid costly breakdowns and prevent millions of dollars loss due to unplanned maintenance. 2. IN MEMORY COMPUTING AND BIG DATA ANALYTICS With the explosion of data, even with current high capacity RAM & multi core processors, the cost of hardware can be prohibitive if business need to store huge volume of data generated every second (,e.g., clickstreams from website or real time data generated by hundreds of sensors attached to a Formula One car). This challenge of large volume of data can be addressed by low cost open source technologies like Hadoop. However the advantage of the in memory computing will remain unutilized unless the in-memory analytics can be integrated with data lakes seamlessly. Hence the built in integration with Hadoop or Spark with SAP HANA (or similar other in-memory databases) can support an architecture where data can be stored based on the ‘data temperature’ or frequency of access. Fig 2. An Big Data Integrated Architecture 2.1. In Memory Data Base and Hadoop Integration Taking SAP Big Data architecture as an example, a typical big data integrated architecture stores hot data (more frequently accessed data) in SAP HANA (in
  • 6. memory database), warm data in Sybase IQ while cold data (less frequently accessed data) in Hadoop. This integrated architecture allows need based data access while keeping the infrastructure cost manageable. SAP recently launched SAP HANA Vora, an in-memory query engine. It runs on Apache Spark to analyze Big Data stored in Hadoop. The goal is to deliver a single integrated platform that embraces Hadoop ecosystem for ◾ All Data: OLTP, OLAP & Big Data ◾ All Operations setup, admin, monitoring, operations ◾ ONE interface for applications Key features of this integration are ◾ Building OLAP style capabilities on Hadoop/HDFS and extend SAP HANA & Hadoop integrations to provide optimized data processing and movement between the two platforms ◾ Enabling massive scale out scenarios for HANA However, SAP HANA VORA reveals another trend that software vendors adopting. It permits users of open source technologies to continue with the open source environment to build analytical applications while leveraging power of in memory computing. For example, SAP HANA VORA offers  Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice,  Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily  Leverage SAP HANA’s multiple data processing engines and in memory computing for developing new insights from business and contextual data. 2.2. In-memory Data base and R Integration Another aspect of integration of in memory database with open source is integration with R which gives access to practically infinite number of readily available algorithms. Extending the example of SAP Big Data Architecture, SAP Predictive Analytics offers advanced SAP HANA integration and provides in-database and in-memory computing through dedicated SAP HANA native libraries and R : • SAP PAL for HANA : the Predictive Analytics Library with HANA native and optimized implementations of industry standard predictive algorithms
  • 7. • SAP APL for HANA : the Automated Predictive Library with SAP proprietary algorithms which automate many tasks for a simplified, quicker and high quality model definition • The R Server enables using any R open source algorithm in the HANA engine The goal of the integration of the SAP HANA database with R is to enable the embedding of R code in the SAP HANA database context. That is, the SAP HANA database allows R code to be processed in-line as part of the overall query execution plan. This scenario is suitable when an SAP HANA-based modeling and consumption application wants to use the R environment for specific statistical functions. An efficient data exchange mechanism supports the transfer of intermediate database tables directly into the vector-oriented data structures of R. This offers a performance advantage compared to standard SQL interfaces, which are tuple based and therefore require an additional data copy on the R side. Fig 3. SAP HANA – R Integration Architecture To process R code in the context of the SAP HANA database, the R code is embedded in SAP HANA SQL code in the form of a RLANG procedure. The SAP
  • 8. HANA database uses the external R environment to execute this R code, similar to native database operations like joins or aggregations. This allows the application developer to elegantly embed R function definitions and calls within SQL Script and submit the entire code as part of a query to the database. Fig 3. shows three main components of the integrated solution: the SAP HANA- based application, the SAP HANA database, and the R environment. When the calculation model plan execution reaches an R-operator, the calculation engine’s R-client issues a request through the Rserve mechanism to create a dedicated R process on the R host. Then, the R-Client efficiently transfers the R function code and its input tables to this R process, and triggers R execution. Once the R process completes the function execution, the resulting R data frame is returned to the calculation engine, which converts it. Since the internal column-oriented data structure used within the SAP HANA database for intermediate results is very similar to the vector-oriented R data frame, this conversion is very efficient. A key benefit of having the overall control flow situated on the database side is that the database execution plans are inherently parallel and, therefore, multiple R processes can be triggered to run in parallel without having to worry about parallel execution within a single R process. While the leading vendors of in memory computing and analytics products are integrating their products to open source technologies, the product upgrade cycle for software vendors are relatively slow compared to open source technologies. This may lead to occasional integration challenges due to version incompatibility. 3. BIG DATA ANALYTICS AND IOT All these developments has opened up new opportunities for practical application of analytics. While sensors, clicks, POS devices etc. can generate large volume of valuable data, making sense of data requires predictive modelling and processing huge volume of data within reasonable time. Taking the example of tracking a formula one car, the predictive models can identify potential failure of a component well in advance by processing zillions of data from sensors real time and hence can prevent a major accident saving life and millions of dollars investment. Building an IoT solution involves three main steps, or phases: Step 1. Data integration: This is the first and primary step. It brings a variety of data into a coherent, complete set – from the edge to the core – to offer the deepest and broadest insights possible.
  • 9. Step 2. Data management: This step, which brings together IT and infrastructures, requires special attention. Data management must address the challenge of managing large volumes of data, as well as layering on contextual information such as asset taxonomy and time and location data. Step 3. Making sense of Data: Once foundational data integration and data management are put in place, many types of business innovation are possible. Enterprises can create meaningful insights and reimagine their business models and customer experiences by leveraging predictive /prescriptive analytics. The next section describes a typical architecture and data flow for an IOT scenario. 4. CASE STUDY : PREDICTIVE MAINTENACE AND SERVICE FOR A EUROPEAN MANUFACTURER OF AIR SYTEMS The customer, a leading manufacturer of air compressors, wanted to provide differentiated value by supplying compressed gas as a business focus in addition to their compressors. For this reason, downtime and breakdowns becomes a critical factor, as it would result in substantial loss for the company. Predictive maintenance and service would help to understand the availability of the machinery and would help to avoid lost revenue and lower maintenance costs. 4.1. The Process Innovation:  Move from Preventative Maintenance to Predictive Maintenance in order to improve product reliability, service revenue, and customer satisfaction.  Application of monitoring and predictive analysis by coupling and analyzing disperse historical data with actual equipment data to more accurately predict future equipment failures.  Combined customer data and service level agreements / contracts to alert and support the service team in preventing failures in an optimized way by an analytics solution. Predictive Maintenance provides the ability to plan demand for aftermarket service and sales based on visibility into the customer base and the support needed. 4.2. Architecture & Requirements  Big Data volume from sensor data (temperature, pressure, machine conditions) in combination with product data including failure codes, machine master data and business data (OEM, dealer). Integration with Hadoop to build a data lake is under consideration.
  • 10.  Flexible predictive tools and algorithms combining technical and business data  HANA/IQ central store fed by Data Services and ESP using SAP BI Tools/Portal for visualization  Machine Data Insight combining stream & signal intelligence Fig 4. The IOT Architecture for the compressor manufacturing company 4.3. Value Drivers/Benefits  Customers can generate additional revenue since they can extend their service with a prediction on when a machine might break down in the future, how long they can run a production line with an existing failure and they can provide the right spare parts to shorten a maintenance shut down.  Reduced service, warranty and maintenance costs  Higher service profitability and customer satisfaction.  Better alignment with spare parts planning and availability  Improved service intervals / lower service costs for customer 5. ANALYTICS MATURITY QUADRANT As part of the study, the concept of Analytics Maturity Quadrant (AMQ) was introduced in the light of above evolution. The AMQ maps a business with respect to two parameters: - Maturity in application of analytics as a business tool - Maturity in usage of data & technology
  • 11. Fig 5 A. Analytics Maturity Quadrant As part of the ongoing study, companies from different industry sectors are being analyzed in terms of their technology footprint and usage of analytics. The general trend indicates sectors like Retail, Telecom and Financial Services are way ahead of their counterparts from other sectors and majority of them have invested in technology and in-house center of excellence for advanced analytics. In contrast, sectors like manufacturing, CPG, Energy etc. continue to depend extensively on traditional BI reporting with occasional small pockets of predictive/prescriptive analytics usage (e.g. forecasting, optimization of distribution etc.). The later segment has started realizing potential of analytics being used as strategic tool for competitive advantage. An outcome of this realization is investment in technology although usage of advanced analytics is lacking. The diagram 5 B is an indicative mapping of some of the key industry sectors. The objective of this mapping is to assist interested companies identify relative position in the quadrant and define their strategy to move towards the top right corners.
  • 12. Fig 5 B. Analytics Maturity Quadrant : Industry Perspective References 1. Evelson, Boris, I forget: what's in-memory (2010) ? http://blogs.forrester.com, March 2010 2. Connect, Transform, and Reimagine Business in a Hyperconnected Future(2014), SAP Thought leadership paper, 2014. 3. Abadi, D. J., Madden , S. R. , Hachem , N. , Column-stores vs. row-stores: how different are they really? (2008) ,SIGMOD, 2008. pp. 967–980 4. Hensen Doug, In Memory Data bases, InformationWeek, March, 2014, pp. 9-16 5. McKinsey Global Institute , Big data: The next frontier for innovation, competition, and productivity , May 2011 6. Plattner,Hasso , Zeier Alexander , In-Memory Data Management :An Inflection Point for Enterprise Applications (2011), Springer, Germany
  • 13. About Author Arup Ray, a senior executive with 20+ years of industry experience, manages a global vertical in Analytics, Big Data, EIM & HANA in SAP SDC. In his role, Arup is also member of the SAP services global leadership team for Analytics, Big Data & HANA. As a consultant & business head, Arup had an opportunity to assist customers across five continents in the area of Analytics, Big Data & Supply Chain Management in multiple industries, e.g., CPG, Retail, Telecom, Manufacturing etc.. He has also incubated Big Data & HANA CoE in SDC global and helping customer leverage Analytics & Big Data technology to achieve their strategic goals. An alumnus of IIT Delhi & ISB Hyderabad, Arup is currently pursuing Business Analytics & Intelligence program at IIM Bangalore.