SlideShare a Scribd company logo
INTERNET OF THINGS
&
PREDICTIVE ANALYTICS
PRASAD NARASIMHAN – TECHNICAL ARCHITECT
INTERNET OF THINGS
• Each “thing” or connected device is part of the digital shadow of a person
• For there to be a market in the internet of things, two things must be true:
1) The “thing” in question must provide utility to the human, and
2) The digital shadow must provide value to an enterprise.
MARKET
• The “market” is made up of many parts :
 From wearable to drivable to home and
 Industrial sensors and controllers, and
• Each part is made up of segments :
 Innovators,
 Early adopters,
 Pragmatists,
 Conservatives, and
 Laggards across many industries.
PREDICTIVE ANALYTICS
• From the data streams that implement the “digital shadows” of people, we can
use predictive analytics to understand their needs and behavior better than ever
before.
• Every new dimension of data increases the predictive power, enabling
enterprises to answer the question “what does the human want?”
INTERNET OF THINGS
&
PREDICTIVE ANALYTICS
• Transforming the internet of things and its sibling, predictive analytics, to be
programmable by the same labor pool that has developed the apps which drove
the mobile revolution makes basic economic sense.
• Types of data generated by the internet of things is coupled with :
 data analysis
 data discovery tools and
 techniques to help business leaders identify emerging developments such as
machines that might need maintenance :
 to prevent costly breakdowns or
 sudden shifts in customer or
 market conditions that might signal some action a company should take.
• The internet of things, the physical world will become a networked information
system—through sensors and actuators embedded in real physical objects and
linked through wired and wireless networks via the internet protocol.
• This holds special value for manufacturing:
 The potential for connected physical systems to improve productivity in the
production process and
 The supply chain is huge.
• Consider processes that govern themselves, where smart products can take
corrective action to avoid damages and where individual parts are automatically
replenished.
• Such technologies already exist and could drive the fourth industrial
revolution— following the steam engine, the conveyor belt (assembly line -
think ford model t), and the first phase of it and automation technology.
EG 1 : AUTO INSURANCE
• The first-order vector was a connected accelerometer offered to drivers :
 to improve their insurance rates based on proven “safe driving” habits.
• Through this digital shadow, the insurance provider can make much better
actuarial predictions than through the coarse-grained data they had before
 age,
 gender, and
 traffic violations.
• This is interesting in the same way the blackberry was interesting - a basic
capability adopted for basic business improvement.
• The second-order vector is much stronger :
 the ability to transform the insurance market to better meet the needs of customers
while changing the rules of competition.
 based on real-time driving information insurance companies can :
 move to a real-time spot-pricing model driven by an exchange (not unlike the stock
exchange),
 bidding on drivers and
 providing insurance on demand. Not driving today? Don’t pay for insurance. Need to drive
fast tomorrow? Pay a little more but don’t worry about your “permanent record”.
• These outcomes are all based on tying the internet of things to predictive
analytics.
EG 2 : HEALTH CARE
• The first-order vector is similar, a wearable accelerometer offered to patients :
 To improve traceability of their compliance with their exercise prescription,
 Enabling better outcomes for cardiac patients.
 Unlike prescription refills, exercise compliance has been untraceable before, so this
digital shadow is a breakthrough for medicine.
• Similar developments exist in digestible sensors within medications :
 which activate only on contact with stomach acid,
 providing higher truth and
 better granularity than a monthly refill.
• In second-order vector in healthcare ,the ability to combine multiple streams of
information that were previously invisible has the potential to drive better health
outcomes through provably higher patient compliance.
• Sorting these data streams at scale will allow health providers and health insurance
companies to rapidly iterate health protocols across a population of humans,
augmenting human expertise with predictive analytics.
• Outcome-based analysis based on predictive models built from data can reduce :
 waste,
 error rates, and
 lawsuits while driving better margins.
• Larger exchanges of this type of data will tend to :
 perform better,
 creating a more effective market and
 a better pool of empirical research for science.
EG 3 : AUTO COMPANIES
• They have installed thousands of "black boxes" inside their prototype and field
testing vehicles to capture second by second data from the dozens of control
units which manage today's automobiles.
• These boxes simply plug into the vehicle's on-board diagnostic (obd) port
which is typically located under the front dashboard of all cars.
• They collect 500-750 different vehicle performance parameters that add up to
terabytes of data in hours!
• The intent of the automakers for installing these boxes is to collect data which their
engineers can later analyze to fix bugs and improve on existing designs.
• For example, one car manufacturer found out from this data that their minivan
batteries would end up in a recall.
 The problem was an underpowered alternator - it was not able to fully recharge the
batteries because the most common drive cycle for this particular minivan (think soccer
mom taking kid to practice) was less than 3 miles.
 As a result, there appeared to be a lot of complaints about dead batteries and the
company was potentially facing the recall of millions of minivans which had this
alternator.
 The boxes collect information about driving cycles and this data was really useful in
understanding the real reason behind the dead batteries.
 The test vehicles which had short drive cycles were the ones which reported dead
batteries! simply changing the alternator to higher capacity could fix the problem.
 Now it was an easy fix to extend this solution to the entire fleet.
ENDLESS OPPORTUNITY
• The opportunities are literally endless,
 Ranging from early fault detection (predicting when a particular component is likely to
fail)
 To automatically adjusting driving route based on traffic pattern predictions.
• The ultimate test of predictive analytics in the internet of things is of course fully
autonomous systems, such as :
 the nissan car of 2020 or
 the google self driving car of today.
• In the end all autonomous systems will need the ability to build predictive
capabilities - in other words, machines must learn machine learning!
EG 4 : GOOGLE’S SELF DRIVING CAR
• Google claims that their self-driving car of today has logged more than 300,000
miles with almost zero incidence of accidents.
• The one time a minor crash did occur was when the car was rear-ended by a
human-driven car!
• So, when the technology is fully mature, it is not just parking valets who
become obsolete, other higher paying professions such as automotive safety
systems experts may also need to look for other options!
• Predictive analytics is the enabler that will make this happen.
EG 5 : JET AIRLINER
• A jet airliner generates 20 terabytes of diagnostic data per hour of flight.
• The average oil platform has 40,000 sensors, generating data 24/7.
• M2M is now generating enormous volumes of data and is testing the capabilities
of traditional database technologies.
• To extract rich, real-time insight from the vast amounts of machine-generated
data, companies will have to build a technology foundation with speed and scale
because raw data, whatever the source, is only useful after it has been
transformed into knowledge through analysis.
• Investigative analytics tools enable interactive, ad-hoc querying on complex big
data sets to identify patterns and insights and can perform analysis at massive
scale with precision even as machine-generated data grows beyond the
petabyte scale
• With investigative analytics, companies can take action
 In response to events in real-time and
 Identify patterns to either capitalize on or
 Prevent an event in the future.
• This is especially important because most failures result from a confluence of
multiple factors, not just a single red flag.
• To fully address the influx of M2M data generated by the increasingly connected
internet of things landscape, companies can deploy a range of technologies to
leverage distributed processing frameworks like hadoop and nosql
 and improve performance of their analytics,
 including enterprise data warehouses,
 analytic databases,
 data visualization, and
 business intelligence tools.
• These can be deployed in any combination of :
 on-premise software,
 appliance, or
 in the cloud.
FINDING RIGHT ANALYTICS DATABASE
TECHNOLOGY
• To find the right analytics database technology to capture, connect, and drive meaning from
data, companies should consider the following requirements:
 Real-time Analysis : Businesses can’t afford for data to get stale. Data solutions need to :
 load quickly and easily,
 and must dynamically query,
 analyze, and
 communicate m2m information in real-time, without huge investments in it administration, support,
and tuning.
 Flexible Querying And Ad-hoc Reporting : When intelligence needs to change quickly, analytic tools
can’t
 be constrained by data schemas that limit the number and
 type of queries that can be performed.
This type of deeper analysis also cannot be constrained by tinkering or time-consuming manual
configuration (such as indexing and managing data partitions) to create and change analytic
queries.
 Efficient Compression : Efficient data compression is key to enabling M2M data management within :
 A network node,
 Smart device, or
 Massive data center cluster.
Better compression allows :
 For less storage capacity overall,
 As well as tighter data sampling and
 Longer historical data sets,
 Increasing the accuracy of query results.
 Ease Of Use And Cost : Data analysis must be :
 Affordable,
 Easy-to-use, and
 Simple to implement in order to justify the investment.
This demands low-touch solutions that are optimized to deliver :
 Fast analysis of large volumes of data,
 With minimal hardware,
 Administrative effort, and
 Customization needed to set up or
 Change query and reporting parameters.
EG 6 : UNION PACIFIC RAILROAD
• The railroad is using sensor and analytics technologies to predict and prevent train
derailments,
• For example, the company has placed infrared sensors on every 20 miles of its tracks to
gather 20 million temperature readings of train wheels each day to look for signs of
overheating, which is a sign of impending failure.
• Meanwhile, trackside microphones are used to pick up “growling” bearings in the wheels.
• Data from such physical measurements are sent via fiber optic lines to union pacific’s data
centers.
• Complex pattern-matching algorithms and analytics are used to identify irregularities,
allowing union pacific experts to determine within minutes of capturing the data whether a
driver should pull a train over for inspection or reduce its speed until it reaches the next
station to be repaired.
HOW TO ANALYZE MACHINE AND SENSOR
DATA
• This tutorial describes how to refine data from heating, ventilation, and air conditioning (hvac)
systems in 20 large buildings around the world using the hortonworks data platform, and how
to analyze the refined sensor data to maintain optimal building temperatures.
• Sensor data
A sensor is a device that measures a physical quantity and transforms it into a digital signal.
sensors are always on, capturing data at a low cost, and powering the “internet of things.”
• Potential uses of sensor data
 Sensors can be used to collect data from many sources, such as:
 To monitor machines or infrastructure such as ventilation equipment, bridges, energy meters, or
airplane engines. This data can be used for predictive analytics, to repair or replace these items
before they break.
 To monitor natural phenomena such as meteorological patterns, underground pressure during oil
extraction, or patient vital statistics during recovery from a medical procedure.
• Prerequisites:
 Hortonworks sandbox (installed and running)
 Hortonworks odbc driver installed and configured
 Microsoft excel 2013 professional plus
• Notes:
 In this tutorial, the hortonworks sandbox is installed on an oracle virtualbox virtual
machine (vm) – your screens may be different.
 Install the odbc driver that matches the version of excel you are using (32-bit or 64-
bit).
 In this tutorial, use the power view feature in microsoft excel 2013 to visualize the
sensor data. Power view is currently only available in microsoft office professional
plus and microsoft office 365 professional plus.
 Note, other versions of excel will work, but the visualizations will be limited to
charts. One can connect to any other visualization tool that one like.
• Overview
To refine and analyze hvac sensor data, :
 Download and extract the sensor data files.
 Load the sensor data into the hortonworks sandbox.
 Run two hive scripts to refine the sensor data.
 Access the refined sensor data with microsoft excel.
 Visualize the sensor data using excel power view.
STEP 1: DOWNLOAD AND EXTRACT THE SENSOR DATA FILES
• Download the sample sensor data contained in a compressed (.zip) folder from
sensorfiles.zip
• Save the sensorfiles.zip file to the computer, then extract the files. One should see a
sensorfiles folder that contains the following files:
 hvac.csv – contains the targeted building temperatures, along with the actual (measured)
building temperatures.
 The building temperature data was obtained using apache flume.
 Flume can be used as a log aggregator, collecting log data from many diverse sources and
moving it to a centralized data store.
 In this case, flume was used to capture the sensor log data, which we can now load into the
hadoop distributed file system (hfds).
 building.csv – contains the “building” database table.
 Apache sqoop can be used to transfer this type of data from a structured database into hfds.
STEP 2: LOAD THE SENSOR DATA INTO THE HORTONWORKS SANDBOX
• Open the sandbox hue and click the
hcatalog icon in the toolbar at the top
of the page, then click create a new
table from a file.
• On the “create a new table from a file”
page, type “hvac” in the table name
box, then click choose a file under the
input file box.
• On the “choose a file” pop-up, click
upload a file.
• Use the file upload dialog to browse
to the sensorfiles folder that was
extracted previously.
• Select the hvac.csv file, then click
open.
• on the “choose a file” pop-up, click
the hvac.csv file.
• The default settings on the “create a
new table from a file” page are
correct for this file, scroll down to the
bottom of the page and click create
table
• A progress indicator appears while
the table is being created
• When the table has been created, it
appears in the hcatalog table list.
• Repeat the previous steps to create a
“building” table by uploading the
building.csv file.
 Now let’s take a look at the two data tables.
 On the hcatalog table list page, select the check box
next to the “hvac” table, then click browse data.
 One can see that the “hvac” table includes :
 columns for date,
 time,
 the target temperature,
 the actual temperature,
 the system identifier,
 the system age, and
 the building id.
• Navigate back to the hcatalog table list page.
• Select the check box next to the “building” table, then click browse data.
• One can see that the “building” table includes columns for the building
identifier, the building manager, the building age, the hvac product in the
building, and the country in which the building is located.
STEP 3: RUN TWO HIVE SCRIPTS TO REFINE THE SENSOR DATA
• Now use two hive scripts to refine the sensor data.
• We hope to accomplish three goals with this data :
 Reduce heating and cooling expenses.
 Keep indoor temperatures in a comfortable range between 65-70 degrees.
 Identify which hvac products are reliable, and replace unreliable equipment with
those models.
 First, identify whether the actual temperature was more than five degrees different
from the target temperature. In the sandbox hue, click the beeswax (hive ui) icon in
the toolbar at the top of the page to display the query editor.
Paste the following script in the Query Editor box,
then click Execute:
To view the data generated by the script, click
Tables in the menu at the top of the page, select the
checkbox next to hvac_temperatures, and then click
Browse Data
•On the Query Results page, slide to scroll to the
right. One can notice that two new attributes appear
in the hvac_temperatures table.The data in the
“temprange” column indicates whether the actual
temperature was:
 NORMAL – within 5 degrees of the target
temperature.
 COLD – more than five degrees colder than
the target temperature.
 HOT – more than 5 degrees warmer than the
target temperature.
• If the temperature is outside of the normal range,
“extremetemp” is assigned a value of 1; otherwise
its value is 0.
• Next combine the “hvac” and
“hvac_temperatures” data sets.in the sandbox
hue, click the beeswax (hive ui) icon in the
toolbar at the top of the page to display the
query editor.
• Paste the following script in the query editor
box, then click execute:create table if not
exists hvac_building as select h.*, b.country,
b.hvacproduct, b.buildingage, b.buildingmgr
from building b join hvac_temperatures h on
b.buildingid = h.buildingid;
• To view the data generated by the
script, click tables in the menu at the
top of the page, select the checkbox
next to hvac_building, and then click
browse data.
The hvac_temperatures table is displayed on the
Query Results page.
STEP 4: ACCESS THE REFINED SENTIMENT DATA WITH MICROSOFT EXCEL
• In this section, use microsoft excel
professional plus 2013 to access the
refined sentiment data.
• In windows, open a new excel
workbook, then select data > from
other sources > from microsoft query.
• On the choose data source pop-up, select
the hortonworks odbc data source that is
installed previously, then click ok.The
hortonworks odbc driver enables to access
hortonworks data with excel and other
business intelligence (bi) applications that
support odbc.
• After the connection to the sandbox is
established, the query wizard appears.
select the “hvac_building” table in the
available tables and columns box, then
click the right arrow button to add the
entire “hvac_building” table to the query.
click next to continue.
• On the filter data screen, click next to
continue without filtering the data.
• On the sort order screen, click next
to continue without setting a sort
order.
• Click finish on the query wizard finish
screen to retrieve the query data from
the sandbox and import it into excel.
• On the import data dialog box, click
ok to accept the default settings and
import the data as a table.
• The imported query data appears in
the excel workbook.
STEP 5: VISUALIZE THE SENSOR DATA USING EXCEL POWER VIEW
• Now the refined sensor data is
successfully imported into
microsoft excel, one can use the
excel power view feature to
analyze and visualize the data.
• Begin the data visualization by
mapping the buildings that are
most frequently outside of the
optimal temperature range.
• In the excel worksheet with the
imported “hvac_building” table,
select insert > power view to
open a new power view report.
• The power view fields area appears on the
right side of the window, with the data
table displayed on the left. Drag the
handles or click the pop out icon to
maximize the size of the data table.
• In the power view fields area, select the
checkboxes next to the country and
extremetemp fields, and clear all of the
other checkboxes. One may need to scroll
down to see all of the check boxes.
• In the fields box, click the down-
arrow at the right of the extremetemp
field, then select count (not blank).
• Click map on the design tab in the
top menu.
• The map view displays a global view of the data.
• One can see that the office in finland had 814 sensor readings where the
temperature was more than five degrees higher or lower than the target
temperature.
• In contrast, the german office is doing a better job maintaining ideal office
temperatures, with only 363 readings outside of the ideal range.
• Hot offices can lead to employee complaints and reduced productivity.
• Let’s see which offices run hot.
• In the power view fields area, clear the extremetemp checkbox and select the
temprange checkbox.
• Click the down-arrow at the right of the temprange field, then select add as
size.
• Drag temprange from the power view fields area to the filters box, then select
the hot checkbox.
• One can see that the buildings in finland and france run hot most often.
• Cold offices cause elevated energy expenditures and employee discomfort.
• In the filters box, clear the hot checkbox and select the cold checkbox.
• One can see that the buildings in finland and indonesia run cold most often.
• Data set includes information about the performance of five brands of hvac
equipment, distributed across many types of buildings in a wide variety of climates.
• Use this data to assess the relative reliability of the different hvac models.
• Open a new excel worksheet, then select data > from other sources > from microsoft
query to access the hvac_building table.
• Follow the same procedure as before to import the data, but this time only select the
“hvacproduct” and “extremetemp” columns.
• In the excel worksheet with the
imported “hvacproduct” and
“extremetemp” columns, select insert
> power view to open a new power
view report.
• Click the pop out icon to maximize
the size of the data table. in the
fields box, click the down-arrow at
the right of the extremetemp field,
then select count (not blank).
• Select column chart > stacked
columnin THE TOP MENU.
• Click the down-arrow next to sort by
hvacproduct in the upper left corner
of the chart area, then select count of
extremetemp.
• One can see that the gg1919 model
seems to regulate temperature most
reliably, whereas the fn39tg failed to
maintain the appropriate temperature
range 9% more frequently than the
gg1919.
• Shown how the hortonworks data
platform (hdp) can store and analyze
sensor data.
• With real-time access to massive
amounts of temperature and other
types of data on hdp, facilities
department can initiate data-driven
strategies to reduce energy
expenditures and improve employee
comfort.

More Related Content

PPTX
Analytics in IoT
PDF
Big Data: Smart Technologies Provide Big Opportunities
PPTX
Impact of big data on DCMI market
PDF
Top industry use cases for streaming analytics
PPTX
EU Data Market study. Presentation at NESSI Summit 2014 IDC & Open Evidence
PDF
IoT Big Data Analytics Insights from Patents
PDF
Leveraging Your Data Report
PDF
Big Data LDN 2018: ACCELERATING YOUR ANALYTICS JOURNEY WITH REAL-TIME AI
Analytics in IoT
Big Data: Smart Technologies Provide Big Opportunities
Impact of big data on DCMI market
Top industry use cases for streaming analytics
EU Data Market study. Presentation at NESSI Summit 2014 IDC & Open Evidence
IoT Big Data Analytics Insights from Patents
Leveraging Your Data Report
Big Data LDN 2018: ACCELERATING YOUR ANALYTICS JOURNEY WITH REAL-TIME AI

What's hot (20)

PDF
Big data analytics use cases: all you need to know
PDF
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
PDF
Let's make money from big data!
PDF
Top Ten Big Data Trends in Finance
PDF
Top Data Analytics Trends for 2019
PDF
Smart Analytics For The Utility Sector
PPTX
Big Data Expo 2015 - IBM 5 predictions
PPTX
What is Big Data?
PDF
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
PDF
Big Data Industry Insights 2015
PPTX
I 4 Petrek V2 Datalan IT Forum
PDF
13 pv-do es-18-bigdata-v3
DOCX
Predictive Analytics, Contextual Computing, and Big Data
PPTX
Big Data, Big Deal? (A Big Data 101 presentation)
PDF
uae views on big data
PPTX
Big Data, Trends,opportunities and some case studies( Mahmoud Khosravi)
PDF
Turning Big Data to Business Advantage
PDF
K1 embedding big data & analytics into the business to deliver sustainable value
PPTX
Big Data use cases in telcos
PDF
Deep Link Analytics Empowered by AI + Graph + Verticals
Big data analytics use cases: all you need to know
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
Let's make money from big data!
Top Ten Big Data Trends in Finance
Top Data Analytics Trends for 2019
Smart Analytics For The Utility Sector
Big Data Expo 2015 - IBM 5 predictions
What is Big Data?
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big Data Industry Insights 2015
I 4 Petrek V2 Datalan IT Forum
13 pv-do es-18-bigdata-v3
Predictive Analytics, Contextual Computing, and Big Data
Big Data, Big Deal? (A Big Data 101 presentation)
uae views on big data
Big Data, Trends,opportunities and some case studies( Mahmoud Khosravi)
Turning Big Data to Business Advantage
K1 embedding big data & analytics into the business to deliver sustainable value
Big Data use cases in telcos
Deep Link Analytics Empowered by AI + Graph + Verticals
Ad

Viewers also liked (20)

PDF
IBM Predictive analytics IoT Presentation
PDF
MTB03015USEN.PDF
PDF
Mtw03008 usen
PDF
The role obd in Usage Based Insurance in 2015
PDF
White Paper on IBM MTSS
PDF
Improving Energy Efficiency of Intelligent Buildings with Smart IoT Retrofits
PPS
Cel mai periculos loc turistic din lume!
PPT
Relevance 2.0
PPS
Imagini deosebite din Antarctica
PPT
Zer da hau?
PDF
Cycling Bachelors
PPS
splendori si incantare
PPTX
Up to date roadshow pres
PPT
Presentation on channel, community, content for startupbisnis
PDF
Presentation – Mobile Show Africa 2012
PPT
JavaEasyFashion
PPT
Tian Lei and his research, the user experience
PDF
Web 2.0
IBM Predictive analytics IoT Presentation
MTB03015USEN.PDF
Mtw03008 usen
The role obd in Usage Based Insurance in 2015
White Paper on IBM MTSS
Improving Energy Efficiency of Intelligent Buildings with Smart IoT Retrofits
Cel mai periculos loc turistic din lume!
Relevance 2.0
Imagini deosebite din Antarctica
Zer da hau?
Cycling Bachelors
splendori si incantare
Up to date roadshow pres
Presentation on channel, community, content for startupbisnis
Presentation – Mobile Show Africa 2012
JavaEasyFashion
Tian Lei and his research, the user experience
Web 2.0
Ad

Similar to Internet of things & predictive analytics (20)

PDF
Internet of things
PDF
The internet of things
PDF
Internet of things
PDF
Barga ACM DEBS 2013 Keynote
PDF
Sean gately internet of things
PPTX
Io t research_arpanpal_iem
PDF
Data Culture Series - Keynote - 27th Jan, London
PDF
Creating the Foundations for the Internet of Things
PDF
Earley Executive Roundtable on Data Analytics - Session 2 - Mining Business I...
PDF
Hot Technologies of 2013: Investigative Analytics
PPTX
The internet of things yabut, ma. beatrix a.
PPT
GK NU CS 101 Session 1B (1).ppt
PPTX
BetaGroup - Tech Trends in 2017, a snap shot by BetaGroup
PDF
IOT & Machine Learning
PDF
IIoT : Old Wine in a New Bottle?
PDF
iot_module4.pdf
PDF
Data Analytics Data Analytics Data Ana
PDF
CS309A Final Paper_KM_DD
PDF
hitachi-ebook-social-innovation-forbes-insights
PDF
hitachi-ebook-social-innovation-forbes-insights
Internet of things
The internet of things
Internet of things
Barga ACM DEBS 2013 Keynote
Sean gately internet of things
Io t research_arpanpal_iem
Data Culture Series - Keynote - 27th Jan, London
Creating the Foundations for the Internet of Things
Earley Executive Roundtable on Data Analytics - Session 2 - Mining Business I...
Hot Technologies of 2013: Investigative Analytics
The internet of things yabut, ma. beatrix a.
GK NU CS 101 Session 1B (1).ppt
BetaGroup - Tech Trends in 2017, a snap shot by BetaGroup
IOT & Machine Learning
IIoT : Old Wine in a New Bottle?
iot_module4.pdf
Data Analytics Data Analytics Data Ana
CS309A Final Paper_KM_DD
hitachi-ebook-social-innovation-forbes-insights
hitachi-ebook-social-innovation-forbes-insights

More from Prasad Narasimhan (20)

DOCX
Single Page Application
PPTX
PPTX
Technology needs to be disruptive
PPTX
Riseof technology
PPTX
Information as commodity
PPTX
Data visualization representation of Analytics data
PPTX
Art of creating good software
PPTX
Application of predictive analytics
PPTX
Software engineering at the speed of technology
PPTX
Challenges in adapting predictive analytics
PPTX
Predictive analytics in marketing
PPTX
Predictive analytics in financial service
PPTX
Predictive analytics in health insurance
PPTX
3D printing
PPTX
Internet of things
PPTX
360 degree view of architect
PPTX
Where business meet’s IT
PPTX
Information + insight = action
PPTX
Become a software technical architect
PPTX
What is happening in Information Technology
Single Page Application
Technology needs to be disruptive
Riseof technology
Information as commodity
Data visualization representation of Analytics data
Art of creating good software
Application of predictive analytics
Software engineering at the speed of technology
Challenges in adapting predictive analytics
Predictive analytics in marketing
Predictive analytics in financial service
Predictive analytics in health insurance
3D printing
Internet of things
360 degree view of architect
Where business meet’s IT
Information + insight = action
Become a software technical architect
What is happening in Information Technology

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Computer network topology notes for revision
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
1_Introduction to advance data techniques.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Clinical guidelines as a resource for EBP(1).pdf
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
Database Infoormation System (DBIS).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Acumen Training GuidePresentation.pptx
Foundation of Data Science unit number two notes
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

Internet of things & predictive analytics

  • 1. INTERNET OF THINGS & PREDICTIVE ANALYTICS PRASAD NARASIMHAN – TECHNICAL ARCHITECT
  • 2. INTERNET OF THINGS • Each “thing” or connected device is part of the digital shadow of a person • For there to be a market in the internet of things, two things must be true: 1) The “thing” in question must provide utility to the human, and 2) The digital shadow must provide value to an enterprise.
  • 3. MARKET • The “market” is made up of many parts :  From wearable to drivable to home and  Industrial sensors and controllers, and • Each part is made up of segments :  Innovators,  Early adopters,  Pragmatists,  Conservatives, and  Laggards across many industries.
  • 4. PREDICTIVE ANALYTICS • From the data streams that implement the “digital shadows” of people, we can use predictive analytics to understand their needs and behavior better than ever before. • Every new dimension of data increases the predictive power, enabling enterprises to answer the question “what does the human want?”
  • 5. INTERNET OF THINGS & PREDICTIVE ANALYTICS • Transforming the internet of things and its sibling, predictive analytics, to be programmable by the same labor pool that has developed the apps which drove the mobile revolution makes basic economic sense. • Types of data generated by the internet of things is coupled with :  data analysis  data discovery tools and  techniques to help business leaders identify emerging developments such as machines that might need maintenance :  to prevent costly breakdowns or  sudden shifts in customer or  market conditions that might signal some action a company should take.
  • 6. • The internet of things, the physical world will become a networked information system—through sensors and actuators embedded in real physical objects and linked through wired and wireless networks via the internet protocol. • This holds special value for manufacturing:  The potential for connected physical systems to improve productivity in the production process and  The supply chain is huge. • Consider processes that govern themselves, where smart products can take corrective action to avoid damages and where individual parts are automatically replenished. • Such technologies already exist and could drive the fourth industrial revolution— following the steam engine, the conveyor belt (assembly line - think ford model t), and the first phase of it and automation technology.
  • 7. EG 1 : AUTO INSURANCE • The first-order vector was a connected accelerometer offered to drivers :  to improve their insurance rates based on proven “safe driving” habits. • Through this digital shadow, the insurance provider can make much better actuarial predictions than through the coarse-grained data they had before  age,  gender, and  traffic violations. • This is interesting in the same way the blackberry was interesting - a basic capability adopted for basic business improvement.
  • 8. • The second-order vector is much stronger :  the ability to transform the insurance market to better meet the needs of customers while changing the rules of competition.  based on real-time driving information insurance companies can :  move to a real-time spot-pricing model driven by an exchange (not unlike the stock exchange),  bidding on drivers and  providing insurance on demand. Not driving today? Don’t pay for insurance. Need to drive fast tomorrow? Pay a little more but don’t worry about your “permanent record”. • These outcomes are all based on tying the internet of things to predictive analytics.
  • 9. EG 2 : HEALTH CARE • The first-order vector is similar, a wearable accelerometer offered to patients :  To improve traceability of their compliance with their exercise prescription,  Enabling better outcomes for cardiac patients.  Unlike prescription refills, exercise compliance has been untraceable before, so this digital shadow is a breakthrough for medicine. • Similar developments exist in digestible sensors within medications :  which activate only on contact with stomach acid,  providing higher truth and  better granularity than a monthly refill.
  • 10. • In second-order vector in healthcare ,the ability to combine multiple streams of information that were previously invisible has the potential to drive better health outcomes through provably higher patient compliance. • Sorting these data streams at scale will allow health providers and health insurance companies to rapidly iterate health protocols across a population of humans, augmenting human expertise with predictive analytics. • Outcome-based analysis based on predictive models built from data can reduce :  waste,  error rates, and  lawsuits while driving better margins. • Larger exchanges of this type of data will tend to :  perform better,  creating a more effective market and  a better pool of empirical research for science.
  • 11. EG 3 : AUTO COMPANIES • They have installed thousands of "black boxes" inside their prototype and field testing vehicles to capture second by second data from the dozens of control units which manage today's automobiles. • These boxes simply plug into the vehicle's on-board diagnostic (obd) port which is typically located under the front dashboard of all cars. • They collect 500-750 different vehicle performance parameters that add up to terabytes of data in hours!
  • 12. • The intent of the automakers for installing these boxes is to collect data which their engineers can later analyze to fix bugs and improve on existing designs. • For example, one car manufacturer found out from this data that their minivan batteries would end up in a recall.  The problem was an underpowered alternator - it was not able to fully recharge the batteries because the most common drive cycle for this particular minivan (think soccer mom taking kid to practice) was less than 3 miles.  As a result, there appeared to be a lot of complaints about dead batteries and the company was potentially facing the recall of millions of minivans which had this alternator.  The boxes collect information about driving cycles and this data was really useful in understanding the real reason behind the dead batteries.  The test vehicles which had short drive cycles were the ones which reported dead batteries! simply changing the alternator to higher capacity could fix the problem.  Now it was an easy fix to extend this solution to the entire fleet.
  • 13. ENDLESS OPPORTUNITY • The opportunities are literally endless,  Ranging from early fault detection (predicting when a particular component is likely to fail)  To automatically adjusting driving route based on traffic pattern predictions. • The ultimate test of predictive analytics in the internet of things is of course fully autonomous systems, such as :  the nissan car of 2020 or  the google self driving car of today. • In the end all autonomous systems will need the ability to build predictive capabilities - in other words, machines must learn machine learning!
  • 14. EG 4 : GOOGLE’S SELF DRIVING CAR • Google claims that their self-driving car of today has logged more than 300,000 miles with almost zero incidence of accidents. • The one time a minor crash did occur was when the car was rear-ended by a human-driven car! • So, when the technology is fully mature, it is not just parking valets who become obsolete, other higher paying professions such as automotive safety systems experts may also need to look for other options! • Predictive analytics is the enabler that will make this happen.
  • 15. EG 5 : JET AIRLINER • A jet airliner generates 20 terabytes of diagnostic data per hour of flight. • The average oil platform has 40,000 sensors, generating data 24/7. • M2M is now generating enormous volumes of data and is testing the capabilities of traditional database technologies. • To extract rich, real-time insight from the vast amounts of machine-generated data, companies will have to build a technology foundation with speed and scale because raw data, whatever the source, is only useful after it has been transformed into knowledge through analysis. • Investigative analytics tools enable interactive, ad-hoc querying on complex big data sets to identify patterns and insights and can perform analysis at massive scale with precision even as machine-generated data grows beyond the petabyte scale
  • 16. • With investigative analytics, companies can take action  In response to events in real-time and  Identify patterns to either capitalize on or  Prevent an event in the future. • This is especially important because most failures result from a confluence of multiple factors, not just a single red flag. • To fully address the influx of M2M data generated by the increasingly connected internet of things landscape, companies can deploy a range of technologies to leverage distributed processing frameworks like hadoop and nosql  and improve performance of their analytics,  including enterprise data warehouses,  analytic databases,  data visualization, and  business intelligence tools. • These can be deployed in any combination of :  on-premise software,  appliance, or  in the cloud.
  • 17. FINDING RIGHT ANALYTICS DATABASE TECHNOLOGY • To find the right analytics database technology to capture, connect, and drive meaning from data, companies should consider the following requirements:  Real-time Analysis : Businesses can’t afford for data to get stale. Data solutions need to :  load quickly and easily,  and must dynamically query,  analyze, and  communicate m2m information in real-time, without huge investments in it administration, support, and tuning.  Flexible Querying And Ad-hoc Reporting : When intelligence needs to change quickly, analytic tools can’t  be constrained by data schemas that limit the number and  type of queries that can be performed. This type of deeper analysis also cannot be constrained by tinkering or time-consuming manual configuration (such as indexing and managing data partitions) to create and change analytic queries.
  • 18.  Efficient Compression : Efficient data compression is key to enabling M2M data management within :  A network node,  Smart device, or  Massive data center cluster. Better compression allows :  For less storage capacity overall,  As well as tighter data sampling and  Longer historical data sets,  Increasing the accuracy of query results.  Ease Of Use And Cost : Data analysis must be :  Affordable,  Easy-to-use, and  Simple to implement in order to justify the investment. This demands low-touch solutions that are optimized to deliver :  Fast analysis of large volumes of data,  With minimal hardware,  Administrative effort, and  Customization needed to set up or  Change query and reporting parameters.
  • 19. EG 6 : UNION PACIFIC RAILROAD • The railroad is using sensor and analytics technologies to predict and prevent train derailments, • For example, the company has placed infrared sensors on every 20 miles of its tracks to gather 20 million temperature readings of train wheels each day to look for signs of overheating, which is a sign of impending failure. • Meanwhile, trackside microphones are used to pick up “growling” bearings in the wheels. • Data from such physical measurements are sent via fiber optic lines to union pacific’s data centers. • Complex pattern-matching algorithms and analytics are used to identify irregularities, allowing union pacific experts to determine within minutes of capturing the data whether a driver should pull a train over for inspection or reduce its speed until it reaches the next station to be repaired.
  • 20. HOW TO ANALYZE MACHINE AND SENSOR DATA • This tutorial describes how to refine data from heating, ventilation, and air conditioning (hvac) systems in 20 large buildings around the world using the hortonworks data platform, and how to analyze the refined sensor data to maintain optimal building temperatures. • Sensor data A sensor is a device that measures a physical quantity and transforms it into a digital signal. sensors are always on, capturing data at a low cost, and powering the “internet of things.” • Potential uses of sensor data  Sensors can be used to collect data from many sources, such as:  To monitor machines or infrastructure such as ventilation equipment, bridges, energy meters, or airplane engines. This data can be used for predictive analytics, to repair or replace these items before they break.  To monitor natural phenomena such as meteorological patterns, underground pressure during oil extraction, or patient vital statistics during recovery from a medical procedure.
  • 21. • Prerequisites:  Hortonworks sandbox (installed and running)  Hortonworks odbc driver installed and configured  Microsoft excel 2013 professional plus • Notes:  In this tutorial, the hortonworks sandbox is installed on an oracle virtualbox virtual machine (vm) – your screens may be different.  Install the odbc driver that matches the version of excel you are using (32-bit or 64- bit).  In this tutorial, use the power view feature in microsoft excel 2013 to visualize the sensor data. Power view is currently only available in microsoft office professional plus and microsoft office 365 professional plus.  Note, other versions of excel will work, but the visualizations will be limited to charts. One can connect to any other visualization tool that one like.
  • 22. • Overview To refine and analyze hvac sensor data, :  Download and extract the sensor data files.  Load the sensor data into the hortonworks sandbox.  Run two hive scripts to refine the sensor data.  Access the refined sensor data with microsoft excel.  Visualize the sensor data using excel power view.
  • 23. STEP 1: DOWNLOAD AND EXTRACT THE SENSOR DATA FILES • Download the sample sensor data contained in a compressed (.zip) folder from sensorfiles.zip • Save the sensorfiles.zip file to the computer, then extract the files. One should see a sensorfiles folder that contains the following files:  hvac.csv – contains the targeted building temperatures, along with the actual (measured) building temperatures.  The building temperature data was obtained using apache flume.  Flume can be used as a log aggregator, collecting log data from many diverse sources and moving it to a centralized data store.  In this case, flume was used to capture the sensor log data, which we can now load into the hadoop distributed file system (hfds).  building.csv – contains the “building” database table.  Apache sqoop can be used to transfer this type of data from a structured database into hfds.
  • 24. STEP 2: LOAD THE SENSOR DATA INTO THE HORTONWORKS SANDBOX • Open the sandbox hue and click the hcatalog icon in the toolbar at the top of the page, then click create a new table from a file. • On the “create a new table from a file” page, type “hvac” in the table name box, then click choose a file under the input file box.
  • 25. • On the “choose a file” pop-up, click upload a file. • Use the file upload dialog to browse to the sensorfiles folder that was extracted previously. • Select the hvac.csv file, then click open.
  • 26. • on the “choose a file” pop-up, click the hvac.csv file. • The default settings on the “create a new table from a file” page are correct for this file, scroll down to the bottom of the page and click create table
  • 27. • A progress indicator appears while the table is being created • When the table has been created, it appears in the hcatalog table list.
  • 28. • Repeat the previous steps to create a “building” table by uploading the building.csv file.  Now let’s take a look at the two data tables.  On the hcatalog table list page, select the check box next to the “hvac” table, then click browse data.  One can see that the “hvac” table includes :  columns for date,  time,  the target temperature,  the actual temperature,  the system identifier,  the system age, and  the building id.
  • 29. • Navigate back to the hcatalog table list page. • Select the check box next to the “building” table, then click browse data. • One can see that the “building” table includes columns for the building identifier, the building manager, the building age, the hvac product in the building, and the country in which the building is located.
  • 30. STEP 3: RUN TWO HIVE SCRIPTS TO REFINE THE SENSOR DATA • Now use two hive scripts to refine the sensor data. • We hope to accomplish three goals with this data :  Reduce heating and cooling expenses.  Keep indoor temperatures in a comfortable range between 65-70 degrees.  Identify which hvac products are reliable, and replace unreliable equipment with those models.  First, identify whether the actual temperature was more than five degrees different from the target temperature. In the sandbox hue, click the beeswax (hive ui) icon in the toolbar at the top of the page to display the query editor.
  • 31. Paste the following script in the Query Editor box, then click Execute: To view the data generated by the script, click Tables in the menu at the top of the page, select the checkbox next to hvac_temperatures, and then click Browse Data
  • 32. •On the Query Results page, slide to scroll to the right. One can notice that two new attributes appear in the hvac_temperatures table.The data in the “temprange” column indicates whether the actual temperature was:  NORMAL – within 5 degrees of the target temperature.  COLD – more than five degrees colder than the target temperature.  HOT – more than 5 degrees warmer than the target temperature. • If the temperature is outside of the normal range, “extremetemp” is assigned a value of 1; otherwise its value is 0.
  • 33. • Next combine the “hvac” and “hvac_temperatures” data sets.in the sandbox hue, click the beeswax (hive ui) icon in the toolbar at the top of the page to display the query editor. • Paste the following script in the query editor box, then click execute:create table if not exists hvac_building as select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr from building b join hvac_temperatures h on b.buildingid = h.buildingid;
  • 34. • To view the data generated by the script, click tables in the menu at the top of the page, select the checkbox next to hvac_building, and then click browse data. The hvac_temperatures table is displayed on the Query Results page.
  • 35. STEP 4: ACCESS THE REFINED SENTIMENT DATA WITH MICROSOFT EXCEL • In this section, use microsoft excel professional plus 2013 to access the refined sentiment data. • In windows, open a new excel workbook, then select data > from other sources > from microsoft query. • On the choose data source pop-up, select the hortonworks odbc data source that is installed previously, then click ok.The hortonworks odbc driver enables to access hortonworks data with excel and other business intelligence (bi) applications that support odbc.
  • 36. • After the connection to the sandbox is established, the query wizard appears. select the “hvac_building” table in the available tables and columns box, then click the right arrow button to add the entire “hvac_building” table to the query. click next to continue. • On the filter data screen, click next to continue without filtering the data.
  • 37. • On the sort order screen, click next to continue without setting a sort order. • Click finish on the query wizard finish screen to retrieve the query data from the sandbox and import it into excel.
  • 38. • On the import data dialog box, click ok to accept the default settings and import the data as a table. • The imported query data appears in the excel workbook.
  • 39. STEP 5: VISUALIZE THE SENSOR DATA USING EXCEL POWER VIEW • Now the refined sensor data is successfully imported into microsoft excel, one can use the excel power view feature to analyze and visualize the data. • Begin the data visualization by mapping the buildings that are most frequently outside of the optimal temperature range. • In the excel worksheet with the imported “hvac_building” table, select insert > power view to open a new power view report.
  • 40. • The power view fields area appears on the right side of the window, with the data table displayed on the left. Drag the handles or click the pop out icon to maximize the size of the data table. • In the power view fields area, select the checkboxes next to the country and extremetemp fields, and clear all of the other checkboxes. One may need to scroll down to see all of the check boxes.
  • 41. • In the fields box, click the down- arrow at the right of the extremetemp field, then select count (not blank). • Click map on the design tab in the top menu.
  • 42. • The map view displays a global view of the data. • One can see that the office in finland had 814 sensor readings where the temperature was more than five degrees higher or lower than the target temperature. • In contrast, the german office is doing a better job maintaining ideal office temperatures, with only 363 readings outside of the ideal range.
  • 43. • Hot offices can lead to employee complaints and reduced productivity. • Let’s see which offices run hot. • In the power view fields area, clear the extremetemp checkbox and select the temprange checkbox. • Click the down-arrow at the right of the temprange field, then select add as size.
  • 44. • Drag temprange from the power view fields area to the filters box, then select the hot checkbox. • One can see that the buildings in finland and france run hot most often.
  • 45. • Cold offices cause elevated energy expenditures and employee discomfort. • In the filters box, clear the hot checkbox and select the cold checkbox. • One can see that the buildings in finland and indonesia run cold most often.
  • 46. • Data set includes information about the performance of five brands of hvac equipment, distributed across many types of buildings in a wide variety of climates. • Use this data to assess the relative reliability of the different hvac models. • Open a new excel worksheet, then select data > from other sources > from microsoft query to access the hvac_building table. • Follow the same procedure as before to import the data, but this time only select the “hvacproduct” and “extremetemp” columns.
  • 47. • In the excel worksheet with the imported “hvacproduct” and “extremetemp” columns, select insert > power view to open a new power view report. • Click the pop out icon to maximize the size of the data table. in the fields box, click the down-arrow at the right of the extremetemp field, then select count (not blank).
  • 48. • Select column chart > stacked columnin THE TOP MENU. • Click the down-arrow next to sort by hvacproduct in the upper left corner of the chart area, then select count of extremetemp.
  • 49. • One can see that the gg1919 model seems to regulate temperature most reliably, whereas the fn39tg failed to maintain the appropriate temperature range 9% more frequently than the gg1919. • Shown how the hortonworks data platform (hdp) can store and analyze sensor data. • With real-time access to massive amounts of temperature and other types of data on hdp, facilities department can initiate data-driven strategies to reduce energy expenditures and improve employee comfort.