SlideShare a Scribd company logo
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Data complexity: variety and velocity
Petabytes
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Massive Compute
and Storage
Deployment
expertise
Data of all Volume
Variety, Velocity
Speed Scale Economics
Always Up,
Always On Open and flexible
Time to value
Big Data
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
• Microsoft’s cloud Hadoop offering
• 100% open source Apache Hadoop
• Built on the latest releases for Hadoop
• Up and running in minutes with no hardware to deploy
• .NET and Java skills and deep integration to Visual Studio
• Utilize familiar BI tools for analysis including Microsoft Excel
• 99.9% Enterprise Service Level Agreement
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
Microsoft contribution to
Apache code
Data Node Data Node Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
Name Node
Job Tracker
HMaster
Coordination
Region Server Region Server Region Server Region Server
• Random, fast (realtime) read/write access to your Big Data.
• Host very large tables (billions of rows X millions of columns) on clusters of
commodity hardware.
• Runs on top of the Hadoop Distributed File System (HDFS)
• Provides flexibility in that new columns can be added to column families at any
time
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
Stream
processin
g
Search and query
Data analytics (Excel)
Web/thick client
dashboards
Devices to take action
RabbitMQ /
ActiveMQ
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
• Single execution model for multiple tasks (SQL queries, Streaming, Machine
Learning, and Graph)
• Processing up to 100x faster performance
• Developer friendly (Java, Python, Scala)
• BI tool of choice (Power BI, Tabelau, Qlik, SAP)
• Notebook experience (Jupyter/iPython, Zeppelin)
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
Spark SQL Spark
Streaming
Machine
Learning
Graph
HDInsight
Script SQL NoSQL StreamingBatch
Map
reduce
In Memory
Core Engine
Spark for Azure HDInsight
In-memory computation engine – Fully managed
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
• Managed & supported by Microsoft
• Familiarity of Windows
• Re-use common tools, documentation, samples from Hadoop/Linux ecosystem
• Add Hadoop projects that were authored on Linux to HDInsight
• Easier transition from on-premise to cloud
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Partner Spotlight: AtScale
Analysts Use Traditional BI Tools Against HDInsight
• HDFS For the Cloud
• Unlimited Storage, Petabyte Files
• Optimized for Massive Throughput
• High frequency, low latency, read immediately
• Managed and secured
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
PB
TB GB
PB
TB
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Neudesic partnered with one of the nation's largest utility companies that recently
deployed Smart Utility Meters for power customers, nearly a million meters sending
usage data every 15 minutes.
The result: an Azure hybrid big data processing solution that enabled the customer
to perform gap analytics: a process for identifying gaps that exist in the power
usage readings, over 7x faster than their previous solution! Billions of Smart Meter
reads get processed to identify the nature and duration of the gaps to mitigate
revenue losses.
Smart Meters Business Rules
Processing
BI Layer
Blob Storage
HDInsightInput Processed Output data
ELT
Local SQL DB for Customer
and other confidential data
Extract processed data from
blob storage
AZCopy
AZCopy SSIS
Input files
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Big Data in Retail
• Clickstream analytics
• Online recommendation engine
• 360° view of the customer
• Analyze brand sentiment
• Localized, personalized promotions
• Optimal store layout
Leading computer
manufacturer in world
• Use clickstream to deliver custom
website ecommerce experience
• Targeted ads for abandoned carts
• Use unstructured data from
website and social for data mining
• Combine w/sales data for 360 view
• Gather data from table-side
devices at restaurants
• Predict promotions/offers and
content to upsell to guests
• Gather social media sentiment
from customer feedback
• Combined with POS data, can
determine right product mix
Leading Multi-national
Retailer
• Track weather information
(temperature/forecast) to predict
shelf space for different seasons
• Sentiment analysis on feedback
Leading clothing online
retailer
• Use clickstream to understand who
is viewing their site
• Building recommendation engine
based on users’ clickpaths
Ziosk turned to Microsoft gold partner, Artis
Consulting to deploy a hybrid deployment
consisting of the Analytics Platform System, Azure
HDInsight, Power BI, and Azure Machine Learning
“Until now, we haven’t had the ability to
optimize the guest experience based on
their specific interactions with the devices.
With Azure, we can close the loop.”
Kevin Mowry
Ziosk
Chief Software Architect
Big Data in Health
• Predictive Analysis of Patient Health
& Clinical Decision Support
• Population, risk, and Care
management
• Real-time quality measures to assist
providers w/regulatory requirements
• Medical research data (eg. genomics)
• Recruit cohorts for pharmaceutical
trials
• Process large volumes of data from
any healthcare provider EHR
system
• Assist in showing compliance
• Store 7-30 years of data to meet
audit requirements
• Scan handwritten notes and do
natural language processing
• Analyze if symptoms might map to
bigger outbreak
• Collect clinical trial data (from
automated equipment, sensors)
• Find patterns on this data
(chemical compositions, enzymes)
• Process 6 years worth of data in a
few hours without any
infrastructure
Big Data Financial
Services
• New account risk screens
• Fraud prevention
• Trading risk
• Maximize deposit spread
• Insurance underwriting
• Accelerate loan processing
• Actively monitor currencies used by UK
manufacturers in supply chain to do risk analysis
• Monitor UK GDP to help customers stay on top of
economic trends
• Needed to handle increasing amounts of finance,
compliance, and legal data from trading operations
• Trading data drives strategic decisions
• Track customer feedback on social media and on
their blog posts/website to understand loyalty
• Predict at-risks clients to reach out to
• Process data for actuaries to analyze results to
understand risks for insurance companies
• Milliman’s application understands relationships
between people, process, and technology to
manage risk
Tangerine partners with Microsoft to build a
solution with Analytics Platform System for the
data warehouse and uses PolyBase to query Azure
HDInsight in the cloud.
“With pre-built integration using PolyBase
to query both the relational data
warehouse and Hadoop in the cloud, the
solution will allow us to reap the benefits of
both relational and non-relational data
regardless of where it lives.”
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
http://azure.microsoft.com/en-us/documentation/services/hdinsight/
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map/
http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-big-data
http://channel9.msdn.com/Shows/Data-Exposed
http://azure.microsoft.com/en-us/pricing/free-trial/
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Applications
Web and
Social
Devices
Sensors
Queryable
Table
Hive
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Head node Name node
Data nodes/task nodes
JDBCODBC Query
Console
Metastore
Thrift server
Command
line
interface
(CLI)
Compiler, Optimizer, Executor
Hadoop
Hive
Visual
Studio
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Scale up/Scale out
Tez
Partitioning
ORCFile
Vectorization -
10
20
30
40
50
60
70
80
90
100
TPCH1 (1TB data) Latency in minutes)
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
TransformationCollection Presentation
and action
Event Queuing
System
Long-term
storage
Search and query
Data analytics (Excel)
Web/thick client
dashboards
Devices to take action
Event hub
Event
producers
Applications
Web and social
Devices
Sensors
Live Dashboards
Apache HBase on
HDInsight
DocumentDB
Solr Azure
Search
MongoDB SQL
Cloud gateways
(web APIs)
Field
gateways
Kafka/RabbitMQ/
ActiveMQ
Event hubs
Azure ML
Storage
adapters
Stream processing
52
Storm
HDInsight
Stream
Analytics
Storm Essentials
53
54
Easy to program
A distributed real time
processing platform
Fault Tolerant
Failure is expected, and
embraced
Fast
Clocked at 1M+
messages per second per
node
Scalable
Thousands of workers per
cluster
Reliable
Guaranteed message
delivery Exactly-once
semantics
Streaming data
analysis
Storm Essentials
55
Unbounded sequence
of Tuples
Core unit of data
Immutable set of
key/value pair
Source of streams
Wraps a streaming
data source and emits
Tuples
56
Spout
{…}
Tuple
{…} {…} {…} {…} {…} {…}
Stream
Write to a
data store
Read from a
data store
Perform arbitrary
computation
(Optionally) Emit
additional streams
Core functions of a streaming computation | Receive tuples and do stuff
Compute
57
Storm Essentials
58
59
60
Cloud gateways
Data
Generator
Counter
Bolt
Aggregate
Writer
Bolt
Live dashboard
Storm Essentials
61
62
Managed services
Open source platform
Scale-up and scale-down
Event Hub
Visual Studio
Azure
HBase, SQL Database,
DocumentDB
Speed
Analyse millions of messages
per second
63
Support for authoring Storm Topologies
Create Storm projects from available template
Submit a topology with C# bolts/spouts
Submit Topologies containing Java spouts/bolts
Monitor Topologies within VS
Troubleshoot Topologies
In essence, you never need to leave Visual Studio for Storm Projects
Storm on HDInsight Azure Stream Analytics
Management &
Operations
Service Managed Cluster Managed Service
Price Link to the pricing page. Link to the pricing page.
Microsoft Supported Yes Yes
Open Source Yes No
Development
Experience
SQL DSL No Yes
Extensible Yes No
Temporal Operators
No. Customer write custom
code Yes
Authorig/Debugging Tools via Visual Studio
Interactive authoring and debugging via azure
portal
Input/Output
Data Ingress No restriction Event Hub, Azure Blobs
Data egress No restriction
Support to write data to Event Hubs, Blob
store, azure table, azure sql db, Powerbi
Supports Multiple Inputs Yes Yes
Generate Multiple Outputs Yes Yes
Data Format No restriction Avro, JSON,CSV
Performance
Scalability Yes Yes
Elastic Scale Yes Yes
Technology comparison
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Web App
Devices
Streaming
Service
Batch Analytics
HBase Hadoop
Web App
HBase
Twitter Spout
Sentiment
Indexer
Broadcaster
Counter
Writer
SignalR
Storm
HBase: The Definitive Guide
Online HBase Book
https://github.com/hdinsight/hbase-sdk-for-net
https://github.com/maxluk/tweet-sentiment
Get started using HBase in HDInsight
Tutorial: Building Tweet Sentiment App
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
SUMMARY
See a deployment summary
VIRTUAL NETWORK
Configure your virtual network
HDINSIGHT CLUSTER
Configure your cluster
{ISV App Name}
1 BASICS
Configure your app
1. Basics
Windows Server 2012 R2 Datacenter
Hadoop
Create
Basics
SUMMARY
See a deployment summary
VIRTUAL NETWORK
Configure your virtual network
HDINSIGHT CLUSTER
Configure your cluster
New HDInsight Cluster
1 BASICS
Configure your app
1. Basics
Windows Server 2012 R2 Datacenter
Hadoop
Create
New HDInsight Cluster
Hadoop
mycluster001
HDInsight_Telemetry
myresourcegroup
Linux (Ubuntu 12.04 LTS)
SUMMARY
See a deployment summary
VIRTUAL NETWORK
Configure your virtual network
HDINSIGHT CLUSTER
Configure your cluster
New HDInsight Cluster
1 BASICS
Configure your app
1. Basics
Windows Server 2012 R2 Datacenter
Hadoop
Create
New Virtual Network
SUMMARY
See a deployment summary
VIRTUAL NETWORK
Configure your virtual network
HDINSIGHT CLUSTER
Configure your cluster
New HDInsight Cluster
1 BASICS
Configure your app
1. Basics
Windows Server 2012 R2 Datacenter
Hadoop
Create
Summary
Cluster name newcluster001
Cluster type Storm
Cluster operating system Linux (Ubuntu 12.04 LTS)
Cluster data source (new) storage001 (Azure Storage)
Head nodes: 2 nodes (D12)
Worker nodes: 4 nodes (D14)
Zookeeper nodes: 3 nodes (D12)
Metastores selected: Hive: yes, Oozie: no
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1

More Related Content

PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PDF
Machine Learning for z/OS
PPTX
SQL Server on Linux - march 2017
PPTX
Accelerating Data Warehouse Modernization
PPTX
Real-time Data Pipelines with SAP and Apache Kafka
PPTX
Innovation in the Enterprise Rent-A-Car Data Warehouse
PPTX
Apache Kudu: Technical Deep Dive


IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Machine Learning for z/OS
SQL Server on Linux - march 2017
Accelerating Data Warehouse Modernization
Real-time Data Pipelines with SAP and Apache Kafka
Innovation in the Enterprise Rent-A-Car Data Warehouse
Apache Kudu: Technical Deep Dive



What's hot (20)

PPTX
Insights into Real-world Data Management Challenges
PPTX
Scaling Data Science on Big Data
PPTX
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
PDF
Key trends in Big Data and new reference architecture from Hewlett Packard En...
PPTX
Db2 analytics accelerator on ibm integrated analytics system technical over...
PDF
Big Data Ready Enterprise
PPTX
Preventative Maintenance of Robots in Automotive Industry
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
PPTX
Boost Performance with Scala – Learn From Those Who’ve Done It!
PPTX
Build Big Data Enterprise Solutions Faster on Azure HDInsight
PPTX
Exploring microservices in a Microsoft landscape
PPTX
Big Data on Azure Tutorial
PPTX
Understanding the IBM Power Systems Advantage
PPTX
Analyzing the World's Largest Security Data Lake!
PDF
Simplifying Big Data Integration with Syncsort DMX and DMX-h
PPTX
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
PPTX
Insights into Real World Data Management Challenges
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
PDF
2015 nov 27_thug_paytm_rt_ingest_brief_final
PPTX
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
Insights into Real-world Data Management Challenges
Scaling Data Science on Big Data
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Db2 analytics accelerator on ibm integrated analytics system technical over...
Big Data Ready Enterprise
Preventative Maintenance of Robots in Automotive Industry
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Boost Performance with Scala – Learn From Those Who’ve Done It!
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Exploring microservices in a Microsoft landscape
Big Data on Azure Tutorial
Understanding the IBM Power Systems Advantage
Analyzing the World's Largest Security Data Lake!
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Insights into Real World Data Management Challenges
High Performance Spatial-Temporal Trajectory Analysis with Spark
2015 nov 27_thug_paytm_rt_ingest_brief_final
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
Ad

Viewers also liked (20)

PPTX
Cortana Analytics Suite
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PPTX
Azure Stream Analytics
PDF
IBM Watson Analytics Presentation
PPTX
Georgia Azure Event - Scalable cloud games using Microsoft Azure
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
PDF
OpenPOWER Roadmap Toward CORAL
PDF
The State of Linux Containers
PDF
OpenPOWER Update
PDF
IBM POWER8 as an HPC platform
PDF
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
PPTX
Bitcoin explained
PPTX
Blockchain
PDF
Oracle Solaris Software Integration
PDF
Open Innovation with Power Systems
PDF
IBM Power8 announce
PPTX
Puppet + Windows Nano Server
PPTX
Expert summit SQL Server 2016
PDF
Oracle Solaris Secure Cloud Infrastructure
PDF
Oracle Solaris Build and Run Applications Better on 11.3
Cortana Analytics Suite
Modern Data Warehousing with the Microsoft Analytics Platform System
Azure Stream Analytics
IBM Watson Analytics Presentation
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Accelerating Business Intelligence Solutions with Microsoft Azure pass
OpenPOWER Roadmap Toward CORAL
The State of Linux Containers
OpenPOWER Update
IBM POWER8 as an HPC platform
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Bitcoin explained
Blockchain
Oracle Solaris Software Integration
Open Innovation with Power Systems
IBM Power8 announce
Puppet + Windows Nano Server
Expert summit SQL Server 2016
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Build and Run Applications Better on 11.3
Ad

Similar to Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1 (20)

PPTX
NYC Data Amp - Microsoft Azure and Data Services Overview
PPTX
Hadoop in the Cloud: Common Architectural Patterns
PPTX
Big Data Analytics .pptx
PPTX
Microsoft cloud big data strategy
PPTX
Build Big Data Enterprise solutions faster on Azure HDInsight
PPTX
UTAD - Jornadas de Informática - Potential of Big Data
PPTX
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
PPTX
PDF
Cortana Analytics Workshop: Big Data @ Microsoft
PPTX
How does Microsoft solve Big Data?
PDF
Transforming Business in a Digital Era with Big Data and Microsoft
PDF
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
PDF
Big Data Analytics from Azure Cloud to Power BI Mobile
PPTX
Big Data: It’s all about the Use Cases
PPTX
New big data architecture in hadoop.pptx
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
PDF
Introduction Machine Learning - Microsoft
PPTX
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
PPTX
CC -Unit4.pptx
NYC Data Amp - Microsoft Azure and Data Services Overview
Hadoop in the Cloud: Common Architectural Patterns
Big Data Analytics .pptx
Microsoft cloud big data strategy
Build Big Data Enterprise solutions faster on Azure HDInsight
UTAD - Jornadas de Informática - Potential of Big Data
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Cortana Analytics Workshop: Big Data @ Microsoft
How does Microsoft solve Big Data?
Transforming Business in a Digital Era with Big Data and Microsoft
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data: It’s all about the Use Cases
New big data architecture in hadoop.pptx
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Introduction Machine Learning - Microsoft
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
5 Comparing Microsoft Big Data Technologies for Analytics
CC -Unit4.pptx

More from MSAdvAnalytics (20)

PDF
Cortana Analytics Workshop: Predictive Maintenance in the IoT Era
PDF
Cortana Analytics Workshop: Cortana Analytics for Retail
PDF
Cortana Analytics Workshop: Cortana Analytics for Marketing
PDF
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
PDF
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
PDF
Cortana Analytics Workshop: Azure Data Catalog
PDF
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
PDF
Cortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
PDF
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
PDF
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
PDF
Cortana Analytics Workshop: Developing for Power BI
PDF
Cortana Analytics Workshop: Milliman Integrate for Cortana Analytics
PDF
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
PDF
Cortana Analytics Workshop: Azure Data Lake
PDF
Cortana Analytics Workshop: Using the Cortana Analytics Process
PDF
Cortana Analytics Workshop: Building Next-Generation Smart Grids
PDF
Cortana Analytics Workshop: Deep Neural Networks
PDF
Cortana Analytics Workshop: AI -- Assistive Intelligence
PDF
Cortana Analytics Workshop: Power BI 2.0
PDF
Cortana Analytics Workshop: Demystifying Cortana Analytics
Cortana Analytics Workshop: Predictive Maintenance in the IoT Era
Cortana Analytics Workshop: Cortana Analytics for Retail
Cortana Analytics Workshop: Cortana Analytics for Marketing
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Connecting Cortana Analytics Faster -- Any Source...
Cortana Analytics Workshop: Real-World Data Collection for Cortana Analytics
Cortana Analytics Workshop: Insights and Predictions -- Integrating and Deplo...
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Milliman Integrate for Cortana Analytics
Cortana Analytics Workshop: Intelligent Retail -- The Machine Learning Approach
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Using the Cortana Analytics Process
Cortana Analytics Workshop: Building Next-Generation Smart Grids
Cortana Analytics Workshop: Deep Neural Networks
Cortana Analytics Workshop: AI -- Assistive Intelligence
Cortana Analytics Workshop: Power BI 2.0
Cortana Analytics Workshop: Demystifying Cortana Analytics

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Foundation of Data Science unit number two notes
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Introduction to Business Data Analytics.
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Quality review (1)_presentation of this 21
Data_Analytics_and_PowerBI_Presentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Galatica Smart Energy Infrastructure Startup Pitch Deck
Foundation of Data Science unit number two notes
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Major-Components-ofNKJNNKNKNKNKronment.pptx
Launch Your Data Science Career in Kochi – 2025
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Acumen Training GuidePresentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Business Data Analytics.
Introduction to Knowledge Engineering Part 1
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21

Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1

  • 5. Data complexity: variety and velocity Petabytes
  • 7. Massive Compute and Storage Deployment expertise Data of all Volume Variety, Velocity Speed Scale Economics Always Up, Always On Open and flexible Time to value
  • 9. HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  • 10. • Microsoft’s cloud Hadoop offering • 100% open source Apache Hadoop • Built on the latest releases for Hadoop • Up and running in minutes with no hardware to deploy • .NET and Java skills and deep integration to Visual Studio • Utilize familiar BI tools for analysis including Microsoft Excel • 99.9% Enterprise Service Level Agreement HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  • 11. HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine Microsoft contribution to Apache code
  • 12. Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Name Node Job Tracker HMaster Coordination Region Server Region Server Region Server Region Server • Random, fast (realtime) read/write access to your Big Data. • Host very large tables (billions of rows X millions of columns) on clusters of commodity hardware. • Runs on top of the Hadoop Distributed File System (HDFS) • Provides flexibility in that new columns can be added to column families at any time HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  • 13. Stream processin g Search and query Data analytics (Excel) Web/thick client dashboards Devices to take action RabbitMQ / ActiveMQ HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  • 14. • Single execution model for multiple tasks (SQL queries, Streaming, Machine Learning, and Graph) • Processing up to 100x faster performance • Developer friendly (Java, Python, Scala) • BI tool of choice (Power BI, Tabelau, Qlik, SAP) • Notebook experience (Jupyter/iPython, Zeppelin) HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine Spark SQL Spark Streaming Machine Learning Graph HDInsight Script SQL NoSQL StreamingBatch Map reduce In Memory Core Engine
  • 15. Spark for Azure HDInsight In-memory computation engine – Fully managed
  • 17. • Managed & supported by Microsoft • Familiarity of Windows • Re-use common tools, documentation, samples from Hadoop/Linux ecosystem • Add Hadoop projects that were authored on Linux to HDInsight • Easier transition from on-premise to cloud
  • 20. Partner Spotlight: AtScale Analysts Use Traditional BI Tools Against HDInsight
  • 21. • HDFS For the Cloud • Unlimited Storage, Petabyte Files • Optimized for Massive Throughput • High frequency, low latency, read immediately • Managed and secured
  • 28. Neudesic partnered with one of the nation's largest utility companies that recently deployed Smart Utility Meters for power customers, nearly a million meters sending usage data every 15 minutes. The result: an Azure hybrid big data processing solution that enabled the customer to perform gap analytics: a process for identifying gaps that exist in the power usage readings, over 7x faster than their previous solution! Billions of Smart Meter reads get processed to identify the nature and duration of the gaps to mitigate revenue losses. Smart Meters Business Rules Processing BI Layer Blob Storage HDInsightInput Processed Output data ELT Local SQL DB for Customer and other confidential data Extract processed data from blob storage AZCopy AZCopy SSIS Input files
  • 30. Big Data in Retail • Clickstream analytics • Online recommendation engine • 360° view of the customer • Analyze brand sentiment • Localized, personalized promotions • Optimal store layout Leading computer manufacturer in world • Use clickstream to deliver custom website ecommerce experience • Targeted ads for abandoned carts • Use unstructured data from website and social for data mining • Combine w/sales data for 360 view • Gather data from table-side devices at restaurants • Predict promotions/offers and content to upsell to guests • Gather social media sentiment from customer feedback • Combined with POS data, can determine right product mix Leading Multi-national Retailer • Track weather information (temperature/forecast) to predict shelf space for different seasons • Sentiment analysis on feedback Leading clothing online retailer • Use clickstream to understand who is viewing their site • Building recommendation engine based on users’ clickpaths
  • 31. Ziosk turned to Microsoft gold partner, Artis Consulting to deploy a hybrid deployment consisting of the Analytics Platform System, Azure HDInsight, Power BI, and Azure Machine Learning “Until now, we haven’t had the ability to optimize the guest experience based on their specific interactions with the devices. With Azure, we can close the loop.” Kevin Mowry Ziosk Chief Software Architect
  • 32. Big Data in Health • Predictive Analysis of Patient Health & Clinical Decision Support • Population, risk, and Care management • Real-time quality measures to assist providers w/regulatory requirements • Medical research data (eg. genomics) • Recruit cohorts for pharmaceutical trials • Process large volumes of data from any healthcare provider EHR system • Assist in showing compliance • Store 7-30 years of data to meet audit requirements • Scan handwritten notes and do natural language processing • Analyze if symptoms might map to bigger outbreak • Collect clinical trial data (from automated equipment, sensors) • Find patterns on this data (chemical compositions, enzymes) • Process 6 years worth of data in a few hours without any infrastructure
  • 33. Big Data Financial Services • New account risk screens • Fraud prevention • Trading risk • Maximize deposit spread • Insurance underwriting • Accelerate loan processing • Actively monitor currencies used by UK manufacturers in supply chain to do risk analysis • Monitor UK GDP to help customers stay on top of economic trends • Needed to handle increasing amounts of finance, compliance, and legal data from trading operations • Trading data drives strategic decisions • Track customer feedback on social media and on their blog posts/website to understand loyalty • Predict at-risks clients to reach out to • Process data for actuaries to analyze results to understand risks for insurance companies • Milliman’s application understands relationships between people, process, and technology to manage risk
  • 34. Tangerine partners with Microsoft to build a solution with Analytics Platform System for the data warehouse and uses PolyBase to query Azure HDInsight in the cloud. “With pre-built integration using PolyBase to query both the relational data warehouse and Hadoop in the cloud, the solution will allow us to reap the benefits of both relational and non-relational data regardless of where it lives.”
  • 47. Head node Name node Data nodes/task nodes JDBCODBC Query Console Metastore Thrift server Command line interface (CLI) Compiler, Optimizer, Executor Hadoop Hive Visual Studio
  • 50. Scale up/Scale out Tez Partitioning ORCFile Vectorization - 10 20 30 40 50 60 70 80 90 100 TPCH1 (1TB data) Latency in minutes)
  • 52. TransformationCollection Presentation and action Event Queuing System Long-term storage Search and query Data analytics (Excel) Web/thick client dashboards Devices to take action Event hub Event producers Applications Web and social Devices Sensors Live Dashboards Apache HBase on HDInsight DocumentDB Solr Azure Search MongoDB SQL Cloud gateways (web APIs) Field gateways Kafka/RabbitMQ/ ActiveMQ Event hubs Azure ML Storage adapters Stream processing 52 Storm HDInsight Stream Analytics
  • 54. 54 Easy to program A distributed real time processing platform Fault Tolerant Failure is expected, and embraced Fast Clocked at 1M+ messages per second per node Scalable Thousands of workers per cluster Reliable Guaranteed message delivery Exactly-once semantics Streaming data analysis
  • 56. Unbounded sequence of Tuples Core unit of data Immutable set of key/value pair Source of streams Wraps a streaming data source and emits Tuples 56 Spout {…} Tuple {…} {…} {…} {…} {…} {…} Stream
  • 57. Write to a data store Read from a data store Perform arbitrary computation (Optionally) Emit additional streams Core functions of a streaming computation | Receive tuples and do stuff Compute 57
  • 59. 59
  • 62. 62 Managed services Open source platform Scale-up and scale-down Event Hub Visual Studio Azure HBase, SQL Database, DocumentDB Speed Analyse millions of messages per second
  • 63. 63 Support for authoring Storm Topologies Create Storm projects from available template Submit a topology with C# bolts/spouts Submit Topologies containing Java spouts/bolts Monitor Topologies within VS Troubleshoot Topologies In essence, you never need to leave Visual Studio for Storm Projects
  • 64. Storm on HDInsight Azure Stream Analytics Management & Operations Service Managed Cluster Managed Service Price Link to the pricing page. Link to the pricing page. Microsoft Supported Yes Yes Open Source Yes No Development Experience SQL DSL No Yes Extensible Yes No Temporal Operators No. Customer write custom code Yes Authorig/Debugging Tools via Visual Studio Interactive authoring and debugging via azure portal Input/Output Data Ingress No restriction Event Hub, Azure Blobs Data egress No restriction Support to write data to Event Hubs, Blob store, azure table, azure sql db, Powerbi Supports Multiple Inputs Yes Yes Generate Multiple Outputs Yes Yes Data Format No restriction Avro, JSON,CSV Performance Scalability Yes Yes Elastic Scale Yes Yes Technology comparison
  • 71. HBase: The Definitive Guide Online HBase Book https://github.com/hdinsight/hbase-sdk-for-net https://github.com/maxluk/tweet-sentiment Get started using HBase in HDInsight Tutorial: Building Tweet Sentiment App
  • 75. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster {ISV App Name} 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create Basics
  • 76. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster New HDInsight Cluster 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create New HDInsight Cluster Hadoop mycluster001 HDInsight_Telemetry myresourcegroup Linux (Ubuntu 12.04 LTS)
  • 77. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster New HDInsight Cluster 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create New Virtual Network
  • 78. SUMMARY See a deployment summary VIRTUAL NETWORK Configure your virtual network HDINSIGHT CLUSTER Configure your cluster New HDInsight Cluster 1 BASICS Configure your app 1. Basics Windows Server 2012 R2 Datacenter Hadoop Create Summary Cluster name newcluster001 Cluster type Storm Cluster operating system Linux (Ubuntu 12.04 LTS) Cluster data source (new) storage001 (Azure Storage) Head nodes: 2 nodes (D12) Worker nodes: 4 nodes (D14) Zookeeper nodes: 3 nodes (D12) Metastores selected: Hive: yes, Oozie: no