SlideShare a Scribd company logo
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
What is Big Data?
What is Big Data?
Big Data = All Data!
Big Data = All Data!
Unstructured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
Semi-Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured CSV, Columnar Storage (Parquet,
ORC). Strict data model structure
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB
Why is Processing Big Data Challenging ?
Big Data with Azure
Big Data with Azure
TrustedProductive IntelligentHybrid
Azure. Cloud for all.
>80%
of Fortune 500 use
the Microsoft Cloud
Big Data with Azure
Azure Big Data Processing Pipeline Ingest
Azure Event Hubs
Compose, orchestrate & monitor data services at scale
• Fully managed service
• Any data on-premises or in the cloud
• Single pane of glass management
• Global service infrastructure
• Cost Effective
Azure Data Factory
BI & analytics
Stored Procedures
Hadoop on Azure
Data Lake Analytics
Custom Code
Machine Learning
Trusted data
Azure Big Data Processing Pipeline Store
A Z U R E B L O B S T O R A G E
• A highly scalable object storage for unstructured data
ď‚§ Serverless Azure Service.
ď‚§ Can store billions of Images, Videos, Audio,
Documents etc.
ď‚§ Automatically scales as more data is uploaded.
ď‚§ Four Replication Options: LRS, GRS, ZRS and
RA-GRS
A Z U R E D A T A L A K E S T O R E
• A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics
ď‚§ No limits on: data types, number of files, size of
individual files, total amount of data stored, how
long data can be stored or ingestion throughput
ď‚§ Low latency and high throughput workloads can be
used for ingesting streaming data.
ď‚§ Is Hadoop-compatible (via WebHDFS REST API).
Supported by leading Hadoop distros and
HDInsight. Backend Storage in Azure
Data Node Data Node Data Node Data Node Data NodeData Node
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rdBlock Block Block Block Block Block
Block 1 Block 2 Block n…
Azure Data Lake Store File
Azure Big Data Processing Pipeline Process
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks
A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W
• Notebooks are a popular way to develop, and run, Spark Applications
ď‚§ Notebooks are not only for authoring Spark applications but
can be run/executed directly on clusters
• Shift+Enter
•
•
 Notebooks support fine grained permissions—so they can be
securely shared with colleagues for collaboration (see
following slide for details on permissions and abilities)
ď‚§ Notebooks are well-suited for prototyping, rapid
development, exploration, discovery and iterative
development Notebooks typically consist of code, data, visualization, comments and notes
Big Data Processing Pipeline
Azure
Machine
Learning
SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
Azure Cosmos DB
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service
No SQL Decision Tree
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer
• Perform near real-time queries on terabytes of data
• A lightning-fast indexing and querying service for complex analytics.
• Allows you to quickly identify trends, patterns, or anomalies in all
data types inclusive of structured, semi structured and unstructured
data.
Big Data Processing Pipeline
Visualize
Azure
Machine
Learning
Big Data with Azure
Big Data with Azure
Big Data with Azure
Big Data with Azure
DEMO
Big Data with Azure

More Related Content

PDF
Big data on Azure for Architects
PPTX
Big Data on Azure Tutorial
 
PPTX
Big Data on azure
PDF
Azure Big data
PDF
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
PPTX
Microsoft Azure Big Data Analytics
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PDF
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Big data on Azure for Architects
Big Data on Azure Tutorial
 
Big Data on azure
Azure Big data
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
Microsoft Azure Big Data Analytics
Big Data Analytics in the Cloud with Microsoft Azure
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution

What's hot (20)

PDF
Democratizing Data Science on Kubernetes
PDF
Azure HDInsight
PPTX
Big Data Use Cases
 
PDF
Data Lakes with Azure Databricks
PDF
Hd insight essentials quick view
PPTX
Big Data in Azure
PPTX
How to boost your datamanagement with Dremio ?
PPTX
Introduction to Azure HDInsight
PDF
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
PPTX
Cortana Analytics Suite
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PPTX
Introduction to PolyBase
PPTX
Big data in Azure
PPTX
Ai & Data Analytics 2018 - Azure Databricks for data scientist
PPTX
Building big data solutions on azure
PPTX
Building Modern Data Platform with Microsoft Azure
PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Global AI Bootcamp Madrid - Azure Databricks
PDF
Modernizing to a Cloud Data Architecture
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Democratizing Data Science on Kubernetes
Azure HDInsight
Big Data Use Cases
 
Data Lakes with Azure Databricks
Hd insight essentials quick view
Big Data in Azure
How to boost your datamanagement with Dremio ?
Introduction to Azure HDInsight
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Cortana Analytics Suite
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Introduction to PolyBase
Big data in Azure
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Building big data solutions on azure
Building Modern Data Platform with Microsoft Azure
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Global AI Bootcamp Madrid - Azure Databricks
Modernizing to a Cloud Data Architecture
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Ad

Similar to Big Data with Azure (20)

PPTX
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturÄ™ Lambda na p...
PDF
Comparing Microsoft Big Data Platform Technologies
PPTX
TechEvent Databricks on Azure
PDF
1 Introduction to Microsoft data platform analytics for release
PPTX
Introduction to Azure Databricks
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
PDF
DBP-010_Using Azure Data Services for Modern Data Applications
PDF
Prague data management meetup 2018-03-27
PPTX
Deep Learning Technical Pitch Deck
PDF
How Apache Spark fits in the Big Data landscape
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PPT
Data analytics & its Trends
PDF
A Gentle Introduction to Big Data
PDF
BD_Architecture and Charateristics.pptx.pdf
PDF
Azure databricks c sharp corner toronto feb 2019 heather grandy
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
PDF
Azure Data Engineering.pdf
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturÄ™ Lambda na p...
Comparing Microsoft Big Data Platform Technologies
TechEvent Databricks on Azure
1 Introduction to Microsoft data platform analytics for release
Introduction to Azure Databricks
Azure Databricks - An Introduction 2019 Roadshow.pptx
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
5 Comparing Microsoft Big Data Technologies for Analytics
DBP-010_Using Azure Data Services for Modern Data Applications
Prague data management meetup 2018-03-27
Deep Learning Technical Pitch Deck
How Apache Spark fits in the Big Data landscape
Lecture 5 - Big Data and Hadoop Intro.ppt
Data analytics & its Trends
A Gentle Introduction to Big Data
BD_Architecture and Charateristics.pptx.pdf
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Data Engineering.pdf
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Ad

More from Aaron (Ari) Bornstein (13)

PPTX
The Importance of Developing Interpretable AI Applications
PPTX
Unsupervised Aspect Based Sentiment Analysis at Scale
PPTX
Microsoft Breeze CA AI Workshop
PPTX
What startups need to know about NLP, AI, & ML on the cloud.
PPTX
PyConIL 2019 Beyond word Embeddings Slides
PPTX
Best practices for DevRel in Israel
PPTX
NLP in the Industry
PPTX
Democratizing AI Istanbul Open Source Summit
PPTX
Beyond word embeddings
PPTX
Data Hack 2018 Microsoft Math Teacher Challenge
PDF
A walk through Azure IoT
PPTX
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PPTX
DLD TLV Cognitive Services: The Brains Behind Your Bot
The Importance of Developing Interpretable AI Applications
Unsupervised Aspect Based Sentiment Analysis at Scale
Microsoft Breeze CA AI Workshop
What startups need to know about NLP, AI, & ML on the cloud.
PyConIL 2019 Beyond word Embeddings Slides
Best practices for DevRel in Israel
NLP in the Industry
Democratizing AI Istanbul Open Source Summit
Beyond word embeddings
Data Hack 2018 Microsoft Math Teacher Challenge
A walk through Azure IoT
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
DLD TLV Cognitive Services: The Brains Behind Your Bot

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Big Data with Azure

  • 9. What is Big Data?
  • 10. What is Big Data? Big Data = All Data!
  • 11. Big Data = All Data! Unstructured
  • 12. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured
  • 13. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured Semi-Structured
  • 14. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured
  • 15. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured Structured
  • 16. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured Structured CSV, Columnar Storage (Parquet, ORC). Strict data model structure
  • 17. Why is Processing Big Data Challenging ?
  • 18. • Variety: It can be structured, semi-structured, or unstructured Why is Processing Big Data Challenging ?
  • 19. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch Why is Processing Big Data Challenging ?
  • 20. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch • Volume: It can be 1GB or 1PB Why is Processing Big Data Challenging ?
  • 21. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch • Volume: It can be 1GB or 1PB Why is Processing Big Data Challenging ?
  • 25. >80% of Fortune 500 use the Microsoft Cloud
  • 27. Azure Big Data Processing Pipeline Ingest
  • 29. Compose, orchestrate & monitor data services at scale • Fully managed service • Any data on-premises or in the cloud • Single pane of glass management • Global service infrastructure • Cost Effective Azure Data Factory BI & analytics Stored Procedures Hadoop on Azure Data Lake Analytics Custom Code Machine Learning Trusted data
  • 30. Azure Big Data Processing Pipeline Store
  • 31. A Z U R E B L O B S T O R A G E • A highly scalable object storage for unstructured data ď‚§ Serverless Azure Service. ď‚§ Can store billions of Images, Videos, Audio, Documents etc. ď‚§ Automatically scales as more data is uploaded. ď‚§ Four Replication Options: LRS, GRS, ZRS and RA-GRS
  • 32. A Z U R E D A T A L A K E S T O R E • A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics ď‚§ No limits on: data types, number of files, size of individual files, total amount of data stored, how long data can be stored or ingestion throughput ď‚§ Low latency and high throughput workloads can be used for ingesting streaming data. ď‚§ Is Hadoop-compatible (via WebHDFS REST API). Supported by leading Hadoop distros and HDInsight. Backend Storage in Azure Data Node Data Node Data Node Data Node Data NodeData Node Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rdBlock Block Block Block Block Block Block 1 Block 2 Block n… Azure Data Lake Store File
  • 33. Azure Big Data Processing Pipeline Process
  • 34. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits Azure Databricks
  • 35. A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W • Notebooks are a popular way to develop, and run, Spark Applications ď‚§ Notebooks are not only for authoring Spark applications but can be run/executed directly on clusters • Shift+Enter • • ď‚§ Notebooks support fine grained permissions—so they can be securely shared with colleagues for collaboration (see following slide for details on permissions and abilities) ď‚§ Notebooks are well-suited for prototyping, rapid development, exploration, discovery and iterative development Notebooks typically consist of code, data, visualization, comments and notes
  • 36. Big Data Processing Pipeline Azure Machine Learning
  • 37. SQL MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Azure Cosmos DB DocumentColumn-family Key-value Graph A globally distributed, massively scalable, multi-model database service
  • 39. Azure Data Explorer Kusto (Developed in Israel)
  • 40. Azure Data Explorer Kusto (Developed in Israel)
  • 41. Azure Data Explorer Kusto (Developed in Israel)
  • 42. Azure Data Explorer • Perform near real-time queries on terabytes of data • A lightning-fast indexing and querying service for complex analytics. • Allows you to quickly identify trends, patterns, or anomalies in all data types inclusive of structured, semi structured and unstructured data.
  • 43. Big Data Processing Pipeline Visualize Azure Machine Learning
  • 48. DEMO