SlideShare a Scribd company logo
Nilesh Gule
@nileshgule | www.HandsOnArchitect.com
Modern Data Warehouse
Using
Azure
$whoami
{
“name” : “Nilesh Gule”,
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://github.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“email” : “nileshgule@gmail.com",
“likes” : “Technical Evangelism, Cricket”,
“co-organizer” : “Azure Singapore UG”
}
Part 3 - Modern Data Warehouse with Azure Synapse
Credits: James Serra
Part 3 - Modern Data Warehouse with Azure Synapse
Part 1 - Recap – ADLS & ADF
• Petabyte scale storage
• Hierarchical namespace
• Hadoop compatible access with ABFS
driver
ADLS - Main features
ADF - Main features
• Cloud ETL service
• Scale-out serverless data integration & data
transformation
• Code-free UI
• Monitoring & Management
Part 2 - Recap
• Collaborative Spark based Analytical service
• Different cluster types (automated / interactive / pool)
• Autoscale based on workloads
• Fine grained access controls
Azure Databricks - Main features
Azure Synapse
Limitless analytics service for
enterprise data warehousing
and
Big Data analytics
Parallelism
• Uses many separate CPUs running in parallel to execute a single
program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel
Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
Synapse Architecture
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
• Control Node
• Compute Node
• Data Movement
Service (DMS)
Components
• Hash
• Round Robin
• Replicate
Distributions
Synapse Data Distributions
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
• Highest query perf for joins &
aggregations on large tables
• Rows per distribution varies
Hash
• Fastest query performance for
small tables
Replicated
ALTER DATABASE ContosoDW MODIFY
(service_objective = 'DW1000');
DWU
DW100
DW200
DW300
DW400
DW500
DW1000
DW1500
DW2000
DW2500
DW3000
DW5000
DW6000
DW7500
DW10000
DW15000
DW30000
Azure SQL Data Warehouse
Engine Worker1
Azure Storage Blob(s)
D12D11 D13 D14 D15 D16 D18D17 D19 D20
D22D21 D23 D24 D25 D26 D28D27 D29 D30
D32D31 D33 D34 D35 D36 D38D37 D39 D40
D42D41 D43 D44 D45 D46 D48D47 D49 D50
D52D51 D53 D54 D55 D56 D58D57 D59 D60
D2D1 D3 D4 D5 D6 D8D7 D9 D10
Azure SQL Data Warehouse
Engine
Worker4
Azure Storage Blob(s)
Worker1
Worker5
Worker3
Worker2
Worker6 D52D51 D53 D54 D55 D56 D58D57 D59 D60
D12D11 D13 D14 D15 D16 D18D17 D19 D20
D22D21 D23 D24 D25 D26 D28D27 D29 D30
D32D31 D33 D34 D35 D36 D38D37 D39 D40
D42D41 D43 D44 D45 D46 D48D47 D49 D50
D2D1 D3 D4 D5 D6 D8D7 D9 D10
Azure Databricks – SQL DW Connectivity
External Data Sources
• External Data Source
• Hadoop, ADLS
• External File Format
• File types
• Delimited Text, Hive RCFile, Hive ORC file, Parquet
• Data Compression
• Gzip, Snappy
• Field Delimiters
• Date Format
• External Table
What workloads are NOT suitable?
• High frequency reads and writes.
• Large numbers of singleton
selects.
• High volumes of single row
inserts.
Operational workloads (OLTP)
• Row by row processing needs.
• Incompatible formats (XML).
Data Preparations
SQL
SQL
What Workloads are Suitable?
Store large volumes of data.
Consolidate disparate data into a single location.
Shape, model, transform and aggregate data.
Batch/Micro-batch loads.
Perform query analysis across large datasets.
Ad-hoc reporting across large data volumes.
All using simple SQL constructs.
Analytics
Summary
• MPP Architecture
• Can be paused
• Optimized for analytics workloads
• Supports multiple external file formats
• Works with Polybase
Azure Synapse - Main features
SQL Server & SQL Data Warehouse Differences
Azure Synapse
Workload Management
External Data Source
External File Formats
External Table
SQL Data Warehouse Benchmark
References – MS Learn
https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse
Thank you very much
Code with Passion and Strive for Excellence
https://www.slideshare.net/nileshgule/presentations
https://speakerdeck.com/nileshgule/
Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule @nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
Q&A

More Related Content

PDF
Modernizing to a Cloud Data Architecture
PPTX
Azure Data Factory
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PDF
Owning Your Own (Data) Lake House
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PPTX
Azure Synapse Analytics Overview (r1)
PDF
Data Platform Architecture Principles and Evaluation Criteria
PDF
Databricks Delta Lake and Its Benefits
Modernizing to a Cloud Data Architecture
Azure Data Factory
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Owning Your Own (Data) Lake House
Building Lakehouses on Delta Lake with SQL Analytics Primer
Azure Synapse Analytics Overview (r1)
Data Platform Architecture Principles and Evaluation Criteria
Databricks Delta Lake and Its Benefits

What's hot (20)

PPTX
Microsoft Azure Data Factory Hands-On Lab Overview Slides
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
PDF
Azure Synapse Analytics
PDF
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Building Modern Data Platform with Microsoft Azure
PPTX
Azure Data Factory Data Flow
PPTX
Data Lakehouse Symposium | Day 4
PDF
Introduction to Azure Data Lake
PDF
Improving Data Literacy Around Data Architecture
PDF
Conceptual vs. Logical vs. Physical Data Modeling
PPTX
Introduction to Data Engineering
PDF
The Hidden Value of Hadoop Migration
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Architecture for Data Governance
PPTX
Inside open metadata—the deep dive
PDF
Azure Data Factory V2; The Data Flows
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PPTX
Microsoft Purview
PPSX
Requirements for a Master Data Management (MDM) Solution - Presentation
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure Synapse Analytics
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
DW Migration Webinar-March 2022.pptx
Building Modern Data Platform with Microsoft Azure
Azure Data Factory Data Flow
Data Lakehouse Symposium | Day 4
Introduction to Azure Data Lake
Improving Data Literacy Around Data Architecture
Conceptual vs. Logical vs. Physical Data Modeling
Introduction to Data Engineering
The Hidden Value of Hadoop Migration
Data Lakehouse Symposium | Day 1 | Part 1
Data Architecture for Data Governance
Inside open metadata—the deep dive
Azure Data Factory V2; The Data Flows
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Microsoft Purview
Requirements for a Master Data Management (MDM) Solution - Presentation
Ad

Similar to Part 3 - Modern Data Warehouse with Azure Synapse (20)

PDF
Modern data warehouse with Azure
PPTX
Best Practices: Hadoop migration to Azure HDInsight
PPTX
Azure Synapse Analytics Overview (r2)
PPTX
Analytics in the Cloud
PPTX
Survey of the Microsoft Azure Data Landscape
PPTX
An intro to Azure Data Lake
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
PPTX
Going Serverless - an Introduction to AWS Glue
PPTX
CC -Unit4.pptx
PPTX
Move your on prem data to a lake in a Lake in Cloud
 
PPTX
Azure Lowlands: An intro to Azure Data Lake
PDF
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
PPTX
Big Data on azure
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
PPTX
SQL_and_Databricks_Presentation_from_basic
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PPTX
SQL Azure for ISUG(SQL Server Israeli User Group)
PPTX
This Ain't Your Parents' Search Engine
PPTX
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
PPTX
SQL To NoSQL - Top 6 Questions Before Making The Move
Modern data warehouse with Azure
Best Practices: Hadoop migration to Azure HDInsight
Azure Synapse Analytics Overview (r2)
Analytics in the Cloud
Survey of the Microsoft Azure Data Landscape
An intro to Azure Data Lake
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Going Serverless - an Introduction to AWS Glue
CC -Unit4.pptx
Move your on prem data to a lake in a Lake in Cloud
 
Azure Lowlands: An intro to Azure Data Lake
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Big Data on azure
Azure Databricks - An Introduction 2019 Roadshow.pptx
SQL_and_Databricks_Presentation_from_basic
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
SQL Azure for ISUG(SQL Server Israeli User Group)
This Ain't Your Parents' Search Engine
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
SQL To NoSQL - Top 6 Questions Before Making The Move
Ad

More from Nilesh Gule (20)

PDF
API Management in the AI Era - Azure Singapore.pdf
PDF
Infuse Intelligence Into your App with Foundry Local.pdf
PDF
Enhance GitHub Copilot using MCP - Enterprise version.pdf
PDF
API Management in the AI Era session GAB Melbourne
PDF
GitHub Copilot Agent Mode - Azure Builders Melbourne
PDF
Festive Tech Calendar -2024 Supercharge Kubernetes Debugging with k8sGPT.pdf
PDF
Code Creativity and Customers- Navigating the Generative AI Landscape - Austr...
PDF
Supercharge Kubernetes Debugging with k8sGPT.pdf
PDF
Portable Multi-cloud Applications with Dapr.pdf
PDF
k8sug Melbourne - Improve Kubernetes with k8sGPT
PDF
Event Driven Autoscaling using KEDA - MVP
PDF
Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
PDF
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
PDF
Modular Architecturs for Resilience and Adaptability.pdf
PDF
Autoscale applications based on external events with KEDA.pdf
PDF
Singapore JUG - Open Telemetry.pdf
PDF
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
PDF
Build Secure Portable Applications using AKS and its ecosystem
PDF
Cloud Native Ninja - PT8 - Containerize React app.pdf
PDF
Cloud Native Ninja - PT8 - Containerize React app.pdf
API Management in the AI Era - Azure Singapore.pdf
Infuse Intelligence Into your App with Foundry Local.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
API Management in the AI Era session GAB Melbourne
GitHub Copilot Agent Mode - Azure Builders Melbourne
Festive Tech Calendar -2024 Supercharge Kubernetes Debugging with k8sGPT.pdf
Code Creativity and Customers- Navigating the Generative AI Landscape - Austr...
Supercharge Kubernetes Debugging with k8sGPT.pdf
Portable Multi-cloud Applications with Dapr.pdf
k8sug Melbourne - Improve Kubernetes with k8sGPT
Event Driven Autoscaling using KEDA - MVP
Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
Modular Architecturs for Resilience and Adaptability.pdf
Autoscale applications based on external events with KEDA.pdf
Singapore JUG - Open Telemetry.pdf
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Build Secure Portable Applications using AKS and its ecosystem
Cloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdf

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
sap open course for s4hana steps from ECC to s4
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
sap open course for s4hana steps from ECC to s4
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine learning based COVID-19 study performance prediction
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
Spectroscopy.pptx food analysis technology
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf

Part 3 - Modern Data Warehouse with Azure Synapse

  • 1. Nilesh Gule @nileshgule | www.HandsOnArchitect.com Modern Data Warehouse Using Azure
  • 2. $whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github” : “https://github.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “email” : “nileshgule@gmail.com", “likes” : “Technical Evangelism, Cricket”, “co-organizer” : “Azure Singapore UG” }
  • 6. Part 1 - Recap – ADLS & ADF • Petabyte scale storage • Hierarchical namespace • Hadoop compatible access with ABFS driver ADLS - Main features ADF - Main features • Cloud ETL service • Scale-out serverless data integration & data transformation • Code-free UI • Monitoring & Management
  • 7. Part 2 - Recap • Collaborative Spark based Analytical service • Different cluster types (automated / interactive / pool) • Autoscale based on workloads • Fine grained access controls Azure Databricks - Main features
  • 8. Azure Synapse Limitless analytics service for enterprise data warehousing and Big Data analytics
  • 9. Parallelism • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  • 10. Synapse Architecture https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture • Control Node • Compute Node • Data Movement Service (DMS) Components • Hash • Round Robin • Replicate Distributions
  • 11. Synapse Data Distributions https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture • Highest query perf for joins & aggregations on large tables • Rows per distribution varies Hash • Fastest query performance for small tables Replicated
  • 12. ALTER DATABASE ContosoDW MODIFY (service_objective = 'DW1000'); DWU DW100 DW200 DW300 DW400 DW500 DW1000 DW1500 DW2000 DW2500 DW3000 DW5000 DW6000 DW7500 DW10000 DW15000 DW30000
  • 13. Azure SQL Data Warehouse Engine Worker1 Azure Storage Blob(s) D12D11 D13 D14 D15 D16 D18D17 D19 D20 D22D21 D23 D24 D25 D26 D28D27 D29 D30 D32D31 D33 D34 D35 D36 D38D37 D39 D40 D42D41 D43 D44 D45 D46 D48D47 D49 D50 D52D51 D53 D54 D55 D56 D58D57 D59 D60 D2D1 D3 D4 D5 D6 D8D7 D9 D10
  • 14. Azure SQL Data Warehouse Engine Worker4 Azure Storage Blob(s) Worker1 Worker5 Worker3 Worker2 Worker6 D52D51 D53 D54 D55 D56 D58D57 D59 D60 D12D11 D13 D14 D15 D16 D18D17 D19 D20 D22D21 D23 D24 D25 D26 D28D27 D29 D30 D32D31 D33 D34 D35 D36 D38D37 D39 D40 D42D41 D43 D44 D45 D46 D48D47 D49 D50 D2D1 D3 D4 D5 D6 D8D7 D9 D10
  • 15. Azure Databricks – SQL DW Connectivity
  • 16. External Data Sources • External Data Source • Hadoop, ADLS • External File Format • File types • Delimited Text, Hive RCFile, Hive ORC file, Parquet • Data Compression • Gzip, Snappy • Field Delimiters • Date Format • External Table
  • 17. What workloads are NOT suitable? • High frequency reads and writes. • Large numbers of singleton selects. • High volumes of single row inserts. Operational workloads (OLTP) • Row by row processing needs. • Incompatible formats (XML). Data Preparations SQL SQL
  • 18. What Workloads are Suitable? Store large volumes of data. Consolidate disparate data into a single location. Shape, model, transform and aggregate data. Batch/Micro-batch loads. Perform query analysis across large datasets. Ad-hoc reporting across large data volumes. All using simple SQL constructs. Analytics
  • 19. Summary • MPP Architecture • Can be paused • Optimized for analytics workloads • Supports multiple external file formats • Works with Polybase Azure Synapse - Main features
  • 20. SQL Server & SQL Data Warehouse Differences Azure Synapse Workload Management External Data Source External File Formats External Table SQL Data Warehouse Benchmark
  • 21. References – MS Learn https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse
  • 22. Thank you very much Code with Passion and Strive for Excellence https://www.slideshare.net/nileshgule/presentations https://speakerdeck.com/nileshgule/
  • 23. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com
  • 24. Q&A