SlideShare a Scribd company logo
Azure Data Lake:
What is it? Why is it?
Where is it?
EUGENE POLONICHKO
DATA PLATFORM MVP
BIDWH ARCHITECT
About me
Eugene Polonichko has over 7 years of experience
with SQL Server. He mainly focused on BI projects
(SSAS, SSIS, PowerBI, Cognos, Informatica
PowerCenter, Pentaho, Tableau). Eugene is a
passionate speaker and SQL community volunteer
presenting regularly at PASS SQL Saturday events
and local user groups around Ukraine and Europe.
Eugene is PASS Chapter Leader and he has a status
MVP Data Platform
https://www.linkedin.com/in/eugenepolonichko/
https://twitter.com/EvgenPolonichko
Agenda
 What is Data Lake?
 Architecture of Azure Data Lake
 Azure Data Lake Store
 Overview of Azure Data Lake Store
 Compare
 For big data processing
 Azure Data Lake Analytics
 U-SQL
 Concepts
 U-SQL Script Structure
 Extractors
 U-SQL Jobs
 U-SQL catalog
 Monitoring and performance U-SQL jobs
 Data Lake Analytics pricing
Data Lake
Data Lake
Architecture of Azure Data Lake
Azure Data Lake Stores
 Azure Data Lake Store is a hyper-scale repository for big data analytic workloads.
Azure Data Lake enables you to capture data of any size, type, and ingestion speed
in one single place for operational and exploratory analytics.
 The Azure Data Lake store is an Apache Hadoop file system compatible with
Hadoop Distributed File System (HDFS)
 Can be accessed from Hadoop (available with HDInsight cluster) using the
WebHDFS-compatible REST APIs
Azure Data Lake Stores
Use Cases
 Store social media
posts, log files, sensor
data
 Store corporate data
such as
relational databases
(as flat files)
Data Lake Storage vs Azure Storage
Optimized storage for big
data analytics workloads
General purpose object
store for a wide variety of
storage scenarios
Batch, interactive, streaming
analytics, log files and etc
Any type of text or binary
data, such as application
back end,
account contains folders, which
in turn contains data stored as
files
Storage account has
containers
Optimized performance for
parallel analytics workloads. High
Throughput and IOPS.
Not optimized for
analytics workloads
Big Data requirements
Pricing
Transaction prices
Storage prices
DEMO
Azure Data Lake Analytics
Azure Data Lake Analytics is an on-demand analytics job service to simplify big data analytics. You
can focus on writing, running, and managing jobs rather than on operating distributed
infrastructure.
 Dynamic scaling
 Develop faster, debug, and optimize smarter using familiar tools
 Affordable and cost effective
 Works with all your Azure Data
 U-SQL: simple and familiar, powerful, and extensible
U-SQL
T-SQL C#
U-SQL
Concepts
Retrieve data from stored
locations in rowset format
Transform the rowset(s)
Transform the rowset(s)
U-SQL Script Structure
Script :=
Statement_List.
Statement_List :=
{ [Statement] ';' }.
Statement :=
Use_Statement
| If_Else_Statement
| Declare_Variable_Statement
| Reference_Assembly_Statement
| Deploy_Resource_Statement
| DDL_Statement
| Query_Statement
| Procedure_Call
| Import_Package_Statement
| DML_Statement
| Output_Statement.
U-SQL Script Structure
U-SQL Built-in Extractors:
 Extractors.Text() :
 Extractors.Csv()
 Extractors.Tsv()
Extractors
U-SQL Jobs
UNIT
V--
V--
V—
V---
V--
V--
ADLAUs
U-SQL Jobs
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs
1 ADLAU ~=
A VM with 2 cores and 6
GB of memory
U-SQL Jobs
U-SQL Catalog
Database
Table
Views
Procedures
DEMO
Monitoring
1 Azure Portal
Monitoring
Visual Studio
DEMO
Pricing
Links
 http://www.sqlservercentral.com/stairway/142480/
 https://azure.microsoft.com/en-us/solutions/data-lake/
Questions?
Thank you

More Related Content

PPTX
Eugene Polonichko "Architecture of modern data warehouse"
PDF
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
PDF
Azure Data Lake Store and Analytics
PDF
Modern Data architecture Design
PPTX
Azure data bricks by Eugene Polonichko
PDF
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
PDF
Building Data Lakes with Apache Airflow
PDF
Definitive Guide to Select Right Data Warehouse (2020)
Eugene Polonichko "Architecture of modern data warehouse"
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Data Lake Store and Analytics
Modern Data architecture Design
Azure data bricks by Eugene Polonichko
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
Building Data Lakes with Apache Airflow
Definitive Guide to Select Right Data Warehouse (2020)

What's hot (20)

PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
PPTX
Webinar - Introduction to Azure Data Lake
PDF
Auckland SQL Saturday - Azure Data Lake
PPTX
Azure data factory
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
PDF
Unleash the Power of Azure Data Factory - SQL User Group
PPTX
ETL in the Cloud With Microsoft Azure
PPTX
Azure Data Factory ETL Patterns in the Cloud
PPTX
Data quality patterns in the cloud with ADF
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
PPTX
Azure Data Factory Data Flows Training v005
PPTX
Database awareness
PPTX
Azure Data Factory Data Flow
PDF
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
PPTX
Data Lake Overview
PPTX
Azure Data Factory Data Wrangling with Power Query
PDF
Azure Analysis Services (Azure Bootcamp 2018)
PDF
Data warehouse con azure synapse analytics
PPTX
Solucion de BI en Azure
Integration Monday - Analysing StackExchange data with Azure Data Lake
Webinar - Introduction to Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
Azure data factory
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Unleash the Power of Azure Data Factory - SQL User Group
ETL in the Cloud With Microsoft Azure
Azure Data Factory ETL Patterns in the Cloud
Data quality patterns in the cloud with ADF
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Azure Data Factory Data Flows Training v005
Database awareness
Azure Data Factory Data Flow
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Data Lake Overview
Azure Data Factory Data Wrangling with Power Query
Azure Analysis Services (Azure Bootcamp 2018)
Data warehouse con azure synapse analytics
Solucion de BI en Azure
Ad

Similar to Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?" (20)

PPTX
Azure Synapse Analytics Overview (r1)
PDF
Introduction to Azure Data Lake
PPTX
Azure Data Lake Intro (SQLBits 2016)
PPTX
ASAkkhskskshjshshshsbvdsjshsbsbsbsbs.pptx
PPTX
Azure synapse analytics 124737537377 .pptx
PDF
Prague data management meetup 2018-03-27
PDF
Big Data Analytics from Azure Cloud to Power BI Mobile
PPTX
Exploring Microsoft Azure Infrastructures
 
PDF
Azure Data Engineer Course | Azure Data Engineer Trainin
PPTX
Afternoons with Azure - Azure Data Services
 
PDF
Azure Data Engineer Interview Questions By ScholarHat
PDF
Azure Synapse 101 Webinar Presentation
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
PPTX
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
PPTX
Modern Analytics Academy - Data Modeling (1).pptx
PPTX
Your-Complete-Guide-to-Azure-Data-Engineering (1).pptx
PDF
Introduction to Azure Synapse Webinar
PPTX
Azure Synapse Analytics Overview (r2)
PPTX
Cepta The Future of Data with Power BI
DOCX
Microsoft Fabric data warehouse by dataplatr
Azure Synapse Analytics Overview (r1)
Introduction to Azure Data Lake
Azure Data Lake Intro (SQLBits 2016)
ASAkkhskskshjshshshsbvdsjshsbsbsbsbs.pptx
Azure synapse analytics 124737537377 .pptx
Prague data management meetup 2018-03-27
Big Data Analytics from Azure Cloud to Power BI Mobile
Exploring Microsoft Azure Infrastructures
 
Azure Data Engineer Course | Azure Data Engineer Trainin
Afternoons with Azure - Azure Data Services
 
Azure Data Engineer Interview Questions By ScholarHat
Azure Synapse 101 Webinar Presentation
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Modern Analytics Academy - Data Modeling (1).pptx
Your-Complete-Guide-to-Azure-Data-Engineering (1).pptx
Introduction to Azure Synapse Webinar
Azure Synapse Analytics Overview (r2)
Cepta The Future of Data with Power BI
Microsoft Fabric data warehouse by dataplatr
Ad

More from DataConf (9)

PPTX
Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
PPTX
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
PDF
Taras Firman "How to build advanced prediction with adding external data."
PPTX
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
PPTX
Oles Petriv "Semantic image segmentation using word embeddings."
PPTX
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
PPTX
Vitalii Bashun "First Spark application in one hour"
PPTX
Vitalii Bondarenko "Machine Learning on Fast Data"
PDF
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...
Sergiy Lunyakin "Cloud BI with Azure Analysis Services"
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
Taras Firman "How to build advanced prediction with adding external data."
Juriy Zaletsky "Використання Encog для прогнозування коливання курсів валют"
Oles Petriv "Semantic image segmentation using word embeddings."
Anastasiya Kaminskaya "How to optimize Tabular model in PowerPivot or in Anal...
Vitalii Bashun "First Spark application in one hour"
Vitalii Bondarenko "Machine Learning on Fast Data"
Volodymyr Getmanskyi "Deep learning for satellite imagery colorization and di...

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharma ospi slides which help in ospi learning
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Complications of Minimal Access Surgery at WLH
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Microbial diseases, their pathogenesis and prophylaxis
Pharma ospi slides which help in ospi learning
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
FourierSeries-QuestionsWithAnswers(Part-A).pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Anesthesia in Laparoscopic Surgery in India
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Basic Mud Logging Guide for educational purpose
Final Presentation General Medicine 03-08-2024.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cell Types and Its function , kingdom of life
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Classroom Observation Tools for Teachers
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Complications of Minimal Access Surgery at WLH

Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"