SlideShare a Scribd company logo
Asanka Padmakumara
Business Intelligence Consultant,
• Blog: asankap.wordpress.com
• Linked In: linkedin.com/in/asankapadmakumara
• Twitter: @asanka_e
• Facebook: facebook.com/asankapk
Move Your On-
Prem Data to a
Lake in the
Clouds
Agenda
• Where are we right now?
• Why we need to go for Data Lake?
• What is Azure Data Lake?
• How do we get there?
• Demo
• Q & A
Where are we right now?
What are the challenges?
• Limited storage
• Limited processing power
• High hardware cost
• High maintains cost
• No disaster recovery
• Availability and reliability issues
• Scalability issues
• Security
• Solution: Azure Data Lake
What is Azure Data Lake?
• Highly scalable data storage and analytics service
• Intended for big data storage and analysis
• A faster and efficient solution than on-prem data centers
• Three services:
Analytics
Storage
HDInsight
(“managed clusters”)
Azure Data Lake Analytics
Azure Data Lake Storage
Azure Data Lake Architecture
Azure Data Lake Store
• Built for Hadoop
• Compatible with most components in Hadoop Eco-
systems
• Web HDFS API
• Unlimited storage, petabyte files
• Performance-tuned for big data analytics
• High throughput, IOPs
• Multiple parts of a file in multiple servers:
Parallel reading
• Enterprise-ready: Highly-available and secure
• All Data, One Place
• Any Data in native format
• No schema, No prior processing
Optimized for Big Data Analytics
• Multiple copies of same file in
improve reading
• Locally-redundant
(multiple copies of data in one Azure
region)
• Parallel reading and writing
• Configurable throughput
• No Limitation in file size or storage
Secure Data in Azure Data Lake Store
• Authentication
• Azure Active Directory
• All AAD features
• End-user authentication or Service-to-service authentication
• Access Control
• POSIX-style permissions
• Read, Write, Execute
• ACLs can be enabled on the root folder, on subfolders, and on individual files.
• Encryption
• Encryption at rest
• Encryption at transit -HTTPS
How to ingest data to Azure Data Lake Store
• Small Data Sets
• Azure Portal
• Azure Power Shell
• Azure – Cross Platform CLI 2.0
• Data Lake Tools For Visual Studio
• Streamed data
• Azure Stream Analytics
• Azure HDInsight Storm
• Data Lake Store .NET SDK
• Relational data
• Apache Sqoop
• Azure Data Factory
• Large Data Set
• Azure Power Shell
• Azure – Cross Platform CLI 2.0
• Azure Data Lake Store .NET SDK
• Azure Data Factory
• Really Large Data Sets
• Azure ExpressRoute
• Azure Import/Export service
How it different from Azure Blob Storage
Azure Data Lake Store Azure Blob Storage
Purpose
Optimized storage for big data analytics
workloads
General purpose
Use Case
Batch, interactive, streaming analytics and
machine learning data such as log files, IoT
data, click streams, large datasets
Any type of text or binary data, such
as application back end, backup data,
media storage for streaming and
general purpose data
Key Concepts
Contains folders, which in turn contains data
stored as files
Contains containers, which in turn has
data in the form of blobs
Size limits
No limits on account sizes, file sizes or number
of files
500 TiB
Geo-redundancy
Locally-redundant (multiple copies of data in
one Azure region)
Locally redundant (LRS), globally
redundant (GRS), read-access globally
redundant (RA-GRS).
Azure Data Lake Analytics
• Massive processing power
• Adjustable parallelism
• No server, VM, Cluster to
maintain.
• Pay for the Job
• Use existing .Net, R and
Python libraries.
• New language : U-SQL
C#SQL
U-SQL
• Combination of Declarative Logic of SQL and Procedure
logic of C#
• Case sensitive
• “Schema on Read”
U-SQL
@ExtraRuns =
SELECT IPLYear, Bowler,
SUM( string.IsNullOrWhiteSpace(ExtraRuns)? 0:
Convert.ToInt32(ExtraRuns)
) AS ExtraRuns,
ExtraType
FROM @MatchData
GROUP BY IPLYear,Bowler,ExtraType;
How do we go there? Azure Data Factory
Your feedbacks are essential to me ..!
Demo/ Q&A
Pricing
• Pay-as-you-go
• For a 1 TB storage, for a month = $39.94
• Monthly commitment packages
• For a 1 TB storage, for a month = $35
• Usage base:
https://azure.microsoft.com/en-us/pricing/details/data-lake-store/
Usage Price
Write operations (per 10,000) $0.05
Read operations (per 10,000) $0.004
Delete operations Free
Transaction size limit No limit

More Related Content

PPTX
Azure Big Data Story
PPTX
Database Choices
PPTX
Big data in Azure
PPTX
10 Things About Spark
PDF
Cortana Analytics Workshop: Azure Data Lake
PDF
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
PPTX
Ignite Your Big Data With a Spark!
PPTX
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Azure Big Data Story
Database Choices
Big data in Azure
10 Things About Spark
Cortana Analytics Workshop: Azure Data Lake
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Ignite Your Big Data With a Spark!
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...

What's hot (18)

PDF
DBP-010_Using Azure Data Services for Modern Data Applications
PPTX
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PPTX
Bleeding Edge Databases
PDF
HDInsight Informative articles
PPTX
Azure document db/Cosmos DB
PPTX
Architecting a datalake
PPTX
Configuration in azure done right
PPTX
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
PDF
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
PDF
Presto: Fast SQL on Everything
PPTX
R in Power BI
PPTX
Cloud native data platform
PDF
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
PPTX
Azure CosmosDB
PPTX
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
PPTX
Cosmosdb graph
PDF
Azure SQL Data Warehouse
PPTX
Apache Arrow: In Theory, In Practice
DBP-010_Using Azure Data Services for Modern Data Applications
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Bleeding Edge Databases
HDInsight Informative articles
Azure document db/Cosmos DB
Architecting a datalake
Configuration in azure done right
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Presto: Fast SQL on Everything
R in Power BI
Cloud native data platform
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Azure CosmosDB
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cosmosdb graph
Azure SQL Data Warehouse
Apache Arrow: In Theory, In Practice
Ad

Similar to Move your on prem data to a lake in a Lake in Cloud (20)

PPTX
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
PPTX
CC -Unit4.pptx
PDF
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
PPTX
An intro to Azure Data Lake
PPTX
Survey of the Microsoft Azure Data Landscape
PPTX
Azure data platform overview
PPTX
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
PPTX
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
PDF
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
PDF
Prague data management meetup 2018-03-27
PDF
Azure Data Platform Overview.pdf
PDF
20160331 sa introduction to big data pipelining berlin meetup 0.3
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
PPTX
Accesso ai dati con Azure Data Platform
PPTX
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
PPTX
Colorado Springs Open Source Hadoop/MySQL
PPSX
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
CC -Unit4.pptx
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
An intro to Azure Data Lake
Survey of the Microsoft Azure Data Landscape
Azure data platform overview
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
5 Comparing Microsoft Big Data Technologies for Analytics
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Prague data management meetup 2018-03-27
Azure Data Platform Overview.pdf
20160331 sa introduction to big data pipelining berlin meetup 0.3
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accesso ai dati con Azure Data Platform
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Colorado Springs Open Source Hadoop/MySQL
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Ad

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
1_Introduction to advance data techniques.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Database Infoormation System (DBIS).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Lecture1 pattern recognition............
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Mega Projects Data Mega Projects Data
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Clinical guidelines as a resource for EBP(1).pdf
1_Introduction to advance data techniques.pptx
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Foundation of Data Science unit number two notes
Introduction to Knowledge Engineering Part 1
Moving the Public Sector (Government) to a Digital Adoption
Database Infoormation System (DBIS).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Acumen Training GuidePresentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Lecture1 pattern recognition............
climate analysis of Dhaka ,Banglades.pptx
Mega Projects Data Mega Projects Data
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

Move your on prem data to a lake in a Lake in Cloud

  • 1. Asanka Padmakumara Business Intelligence Consultant, • Blog: asankap.wordpress.com • Linked In: linkedin.com/in/asankapadmakumara • Twitter: @asanka_e • Facebook: facebook.com/asankapk
  • 2. Move Your On- Prem Data to a Lake in the Clouds
  • 3. Agenda • Where are we right now? • Why we need to go for Data Lake? • What is Azure Data Lake? • How do we get there? • Demo • Q & A
  • 4. Where are we right now?
  • 5. What are the challenges? • Limited storage • Limited processing power • High hardware cost • High maintains cost • No disaster recovery • Availability and reliability issues • Scalability issues • Security • Solution: Azure Data Lake
  • 6. What is Azure Data Lake? • Highly scalable data storage and analytics service • Intended for big data storage and analysis • A faster and efficient solution than on-prem data centers • Three services: Analytics Storage HDInsight (“managed clusters”) Azure Data Lake Analytics Azure Data Lake Storage
  • 7. Azure Data Lake Architecture
  • 8. Azure Data Lake Store • Built for Hadoop • Compatible with most components in Hadoop Eco- systems • Web HDFS API • Unlimited storage, petabyte files • Performance-tuned for big data analytics • High throughput, IOPs • Multiple parts of a file in multiple servers: Parallel reading • Enterprise-ready: Highly-available and secure • All Data, One Place • Any Data in native format • No schema, No prior processing
  • 9. Optimized for Big Data Analytics • Multiple copies of same file in improve reading • Locally-redundant (multiple copies of data in one Azure region) • Parallel reading and writing • Configurable throughput • No Limitation in file size or storage
  • 10. Secure Data in Azure Data Lake Store • Authentication • Azure Active Directory • All AAD features • End-user authentication or Service-to-service authentication • Access Control • POSIX-style permissions • Read, Write, Execute • ACLs can be enabled on the root folder, on subfolders, and on individual files. • Encryption • Encryption at rest • Encryption at transit -HTTPS
  • 11. How to ingest data to Azure Data Lake Store • Small Data Sets • Azure Portal • Azure Power Shell • Azure – Cross Platform CLI 2.0 • Data Lake Tools For Visual Studio • Streamed data • Azure Stream Analytics • Azure HDInsight Storm • Data Lake Store .NET SDK • Relational data • Apache Sqoop • Azure Data Factory • Large Data Set • Azure Power Shell • Azure – Cross Platform CLI 2.0 • Azure Data Lake Store .NET SDK • Azure Data Factory • Really Large Data Sets • Azure ExpressRoute • Azure Import/Export service
  • 12. How it different from Azure Blob Storage Azure Data Lake Store Azure Blob Storage Purpose Optimized storage for big data analytics workloads General purpose Use Case Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data Key Concepts Contains folders, which in turn contains data stored as files Contains containers, which in turn has data in the form of blobs Size limits No limits on account sizes, file sizes or number of files 500 TiB Geo-redundancy Locally-redundant (multiple copies of data in one Azure region) Locally redundant (LRS), globally redundant (GRS), read-access globally redundant (RA-GRS).
  • 13. Azure Data Lake Analytics • Massive processing power • Adjustable parallelism • No server, VM, Cluster to maintain. • Pay for the Job • Use existing .Net, R and Python libraries. • New language : U-SQL
  • 14. C#SQL U-SQL • Combination of Declarative Logic of SQL and Procedure logic of C# • Case sensitive • “Schema on Read” U-SQL @ExtraRuns = SELECT IPLYear, Bowler, SUM( string.IsNullOrWhiteSpace(ExtraRuns)? 0: Convert.ToInt32(ExtraRuns) ) AS ExtraRuns, ExtraType FROM @MatchData GROUP BY IPLYear,Bowler,ExtraType;
  • 15. How do we go there? Azure Data Factory
  • 16. Your feedbacks are essential to me ..!
  • 18. Pricing • Pay-as-you-go • For a 1 TB storage, for a month = $39.94 • Monthly commitment packages • For a 1 TB storage, for a month = $35 • Usage base: https://azure.microsoft.com/en-us/pricing/details/data-lake-store/ Usage Price Write operations (per 10,000) $0.05 Read operations (per 10,000) $0.004 Delete operations Free Transaction size limit No limit

Editor's Notes

  • #5: On prem Lots of data Limited space Maintain the servers Lot of processing power
  • #6: Grow hardware on demand Upgrade instantly Availability and readability : Multiple copies of data, Down time for maintains, hardware familiar causes business issues Increase decrease hardware on demand Ability to fail fast, if fail , no need of hardware Ability move into latest technologies Scalability: take time to scale
  • #9: Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS) Works with application which support Web HDFS 3 copy in a single IOPS: input output operations per seconds
  • #10: Automatically optimized for any throughput
  • #14: 250 AU max: 1 AU= 2 core cpu, 6 GB ram Pay As you go: Price: 1 Au for 1 Hour 2$ Monthly : 100 Au , 100$
  • #15: Declarative logic Procedure logic sql to query, C# to customize Case sensitive C# data type C# comparison Some commonly used SQL keywords, including WHILE, UPDATE, and MERGE are not supported in U-SQL
  • #16: A cloud integration service Workflow called Pipelines Activities in pipeline Integration Run time: self hosted Activities : Copy data, run ssis packages, execute SPs, Execute U-SQL queries Price: no of activity runs and data moment hours Or SSIS runtime based on VM and time