SlideShare a Scribd company logo
SeaScale Meetup
Jan 2016
Azure Data Lake &
U-SQL
Michael Rys, @MikeDoesBigData
http://www.azure.com/datalake
{mrys, usql}@microsoft.com
Analytics
Storage
HDInsight
(“managed clusters”)
Azure Data Lake Analytics
Azure Data Lake Storage
Azure Data Lake
ADLA complements HDInsight
Target the same scenarios, tools, and customers
HDInsight
For developers familiar with the
Open Source: Java, Eclipse, Hive, etc.
Clusters offer customization, control,
and flexibility in a managed Hadoop
cluster
ADLA
Enables customers to leverage
existing experience with C#, SQL &
PowerShell
Offers convenience, efficiency,
automatic scale, and management in
a “job service” form factor
WebHDFS
YARN
U-SQL
Analytics Service HDInsight
(managed Hadoop Clusters)
Analytics
Store
Azure Data Lake
Azure Data Lake
Analytics Service
Enterprise-
grade
Limitless scaleProductivity
from day one
Easy and
powerful data
preparation
All data
6
0100101001000101010100101001000
10101010010100100010101010010100
10001010101001010010001010101001
0100100010101010010100100010101
0100101001000101010100101001000
10101010010100100010101010010100
10001010101001010010001010101001
0100100010101010010100100010101
0100101001000101010100101001000
10101010010100100010101010010100
Azure Data Lake Analytics
Azure
Data Lake
Analytics Service
A new distributed
analytics service
Built on Apache YARN
Scales dynamically with the turn of a dial
Pay by the query
Supports Azure AD for access control,
roles, and integration with on-prem
identity systems
Built with U-SQL to unify the benefits of
SQL with the power of C#
Processes data across Azure
7
Work across all cloud data
Azure Data Lake
Analytics
Azure SQL DW Azure SQL DB
Azure
Storage Blobs
Azure
Data Lake Store
SQL DB in an
Azure VM
Azure Data Lake and U-SQL
Azure Data Lake
U-SQL
Azure Data Lake and U-SQL
•
•
•
 hard to work with anything other than
structured data
 difficult to extend with custom code
 User often has to
care about scale and performance
 SQL is 2nd class within string
 Often no code reuse/
sharing across queries
Get benefits of both!
Makes it easy for you by unifying:
• Unstructured and structured data processing
• Declarative SQL and custom imperative Code
• Local and remote Queries
• Increase productivity and agility from Day 1 and
at Day 100 for YOU!
Azure Data Lake and U-SQL
Extend U-SQL with C#/.NET
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs)
Azure Data Lake and U-SQL
U-SQL Language Philosophy
Declarative Query and Transformation Language:
• Uses SQL’s SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
• Optimizable, Scalable
Expression-flow programming style:
• Easy to use functional lambda composition
• Composable, globally optimizable
Operates on Unstructured & Structured Data
• Schema on read over files
• Relational metadata objects (e.g. database, table)
Extensible from ground up:
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
Federated query across distributed data sources
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
Intro Blog entry: http://aka.ms/usql-intro
Blog entry on UDFs: http://aka.ms/usql-udf
U-SQL Reference Doc (beta): http://aka.ms/usql_reference
U-SQL Community & Team site: http://usql.io/
Videos: https://channel9.msdn.com/Series/AzureDataLake
Microsoft Confidential Material - covered under NDA
Additional Resources • Blogs and community page:
• http://usql.io
• https://blogs.msdn.microsoft.com/azuredatalake/
• http://blogs.msdn.com/b/visualstudio/
• http://azure.microsoft.com/en-us/blog/topics/big-
data/
• https://channel9.msdn.com/Search?term=U-
SQL#ch9Search
• Documentation:
• http://aka.ms/usql_reference
• https://azure.microsoft.com/en-
us/documentation/services/data-lake-analytics/
• ADL forums and feedback
• http://aka.ms/adlfeedback
• https://social.msdn.microsoft.com/Forums/azure/en-
US/home?forum=AzureDataLake
• http://stackoverflow.com/questions/tagged/u-sql
Unifies natively SQL’s declarativity and C#’s extensibility
Unifies querying structured and unstructured
Unifies local and remote queries
Increase productivity and agility from Day 1 forward for
YOU!
Sign up for an Azure Data Lake account and join the Public Preview
http://www.azure.com/datalake and give us your feedback via
http://aka.ms/adlfeedback or at http://aka.ms/u-sql-survey!

More Related Content

PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
PPTX
Azure Data Lake Intro (SQLBits 2016)
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
PPTX
Azure data lake sql konf 2016
PPTX
Azure Data Lake Analytics Deep Dive
PPTX
An intro to Azure Data Lake
PPTX
Microsoft's Hadoop Story
PPTX
Microsoft Azure Databricks
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
Azure Data Lake Intro (SQLBits 2016)
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Azure data lake sql konf 2016
Azure Data Lake Analytics Deep Dive
An intro to Azure Data Lake
Microsoft's Hadoop Story
Microsoft Azure Databricks

What's hot (20)

PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
PDF
Cortana Analytics Workshop: Azure Data Lake
PPTX
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
PDF
Dipping Your Toes: Azure Data Lake for DBAs
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
PPTX
SQLBits X Scaling out with SQL Azure Federations
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
PPTX
Azure data factory
PDF
Introduction to Azure Data Lake
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
PPTX
Azure data bricks by Eugene Polonichko
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
PDF
Spark as a Service with Azure Databricks
PDF
201905 Azure Databricks for Machine Learning
PPTX
Deep Dive into Azure Data Factory v2
PPTX
Analyzing StackExchange data with Azure Data Lake
PDF
Azure Data Factory V2; The Data Flows
PDF
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Cortana Analytics Workshop: Azure Data Lake
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Dipping Your Toes: Azure Data Lake for DBAs
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
SQLBits X Scaling out with SQL Azure Federations
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Azure data factory
Introduction to Azure Data Lake
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Azure data bricks by Eugene Polonichko
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Spark as a Service with Azure Databricks
201905 Azure Databricks for Machine Learning
Deep Dive into Azure Data Factory v2
Analyzing StackExchange data with Azure Data Lake
Azure Data Factory V2; The Data Flows
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Ad

Viewers also liked (19)

PPTX
U-SQL - Azure Data Lake Analytics for Developers
PPTX
U-SQL Query Execution and Performance Tuning
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
PPTX
Using C# with U-SQL (SQLBits 2016)
PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
PPTX
Introducing U-SQL (SQLPASS 2016)
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
PPTX
ADL/U-SQL Introduction (SQLBits 2016)
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
PPTX
U-SQL Learning Resources (SQLBits 2016)
PPTX
U-SQL Intro (SQLBits 2016)
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPTX
U-SQL Does SQL (SQLBits 2016)
PPTX
Tokyo azure meetup #2 big data made easy
DOCX
Inevitability of Multi-Tenancy & SAAS in Product Engineering
PDF
Open stack design 2012 applications targeting openstack-final
PDF
OpenStack Preso: DevOps on Hybrid Infrastructure
PPTX
Data Migration and Data-Tier Applications with SQL Azure
U-SQL - Azure Data Lake Analytics for Developers
U-SQL Query Execution and Performance Tuning
U-SQL Partitioned Data and Tables (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Introducing U-SQL (SQLPASS 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Killer Scenarios with Data Lake in Azure with U-SQL
ADL/U-SQL Introduction (SQLBits 2016)
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
U-SQL Learning Resources (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
U-SQL Does SQL (SQLBits 2016)
Tokyo azure meetup #2 big data made easy
Inevitability of Multi-Tenancy & SAAS in Product Engineering
Open stack design 2012 applications targeting openstack-final
OpenStack Preso: DevOps on Hybrid Infrastructure
Data Migration and Data-Tier Applications with SQL Azure
Ad

Similar to Azure Data Lake and U-SQL (20)

PDF
USQL Trivadis Azure Data Lake Event
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
PPTX
Azure Synapse Analytics Overview (r1)
PPTX
Introduction to Azure Databricks
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
PPTX
Azure Lowlands: An intro to Azure Data Lake
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PPTX
3 CityNetConf - sql+c#=u-sql
PPTX
What’s new in SQL Server 2017
PPTX
Cepta The Future of Data with Power BI
PDF
Prague data management meetup 2018-03-27
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
PPTX
How does Microsoft solve Big Data?
PDF
USQ Landdemos Azure Data Lake
PPTX
Azure Synapse Analytics Overview (r2)
PDF
Introduction to SQL Server Analysis services 2008
PPTX
CC -Unit4.pptx
PDF
QuerySurge Slide Deck for Big Data Testing Webinar
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
PDF
Big data talking stories in Healthcare
USQL Trivadis Azure Data Lake Event
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Azure Synapse Analytics Overview (r1)
Introduction to Azure Databricks
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Lowlands: An intro to Azure Data Lake
Big Data Analytics in the Cloud with Microsoft Azure
3 CityNetConf - sql+c#=u-sql
What’s new in SQL Server 2017
Cepta The Future of Data with Power BI
Prague data management meetup 2018-03-27
Azure Databricks - An Introduction 2019 Roadshow.pptx
How does Microsoft solve Big Data?
USQ Landdemos Azure Data Lake
Azure Synapse Analytics Overview (r2)
Introduction to SQL Server Analysis services 2008
CC -Unit4.pptx
QuerySurge Slide Deck for Big Data Testing Webinar
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Big data talking stories in Healthcare

More from Michael Rys (12)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data Processing with .NET and Spark (SQLBits 2020)
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
U-SQL Meta Data Catalog (SQLBits 2016)

Recently uploaded (20)

PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Foundation of Data Science unit number two notes
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
1_Introduction to advance data techniques.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Global journeys: estimating international migration
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Launch Your Data Science Career in Kochi – 2025
Business Ppt On Nestle.pptx huunnnhhgfvu
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Foundation of Data Science unit number two notes
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
1_Introduction to advance data techniques.pptx
Quality review (1)_presentation of this 21
Clinical guidelines as a resource for EBP(1).pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Global journeys: estimating international migration
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
climate analysis of Dhaka ,Banglades.pptx
Mega Projects Data Mega Projects Data

Azure Data Lake and U-SQL

  • 1. SeaScale Meetup Jan 2016 Azure Data Lake & U-SQL Michael Rys, @MikeDoesBigData http://www.azure.com/datalake {mrys, usql}@microsoft.com
  • 2. Analytics Storage HDInsight (“managed clusters”) Azure Data Lake Analytics Azure Data Lake Storage Azure Data Lake
  • 3. ADLA complements HDInsight Target the same scenarios, tools, and customers HDInsight For developers familiar with the Open Source: Java, Eclipse, Hive, etc. Clusters offer customization, control, and flexibility in a managed Hadoop cluster ADLA Enables customers to leverage existing experience with C#, SQL & PowerShell Offers convenience, efficiency, automatic scale, and management in a “job service” form factor
  • 4. WebHDFS YARN U-SQL Analytics Service HDInsight (managed Hadoop Clusters) Analytics Store Azure Data Lake
  • 6. Enterprise- grade Limitless scaleProductivity from day one Easy and powerful data preparation All data 6 0100101001000101010100101001000 10101010010100100010101010010100 10001010101001010010001010101001 0100100010101010010100100010101 0100101001000101010100101001000 10101010010100100010101010010100 10001010101001010010001010101001 0100100010101010010100100010101 0100101001000101010100101001000 10101010010100100010101010010100 Azure Data Lake Analytics
  • 7. Azure Data Lake Analytics Service A new distributed analytics service Built on Apache YARN Scales dynamically with the turn of a dial Pay by the query Supports Azure AD for access control, roles, and integration with on-prem identity systems Built with U-SQL to unify the benefits of SQL with the power of C# Processes data across Azure 7
  • 8. Work across all cloud data Azure Data Lake Analytics Azure SQL DW Azure SQL DB Azure Storage Blobs Azure Data Lake Store SQL DB in an Azure VM
  • 13.  hard to work with anything other than structured data  difficult to extend with custom code
  • 14.  User often has to care about scale and performance  SQL is 2nd class within string  Often no code reuse/ sharing across queries
  • 15. Get benefits of both! Makes it easy for you by unifying: • Unstructured and structured data processing • Declarative SQL and custom imperative Code • Local and remote Queries • Increase productivity and agility from Day 1 and at Day 100 for YOU!
  • 17. Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined aggregates (UDAGGs) User-defined functions (UDFs) User-defined operators (UDOs)
  • 19. U-SQL Language Philosophy Declarative Query and Transformation Language: • Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions • Optimizable, Scalable Expression-flow programming style: • Easy to use functional lambda composition • Composable, globally optimizable Operates on Unstructured & Structured Data • Schema on read over files • Relational metadata objects (e.g. database, table) Extensible from ground up: • Type system is based on C# • Expression language IS C# • User-defined functions (U-SQL and C#) • User-defined Aggregators (C#) • User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode • EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across distributed data sources REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;
  • 20. Intro Blog entry: http://aka.ms/usql-intro Blog entry on UDFs: http://aka.ms/usql-udf U-SQL Reference Doc (beta): http://aka.ms/usql_reference U-SQL Community & Team site: http://usql.io/ Videos: https://channel9.msdn.com/Series/AzureDataLake
  • 21. Microsoft Confidential Material - covered under NDA Additional Resources • Blogs and community page: • http://usql.io • https://blogs.msdn.microsoft.com/azuredatalake/ • http://blogs.msdn.com/b/visualstudio/ • http://azure.microsoft.com/en-us/blog/topics/big- data/ • https://channel9.msdn.com/Search?term=U- SQL#ch9Search • Documentation: • http://aka.ms/usql_reference • https://azure.microsoft.com/en- us/documentation/services/data-lake-analytics/ • ADL forums and feedback • http://aka.ms/adlfeedback • https://social.msdn.microsoft.com/Forums/azure/en- US/home?forum=AzureDataLake • http://stackoverflow.com/questions/tagged/u-sql
  • 22. Unifies natively SQL’s declarativity and C#’s extensibility Unifies querying structured and unstructured Unifies local and remote queries Increase productivity and agility from Day 1 forward for YOU! Sign up for an Azure Data Lake account and join the Public Preview http://www.azure.com/datalake and give us your feedback via http://aka.ms/adlfeedback or at http://aka.ms/u-sql-survey!

Editor's Notes

  • #7: All data Unstructured, Semi structured, Structured Domain-specific user defined types using C# Queries over Data Lake and Azure Blobs Federated Queries over Operational and DW SQL stores removing the complexity of ETL Productive from day one Effortless scale and performance without need to manually tune/configure Best developer experience throughout development lifecycle for both novices and experts Leverage your existing skills with SQL and .NET Easy and powerful data preparation Easy to use built-in connectors for common data formats Simple and rich extensibility model for adding customer – specific data transformation – both existing and new No limits scale Scales on demand with no change to code Automatically parallelizes SQL and custom code Designed to process petabytes of data Enterprise grade Managing, securing, sharing, and discovery of familiar data and code objects (tables, functions etc.) Role based authorization of Catalogs and storage accounts using AAD security Auditing of catalog objects (databases, tables etc.)
  • #8: A new distributed analytics service Built on Apache YARN Dynamically scales Handles jobs of any scale instantly by simply setting the dial for how much power you need. You only pay for the cost of the query Supports Azure Active Directory for Access Control, Roles, Integration with on-premises identity systems It also includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C# U-SQL’s scalable runtime processes data across multiple Azure data sources
  • #9: ADLA allows you to compute on data anywhere and a join data from multiple cloud sources.
  • #16: Hard to operate on unstructured data: Even Hive requires meta data to be created to operate on unstructured data. Adding Custom Java functions, aggregators and SerDes is involving a lot of steps and often access to server’s head node and differs based on type of operation. Requires many tools and steps. Some examples: Hive UDAgg Code and compile .java into .jar Extend AbstractGenericUDAFResolver class: Does type checking, argument checking and overloading Extend GenericUDAFEvaluator class: implements logic in 8 methods. - Deploy: Deploy jar into class path on server Edit FunctionRegistry.java to register as built-in Update the content of show functions with ant Hive UDF (as of v0.13) Code Load JAR into head node or at URI CREATE FUNCTION USING JAR to register and load jar into classpath for every function (instead of registering jar and just use the functions)
  • #17: Spark supports Custom “inputters and outputters” for defining custom RDDs No UDAGGs Simple integration of UDFs but only for duration of program. No reuse/sharing. Cloud dataflow? Requires has to care about scale and perf Spark UDAgg Is not yet supported ( SPARK-3947) Spark UDF Write inline function def westernState(state: String) = Seq("CA", "OR", "WA", "AK").contains(state) for SQL usage need to register the table customerTable.registerTempTable("customerTable") Register each UDF sqlContext.udf.register("westernState", westernState _) Call it val westernStates = sqlContext.sql("SELECT * FROM customerTable WHERE westernState(state)")
  • #18: Offers Auto-scaling and performance Operates on unstructured data without tables needed Easy to extend declaratively with custom code: consistent model for UDO, UDF and UDAgg. Easy to query remote sources even without external tables U-SQL UDAgg Code and compile .cs file: Implement IAggregate’s 3 methods :Init(), Accumulate(), Terminate() C# takes case of type checking, generics etc. Deploy: Tooling: one click registration in user db of assembly By Hand: Copy file to ADL CREATE ASSEMBLY to register assembly Use via AGG<MyNamespace.MyAggregate<T>>(a) U-SQL UDF Code in C#, register assembly once, call by C# name.
  • #20: Extensions require .NET assemblies to be registered with a database