Azure Data Lake and U-SQL

SeaScale Meetup
Jan 2016
Azure Data Lake &
U-SQL
Michael Rys, @MikeDoesBigData
http://www.azure.com/datalake
{mrys, usql}@microsoft.com

Analytics
Storage
HDInsight
(“managed clusters”)
Azure Data Lake Analytics
Azure Data Lake Storage
Azure Data Lake

ADLA complements HDInsight
Target the same scenarios, tools, and customers
HDInsight
For developers familiar with the
Open Source: Java, Eclipse, Hive, etc.
Clusters offer customization, control,
and flexibility in a managed Hadoop
cluster
ADLA
Enables customers to leverage
existing experience with C#, SQL &
PowerShell
Offers convenience, efficiency,
automatic scale, and management in
a “job service” form factor

WebHDFS
YARN
U-SQL
Analytics Service HDInsight
(managed Hadoop Clusters)
Analytics
Store
Azure Data Lake

Azure Data Lake
Analytics Service

Enterprise-
grade
Limitless scaleProductivity
from day one
Easy and
powerful data
preparation
All data
6
0100101001000101010100101001000
10101010010100100010101010010100
10001010101001010010001010101001
0100100010101010010100100010101
0100101001000101010100101001000
10101010010100100010101010010100
10001010101001010010001010101001
0100100010101010010100100010101
0100101001000101010100101001000
10101010010100100010101010010100
Azure Data Lake Analytics

Azure
Data Lake
Analytics Service
A new distributed
analytics service
Built on Apache YARN
Scales dynamically with the turn of a dial
Pay by the query
Supports Azure AD for access control,
roles, and integration with on-prem
identity systems
Built with U-SQL to unify the benefits of
SQL with the power of C#
Processes data across Azure
7

Work across all cloud data
Azure Data Lake
Analytics
Azure SQL DW Azure SQL DB
Azure
Storage Blobs
Azure
Data Lake Store
SQL DB in an
Azure VM

 hard to work with anything other than
structured data
 difficult to extend with custom code

 User often has to
care about scale and performance
 SQL is 2nd class within string
 Often no code reuse/
sharing across queries

Get benefits of both!
Makes it easy for you by unifying:
• Unstructured and structured data processing
• Declarative SQL and custom imperative Code
• Local and remote Queries
• Increase productivity and agility from Day 1 and
at Day 100 for YOU!

Extend U-SQL with C#/.NET
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs)

U-SQL Language Philosophy
Declarative Query and Transformation Language:
• Uses SQL’s SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
• Optimizable, Scalable
Expression-flow programming style:
• Easy to use functional lambda composition
• Composable, globally optimizable
Operates on Unstructured & Structured Data
• Schema on read over files
• Relational metadata objects (e.g. database, table)
Extensible from ground up:
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
Federated query across distributed data sources
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;

Intro Blog entry: http://aka.ms/usql-intro
Blog entry on UDFs: http://aka.ms/usql-udf
U-SQL Reference Doc (beta): http://aka.ms/usql_reference
U-SQL Community & Team site: http://usql.io/
Videos: https://channel9.msdn.com/Series/AzureDataLake

Microsoft Confidential Material - covered under NDA
Additional Resources • Blogs and community page:
• http://usql.io
• https://blogs.msdn.microsoft.com/azuredatalake/
• http://blogs.msdn.com/b/visualstudio/
• http://azure.microsoft.com/en-us/blog/topics/big-
data/
• https://channel9.msdn.com/Search?term=U-
SQL#ch9Search
• Documentation:
• http://aka.ms/usql_reference
• https://azure.microsoft.com/en-
us/documentation/services/data-lake-analytics/
• ADL forums and feedback
• http://aka.ms/adlfeedback
• https://social.msdn.microsoft.com/Forums/azure/en-
US/home?forum=AzureDataLake
• http://stackoverflow.com/questions/tagged/u-sql

Unifies natively SQL’s declarativity and C#’s extensibility
Unifies querying structured and unstructured
Unifies local and remote queries
Increase productivity and agility from Day 1 forward for
YOU!
Sign up for an Azure Data Lake account and join the Public Preview
http://www.azure.com/datalake and give us your feedback via
http://aka.ms/adlfeedback or at http://aka.ms/u-sql-survey!

Azure Data Lake and U-SQL

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Azure Data Lake and U-SQL (20)

More from Michael Rys (12)

Recently uploaded (20)

Azure Data Lake and U-SQL

Editor's Notes