SlideShare a Scribd company logo
4
Most read
7
Most read
21
Most read
An Introduction to
Azure Data Factory
v1
Everything you need to know to start developing today
Who are these people?!
• Parinaz Kallick – Business Intelligence Consultant
 Working with BI for 10 years (Origins in databases and reporting &
 MSBI Stack)
 B.S in Computer Science
 MBA-IT
• Eric Bragas – Business Intelligence Consultant, MCP
 Working with Microsoft BI for 5+ years
 Azure and Power BI for 3+ years
 California native, based in San Francisco
 Eastern cuisine aficionado
Agenda
What is Data Factory?
How does it work?
Core Components
How to Develop
• Demo
Monitoring & Management
Use Cases
Challenges Best Practices
What is Azure Data Factory (ADF)?
• "[Azure Data Factory] is a cloud-based data integration service that
allows you to create data-driven workflows in the cloud that
orchestrate and automate data movement and data transformation.“
• In short - it's Azure's PaaS service for time series data integration
How Does it Work?
• Leverages cloud resources to Extract, Load, and Transform your data
 Storage - Azure Blob Storage, HDInsight, Azure SQL DW, etc.
 Compute - Hive Query, Azure SQL DW, etc.
• ELT over ETL
• Time-series paradigm, ie. web logs, social sentiment, sensor data
v1 Supported Services
Components
• Pipeline - the unit of orchestration, and container for activities
• Activity - a data movement or transformation component
 ie. Copy, HiveQuery, StoredProcedure, etc.
• Linked Service - connection manager
 i.e. Azure Blob Storage, Azure SQL DW, etc.
• Data Set - a data structure within a linked service
 i.e. a table or storage container, etc.
Components
Intro to Azure Data Factory v1
Why is Data Factory Different than
Other Integration Tools (*cough* *cough* SSIS)
• Extract, Load, then Transform
 Leverage scale out compute resources to do you transforms instead of a
VM running your integration service which is bound by resource limits
• PaaS - pay-as-you-go
 Don't need a server constantly running and accruing charges
• Scheduling is time-series based and implicitly defined
 Major paradigm shift; kind of complex initially
• Built in task scheduler
• Works with structured and unstructured data
• Destinations are called "sinks"?
Scheduling in ADFv1
Scheduling in ADFv1
Developing Data Factories
Azure Portal
• Non-Microsoft clients
• Exploration
Visual Studio
• Mature development
environments
• Multiple
environments
• Team development –
easier collaboration
PowerShell
• Monitoring and
Management
• Quick setup and tear
down
Demo Architecture
Blog Storage -
Daily Sales
Files
Azure SQL -
Sales DataMart
Data Factory
Staging
Table
Summary
Table
Demo!
• Tools and extensions:
 Microsoft Azure Data Factory Tools for Visual Studio 2015
 Cloud Explorer for Visual Studio 2015
• Spin up an Azure Data Factory
 Azure Storage with files and empty Azure SQL DB should be ready to go
• Copy Azure Blob Storage to Azure SQL Database
 Use SQL write cleanup script
How do we Monitor our New
Pipeline?
• Azure Portal > Data Factory > Monitor & Manage
• PowerShell
Use Cases
• Time-series, ie. web logs, social sentiment, etc.
• Hybrid integrations
• Advanced Analytics workflows
• Cloud migration
When ADF is NOT the Best Option
• Required data sources are not supported
• Loading Azure Data Warehouse
 Polybase is more performant
• Extracting from a non-time series source
• Anytime before v2 is Generally Available!
Challenges and Best Practices
Challenges
• The scheduling component can be very challenging to work with
• The lack of expressions and variables within a control flow is a big
gap
Best Practices
• Use consistent naming conventions
• Always publish pipelines with isPaused: True
• Test thoroughly before promoting to production
Azure Data Factory v2
High-level
 ADFv1 – is a service designed for the batch data processing of time series data
 ADFv2 – is a general purpose, hybrid data integration service with very flexible execution
patterns
New Features:
• Integration Runtime (publish SSIS
packages)
• Branching logic (On success, On failure, On
Completion, On skip)
• Web Development UI
• Expressions and Parameters
• System Variables
• Event and Scheduled Triggers
• Additional Activity Types
• Way more data sources! Eg. BigQuery,
Dynamics 365, and way more
Fin.

More Related Content

PDF
Introduction to Azure Data Factory
PDF
Azure Data Factory Introduction.pdf
PPTX
Azure data factory
PDF
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
PDF
Azure Data Factory v2
PPTX
Snowflake Data Loading.pptx
PPTX
Azure datafactory
PDF
Azure Data Factory presentation with links
Introduction to Azure Data Factory
Azure Data Factory Introduction.pdf
Azure data factory
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Azure Data Factory v2
Snowflake Data Loading.pptx
Azure datafactory
Azure Data Factory presentation with links

What's hot (20)

PDF
Azure Data Factory V2; The Data Flows
PPTX
Azure Data Factory
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PPTX
1- Introduction of Azure data factory.pptx
PPTX
Azure Data Factory Data Flow
PPTX
Azure Data Factory for Azure Data Week
PPTX
Azure data platform overview
PDF
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
PPTX
Azure data factory
PPTX
Azure Synapse Analytics Overview (r1)
PPTX
Introduction to Data Engineering
PPTX
SQL to Azure Migrations
PPTX
Building Modern Data Platform with Microsoft Azure
PDF
Azure Purview Data Toboggan Erwin de Kreuk
PPTX
ADF Demo_ppt.pptx
PPTX
Azure data bricks by Eugene Polonichko
PDF
Databricks: A Tool That Empowers You To Do More With Data
PPTX
Microsoft Azure Data Factory Hands-On Lab Overview Slides
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
PDF
Azure Synapse Analytics
Azure Data Factory V2; The Data Flows
Azure Data Factory
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
1- Introduction of Azure data factory.pptx
Azure Data Factory Data Flow
Azure Data Factory for Azure Data Week
Azure data platform overview
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure data factory
Azure Synapse Analytics Overview (r1)
Introduction to Data Engineering
SQL to Azure Migrations
Building Modern Data Platform with Microsoft Azure
Azure Purview Data Toboggan Erwin de Kreuk
ADF Demo_ppt.pptx
Azure data bricks by Eugene Polonichko
Databricks: A Tool That Empowers You To Do More With Data
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Azure Synapse Analytics
Ad

Similar to Intro to Azure Data Factory v1 (20)

PPTX
Geek Sync | Deployment and Management of Complex Azure Environments
PPTX
Migrate a successful transactional database to azure
PPTX
Understanding System Design and Architecture Blueprints of Efficiency
PPTX
Monitorando performance no Azure SQL Database
PDF
SQL Server 2019 hotlap - WARDY IT Solutions
PPTX
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
PDF
Designing a modern data warehouse in azure
PDF
Designing a modern data warehouse in azure
PPTX
Microsoft Azure BI Solutions in the Cloud
PPTX
Tech-Spark: Azure SQL Databases
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
PPTX
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
PDF
Azure Analysis Services (Azure Bootcamp 2018)
PPTX
AZURE Data Related Services
PPTX
Azure Data Factory ETL Patterns in the Cloud
PPTX
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
PPTX
Building cloud native data microservice
PPTX
SQL Azure - the good, the bad and the ugly.
PPTX
Afternoons with Azure - Azure Data Services
 
PPTX
A lap around Azure Data Factory
Geek Sync | Deployment and Management of Complex Azure Environments
Migrate a successful transactional database to azure
Understanding System Design and Architecture Blueprints of Efficiency
Monitorando performance no Azure SQL Database
SQL Server 2019 hotlap - WARDY IT Solutions
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Microsoft Azure BI Solutions in the Cloud
Tech-Spark: Azure SQL Databases
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Azure Analysis Services (Azure Bootcamp 2018)
AZURE Data Related Services
Azure Data Factory ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Building cloud native data microservice
SQL Azure - the good, the bad and the ugly.
Afternoons with Azure - Azure Data Services
 
A lap around Azure Data Factory
Ad

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Database Infoormation System (DBIS).pptx
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Acumen Training GuidePresentation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Business Data Analytics.
Business Ppt On Nestle.pptx huunnnhhgfvu
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Foundation of Data Science unit number two notes
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Moving the Public Sector (Government) to a Digital Adoption
IB Computer Science - Internal Assessment.pptx
Reliability_Chapter_ presentation 1221.5784
Launch Your Data Science Career in Kochi – 2025
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
.pdf is not working space design for the following data for the following dat...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

Intro to Azure Data Factory v1

  • 1. An Introduction to Azure Data Factory v1 Everything you need to know to start developing today
  • 2. Who are these people?! • Parinaz Kallick – Business Intelligence Consultant  Working with BI for 10 years (Origins in databases and reporting &  MSBI Stack)  B.S in Computer Science  MBA-IT • Eric Bragas – Business Intelligence Consultant, MCP  Working with Microsoft BI for 5+ years  Azure and Power BI for 3+ years  California native, based in San Francisco  Eastern cuisine aficionado
  • 3. Agenda What is Data Factory? How does it work? Core Components How to Develop • Demo Monitoring & Management Use Cases Challenges Best Practices
  • 4. What is Azure Data Factory (ADF)? • "[Azure Data Factory] is a cloud-based data integration service that allows you to create data-driven workflows in the cloud that orchestrate and automate data movement and data transformation.“ • In short - it's Azure's PaaS service for time series data integration
  • 5. How Does it Work? • Leverages cloud resources to Extract, Load, and Transform your data  Storage - Azure Blob Storage, HDInsight, Azure SQL DW, etc.  Compute - Hive Query, Azure SQL DW, etc. • ELT over ETL • Time-series paradigm, ie. web logs, social sentiment, sensor data
  • 7. Components • Pipeline - the unit of orchestration, and container for activities • Activity - a data movement or transformation component  ie. Copy, HiveQuery, StoredProcedure, etc. • Linked Service - connection manager  i.e. Azure Blob Storage, Azure SQL DW, etc. • Data Set - a data structure within a linked service  i.e. a table or storage container, etc.
  • 10. Why is Data Factory Different than Other Integration Tools (*cough* *cough* SSIS) • Extract, Load, then Transform  Leverage scale out compute resources to do you transforms instead of a VM running your integration service which is bound by resource limits • PaaS - pay-as-you-go  Don't need a server constantly running and accruing charges • Scheduling is time-series based and implicitly defined  Major paradigm shift; kind of complex initially • Built in task scheduler • Works with structured and unstructured data • Destinations are called "sinks"?
  • 13. Developing Data Factories Azure Portal • Non-Microsoft clients • Exploration Visual Studio • Mature development environments • Multiple environments • Team development – easier collaboration PowerShell • Monitoring and Management • Quick setup and tear down
  • 14. Demo Architecture Blog Storage - Daily Sales Files Azure SQL - Sales DataMart Data Factory Staging Table Summary Table
  • 15. Demo! • Tools and extensions:  Microsoft Azure Data Factory Tools for Visual Studio 2015  Cloud Explorer for Visual Studio 2015 • Spin up an Azure Data Factory  Azure Storage with files and empty Azure SQL DB should be ready to go • Copy Azure Blob Storage to Azure SQL Database  Use SQL write cleanup script
  • 16. How do we Monitor our New Pipeline? • Azure Portal > Data Factory > Monitor & Manage • PowerShell
  • 17. Use Cases • Time-series, ie. web logs, social sentiment, etc. • Hybrid integrations • Advanced Analytics workflows • Cloud migration
  • 18. When ADF is NOT the Best Option • Required data sources are not supported • Loading Azure Data Warehouse  Polybase is more performant • Extracting from a non-time series source • Anytime before v2 is Generally Available!
  • 19. Challenges and Best Practices Challenges • The scheduling component can be very challenging to work with • The lack of expressions and variables within a control flow is a big gap Best Practices • Use consistent naming conventions • Always publish pipelines with isPaused: True • Test thoroughly before promoting to production
  • 20. Azure Data Factory v2 High-level  ADFv1 – is a service designed for the batch data processing of time series data  ADFv2 – is a general purpose, hybrid data integration service with very flexible execution patterns New Features: • Integration Runtime (publish SSIS packages) • Branching logic (On success, On failure, On Completion, On skip) • Web Development UI • Expressions and Parameters • System Variables • Event and Scheduled Triggers • Additional Activity Types • Way more data sources! Eg. BigQuery, Dynamics 365, and way more
  • 21. Fin.

Editor's Notes

  • #7: All supported services in v1: https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-create-datasets Supported Services in v2: https://docs.microsoft.com/en-us/azure/data-factory/concepts-datasets-linked-services
  • #12: https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution
  • #16: ADF Tools: https://marketplace.visualstudio.com/items?itemName=AzureDataFactory.MicrosoftAzureDataFactoryToolsforVisualStudio2015 Cloud Explorer: https://marketplace.visualstudio.com/items?itemName=MicrosoftCloudExplorer.CloudExplorerforVisualStudio2015
  • #21: https://docs.microsoft.com/en-us/azure/data-factory/compare-versions https://www.purplefrogsystems.com/paul/2017/09/whats-new-in-azure-data-factory-version-2-adfv2/