Azure Data Factory — Aamir P

AAMIR P

Senior Software Engineer at Tiger Analytics

Published Jun 3, 2025

+ Follow

Hello readers!

I’m doing Azure Data Factory as part of my learning journey. Let us learn together in this article.

Feel free to correct me if anything is incorrect.

Azure Data Factory (ADF) is a cloud-based integration service that allows you to create, schedule and orchestrate workflows. You can move and transform data from diverse sources to destinations.

Why use ADF?

It supports both ETL and ELT pipelines.
Connects to a wide variety of data sources like SQL Server, Azure Blob Storage, REST APIs, and many more.
You can build pipelines visually or integrate code using Data Flows.
No infrastructure to maintain; Microsoft handles scaling and uptime.
Allows you to coordinate data workflows — running multiple activities in sequence or parallel.

Architecture

Source → Linked Service → Dataset → Activity → Pipeline → Sink (Destination)

Source/Sink: Where your data is coming from (source) and going to (sink).
Linked Service: A connection string with authentication details.
Dataset: A dataset represents the data structure and location inside a data store. For example, a specific CSV file in Blob Storage or a table in SQL Server.
Activity: Activities are individual tasks within a pipeline, such as Copy Activity, Data Flow Activity, or Web Activity.
Pipeline: A pipeline is a logical grouping of activities that together perform a data task, for example, copying data from a Blob storage to an Azure SQL database.
Trigger: Triggers automate pipeline execution based on a schedule, event, or tumbling window.
Integration Runtime: The compute infrastructure that carries out data movement and transformation. There are three types: a. Azure IR: For cloud-based data movement. b. Self-hosted IR: For accessing on-premises or private network data. c. Azure-SSIS IR: For running SSIS packages.

Features

For complex transformations like joins, aggregations, or filtering, ADF offers Mapping Data Flows. This visual interface lets you create scalable Spark transformations without coding.
You can make your pipelines dynamic by using parameters — values passed at runtime to datasets, linked services, or pipeline activities. This helps reuse pipelines for different files or databases.
ADF has a monitoring dashboard to track pipeline and activity runs. You can set up alerts to get notified on failures or delays.
Integrate ADF with Git (Azure DevOps or GitHub) to manage your pipeline code versions, collaborate with teams, and implement Continuous Integration/Continuous Deployment pipelines.

Let us build a pipeline for practical exposure purposes:-

Step 1: Create an Azure Data Factory Instance

Go to the Azure Portal.
Search for Data Factory and create a new instance.
Choose a resource group and region.
Wait for deployment.

Step 2: Create Linked Services

Open your Data Factory studio.
Under Manage, create linked services to your data sources/destinations.
For example, create a linked service for Azure Blob Storage with your storage account credentials.
Create another linked service for your SQL database.

Step 3: Create Datasets

Define datasets that point to the actual data you want to move or process.
E.g., a dataset for CSV files in Blob Storage.
A dataset for a table in Azure SQL Database.

Step 4: Build a Pipeline with Activities

Go to the Author tab.
Create a new pipeline.
Drag a Copy Activity onto the canvas.
Configure the source dataset (e.g., Blob Storage CSV).
Configure the sink dataset (e.g., Azure SQL Table).

Step 5: Validate and Debug

Validate your pipeline for errors.
Run the pipeline in debug mode to test.

Step 6: Trigger Pipeline

Add a trigger to schedule when this pipeline should run (e.g., daily at midnight).
Publish all changes to save them.

Use Cases

Typically, ADF is used for the following purposes, namely:-

Data Migration: Let’s say, moving the SSIS package to the ADF space.
Data Warehousing: Feeding data warehouses with batch or streaming data.
ETL/ETL Pipelines: Extracting data from multiple sources, transforming it, and loading it into analytical stores.
Real-time Analytics: Triggering pipelines based on events to support near-real-time data processing.
Data Governance: Enforcing policies

Monitoring and Logging

ADF provides a Monitoring tab where you can:

View pipeline runs and activity status
Drill into activity inputs, outputs, and error messages
Set up alerts via Azure Monitor

Using Azure Logic App, you can send an email when a pipeline gets completed. This is done using Web Activity.

That’s it for the day! Thanks for reading. ADF is easy to learn and has a good demand in the market. Learning this is highly useful, being a Data Warehouse Engineer.

Check out this link to know more about me

Let’s get to know each other! https://lnkd.in/gdBxZC5j

Get my books, podcasts, placement preparation, etc. https://linktr.ee/aamirp

Get my Podcasts on Spotify https://lnkd.in/gG7km8G5

Catch me on Medium https://lnkd.in/gi-mAPxH

Follow me on Instagram https://lnkd.in/gkf3KPDQ

Udemy Udemy (Python Course) https://lnkd.in/grkbfz_N

YouTube https://www.youtube.com/@knowledge_engine_from_AamirP

Subscribe to my Channel for more useful content.

Azure Data Factory — Aamir P

AAMIR P

Senior Software Engineer at Tiger Analytics

Why use ADF?

Architecture

Features

Step 1: Create an Azure Data Factory Instance

Step 2: Create Linked Services

Step 3: Create Datasets

Step 4: Build a Pipeline with Activities

Step 5: Validate and Debug

Step 6: Trigger Pipeline

Use Cases

Monitoring and Logging

Dive Into Data with Aamir P

1,632 follower

More articles by this author

Others also viewed

Building Data Pipelines with No-Code ETL Using AWS Glue Studio

Mastering Azure Data Factory: Building Scalable ETL Pipelines

Azure Data Engineering Cheat Sheet

ETL vs ELT: Which Data Pipeline Strategy Is Best for Azure?

What Skills Should Every Data Engineer Have in 2025? 🚀

End-to-End Data Pipeline with Snowflake, Airflow, and dbt

Building a Medallion Architecture with EMR Serverless and Apache Iceberg: An Incremental Data Processing Guide with Hands-On Code

Architecture for Warehouse, Lakehouse, Big Data

Mastering Azure Data Factory: A Comprehensive Guide to Modern Data Integration

AWS Glue

Explore topics

Why use ADF?

Architecture

Features

Step 1: Create an Azure Data Factory Instance

Step 2: Create Linked Services

Step 3: Create Datasets

Step 4: Build a Pipeline with Activities

Step 5: Validate and Debug

Step 6: Trigger Pipeline

Use Cases

Monitoring and Logging

Dive Into Data with Aamir P

1,632 follower

Fabric Data Engineer — Aamir P

May 16, 2025

CPG (Consumer Packed Goods)— Aamir P

Feb 12, 2025

Dataiku — Aamir P

Oct 11, 2024

PySpark — Aamir P

Oct 3, 2024

Data Build Tool(DBT) — Aamir P

Sep 19, 2024

SSIS Data Warehouse Developer — Aamir P

Sep 10, 2024

Talend — Aamir P

Aug 7, 2024

Data Warehousing and BI Analytics — Aamir P

May 7, 2024

TensorFlow - Aamir P

Apr 24, 2024

Data Engineering — Aamir P

Mar 29, 2024

Others also viewed

Building Data Pipelines with No-Code ETL Using AWS Glue Studio

Mastering Azure Data Factory: Building Scalable ETL Pipelines

Azure Data Engineering Cheat Sheet

ETL vs ELT: Which Data Pipeline Strategy Is Best for Azure?

What Skills Should Every Data Engineer Have in 2025? 🚀

End-to-End Data Pipeline with Snowflake, Airflow, and dbt

Building a Medallion Architecture with EMR Serverless and Apache Iceberg: An Incremental Data Processing Guide with Hands-On Code

Architecture for Warehouse, Lakehouse, Big Data

Mastering Azure Data Factory: A Comprehensive Guide to Modern Data Integration

AWS Glue

Explore topics