SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Airflow
for beginners
https://github.com/karpenkovarya/airflow_for_beginners
What is Airflow?
It is a tool to BUILD, SCHEDULE and MONITOR
data pipelines
Set of data processing elements connected in series.
The output of one element is the input of the next one.
I
Create
Questions
table
II
Store data
from Stack
Overflow
III
Write filtered
questions to
S3
IV
Render HTML
template
V
Send me an
email
Building blocks
of Airflow
Operator
(Worker)
Knows how to perform a task
and has the tools to do it.
Example:
Python Operator
Postgres Operator
Bash Operator
Email Operator
DAG
(Protocol /
Instructions)
Describes the
order of tasks and
what to do if task is failing.
Example:
Run Task A, when it is finished, run
Task B. If one of the tasks failed, stop
the whole process and send me a
notification.
Task
(Specific job)
Job that is done by an
Operator.
Example:
- Load data from some API using
Python Operator
- Write data to the database using
MySQL Operator
Hooks
Interfaces to the external
platforms and databases.
Implements common interface
(all hooks look very similar) and
use Connections
Example:
S3 Hook
Slack Hook
HDFS Hook
Connection
Credentials to the external
systems that can be securely
stored in the Airflow.
Example:
Postgres Connection = Connection
string to the Postgres database
AWS Connection = AWS access
keys
Variables
Like environment
variables.
Can store arbitrary
information and be used in
the Tasks
Examples:
Stack Overflow base URL
Gmail Client ID and Secret
XComs
Let’s Tasks exchange
small messages.
Airflow for Beginners
I
Create
Questions
table
II
Store data
from Stack
Overflow
III
Write filtered
questions to
S3
IV
Render HTML
template
V
Send me an
email
Postgres
Connection
Postgres
Connection
Postgres
Connection
S3
Connection
Python Operator
Python Operator
Python Operator
Postgres Hook
S3
Connection
S3
Hook
Postgres Hook S3
HookPostgres
Operator
XCom
XCom
Variables
Variables
Email
Operator
Airflow for Beginners
What have we learned?
- What is Apache Airflow
- What is a data pipeline
- Main Airflow concepts (DAG, Task, Operator, Connection, etc.)
- First pipeline
Thank you!
🌻✨💛
📬 hello@varya.io

More Related Content

PPTX
Airflow presentation
PPTX
Apache airflow
PDF
Apache Airflow
PDF
Apache airflow
PDF
Introducing Apache Airflow and how we are using it
PDF
Airflow presentation
PDF
Airflow Intro-1.pdf
PDF
Apache Airflow
Airflow presentation
Apache airflow
Apache Airflow
Apache airflow
Introducing Apache Airflow and how we are using it
Airflow presentation
Airflow Intro-1.pdf
Apache Airflow

What's hot (20)

PDF
Introduction to Apache Airflow
PPTX
Apache Airflow Introduction
PDF
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
PPTX
Apache Airflow overview
PPTX
Airflow - a data flow engine
PPTX
Airflow 101
PDF
Apache Airflow
PDF
Apache Airflow Architecture
PDF
Airflow introduction
PPTX
Apache Airflow in Production
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
PDF
Building an analytics workflow using Apache Airflow
PDF
Introduction to Apache Airflow - Data Day Seattle 2016
PDF
Orchestrating workflows Apache Airflow on GCP & AWS
PDF
From airflow to google cloud composer
PDF
Building Better Data Pipelines using Apache Airflow
PDF
Airflow tutorials hands_on
PPTX
Running Airflow Workflows as ETL Processes on Hadoop
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
PDF
Cloud Monitoring tool Grafana
Introduction to Apache Airflow
Apache Airflow Introduction
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Apache Airflow overview
Airflow - a data flow engine
Airflow 101
Apache Airflow
Apache Airflow Architecture
Airflow introduction
Apache Airflow in Production
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building an analytics workflow using Apache Airflow
Introduction to Apache Airflow - Data Day Seattle 2016
Orchestrating workflows Apache Airflow on GCP & AWS
From airflow to google cloud composer
Building Better Data Pipelines using Apache Airflow
Airflow tutorials hands_on
Running Airflow Workflows as ETL Processes on Hadoop
How I learned to time travel, or, data pipelining and scheduling with Airflow
Cloud Monitoring tool Grafana
Ad

Similar to Airflow for Beginners (20)

PDF
Building Automated Data Pipelines with Airflow.pdf
PDF
Flyte kubecon 2019 SanDiego
PPTX
Apache AirfowAsaSAsaSAsSas - Session1.pptx
PPSX
Introduce Airflow.ppsx
PDF
Managing transactions on Ethereum with Apache Airflow
PPTX
ISI work
PPTX
Exploring SharePoint with F#
PPSX
Intro to Talend Open Studio for Data Integration
DOCX
Srgoc dotnet
PPTX
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
PPTX
DataPipelineApacheAirflow.pptx
PDF
Chapter_01_Intro to_Airflow.pdf
PPT
LINQ 2 SQL Presentation To Palmchip And Trg, Technology Resource Group
PPTX
Building data pipelines
PPTX
Datastage free tutorial
PPT
JAVA_BASICS.ppt
PPT
Sqllite
PDF
Metaflow: The ML Infrastructure at Netflix
PPTX
Ty bca-sem-v-introduction to vb.net-i-uploaded
Building Automated Data Pipelines with Airflow.pdf
Flyte kubecon 2019 SanDiego
Apache AirfowAsaSAsaSAsSas - Session1.pptx
Introduce Airflow.ppsx
Managing transactions on Ethereum with Apache Airflow
ISI work
Exploring SharePoint with F#
Intro to Talend Open Studio for Data Integration
Srgoc dotnet
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
DataPipelineApacheAirflow.pptx
Chapter_01_Intro to_Airflow.pdf
LINQ 2 SQL Presentation To Palmchip And Trg, Technology Resource Group
Building data pipelines
Datastage free tutorial
JAVA_BASICS.ppt
Sqllite
Metaflow: The ML Infrastructure at Netflix
Ty bca-sem-v-introduction to vb.net-i-uploaded
Ad

Recently uploaded (20)

PPTX
UNIT - 3 Total quality Management .pptx
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPT
Total quality management ppt for engineering students
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PPT
Occupational Health and Safety Management System
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
UNIT - 3 Total quality Management .pptx
Information Storage and Retrieval Techniques Unit III
Fundamentals of safety and accident prevention -final (1).pptx
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Current and future trends in Computer Vision.pptx
Total quality management ppt for engineering students
Visual Aids for Exploratory Data Analysis.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Safety Seminar civil to be ensured for safe working.
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
Occupational Health and Safety Management System
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
86236642-Electric-Loco-Shed.pdf jfkduklg
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...

Airflow for Beginners

  • 2. What is Airflow? It is a tool to BUILD, SCHEDULE and MONITOR data pipelines Set of data processing elements connected in series. The output of one element is the input of the next one.
  • 3. I Create Questions table II Store data from Stack Overflow III Write filtered questions to S3 IV Render HTML template V Send me an email
  • 4. Building blocks of Airflow Operator (Worker) Knows how to perform a task and has the tools to do it. Example: Python Operator Postgres Operator Bash Operator Email Operator DAG (Protocol / Instructions) Describes the order of tasks and what to do if task is failing. Example: Run Task A, when it is finished, run Task B. If one of the tasks failed, stop the whole process and send me a notification. Task (Specific job) Job that is done by an Operator. Example: - Load data from some API using Python Operator - Write data to the database using MySQL Operator Hooks Interfaces to the external platforms and databases. Implements common interface (all hooks look very similar) and use Connections Example: S3 Hook Slack Hook HDFS Hook Connection Credentials to the external systems that can be securely stored in the Airflow. Example: Postgres Connection = Connection string to the Postgres database AWS Connection = AWS access keys Variables Like environment variables. Can store arbitrary information and be used in the Tasks Examples: Stack Overflow base URL Gmail Client ID and Secret XComs Let’s Tasks exchange small messages.
  • 6. I Create Questions table II Store data from Stack Overflow III Write filtered questions to S3 IV Render HTML template V Send me an email Postgres Connection Postgres Connection Postgres Connection S3 Connection Python Operator Python Operator Python Operator Postgres Hook S3 Connection S3 Hook Postgres Hook S3 HookPostgres Operator XCom XCom Variables Variables Email Operator
  • 8. What have we learned? - What is Apache Airflow - What is a data pipeline - Main Airflow concepts (DAG, Task, Operator, Connection, etc.) - First pipeline