SlideShare a Scribd company logo
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What’s coming
in Apache
Airflow 2.0
Polidea
Polidea
Apache Airflow
Airflow is a platform to programmatically author,
schedule and monitor workflows.
Dynamic/Elegant
Extensible
Scalable
Polidea
What’s on
today ?
Polidea
What is the presentation about ?
● The team @ Polidea
● What the Airflow ?
● Where Apache Airflow is now?
● What’s coming in Apache Airflow 2.0.
Polidea
Team @ Polidea
Polidea
Logo or mockup
Hi!
Jarek Potiuk
Principal Software Engineer @Polidea
Apache Airflow PMC member
Certified GCP Architect
ex-Googler, ex-CTO, ex-choir member
@higrys
Polidea
Apache Airflow Development team@ Polidea
Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół
Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński
Tobiasz Kędzierski Michał Słowikowski
PMC
Past:
Polidea
Apache Airflow Website team @ Polidea
Kamil Breguła Zuzanna Rykowska Kamil Gabryjelski
Magdalena WęgrzyńskaMarta StrzałkowskaTomasz Urbaszek
Polidea
70+
TALENTS
100+
PROJECTS
DELIVERED
3m
USERS OF
OUR APPS
75%
OF BUSINESS
THROUGH
REFERRALS
Team
@Polidea
Polidea
Polidea &
Apache Airflow
Polidea
August 2018
2 people
Timeline December 2019
6 (9) people
Polidea
Our tasks
● 130+ operators
● 18+ GCP services
● Oozie-To-Airflow
● New Apache Airflow Website
Polidea
What we delivered extra
● Documentation improvements
● Breeze - improved dev environment
● Py2 -> Py3
● Pylint compatibility
● Pre-commit framework introduction
● CI environment reimplemented
● Operator scaffolding
● Convert tests to pytests
2 Apache Airflow Committers
Apache Airflow PMC member
Polidea
Open-source friendly company
Polidea
Apache
Airflow
Polidea
Why Apache Airflow and not one of these?
And many, many, many more ....
Polidea
Airflow is an Orchestrator
● Tells others what/when to do
● Synchronizes work between others
● Monitors what’s going on
● Intervenes if needed
● Mostly does not do much
Polidea
Airflow is Python
Polidea
Arbitrary complex workflows as a program
Polidea
Airflow has usable UI
Polidea
Airflow CLI
Polidea
What Airflow shines at ?
● Regular batch ETL jobs (think CRON)
● Processing fixed intervals of data
● Managing complex dependencies
● Backfilling data
● Interfacing to hundreds of different systems
● Platform for others to generate DAG files
Polidea
Apache Airflow 1.10
state of the pinwheel
Polidea
Current versions
● 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6 ….
● 1.10.7 in the making
● Deployed in thousands of companies
● On the rise of usage
● 2.0 - in master
Polidea
How to stay relevant ?
● Cloud Native is coming
● APIs are backbone of modern software
● User Interface matters
● Performance and reliability matter
● Many services, many changes
● Community over code
Polidea
End of 2019 survey: 300 responses(!)
● Started by Tomasz Urbaszek
● Run for the last 2 weeks
● Fresh off-the press
● Some surprises found
● Going in the right direction
Polidea
Polidea
What do you use Airflow for?
Data processing (ETL) 97%
Artificial Intelligence and Machine Learning
Pipelines 29%
Automating DevOps operations 21%
Polidea
What can be improved ?
Scheduler performance 61%
Web UI 58%
Logging, monitoring and alerting 47%
Examples, howtos, onboarding documentation 46%
Technical documentation 44%
Reliability 36%
REST API 31%
Authentication and authorization 29%
Polidea
What would be the most interesting feature for you ?
Production-ready docker image 56%
Declarative way of writing DAGs 50%
Horizontal autoscaling 40%
Examples, howtos, onboarding documentation 46%
Asynchronous Operators 31%
Stateless web server 26%
Knative Executor 16%
I already have all I need 4%
Polidea
Apache Airflow
2.0
Polidea
Cloud Native is coming
Polidea
Polidea
No - we do not plan to use Kubernetes near term 29%
Yes - setup on our own via Helm Chart or similar 21%
Not yet - but we use Kubernetes in our organization and we
could move 20%
Yes - via managed service in the cloud
(Composer/Astronomer etc.) 15%
Not yet - but we plan to deploy Kubernetes in our
organization soon 14%
Other 2%
Either use or can use Kubernetes in foreseeable future 69%
Do not have plans to use Kubernetes 29%
Do you use Kubernetes-based deployments for Airflow?
Polidea
Cloud Native is coming: Scalability
● Knative Executor
● SIG-Knative => SIG Scalability
● Native Airflow Executor (WIP)
● Pub/Sub communication
● Horizontally auto-scalable
Polidea
Cloud Native is coming: Deployability
● Native worker deployable at different providers
● “As a service” and “on-premises” friendly
● Generic Pub/Sub architecture for communication
● No DB communication between components
● Production-optimised docker image
Polidea
Cloud Native is coming: Monitoring
● Integrate with standard monitoring tools
● More metrics exported using stats
● Integration with Prometheus on Kubernetes
● Horizontal Scalability approach based on metrics
Polidea
APIs are backbone of modern software
Polidea
APIs are taking over the world
● Modern API
● HTTP-based API used by CLI, webserver
● Pub/Sub API for communication Scheduler <> Workers
● Generic APIs - not tied to Kubernetes/other deployment options
● Better Authentication/Authorization
● Opens up multi-tenancy capabilities
Polidea
User Interface matters
Polidea
Original Airflow Graphical User Interface 97%
CLI 40%
API (experimental) 20%
Custom Own Created UI 8%
Which interface(s) of Airflow do you use as part of your current role?
Polidea
UIs are getting better
● Make UI refresh like it’s 2020
● Modern design (possibly)
● Use APIs for communication not DB/file access
● Better authentication and authorisation
● Stateless web-server
● Better responsiveness
Polidea
Performance and reliability
matter
Polidea
Performance and reliability is important
● Automated performance testing (CI - targeted)
● Monitoring performance characteristics
● Improve Webserver/Scheduler Performance
● Internal instrumentation and optimisations
Polidea
Many services, fast changes
Polidea
Fast evolving services
● Currently operators bound to releases of Airflow
● Migration to 2.0 will take time
● Introducing new approach
○ move operators to new path/namespaces
○ change import paths
○ backporting to 1.10
○ backportable to 1.10 (!)
○ future: per-provider packaging
Polidea
Community over code
Polidea
Polidea
Community over code: Documentation
● Google Season of Docs - great programme!
● Onboarding, best practices, architecture, deployment options
● Better, clearer structure
● Both user and developer documentation improved
● Worked with technical writers from India and Russia
Polidea
Community over code: New website: airflow.apache.org
Work sponsored by Google Cloud
Polidea
Community over code: Development environment
● It’s a Breeze to develop Apache Airflow
● Get your environment up in 10 minutes
● Integration with IDE
● Well documented
● Team-work enabler
● Allows to run and debug DAGs
● Fully debuggable: DebugExecutor - cooperation with Databand.ai
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Friday, December 13, 2019
5:30 PM to 9:30 PM
Polidea Sp. z o.o.
Przeskok 2 IV p. · Warsaw
https://t.co/TmWdWwfemI
First Warsaw Apache Airflow Workshop
Polidea
Thanks!
hello@polidea.com
Polidea
Thanks!
Prototypes Widely Used
UX Design
Personal Growth Individuality Trust
Beter, Fester
Ideation
API Design
Shipping
Building
blocks
Backend VR Android Learn
more
Firmware
Decision
Coffee
CollaborationContactCVSave moneyOpen Source
Management
React
Native
Testing Team
Quality iOS Maintain Security Front End
Flutter
Task
Seamless UXResearchMobile AR

More Related Content

PDF
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
PDF
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
PDF
Upgrading to Apache Airflow 2 | Airflow Summit 2021
PDF
Introduction to Apache Airflow
PDF
Airflow Best Practises & Roadmap to Airflow 2.0
PDF
Upcoming features in Airflow 2
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
PDF
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Introduction to Apache Airflow
Airflow Best Practises & Roadmap to Airflow 2.0
Upcoming features in Airflow 2
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup

What's hot (20)

PDF
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
PPTX
Airflow 101
PDF
Devops Porto - CI/CD at Gitlab
PPTX
Airflow at WePay
PPTX
GitLab for CI/CD process
PPTX
Airflow presentation
PDF
Kube Your Enthusiasm - Paul Czarkowski
PPTX
Fyber - airflow best practices in production
PDF
GitOps Toolkit (Cloud Native Nordics Tech Talk)
PPTX
Apache Airflow Introduction
PDF
Importance of GCP: 30 Days of GCP
PPTX
PDF
Introduction of cloud native CI/CD on kubernetes
PPTX
PPTX
Gradle build capabilities
PDF
Load impact insights webinar
PDF
Argocd up and running
PDF
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
PDF
5 Habits of High-Velocity Teams Using Kubernetes
PDF
So you want to write a cloud function
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Airflow 101
Devops Porto - CI/CD at Gitlab
Airflow at WePay
GitLab for CI/CD process
Airflow presentation
Kube Your Enthusiasm - Paul Czarkowski
Fyber - airflow best practices in production
GitOps Toolkit (Cloud Native Nordics Tech Talk)
Apache Airflow Introduction
Importance of GCP: 30 Days of GCP
Introduction of cloud native CI/CD on kubernetes
Gradle build capabilities
Load impact insights webinar
Argocd up and running
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes
So you want to write a cloud function
Ad

Similar to What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019 (20)

PDF
PCF Cloud-Native Workshop Slides
PPTX
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
PPTX
It's a Breeze to develop Airflow (Cloud Native Warsaw)
PPTX
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
PDF
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
PPTX
London-MuleSoft-Meetup-April-19-2023
PDF
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
PDF
Pivotal + Apigee Workshop (June 4th, 2019)
PPTX
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
PPTX
London MuleSoft Meetup
PDF
Continuous Lifecycle London 2018 Event Keynote
PPTX
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
PDF
How to Scale Operations for a Multi-Cloud Platform using PCF
PDF
Platformless Modernization with Choreo.pdf
PDF
Mule soft meetup__jaipur_december_2020_final
PDF
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
PPTX
Breaking the Monolith
PDF
Introduction to DevOps and the Practical Use Cases at Credit OK
PDF
DocDokuPLM presentation - OW2Con 2015 Community Award winner
PDF
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
PCF Cloud-Native Workshop Slides
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
It's a Breeze to develop Airflow (Cloud Native Warsaw)
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
London-MuleSoft-Meetup-April-19-2023
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
Pivotal + Apigee Workshop (June 4th, 2019)
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
London MuleSoft Meetup
Continuous Lifecycle London 2018 Event Keynote
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
How to Scale Operations for a Multi-Cloud Platform using PCF
Platformless Modernization with Choreo.pdf
Mule soft meetup__jaipur_december_2020_final
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
Breaking the Monolith
Introduction to DevOps and the Practical Use Cases at Credit OK
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
Ad

More from Jarek Potiuk (8)

PDF
Subtle Differences between Python versions
PDF
Caching in Docker - the hardest thing in computer science
PDF
Off time - how to use social media to be more out of social media
PDF
Berlin Apache Con EU Airflow Workshops
PDF
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
PDF
Ci for android OS
PDF
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
PPTX
React native introduction (Mobile Warsaw)
Subtle Differences between Python versions
Caching in Docker - the hardest thing in computer science
Off time - how to use social media to be more out of social media
Berlin Apache Con EU Airflow Workshops
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Ci for android OS
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
React native introduction (Mobile Warsaw)

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
cuic standard and advanced reporting.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Programs and apps: productivity, graphics, security and other tools
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
Spectroscopy.pptx food analysis technology
cuic standard and advanced reporting.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Assigned Numbers - 2025 - Bluetooth® Document
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...

What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019

  • 4. Polidea Apache Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. Dynamic/Elegant Extensible Scalable
  • 6. Polidea What is the presentation about ? ● The team @ Polidea ● What the Airflow ? ● Where Apache Airflow is now? ● What’s coming in Apache Airflow 2.0.
  • 8. Polidea Logo or mockup Hi! Jarek Potiuk Principal Software Engineer @Polidea Apache Airflow PMC member Certified GCP Architect ex-Googler, ex-CTO, ex-choir member @higrys
  • 9. Polidea Apache Airflow Development team@ Polidea Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński Tobiasz Kędzierski Michał Słowikowski PMC Past:
  • 10. Polidea Apache Airflow Website team @ Polidea Kamil Breguła Zuzanna Rykowska Kamil Gabryjelski Magdalena WęgrzyńskaMarta StrzałkowskaTomasz Urbaszek
  • 13. Polidea August 2018 2 people Timeline December 2019 6 (9) people
  • 14. Polidea Our tasks ● 130+ operators ● 18+ GCP services ● Oozie-To-Airflow ● New Apache Airflow Website
  • 15. Polidea What we delivered extra ● Documentation improvements ● Breeze - improved dev environment ● Py2 -> Py3 ● Pylint compatibility ● Pre-commit framework introduction ● CI environment reimplemented ● Operator scaffolding ● Convert tests to pytests 2 Apache Airflow Committers Apache Airflow PMC member
  • 18. Polidea Why Apache Airflow and not one of these? And many, many, many more ....
  • 19. Polidea Airflow is an Orchestrator ● Tells others what/when to do ● Synchronizes work between others ● Monitors what’s going on ● Intervenes if needed ● Mostly does not do much
  • 24. Polidea What Airflow shines at ? ● Regular batch ETL jobs (think CRON) ● Processing fixed intervals of data ● Managing complex dependencies ● Backfilling data ● Interfacing to hundreds of different systems ● Platform for others to generate DAG files
  • 26. Polidea Current versions ● 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6 …. ● 1.10.7 in the making ● Deployed in thousands of companies ● On the rise of usage ● 2.0 - in master
  • 27. Polidea How to stay relevant ? ● Cloud Native is coming ● APIs are backbone of modern software ● User Interface matters ● Performance and reliability matter ● Many services, many changes ● Community over code
  • 28. Polidea End of 2019 survey: 300 responses(!) ● Started by Tomasz Urbaszek ● Run for the last 2 weeks ● Fresh off-the press ● Some surprises found ● Going in the right direction
  • 30. Polidea What do you use Airflow for? Data processing (ETL) 97% Artificial Intelligence and Machine Learning Pipelines 29% Automating DevOps operations 21%
  • 31. Polidea What can be improved ? Scheduler performance 61% Web UI 58% Logging, monitoring and alerting 47% Examples, howtos, onboarding documentation 46% Technical documentation 44% Reliability 36% REST API 31% Authentication and authorization 29%
  • 32. Polidea What would be the most interesting feature for you ? Production-ready docker image 56% Declarative way of writing DAGs 50% Horizontal autoscaling 40% Examples, howtos, onboarding documentation 46% Asynchronous Operators 31% Stateless web server 26% Knative Executor 16% I already have all I need 4%
  • 36. Polidea No - we do not plan to use Kubernetes near term 29% Yes - setup on our own via Helm Chart or similar 21% Not yet - but we use Kubernetes in our organization and we could move 20% Yes - via managed service in the cloud (Composer/Astronomer etc.) 15% Not yet - but we plan to deploy Kubernetes in our organization soon 14% Other 2% Either use or can use Kubernetes in foreseeable future 69% Do not have plans to use Kubernetes 29% Do you use Kubernetes-based deployments for Airflow?
  • 37. Polidea Cloud Native is coming: Scalability ● Knative Executor ● SIG-Knative => SIG Scalability ● Native Airflow Executor (WIP) ● Pub/Sub communication ● Horizontally auto-scalable
  • 38. Polidea Cloud Native is coming: Deployability ● Native worker deployable at different providers ● “As a service” and “on-premises” friendly ● Generic Pub/Sub architecture for communication ● No DB communication between components ● Production-optimised docker image
  • 39. Polidea Cloud Native is coming: Monitoring ● Integrate with standard monitoring tools ● More metrics exported using stats ● Integration with Prometheus on Kubernetes ● Horizontal Scalability approach based on metrics
  • 40. Polidea APIs are backbone of modern software
  • 41. Polidea APIs are taking over the world ● Modern API ● HTTP-based API used by CLI, webserver ● Pub/Sub API for communication Scheduler <> Workers ● Generic APIs - not tied to Kubernetes/other deployment options ● Better Authentication/Authorization ● Opens up multi-tenancy capabilities
  • 43. Polidea Original Airflow Graphical User Interface 97% CLI 40% API (experimental) 20% Custom Own Created UI 8% Which interface(s) of Airflow do you use as part of your current role?
  • 44. Polidea UIs are getting better ● Make UI refresh like it’s 2020 ● Modern design (possibly) ● Use APIs for communication not DB/file access ● Better authentication and authorisation ● Stateless web-server ● Better responsiveness
  • 46. Polidea Performance and reliability is important ● Automated performance testing (CI - targeted) ● Monitoring performance characteristics ● Improve Webserver/Scheduler Performance ● Internal instrumentation and optimisations
  • 48. Polidea Fast evolving services ● Currently operators bound to releases of Airflow ● Migration to 2.0 will take time ● Introducing new approach ○ move operators to new path/namespaces ○ change import paths ○ backporting to 1.10 ○ backportable to 1.10 (!) ○ future: per-provider packaging
  • 51. Polidea Community over code: Documentation ● Google Season of Docs - great programme! ● Onboarding, best practices, architecture, deployment options ● Better, clearer structure ● Both user and developer documentation improved ● Worked with technical writers from India and Russia
  • 52. Polidea Community over code: New website: airflow.apache.org Work sponsored by Google Cloud
  • 53. Polidea Community over code: Development environment ● It’s a Breeze to develop Apache Airflow ● Get your environment up in 10 minutes ● Integration with IDE ● Well documented ● Team-work enabler ● Allows to run and debug DAGs ● Fully debuggable: DebugExecutor - cooperation with Databand.ai
  • 55. Friday, December 13, 2019 5:30 PM to 9:30 PM Polidea Sp. z o.o. Przeskok 2 IV p. · Warsaw https://t.co/TmWdWwfemI First Warsaw Apache Airflow Workshop
  • 58. Prototypes Widely Used UX Design Personal Growth Individuality Trust Beter, Fester Ideation API Design Shipping Building blocks Backend VR Android Learn more Firmware Decision Coffee CollaborationContactCVSave moneyOpen Source Management React Native Testing Team Quality iOS Maintain Security Front End Flutter Task Seamless UXResearchMobile AR