SlideShare a Scribd company logo
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
Polidea
Polidea
Airflow is a platform to programmatically author,
schedule and monitor workflows.
Dynamic/Elegant
Extensible
Scalable
Polidea
Polidea
● Developing Airflow is (was !) hard
● Road taken to developer productivity
● Improving first time experience for the developers
● Focus on teamwork
Polidea
Polidea
Polidea
Hi!
Principal Software Engineer @Polidea
Apache Airflow PMC member
Certified GCP Architect
ex-Googler, ex-CTO, ex-choir member
Polidea
TALENTS
PROJECTS
DELIVERED
USERS OF
OUR APPS
OF BUSINESS
THROUGH
REFERRALS
Polidea
Polidea
Polidea
Polidea
● 100+ operators
● 18+ GCP services
● Oozie-To-Airflow
Polidea
● 1 Apache Airflow committer, 1 PMC member
● Documentation improvements
● Breeze - improved development environment
● Py2 -> Py3
● Pylint compatibility
● CI environment reimplemented
● Operator scaffolding
Polidea
Polidea
Polidea
● Multiple backends: postgres, mysql, sqlite
● Multiple python versions (2.7) - 3.5, 3.6. 3.7
● Multiple executors: Local/Sequential/Kubernetes
● Automated static code analysis
● Automated documentation building
Polidea
● Long time to set it up
● Frustrations of fresh developer experience
● High friction/learning curve for Airflow development environment
● Slow iteration speed
● Complicated Development Environment
Polidea
● Scripts only designed for CI, not local environment
● Dependencies installed every time you start the environment
● Always full database reset
● Minutes to run one test
● No guidance how to iterate over tests
Polidea
Polidea
Polidea
● Focus on developer productivity
● Faster development cycle
● Decrease developer frustration
● Improve the teamwork
● Easy for ad-hoc contributors to code & test
Polidea
● AIP-10: Multi-layered and multi-stage official Airflow image
● AIP-7: Simplified Development Workflow
● AIP-26: Production-ready Airflow Docker Image and helm chart
● AIP-23: Migrate out of Travis CI
● AIP-4: Support for System Tests for external systems
Polidea
● Local virtualenv
● Own Travis CI fork
● Docker compose (Travis CI equivalent)
Polidea
● Total time: 7 minutes
● Running one test only
● Failure at the end (!)
● Re-run - 10-20 seconds for DB
● Re-enter - same time (!)
● No bash history
Polidea
Polidea
● Docker images built from master automatically (DockerHub)
● Local images use cached images
● Tests and static checks run using Docker Compose/Docker environment
● Can be run on Kubernetes Cluster (Docker-In-Docker)
● CI system - independent
● Base to build production image
Polidea
Polidea
● works out-of-the-box
● initializes DB when needed
● environment variables set
● sub-second test overhead
● ipdb debugging
● verbose output
Polidea
Polidea
● entering the environment: ./breeze --backend sqlite --python 3.5
● last-used environment: ./breeze
● automated image management
● autocomplete of options
● sub-second test execution overhead
● host sources mounted to Docker container
● ports forwarded
● hints for ad-hoc developers
Polidea
● run-tests tests.core<TAB><TAB> autocomplete
● bash history across sessions
● run static checks with Breeze
● build documentation with Breeze
● run licence checks with Breeze
● easy debugging (including debugging with IDE)
● pre-commit checks
Polidea
Polidea
Polidea
Polidea
● Docker image management
● Run-tests with DB initialisation
● Travis CI integration
● Run all tasks (docs/static/licence check ...)
● Pre-commit checks
● Comprehensive documentation - Google Season of Docs YAY!
Polidea
Polidea
Polidea
● easy to use
○ pre-commit install
○ pre-commit run
○ pre-commit run mypy
○ pre-commit run --all-files
● run only for changed files (fast)
● catches errors early
● make committers time efficient
● promotes good practices
Polidea
Polidea
● Production-ready Apache Airflow official image
● Simplifications (less images, easier scripts)
● Migrating out of Travis CI
○ GitLab CI (only CI) or GitHub Actions
○ Kubernetes Cluster on Google Kubernetes Engine (Thanks Google!)
● Automation of Performance Tests
● Automation of Release Tests
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
Polidea
hello@polidea.com

More Related Content

PPTX
It's a Breeze to develop Airflow (Cloud Native Warsaw)
PDF
Webinar - Unbox GitLab CI/CD
PDF
Continuous Integration/Deployment with Gitlab CI
PDF
Devops Porto - CI/CD at Gitlab
PDF
CI with Gitlab & Docker
PPTX
Workflows using Git GitHub | Edureka
PDF
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
PDF
Gitlab ci, cncf.sk
It's a Breeze to develop Airflow (Cloud Native Warsaw)
Webinar - Unbox GitLab CI/CD
Continuous Integration/Deployment with Gitlab CI
Devops Porto - CI/CD at Gitlab
CI with Gitlab & Docker
Workflows using Git GitHub | Edureka
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Gitlab ci, cncf.sk

What's hot (20)

PDF
FOSDEM 2017: GitLab CI
PDF
Deploy Multinode GitLab Runner in openSUSE 15.1 Instances with Ansible Automa...
PPTX
An OpenShift Migration: From 3.9 to 4.5
PDF
Introduction to GIT
PDF
Using GitLab CI
PDF
Introduction to GitHub Actions
PDF
Gitlab ci-cd
PDF
Git Tutorial
PDF
Git with the flow
PDF
DevOps of Python applications using OpenShift (Italian version)
PDF
Architecting Qt Mobile Applications: Frameworks, Code Generators and Beyond
PDF
Training: Day Two - Eclipse, Git, Maven
ODP
Open Innovation Lab (OIL) - 20150227 - GIT Intro Workshop
PDF
Git Tutorial I
PPTX
GitLab for CI/CD process
PDF
Upgrading to Apache Airflow 2 | Airflow Summit 2021
PPTX
Lets git to it
PDF
Git and Github
PDF
Log monitoring with Logstash and Icinga
FOSDEM 2017: GitLab CI
Deploy Multinode GitLab Runner in openSUSE 15.1 Instances with Ansible Automa...
An OpenShift Migration: From 3.9 to 4.5
Introduction to GIT
Using GitLab CI
Introduction to GitHub Actions
Gitlab ci-cd
Git Tutorial
Git with the flow
DevOps of Python applications using OpenShift (Italian version)
Architecting Qt Mobile Applications: Frameworks, Code Generators and Beyond
Training: Day Two - Eclipse, Git, Maven
Open Innovation Lab (OIL) - 20150227 - GIT Intro Workshop
Git Tutorial I
GitLab for CI/CD process
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Lets git to it
Git and Github
Log monitoring with Logstash and Icinga
Ad

Similar to It's a Breeze to develop Apache Airflow (Apache Con Berlin) (20)

PDF
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
PPTX
Gocd – Kubernetes/Nomad Continuous Deployment
ODP
Advanced Code Flow, Notes From the Field
PDF
Dev + DevOps для PHP розробника
PPTX
Continuous testing
PPTX
Modern Web-site Development Pipeline
PDF
DrupalCon Los Angeles - Continuous Integration Toolbox
PDF
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
PDF
kpatch.kgraft
PDF
Why You Should be Using Multi-stage Docker Builds in 2019
PDF
Docker based-Pipelines with Codefresh
PDF
Beyond Puppet
PPTX
MoldCamp - multidimentional testing workflow. CIBox.
PPTX
PDF
Chef - Administration for programmers
PDF
Docker SQL Continuous Integration Flow
PPTX
Настройка окружения для кросскомпиляции проектов на основе docker'a
PDF
Ruby microservices with Docker - Sergii Koba
PDF
Automating Complex Setups with Puppet
PDF
Update on the open source browser space (16th GENIVI AMM)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Gocd – Kubernetes/Nomad Continuous Deployment
Advanced Code Flow, Notes From the Field
Dev + DevOps для PHP розробника
Continuous testing
Modern Web-site Development Pipeline
DrupalCon Los Angeles - Continuous Integration Toolbox
CBDW2014- Intro to CommandBox; The ColdFusion CLI, Package Manager, and REPL ...
kpatch.kgraft
Why You Should be Using Multi-stage Docker Builds in 2019
Docker based-Pipelines with Codefresh
Beyond Puppet
MoldCamp - multidimentional testing workflow. CIBox.
Chef - Administration for programmers
Docker SQL Continuous Integration Flow
Настройка окружения для кросскомпиляции проектов на основе docker'a
Ruby microservices with Docker - Sergii Koba
Automating Complex Setups with Puppet
Update on the open source browser space (16th GENIVI AMM)
Ad

More from Jarek Potiuk (8)

PDF
Subtle Differences between Python versions
PDF
Caching in Docker - the hardest thing in computer science
PDF
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
PDF
Off time - how to use social media to be more out of social media
PDF
Berlin Apache Con EU Airflow Workshops
PDF
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
PDF
Ci for android OS
PPTX
React native introduction (Mobile Warsaw)
Subtle Differences between Python versions
Caching in Docker - the hardest thing in computer science
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Off time - how to use social media to be more out of social media
Berlin Apache Con EU Airflow Workshops
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Ci for android OS
React native introduction (Mobile Warsaw)

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Spectroscopy.pptx food analysis technology
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
Cloud computing and distributed systems.
Teaching material agriculture food technology
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
Spectroscopy.pptx food analysis technology

It's a Breeze to develop Apache Airflow (Apache Con Berlin)

  • 3. Polidea Airflow is a platform to programmatically author, schedule and monitor workflows. Dynamic/Elegant Extensible Scalable
  • 5. Polidea ● Developing Airflow is (was !) hard ● Road taken to developer productivity ● Improving first time experience for the developers ● Focus on teamwork
  • 8. Polidea Hi! Principal Software Engineer @Polidea Apache Airflow PMC member Certified GCP Architect ex-Googler, ex-CTO, ex-choir member
  • 13. Polidea ● 100+ operators ● 18+ GCP services ● Oozie-To-Airflow
  • 14. Polidea ● 1 Apache Airflow committer, 1 PMC member ● Documentation improvements ● Breeze - improved development environment ● Py2 -> Py3 ● Pylint compatibility ● CI environment reimplemented ● Operator scaffolding
  • 17. Polidea ● Multiple backends: postgres, mysql, sqlite ● Multiple python versions (2.7) - 3.5, 3.6. 3.7 ● Multiple executors: Local/Sequential/Kubernetes ● Automated static code analysis ● Automated documentation building
  • 18. Polidea ● Long time to set it up ● Frustrations of fresh developer experience ● High friction/learning curve for Airflow development environment ● Slow iteration speed ● Complicated Development Environment
  • 19. Polidea ● Scripts only designed for CI, not local environment ● Dependencies installed every time you start the environment ● Always full database reset ● Minutes to run one test ● No guidance how to iterate over tests
  • 22. Polidea ● Focus on developer productivity ● Faster development cycle ● Decrease developer frustration ● Improve the teamwork ● Easy for ad-hoc contributors to code & test
  • 23. Polidea ● AIP-10: Multi-layered and multi-stage official Airflow image ● AIP-7: Simplified Development Workflow ● AIP-26: Production-ready Airflow Docker Image and helm chart ● AIP-23: Migrate out of Travis CI ● AIP-4: Support for System Tests for external systems
  • 24. Polidea ● Local virtualenv ● Own Travis CI fork ● Docker compose (Travis CI equivalent)
  • 25. Polidea ● Total time: 7 minutes ● Running one test only ● Failure at the end (!) ● Re-run - 10-20 seconds for DB ● Re-enter - same time (!) ● No bash history
  • 27. Polidea ● Docker images built from master automatically (DockerHub) ● Local images use cached images ● Tests and static checks run using Docker Compose/Docker environment ● Can be run on Kubernetes Cluster (Docker-In-Docker) ● CI system - independent ● Base to build production image
  • 29. Polidea ● works out-of-the-box ● initializes DB when needed ● environment variables set ● sub-second test overhead ● ipdb debugging ● verbose output
  • 31. Polidea ● entering the environment: ./breeze --backend sqlite --python 3.5 ● last-used environment: ./breeze ● automated image management ● autocomplete of options ● sub-second test execution overhead ● host sources mounted to Docker container ● ports forwarded ● hints for ad-hoc developers
  • 32. Polidea ● run-tests tests.core<TAB><TAB> autocomplete ● bash history across sessions ● run static checks with Breeze ● build documentation with Breeze ● run licence checks with Breeze ● easy debugging (including debugging with IDE) ● pre-commit checks
  • 36. Polidea ● Docker image management ● Run-tests with DB initialisation ● Travis CI integration ● Run all tasks (docs/static/licence check ...) ● Pre-commit checks ● Comprehensive documentation - Google Season of Docs YAY!
  • 39. Polidea ● easy to use ○ pre-commit install ○ pre-commit run ○ pre-commit run mypy ○ pre-commit run --all-files ● run only for changed files (fast) ● catches errors early ● make committers time efficient ● promotes good practices
  • 41. Polidea ● Production-ready Apache Airflow official image ● Simplifications (less images, easier scripts) ● Migrating out of Travis CI ○ GitLab CI (only CI) or GitHub Actions ○ Kubernetes Cluster on Google Kubernetes Engine (Thanks Google!) ● Automation of Performance Tests ● Automation of Release Tests