SlideShare a Scribd company logo
13
Most read
14
Most read
18
Most read
Dagster @ Rohde & Schwarz MNT
Community Meeting May, 2021
Introduction
Simon, Data Engineer
& Author
Working at Rohde &
Schwarz, Writing at
sspaeti.com.
Early user of Dagster
Rohde & Schwarz,
Company
SmartAnalytics,
Product
sspaeti.com,
Blog
Specialized in electronic test
equipment, broadcast & media,
cybersecurity, radio monitoring
and radiolocation, and radio
communication.
Actionable benchmarking,
optimization and monitoring
intelligence from drive test
data in mobile network testing
(MNT)
Genuine news about the
data ecosystem. Topics:
#dataengineering #bigdata
#python #opensource #ETL
What Do We Do?
Our tools help to improve the quality and performance of mobile networks
Article Hello Africa! R&S®Freerider 4 Backpack
QualiPoc Android
SmartAnalytics
Source: Iberdrola.com
Architecture - Where We Come From
SmartAnalytics
Custom ETL
(C# and SQL)
Motivation for using
Dagster
Bringing the ETL into the cloud and
manage at a central place. Being
#bigdata ready.
● on-prem → cloud
● scale-up → scale-out
● and generally overcoming limits
in ETL processing and query
time
Architecture - Cloud-Native with Dagster
Event-Driven with Sensors
→ Run-History of Sensors
Event-Driven with Sensors
→ Listening on S3-Folder
Import-Pipeline
File-Upload ⇒ ETL ⇒ Delta ⇒ Druid
Assets - Link Data to Computations
● ETL file size,
duration & time
overview
● Assets for
persistent Delta &
Druid tables to see
what pipelines
affected changes
Example of adding Assets
→ Simply yield the Metadata-Entries
Advantages in using Dagster
Replaced Custom Built Engine
We could replace our own created engine.
Implications:
● Stable and tested
● Massive out-of-the-box features
○ Re-start capabilities, backfill,
dependency management,
statemangement of running jobs,
support different modes, easy
testable
Feature rich UI - Dagit
Beautiful UI with supports the user and
engineers to get a fast overview and do
operations.
Implications:
● Everyone can sees what’s going on in
the system:
○ Current jobs
○ State in the past
○ Rich Metadata
Advantages in using Dagster
Problem solving
Problems and errors are straightforward to
spot, even given the complex big data
architecture.
Implications:
● Error fixing during development are
fast and easy
● Error reporting are coming with good
amount of context
Easy to learn Dagster
User which haven’t used Dagster, can get
started fast. Concepts behind make sense to
new users.
Implications:
● Developers up to speed fast
● It’s pleasant to write pipelines
Advantages in using Dagster
Self-Documented
Pipelines are documented directly within
Dagit. Each step is explained by the solids
and rich metadata e.g. adding SQL-Stmt or
Assets.
Implications:
● Users and customers can easily
understand what’s going on
● Easy to model pipelines
Reusable code
Existing Microservices in Python could be
easily transferred with minimum effort.
With `resources` and `solids` we can re-use
all our code in an easy way.
Implications:
● Easy to consolidate code into Dagster
● No code duplication (DRY-principle)
● Stable and tested functions
● Reduce of boiler-plate compared
implement multiple microservices
● Functional by design
Example of Re-usable Code with Resources
Define once
And use everywhere with context
Advantages in using Dagster
Kubernetes deployment
Easy way to schedule pods from our
pipelines.
Implications:
● Based on dockerfiles which allows us
to run SQL-Server pods and at the
same time pod with Spark configured
Python based (& SQL supportive)
Python is the language of data and easy to
understand for analysts and engineers. With
prepared easy to inject SQL-statements.
Implications:
● Easier for non Engineers to adapt
● Possible to use wide range of Python
packages, especially for ML
Next Steps
Testing
● Add Unit and Smoke Tests to improve
stability
Documentation
● Use Assets more intensively / automated(?)
● Integrate with new data lineage feature
Guidelines
● Extend our Dagster guidelines and best
practices to align on common patterns
Pipelines
● Try dynamic orchestration for overall pipeline
● Add partitions by file_name
Questions?
Thanks for listening! Feel free to
reach out to me on Dagster-Slack
or anywhere else.
SmartAnalytics
Mobile Network Testing - MNT
sspaeti.com
Simon Späti
@sspaeti

More Related Content

PPTX
Introduction to power apps
PDF
Introduction to MLflow
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Architecting Agile Data Applications for Scale
PDF
Managing Infrastructure as a Product - Introduction to Platform Engineering
PPTX
Demystifying data engineering
PDF
Big Query Basics
PDF
Future of Data Engineering
Introduction to power apps
Introduction to MLflow
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Architecting Agile Data Applications for Scale
Managing Infrastructure as a Product - Introduction to Platform Engineering
Demystifying data engineering
Big Query Basics
Future of Data Engineering

What's hot (20)

PPTX
Databricks Fundamentals
PDF
Data Warehouse - Incremental Migration to the Cloud
PDF
How to govern and secure a Data Mesh?
PDF
DataOps: An Agile Method for Data-Driven Organizations
PDF
DataMinds 2022 Azure Purview Erwin de Kreuk
PDF
Databricks Overview for MLOps
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PPTX
Microsoft Data Platform - What's included
PDF
Observability & Datadog
PPTX
Choosing technologies for a big data solution in the cloud
PPTX
Snowflake + Power BI: Cloud Analytics for Everyone
PDF
MLOps Using MLflow
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
DataOps - The Foundation for Your Agile Data Architecture
PDF
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
PDF
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
PDF
Patterns of Enterprise Application Architecture (by example)
PPTX
MLOps.pptx
Databricks Fundamentals
Data Warehouse - Incremental Migration to the Cloud
How to govern and secure a Data Mesh?
DataOps: An Agile Method for Data-Driven Organizations
DataMinds 2022 Azure Purview Erwin de Kreuk
Databricks Overview for MLOps
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Data Warehousing Trends, Best Practices, and Future Outlook
Microsoft Data Platform - What's included
Observability & Datadog
Choosing technologies for a big data solution in the cloud
Snowflake + Power BI: Cloud Analytics for Everyone
MLOps Using MLflow
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
DataOps - The Foundation for Your Agile Data Architecture
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
Patterns of Enterprise Application Architecture (by example)
MLOps.pptx
Ad

Similar to Dagster @ R&S MNT (20)

PDF
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
PPTX
[DSC DACH 24] Cost efficient alternative to databricks lock-in - Georg Heiler
PDF
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
PPTX
BigData Technology in energy and public sector
PDF
C19013010 the tutorial to build shared ai services session 2
PDF
Connected homes - Create a safer, healthier and greener world for your custom...
PDF
Virtual workshop telco version
PPTX
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PDF
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
PPTX
Powering a Virtual Power Station with Big Data
PDF
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
PDF
How Service Mesh Fits into the Modern Data Stack
PDF
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
PDF
Interview with First Utility CIO/CTO Bill Wilkins
PPTX
How komatsu is driving operational efficiencies using io t and machine learni...
PDF
Big Data Analytics in Energy & Utilities
PDF
CLASS 2016 - Palestra José Antunes
PPTX
Supercharging Self-Service API Integration with AI
PPT
Astin Profile Jp Final
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
[DSC DACH 24] Cost efficient alternative to databricks lock-in - Georg Heiler
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
BigData Technology in energy and public sector
C19013010 the tutorial to build shared ai services session 2
Connected homes - Create a safer, healthier and greener world for your custom...
Virtual workshop telco version
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Powering a Virtual Power Station with Big Data
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
How Service Mesh Fits into the Modern Data Stack
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
Interview with First Utility CIO/CTO Bill Wilkins
How komatsu is driving operational efficiencies using io t and machine learni...
Big Data Analytics in Energy & Utilities
CLASS 2016 - Palestra José Antunes
Supercharging Self-Service API Integration with AI
Astin Profile Jp Final
Ad

Recently uploaded (20)

PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
composite construction of structures.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Welding lecture in detail for understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Well-logging-methods_new................
PDF
PPT on Performance Review to get promotions
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
UNIT 4 Total Quality Management .pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Digital Logic Computer Design lecture notes
Lesson 3_Tessellation.pptx finite Mathematics
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Lecture Notes Electrical Wiring System Components
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
composite construction of structures.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Welding lecture in detail for understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Well-logging-methods_new................
PPT on Performance Review to get promotions
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
bas. eng. economics group 4 presentation 1.pptx
CH1 Production IntroductoryConcepts.pptx
UNIT 4 Total Quality Management .pptx

Dagster @ R&S MNT

  • 1. Dagster @ Rohde & Schwarz MNT Community Meeting May, 2021
  • 2. Introduction Simon, Data Engineer & Author Working at Rohde & Schwarz, Writing at sspaeti.com. Early user of Dagster Rohde & Schwarz, Company SmartAnalytics, Product sspaeti.com, Blog Specialized in electronic test equipment, broadcast & media, cybersecurity, radio monitoring and radiolocation, and radio communication. Actionable benchmarking, optimization and monitoring intelligence from drive test data in mobile network testing (MNT) Genuine news about the data ecosystem. Topics: #dataengineering #bigdata #python #opensource #ETL
  • 3. What Do We Do? Our tools help to improve the quality and performance of mobile networks Article Hello Africa! R&S®Freerider 4 Backpack QualiPoc Android SmartAnalytics Source: Iberdrola.com
  • 4. Architecture - Where We Come From SmartAnalytics Custom ETL (C# and SQL)
  • 5. Motivation for using Dagster Bringing the ETL into the cloud and manage at a central place. Being #bigdata ready. ● on-prem → cloud ● scale-up → scale-out ● and generally overcoming limits in ETL processing and query time
  • 7. Event-Driven with Sensors → Run-History of Sensors
  • 8. Event-Driven with Sensors → Listening on S3-Folder
  • 10. Assets - Link Data to Computations ● ETL file size, duration & time overview ● Assets for persistent Delta & Druid tables to see what pipelines affected changes
  • 11. Example of adding Assets → Simply yield the Metadata-Entries
  • 12. Advantages in using Dagster Replaced Custom Built Engine We could replace our own created engine. Implications: ● Stable and tested ● Massive out-of-the-box features ○ Re-start capabilities, backfill, dependency management, statemangement of running jobs, support different modes, easy testable Feature rich UI - Dagit Beautiful UI with supports the user and engineers to get a fast overview and do operations. Implications: ● Everyone can sees what’s going on in the system: ○ Current jobs ○ State in the past ○ Rich Metadata
  • 13. Advantages in using Dagster Problem solving Problems and errors are straightforward to spot, even given the complex big data architecture. Implications: ● Error fixing during development are fast and easy ● Error reporting are coming with good amount of context Easy to learn Dagster User which haven’t used Dagster, can get started fast. Concepts behind make sense to new users. Implications: ● Developers up to speed fast ● It’s pleasant to write pipelines
  • 14. Advantages in using Dagster Self-Documented Pipelines are documented directly within Dagit. Each step is explained by the solids and rich metadata e.g. adding SQL-Stmt or Assets. Implications: ● Users and customers can easily understand what’s going on ● Easy to model pipelines Reusable code Existing Microservices in Python could be easily transferred with minimum effort. With `resources` and `solids` we can re-use all our code in an easy way. Implications: ● Easy to consolidate code into Dagster ● No code duplication (DRY-principle) ● Stable and tested functions ● Reduce of boiler-plate compared implement multiple microservices ● Functional by design
  • 15. Example of Re-usable Code with Resources Define once And use everywhere with context
  • 16. Advantages in using Dagster Kubernetes deployment Easy way to schedule pods from our pipelines. Implications: ● Based on dockerfiles which allows us to run SQL-Server pods and at the same time pod with Spark configured Python based (& SQL supportive) Python is the language of data and easy to understand for analysts and engineers. With prepared easy to inject SQL-statements. Implications: ● Easier for non Engineers to adapt ● Possible to use wide range of Python packages, especially for ML
  • 17. Next Steps Testing ● Add Unit and Smoke Tests to improve stability Documentation ● Use Assets more intensively / automated(?) ● Integrate with new data lineage feature Guidelines ● Extend our Dagster guidelines and best practices to align on common patterns Pipelines ● Try dynamic orchestration for overall pipeline ● Add partitions by file_name
  • 18. Questions? Thanks for listening! Feel free to reach out to me on Dagster-Slack or anywhere else. SmartAnalytics Mobile Network Testing - MNT sspaeti.com Simon Späti @sspaeti