SlideShare a Scribd company logo
Metadata Driven Development
Content
• What is M3D?
• M3D Components
• Why we need it?
• M3D @ Adidas
• What’s to come?
What is it?
Framework
Cloud and platform agnostic
Easy development of new feats
Multiple source/target system
Creation of data lake envs
What is it?
M3D Components
M3D Engine M3D API
https://github.com/adidas/m3d-engine https://github.com/adidas/m3d-api
Features in action!
Creation of data lake envs
Inbound
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Features in action!
Creation of data lake envs
Inbound
Landing
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Features in action!
Creation of data lake envs
Inbound
Landing
Lake
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Features in action!
Creation of data lake envs
Inbound
Landing
Lake Out
View
View View
View
Lake
Features in action!
Inbound
Landing
Lake Out
Lake
Multiple source/target system
M3D
Target
Exasol
Target Y
Target
Target Z
AWS
Cloud and platform agnostic
Cloudera
Features in action!
Inbound
Landing
Lake
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Easy development of new feats
Easily define new
readers/writers based on traits
Introduce transformations on
datasets
Add new APIs to M3D Api
Why We Need It?
Simplification of definition of new tables in lake
Easy way to schedule loads
Clear semantics defined globally
Single extraction from source systems
Simple definition of views as interface for applications
M3D @ Adidas
We have the so called “Cockpit” for definition of global semantics
Orchestration with Jenkins as a Service
Loading data to lake (example):
Place files in
Inbound layer
python m3d_main.py -function
create_emr_cluster  -
core_instance_type m4.large  -
master_instance_type m4.large  -
core_instance_count 3  -
destination_system emr  -
destination_database emr_database 
-destination_environment test  -
config config.json  -emr_version emr-
5.23.0
python m3d_main.py -function
create_table  -config config.json  -
destination_system emr  -
destination_database emr_database  -
destination_environment test  -
destination_table table_name  -
emr_cluster_id id-of-started-cluster
python m3d_main.py -function load_table  -config config.json  -
destination_system emr  -destination_database emr_database  -
destination_environment test  -destination_table table_name  -
load_type FullLoad  -emr_cluster_id id-of-started-cluster
M3D @ Adidas
9 TB In Memory Lake Size
150 TB Lake Size on Big Data Platform
500+ tables in Data Lake
> 900 views consumed by Products
M3D @ Adidas
Table
M3D @ Adidas
M3D @ Adidas
What’s to come?
Airflow as orchestration for M3D
Publish M3D Api as library
UI for quick setup of M3D Api/Engine
PySpark version for M3D engine
Thank you!
Questions?

More Related Content

PDF
PyconUK-2015
PDF
Natural Language Query and Conversational Interface to Apache Spark
PPTX
Configuration management
PDF
Making KVS 10x Scalable
PDF
Fluentd and Docker - running fluentd within a docker container
PPTX
Up and running with pyspark
PDF
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
PPTX
Embulk and Machine Learning infrastructure
PyconUK-2015
Natural Language Query and Conversational Interface to Apache Spark
Configuration management
Making KVS 10x Scalable
Fluentd and Docker - running fluentd within a docker container
Up and running with pyspark
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
Embulk and Machine Learning infrastructure

What's hot (20)

PDF
Fluentd - Flexible, Stable, Scalable
PDF
Overview of the Hive Stinger Initiative
PDF
Meet Solr For The Tirst Again
PDF
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
PPTX
Keeping Spark on Track: Productionizing Spark for ETL
PPTX
RethinkDB - the open-source database for the realtime web
PDF
Fluentd at Bay Area Kubernetes Meetup
PDF
SparkR - Play Spark Using R (20160909 HadoopCon)
PDF
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
PDF
Redis: REmote DIctionary Server
PPTX
Distcp gobblin
PPTX
MySQL Slow Query log Monitoring using Beats & ELK
KEY
MongoFr : MongoDB as a log Collector
PDF
Automating Workflows for Analytics Pipelines
PPT
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
PPTX
Big data Lambda Architecture - Batch Layer Hands On
PDF
Building Hadoop Data Applications with Kite
PPTX
ETL with SPARK - First Spark London meetup
PDF
Fast real-time approximations using Spark streaming
PPTX
Benchmarking Redis by itself and versus other NoSQL databases
Fluentd - Flexible, Stable, Scalable
Overview of the Hive Stinger Initiative
Meet Solr For The Tirst Again
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Keeping Spark on Track: Productionizing Spark for ETL
RethinkDB - the open-source database for the realtime web
Fluentd at Bay Area Kubernetes Meetup
SparkR - Play Spark Using R (20160909 HadoopCon)
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
Redis: REmote DIctionary Server
Distcp gobblin
MySQL Slow Query log Monitoring using Beats & ELK
MongoFr : MongoDB as a log Collector
Automating Workflows for Analytics Pipelines
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
Big data Lambda Architecture - Batch Layer Hands On
Building Hadoop Data Applications with Kite
ETL with SPARK - First Spark London meetup
Fast real-time approximations using Spark streaming
Benchmarking Redis by itself and versus other NoSQL databases
Ad

Similar to M3D - Metadata Driven Development (20)

PDF
Building a Scalable Asset Management (DAM) Platform in the AWS
PPTX
Running Presto and Spark on the Netflix Big Data Platform
PDF
SnapDish AWS
PDF
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
PDF
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
PDF
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
PPTX
Effectively Scale and Operate AEM with MongoDB by Norberto Leite
PDF
M|18 What's New in the MariaDB AX Platform
PPTX
M|18 Analyzing Data with the MariaDB AX Platform
PDF
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
PDF
Dok Talks #124 - Intro to Druid on Kubernetes
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
PDF
Effectively Deploying MongoDB on AEM
PPTX
Introduction to Apache Spark
PDF
Data Replication Options in AWS
PPTX
High Performance Computing (HPC) in cloud
PDF
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
PDF
DW on AWS
PPTX
Terraform modules restructured
PPTX
Terraform Modules Restructured
Building a Scalable Asset Management (DAM) Platform in the AWS
Running Presto and Spark on the Netflix Big Data Platform
SnapDish AWS
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
Effectively Scale and Operate AEM with MongoDB by Norberto Leite
M|18 What's New in the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Dok Talks #124 - Intro to Druid on Kubernetes
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Effectively Deploying MongoDB on AEM
Introduction to Apache Spark
Data Replication Options in AWS
High Performance Computing (HPC) in cloud
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
DW on AWS
Terraform modules restructured
Terraform Modules Restructured
Ad

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
A Presentation on Artificial Intelligence
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine Learning_overview_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Chapter 3 Spatial Domain Image Processing.pdf
sap open course for s4hana steps from ECC to s4
Reach Out and Touch Someone: Haptics and Empathic Computing
Programs and apps: productivity, graphics, security and other tools
A Presentation on Artificial Intelligence
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

M3D - Metadata Driven Development

  • 2. Content • What is M3D? • M3D Components • Why we need it? • M3D @ Adidas • What’s to come?
  • 3. What is it? Framework Cloud and platform agnostic Easy development of new feats Multiple source/target system Creation of data lake envs
  • 5. M3D Components M3D Engine M3D API https://github.com/adidas/m3d-engine https://github.com/adidas/m3d-api
  • 6. Features in action! Creation of data lake envs Inbound Raw File Raw File Raw File Raw File Raw File Raw File
  • 7. Features in action! Creation of data lake envs Inbound Landing Raw File Raw File Raw File Raw File Raw File Raw File
  • 8. Features in action! Creation of data lake envs Inbound Landing Lake Parqu et Parqu et Parqu et Parqu et Parqu et Parqu et
  • 9. Features in action! Creation of data lake envs Inbound Landing Lake Out View View View View Lake
  • 10. Features in action! Inbound Landing Lake Out Lake Multiple source/target system M3D Target Exasol Target Y Target Target Z AWS Cloud and platform agnostic Cloudera
  • 11. Features in action! Inbound Landing Lake Parqu et Parqu et Parqu et Parqu et Parqu et Parqu et Easy development of new feats Easily define new readers/writers based on traits Introduce transformations on datasets Add new APIs to M3D Api
  • 12. Why We Need It? Simplification of definition of new tables in lake Easy way to schedule loads Clear semantics defined globally Single extraction from source systems Simple definition of views as interface for applications
  • 13. M3D @ Adidas We have the so called “Cockpit” for definition of global semantics Orchestration with Jenkins as a Service Loading data to lake (example): Place files in Inbound layer python m3d_main.py -function create_emr_cluster - core_instance_type m4.large - master_instance_type m4.large - core_instance_count 3 - destination_system emr - destination_database emr_database -destination_environment test - config config.json -emr_version emr- 5.23.0 python m3d_main.py -function create_table -config config.json - destination_system emr - destination_database emr_database - destination_environment test - destination_table table_name - emr_cluster_id id-of-started-cluster python m3d_main.py -function load_table -config config.json - destination_system emr -destination_database emr_database - destination_environment test -destination_table table_name - load_type FullLoad -emr_cluster_id id-of-started-cluster
  • 14. M3D @ Adidas 9 TB In Memory Lake Size 150 TB Lake Size on Big Data Platform 500+ tables in Data Lake > 900 views consumed by Products
  • 18. What’s to come? Airflow as orchestration for M3D Publish M3D Api as library UI for quick setup of M3D Api/Engine PySpark version for M3D engine