SlideShare a Scribd company logo
maayang@wix.com linkedin/maayan-gad github.com/maaayniti
Maayan Gad - Data Engineering Guild
June 2023
Solving Data
Engineers Velocity
With Wix’s Data Warehouse Automation
Hi, I’m Maayan
→ Senior Big Data Engineer at day
→ A metal acapella singer at night
Solving Data Engineers Velocity | June 2023
Have been designing and developing Big
Data platforms for 6 years
→
Agenda → DE work today
→ Wix’s Solution
→ Demo of DWH
Automation
→ Additional DWHA Capabilities
Solving Data Engineers Velocity | June 2022
→ What’s next?
→ The 3 Components of the Solution
Maintaining
SQL
is fun
right?
Presentation Name | June 2023
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity | June 2023
And what about more complex changes?
Wix’s Solution
BI Bank
→ Unifies the semantics
→ Defines the sources
Data Warehouse Automation
→ Unifies DWH tables definitions
→ Handles differences, so
maintenance becomes easy
Metric Collector
→ Unifies the way we
collect the sources
→ Efficient
Solving Data Engineers Velocity | June 2023
→ Handles the aggregations
Wix Engineering Locations
EU Ukraine Israel ROW
Vilnius Kyiv Tel-Aviv USA
Krakow Dnipro Be’er Sheva Canada
Berlin Lviv Haifa
Amsterdam
Solving Data Engineers Velocity | June 2022
BI Bank
Solving Data Engineers Velocity | June 2023
Rules
→ The source - the “from” and
“where” of the query
KPIs
→ Adds a domain to a set of rules
i.e. sites, users
• Helps us read from each source
only once
• Used in multiple data platforms
inside the company
Solving Data Engineers Velocity | June 2023
Metric Collector
1
3
5
2
4
Get a list of metrics objects
Extract all relevant sources
Load all sources in parallel
Generate DFs per sources
Validate Metrics (optional)
Store to S3 Return a DF
Regular
OR
Union all DFs
DWH Automation
1 Understands the difference between yesterday’s
run and today’s run from Iceberg metadata
2 Reads the raw data using the Metric Collector
3 Aggregates
Solving Data Engineers Velocity | June 2023
Standard flow
4 Adds table-type-related logic
5 Writes - data+metadata
LEAD/LAG for SCD History Join for Dim
DWH Automation
→ Aggregation columns additions,
renames and deletions
→ Changes in the sources,
A.K.A the BI bank rules and KPIs
→ Changes in the table configuration:
● Table name
● Start time/days back
● Group by (PK) columns
● Owner
Solving Data Engineers Velocity | June 2023
Difference deepdive
Now, what used to take
hours, takes minutes or
seconds.
Solving Data Engineers Velocity | June 2023
Demo time!
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Additional DWHA Capabilities
Table types
→ Fact
→ Slowly changing type 2
Scale
→ 5,000-10,000 GB
processed per day
→ 370,000,000,000 rows
for the largest table
Solving Data Engineers Velocity | June 2023
→ Dimension
→ Fact Unpivot
→ Hundreds of tables
Handels Differences in
→ Column name, addition,
deletion
→ Primary key change
→ Table start time/days back
→ Source change
→ Table name, owner
Additional Columns
→ Update date, incremental date
→ Slowly changing - start, stop, status
We are now
working to create
a DWH
Automation UI
Solving Data Engineers Velocity | June 2023
What’s Next?
Thank You!
Any questions?

More Related Content

PDF
Power BI Updates - ____November 2023.pdf
PPTX
PSSUG Nov 2012: Big Data with SQL Server
PDF
Power_BI_Updates_October_2023_1698124764.pdf
PPTX
Big Data with SQL Server
PDF
Big Data Analytics from Azure Cloud to Power BI Mobile
PPTX
Session 1 Introduction to NoSQL.pptx
PPTX
Coud-based Data Lake for Analytics and AI
PDF
Azure SQL Data Warehouse
Power BI Updates - ____November 2023.pdf
PSSUG Nov 2012: Big Data with SQL Server
Power_BI_Updates_October_2023_1698124764.pdf
Big Data with SQL Server
Big Data Analytics from Azure Cloud to Power BI Mobile
Session 1 Introduction to NoSQL.pptx
Coud-based Data Lake for Analytics and AI
Azure SQL Data Warehouse

Similar to Solving Data Engineers Velocity - Wix's Data Warehouse Automation (20)

PDF
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
PPTX
Agile Data Engineering - Intro to Data Vault Modeling (2016)
PDF
Predictions for the Future of Graph Database
PDF
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
PDF
Workshop on Google Cloud Data Platform
PDF
20210427 azure lille_meetup_azure_data_stack
PDF
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
PPTX
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
PDF
🐬 The future of MySQL is Postgres 🐘
PPTX
The Six pillars for Building big data analytics ecosystems
PDF
What's new in SQL Server 2017
PDF
Introduction to Azure Data Factory
PPTX
Introduction to Big Data
PDF
Cosmos DB Real-time Advanced Analytics Workshop
PDF
BDE SC3.3 Workshop - BDE Platform: Technical overview
PDF
MineDB Mineral Resource Evaluation White Paper
PDF
Avast Premium Security 24.12.9725 + License Key Till 2050
PDF
Serif Affinity Photo Crack 2.3.1.2217 + Serial Key [Latest]
PDF
FastStone Capture 10.4 Crack + Serial Key [Latest]
PDF
EASEUS Partition Master 18.8 Crack + License Code [2025]
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Predictions for the Future of Graph Database
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
Workshop on Google Cloud Data Platform
20210427 azure lille_meetup_azure_data_stack
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
🐬 The future of MySQL is Postgres 🐘
The Six pillars for Building big data analytics ecosystems
What's new in SQL Server 2017
Introduction to Azure Data Factory
Introduction to Big Data
Cosmos DB Real-time Advanced Analytics Workshop
BDE SC3.3 Workshop - BDE Platform: Technical overview
MineDB Mineral Resource Evaluation White Paper
Avast Premium Security 24.12.9725 + License Key Till 2050
Serif Affinity Photo Crack 2.3.1.2217 + Serial Key [Latest]
FastStone Capture 10.4 Crack + Serial Key [Latest]
EASEUS Partition Master 18.8 Crack + License Code [2025]
Ad

More from Itai Yaffe (20)

PDF
Mastering Partitioning for High-Volume Data Processing
PDF
Lessons Learnt from Running Thousands of On-demand Spark Applications
PPTX
Why do the majority of Data Science projects never make it to production?
PDF
Planning a data solution - "By Failing to prepare, you are preparing to fail"
PDF
Evaluating Big Data & ML Solutions - Opening Notes
PDF
Big data serving: Processing and inference at scale in real time
PDF
Data Lakes on Public Cloud: Breaking Data Management Monoliths
PDF
Unleashing the Power of your Data
PDF
Data Lake on Public Cloud - Opening Notes
PDF
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...
PDF
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
PDF
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
PDF
Introducing Kafka Connect and Implementing Custom Connectors
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
PDF
Scalable Incremental Index for Druid
PDF
Funnel Analysis with Spark and Druid
PDF
The benefits of running Spark on your own Docker
PDF
Optimizing Spark-based data pipelines - are you up for it?
PDF
Scheduling big data workloads on serverless infrastructure
PDF
GraphQL API on a Serverless Environment
Mastering Partitioning for High-Volume Data Processing
Lessons Learnt from Running Thousands of On-demand Spark Applications
Why do the majority of Data Science projects never make it to production?
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Evaluating Big Data & ML Solutions - Opening Notes
Big data serving: Processing and inference at scale in real time
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Unleashing the Power of your Data
Data Lake on Public Cloud - Opening Notes
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the ...
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Introducing Kafka Connect and Implementing Custom Connectors
A Day in the Life of a Druid Implementor and Druid's Roadmap
Scalable Incremental Index for Druid
Funnel Analysis with Spark and Druid
The benefits of running Spark on your own Docker
Optimizing Spark-based data pipelines - are you up for it?
Scheduling big data workloads on serverless infrastructure
GraphQL API on a Serverless Environment
Ad

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Introduction to Data Science and Data Analysis
PPTX
1_Introduction to advance data techniques.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Business Analytics and business intelligence.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Lecture1 pattern recognition............
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Data Science and Data Analysis
1_Introduction to advance data techniques.pptx
ISS -ESG Data flows What is ESG and HowHow
IBA_Chapter_11_Slides_Final_Accessible.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Business Analytics and business intelligence.pdf
[EN] Industrial Machine Downtime Prediction
Lecture1 pattern recognition............
annual-report-2024-2025 original latest.
Introduction to Knowledge Engineering Part 1
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Supervised vs unsupervised machine learning algorithms
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

Solving Data Engineers Velocity - Wix's Data Warehouse Automation

  • 1. maayang@wix.com linkedin/maayan-gad github.com/maaayniti Maayan Gad - Data Engineering Guild June 2023 Solving Data Engineers Velocity With Wix’s Data Warehouse Automation
  • 2. Hi, I’m Maayan → Senior Big Data Engineer at day → A metal acapella singer at night Solving Data Engineers Velocity | June 2023 Have been designing and developing Big Data platforms for 6 years →
  • 3. Agenda → DE work today → Wix’s Solution → Demo of DWH Automation → Additional DWHA Capabilities Solving Data Engineers Velocity | June 2022 → What’s next? → The 3 Components of the Solution
  • 11. Solving Data Engineers Velocity | June 2023 And what about more complex changes?
  • 12. Wix’s Solution BI Bank → Unifies the semantics → Defines the sources Data Warehouse Automation → Unifies DWH tables definitions → Handles differences, so maintenance becomes easy Metric Collector → Unifies the way we collect the sources → Efficient Solving Data Engineers Velocity | June 2023 → Handles the aggregations
  • 13. Wix Engineering Locations EU Ukraine Israel ROW Vilnius Kyiv Tel-Aviv USA Krakow Dnipro Be’er Sheva Canada Berlin Lviv Haifa Amsterdam Solving Data Engineers Velocity | June 2022
  • 14. BI Bank Solving Data Engineers Velocity | June 2023 Rules → The source - the “from” and “where” of the query KPIs → Adds a domain to a set of rules i.e. sites, users
  • 15. • Helps us read from each source only once • Used in multiple data platforms inside the company Solving Data Engineers Velocity | June 2023 Metric Collector
  • 16. 1 3 5 2 4 Get a list of metrics objects Extract all relevant sources Load all sources in parallel Generate DFs per sources Validate Metrics (optional) Store to S3 Return a DF Regular OR Union all DFs
  • 17. DWH Automation 1 Understands the difference between yesterday’s run and today’s run from Iceberg metadata 2 Reads the raw data using the Metric Collector 3 Aggregates Solving Data Engineers Velocity | June 2023 Standard flow 4 Adds table-type-related logic 5 Writes - data+metadata LEAD/LAG for SCD History Join for Dim
  • 18. DWH Automation → Aggregation columns additions, renames and deletions → Changes in the sources, A.K.A the BI bank rules and KPIs → Changes in the table configuration: ● Table name ● Start time/days back ● Group by (PK) columns ● Owner Solving Data Engineers Velocity | June 2023 Difference deepdive
  • 19. Now, what used to take hours, takes minutes or seconds. Solving Data Engineers Velocity | June 2023 Demo time!
  • 24. Additional DWHA Capabilities Table types → Fact → Slowly changing type 2 Scale → 5,000-10,000 GB processed per day → 370,000,000,000 rows for the largest table Solving Data Engineers Velocity | June 2023 → Dimension → Fact Unpivot → Hundreds of tables Handels Differences in → Column name, addition, deletion → Primary key change → Table start time/days back → Source change → Table name, owner Additional Columns → Update date, incremental date → Slowly changing - start, stop, status
  • 25. We are now working to create a DWH Automation UI Solving Data Engineers Velocity | June 2023 What’s Next?