"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko
Chief Software Architect at Cloud Works
(Teamwork Commerce)
Google Developers Expert
Cloud Champion Innovator
GDG Cloud Kharkiv Organized
Certified Google Cloud Architect
Artem
Nikulchenko
A data warehouse is an enterprise system used for the analysis and
reporting of structured and semi-structured data from multiple sources,
such as point-of-sale transactions, marketing automation, customer
relationship management, and more. A data warehouse is suited for ad hoc
analysis as well custom reporting. A data warehouse can store both current
and historical data in one place and is designed to give a long-range view of
data over time, making it a primary component of business intelligence.
Wikipedia
What is Data Warehouse?
Do you need a Data Warehouse?
Do you need a Data Warehouse?
! Reports are running too slow
Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
! System accumulates a lot of historical data that is not needed for day-to-day workflow
None of the above - are you sure you need a DW?
! You just need a reporting tool?
DataStudio (GCP), Looker (GCP), Tableau etc.
! Your reports are a little slow?
Have you tried ROLAP?
! All your data in PostgreSQL?
There is a surprise at the end of talk for you!
star schema is the simplest style of data
mart schema and is the approach most
widely used to develop data warehouses
and dimensional data marts.
Star-schema
Product
Dimension
Product ID
Product Name
Product Category
Unit Price
Customer
Dimension
Customer ID
Customer Name
Address
City
Zip
Time
Dimension
Order ID
Order Date
Year
Quarter
Month
SALES
Product ID
Order ID
Customer ID
Employer ID
Total
Quantity
Discount
Emp
Dimension
Emp ID
Emp Name
Title
Department
Region
Traditional Data
Warehouse
Extract-Transform-Load (ETL)
! Extract data from sources
! Transform in intermediate tool
! Load into Data Warehouse DB
Data
Warehouse
Data
Sources
Flat
Files
JSON
Files
Cloud
Sources
Extract
Transform
Load
Traditional Data
Warehouse
What are the issues?
! High upfront cost
! High maintenance cost
! Complex ETL process
! Proprietary query language
! No automated scaling
Cloud Data Warehouse
What is the difference?
! No upfront costs (pay-per-usage)
! Fully managed service
! Automatic scaling (due to storage
and compute separation)
! ELT instead of ETL (done in SQL)
! Support of a standard SQL dialect
Google BigQuery
Petabyte scale multi-cloud DW
! Dremel: The Execution Engine
! Colossus: Distributed Storage
! Borg: Compute
! Jupiter: The Network
Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
! Wonder what are the most popular
pages
Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
! Wonder what are the most popular
pages
! Get result within a minute
ELT process
Use
Load
Extract Transform
Moving data into BigQuery
Extract
Load
! BigQuery Data Transfer Service
Moving data into BigQuery
Google Software as a Service (SaaS) apps:
! Campaign Manager
! Cloud Storage
! Google Ad Manager
! Google Ads
! Google Merchant Center (beta)
! Google Play
! Search Ads 360 (beta)
! YouTube Channel reports
! YouTube Content Owner reports
External cloud storage providers:
! Amazon S3
Data warehouses:
! Teradata
! Amazon Redshift
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
! Custom solution using BQ API
Extract
Load
Moving data into BigQuery
Things to think about
?
! Preparing you data
○ PK
○ Data-modification column
Things to think about
! Preparing you data
! Batch vs Streaming Import
Things to think about
! Preparing you data
! Batch vs Streaming Import
! Handling Data Modifications
○ Update instantly (not a good idea)
○ Batch update
○ Views (or Materialized Views)
○ …mixed
Things to think about
! Scheduled query
! CloudTasks
! Composer (AirFlow)
Massaging data in BigQuery
Transform
! DataStudio
! Looker
! ML models (BQ ML or Vertex AI)
! …or any other tool your like
Using data in BigQuery
Use
Teamwork Example
Google BigQuery
! Embeded ML and predictive modeling
! Interactive data analysis with BI Engine
! Multicloud data analysis with BQ Omni
! Federated query and logical DW
Tons of cool features:
Bonus: AlloyDB
! Fully compatible with PostgreSQL,
providing flexibility and true portability for
your workloads
! Superior performance, 4X faster than
standard PostgreSQL for transactional
workloads
! Fast, real-time insights, up to 100X
faster analytical queries than standard
PostgreSQL
A fully managed PostgreSQL-compatible
database service for your most demanding
enterprise database workloads.
https://cloud.google.com/alloydb
Bonus: AlloyDB
Долучайтеся
PayPal: nikulchenko@gmail.com
Revolut: https://revolut.me/artemwvzv
Карта: 5375 4141 2884 6630
Тазики – займаються автівками для ЗСУ
Передали вже більше 170 “тазиків”. Газуємо далі!
ТГ: https://t.me/rooh_uk
Thank You!
Artem Nikulchenko
https://www.linkedin.com/in/artem-nikulchenko/
https://medium.com/@an_14796

More Related Content

PPTX
Kafka 101
PPTX
Google Cloud GenAI Overview_071223.pptx
PPTX
Talend Big Data Capabilities Overview
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
PDF
Making Apache Spark Better with Delta Lake
PDF
Introducing Change Data Capture with Debezium
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PDF
The Apache Spark File Format Ecosystem
Kafka 101
Google Cloud GenAI Overview_071223.pptx
Talend Big Data Capabilities Overview
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Making Apache Spark Better with Delta Lake
Introducing Change Data Capture with Debezium
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
The Apache Spark File Format Ecosystem

What's hot (20)

PPTX
Change data capture
PDF
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
PDF
Modernizing to a Cloud Data Architecture
PDF
Data Platform Architecture Principles and Evaluation Criteria
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PDF
How We Optimize Spark SQL Jobs With parallel and sync IO
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PPTX
DevOps-as-a-Service: Towards Automating the Automation
PDF
Building an open data platform with apache iceberg
PDF
Extend The Power Platform with Custom Connectors - CollabDays NL 2022
PPTX
PPTX
Developing ssas cube
PDF
Moving OBIEE to Oracle Analytics Cloud
PPTX
How to build a successful Data Lake
PDF
Scaling and Modernizing Data Platform with Databricks
PDF
Creating a Data validation and Testing Strategy
PPTX
Apache Kafka Best Practices
PDF
A-Project Report- SSIS
Change data capture
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Modernizing to a Cloud Data Architecture
Data Platform Architecture Principles and Evaluation Criteria
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
How We Optimize Spark SQL Jobs With parallel and sync IO
A Thorough Comparison of Delta Lake, Iceberg and Hudi
DevOps-as-a-Service: Towards Automating the Automation
Building an open data platform with apache iceberg
Extend The Power Platform with Custom Connectors - CollabDays NL 2022
Developing ssas cube
Moving OBIEE to Oracle Analytics Cloud
How to build a successful Data Lake
Scaling and Modernizing Data Platform with Databricks
Creating a Data validation and Testing Strategy
Apache Kafka Best Practices
A-Project Report- SSIS
Ad

Similar to "Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko (20)

PPTX
Data Lake Overview
PPTX
Is the traditional data warehouse dead?
PPTX
Introduction To Data WareHouse
PDF
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
PPTX
Big data architectures and the data lake
PPTX
Designing modern dw and data lake
PPTX
Thu-310pm-Impetus-SachinAndAjay
PDF
Big data analytics beyond beer and diapers
PPTX
Agile data warehousing
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PPTX
DATA WAREHOUSING
PPTX
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
PPTX
The Data Engineering Guide 101 - GDGoC NUML X Bytewise
PDF
single store faster analytics for warehousing
PDF
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
PPT
11667 Bitt I 2008 Lect4
PPTX
UNIT 1.pptxgfghdcsvdsvsvsfffcafcaefefcsdc
PDF
Cognos datawarehouse
PPTX
Data lake-itweekend-sharif university-vahid amiry
Data Lake Overview
Is the traditional data warehouse dead?
Introduction To Data WareHouse
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Big data architectures and the data lake
Designing modern dw and data lake
Thu-310pm-Impetus-SachinAndAjay
Big data analytics beyond beer and diapers
Agile data warehousing
Data Warehouse Modernization: Accelerating Time-To-Action
DATA WAREHOUSING
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
The Data Engineering Guide 101 - GDGoC NUML X Bytewise
single store faster analytics for warehousing
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
11667 Bitt I 2008 Lect4
UNIT 1.pptxgfghdcsvdsvsvsfffcafcaefefcsdc
Cognos datawarehouse
Data lake-itweekend-sharif university-vahid amiry
Ad

More from Fwdays (20)

PDF
"Mastering UI Complexity: State Machines and Reactive Patterns at Grammarly",...
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
PPTX
"Computer Use Agents: From SFT to Classic RL", Maksym Shamrai
PPTX
"Як ми переписали Сільпо на Angular", Євген Русаков
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
PDF
"Validation and Observability of AI Agents", Oleksandr Denisyuk
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
PPTX
"Co-Authoring with a Machine: What I Learned from Writing a Book on Generativ...
PPTX
"Human-AI Collaboration Models for Better Decisions, Faster Workflows, and Cr...
PDF
"AI is already here. What will happen to your team (and your role) tomorrow?"...
PPTX
"Is it worth investing in AI in 2025?", Alexander Sharko
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
PDF
"Database isolation: how we deal with hundreds of direct connections to the d...
PDF
"Scaling in space and time with Temporal", Andriy Lupa .pdf
PPTX
"Provisioning via DOT-Chain: from catering to drone marketplaces", Volodymyr ...
PPTX
" Observability with Elasticsearch: Best Practices for High-Load Platform", A...
PPTX
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
PPTX
"Istio Ambient Mesh in production: our way from Sidecar to Sidecar-less",Hlib...
"Mastering UI Complexity: State Machines and Reactive Patterns at Grammarly",...
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
"Computer Use Agents: From SFT to Classic RL", Maksym Shamrai
"Як ми переписали Сільпо на Angular", Євген Русаков
"AI Transformation: Directions and Challenges", Pavlo Shaternik
"Validation and Observability of AI Agents", Oleksandr Denisyuk
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
"Co-Authoring with a Machine: What I Learned from Writing a Book on Generativ...
"Human-AI Collaboration Models for Better Decisions, Faster Workflows, and Cr...
"AI is already here. What will happen to your team (and your role) tomorrow?"...
"Is it worth investing in AI in 2025?", Alexander Sharko
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Database isolation: how we deal with hundreds of direct connections to the d...
"Scaling in space and time with Temporal", Andriy Lupa .pdf
"Provisioning via DOT-Chain: from catering to drone marketplaces", Volodymyr ...
" Observability with Elasticsearch: Best Practices for High-Load Platform", A...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"Istio Ambient Mesh in production: our way from Sidecar to Sidecar-less",Hlib...

Recently uploaded (20)

PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
The various Industrial Revolutions .pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Hindi spoken digit analysis for native and non-native speakers
PPT
What is a Computer? Input Devices /output devices
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Unlock new opportunities with location data.pdf
PDF
Five Habits of High-Impact Board Members
PDF
1 - Historical Antecedents, Social Consideration.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Benefits of Physical activity for teenagers.pptx
Architecture types and enterprise applications.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Developing a website for English-speaking practice to English as a foreign la...
NewMind AI Weekly Chronicles – August ’25 Week III
Chapter 5: Probability Theory and Statistics
Enhancing emotion recognition model for a student engagement use case through...
The various Industrial Revolutions .pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Getting started with AI Agents and Multi-Agent Systems
CloudStack 4.21: First Look Webinar slides
Hindi spoken digit analysis for native and non-native speakers
What is a Computer? Input Devices /output devices
observCloud-Native Containerability and monitoring.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Module 1.ppt Iot fundamentals and Architecture
Unlock new opportunities with location data.pdf
Five Habits of High-Impact Board Members
1 - Historical Antecedents, Social Consideration.pdf

"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko

  • 2. Chief Software Architect at Cloud Works (Teamwork Commerce) Google Developers Expert Cloud Champion Innovator GDG Cloud Kharkiv Organized Certified Google Cloud Architect Artem Nikulchenko
  • 3. A data warehouse is an enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources, such as point-of-sale transactions, marketing automation, customer relationship management, and more. A data warehouse is suited for ad hoc analysis as well custom reporting. A data warehouse can store both current and historical data in one place and is designed to give a long-range view of data over time, making it a primary component of business intelligence. Wikipedia What is Data Warehouse?
  • 4. Do you need a Data Warehouse?
  • 5. Do you need a Data Warehouse? ! Reports are running too slow
  • 6. Do you need a Data Warehouse? ! Reports are running too slow ! Reports interfere with transactional workflows
  • 7. Do you need a Data Warehouse? ! Reports are running too slow ! Reports interfere with transactional workflows ! Data is dispersed across multiple DB (and some not even in DB…)
  • 8. Do you need a Data Warehouse? ! Reports are running too slow ! Reports interfere with transactional workflows ! Data is dispersed across multiple DB (and some not even in DB…) ! System accumulates a lot of historical data that is not needed for day-to-day workflow
  • 9. None of the above - are you sure you need a DW? ! You just need a reporting tool? DataStudio (GCP), Looker (GCP), Tableau etc. ! Your reports are a little slow? Have you tried ROLAP? ! All your data in PostgreSQL? There is a surprise at the end of talk for you!
  • 10. star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. Star-schema Product Dimension Product ID Product Name Product Category Unit Price Customer Dimension Customer ID Customer Name Address City Zip Time Dimension Order ID Order Date Year Quarter Month SALES Product ID Order ID Customer ID Employer ID Total Quantity Discount Emp Dimension Emp ID Emp Name Title Department Region
  • 11. Traditional Data Warehouse Extract-Transform-Load (ETL) ! Extract data from sources ! Transform in intermediate tool ! Load into Data Warehouse DB Data Warehouse Data Sources Flat Files JSON Files Cloud Sources Extract Transform Load
  • 12. Traditional Data Warehouse What are the issues? ! High upfront cost ! High maintenance cost ! Complex ETL process ! Proprietary query language ! No automated scaling
  • 13. Cloud Data Warehouse What is the difference? ! No upfront costs (pay-per-usage) ! Fully managed service ! Automatic scaling (due to storage and compute separation) ! ELT instead of ETL (done in SQL) ! Support of a standard SQL dialect
  • 14. Google BigQuery Petabyte scale multi-cloud DW ! Dremel: The Execution Engine ! Colossus: Distributed Storage ! Borg: Compute ! Jupiter: The Network
  • 15. Google BigQuery Petabyte scale multi-cloud DW ! Take all Wikipedia views in 2022
  • 16. Google BigQuery Petabyte scale multi-cloud DW ! Take all Wikipedia views in 2022 ! Wonder what are the most popular pages
  • 17. Google BigQuery Petabyte scale multi-cloud DW ! Take all Wikipedia views in 2022 ! Wonder what are the most popular pages ! Get result within a minute
  • 19. Moving data into BigQuery Extract Load
  • 20. ! BigQuery Data Transfer Service Moving data into BigQuery Google Software as a Service (SaaS) apps: ! Campaign Manager ! Cloud Storage ! Google Ad Manager ! Google Ads ! Google Merchant Center (beta) ! Google Play ! Search Ads 360 (beta) ! YouTube Channel reports ! YouTube Content Owner reports External cloud storage providers: ! Amazon S3 Data warehouses: ! Teradata ! Amazon Redshift
  • 21. ! BigQuery Data Transfer Service ! Federated query (for PSQL and MySQL) Moving data into BigQuery
  • 22. ! BigQuery Data Transfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion Moving data into BigQuery
  • 23. ! BigQuery Data Transfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) Moving data into BigQuery
  • 24. ! BigQuery Data Transfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) ! Existing ETL tools Moving data into BigQuery
  • 25. ! BigQuery Data Transfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) ! Existing ETL tools ! BigQuery Omni Moving data into BigQuery
  • 26. ! BigQuery Data Transfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) ! Existing ETL tools ! BigQuery Omni ! Custom solution using BQ API Extract Load Moving data into BigQuery
  • 27. Things to think about ?
  • 28. ! Preparing you data ○ PK ○ Data-modification column Things to think about
  • 29. ! Preparing you data ! Batch vs Streaming Import Things to think about
  • 30. ! Preparing you data ! Batch vs Streaming Import ! Handling Data Modifications ○ Update instantly (not a good idea) ○ Batch update ○ Views (or Materialized Views) ○ …mixed Things to think about
  • 31. ! Scheduled query ! CloudTasks ! Composer (AirFlow) Massaging data in BigQuery Transform
  • 32. ! DataStudio ! Looker ! ML models (BQ ML or Vertex AI) ! …or any other tool your like Using data in BigQuery Use
  • 34. Google BigQuery ! Embeded ML and predictive modeling ! Interactive data analysis with BI Engine ! Multicloud data analysis with BQ Omni ! Federated query and logical DW Tons of cool features:
  • 35. Bonus: AlloyDB ! Fully compatible with PostgreSQL, providing flexibility and true portability for your workloads ! Superior performance, 4X faster than standard PostgreSQL for transactional workloads ! Fast, real-time insights, up to 100X faster analytical queries than standard PostgreSQL A fully managed PostgreSQL-compatible database service for your most demanding enterprise database workloads. https://cloud.google.com/alloydb
  • 37. Долучайтеся PayPal: nikulchenko@gmail.com Revolut: https://revolut.me/artemwvzv Карта: 5375 4141 2884 6630 Тазики – займаються автівками для ЗСУ Передали вже більше 170 “тазиків”. Газуємо далі! ТГ: https://t.me/rooh_uk