SlideShare a Scribd company logo
Data Architecture at VEX
Talk Of The Minds
Wout Scheepers
Vente-Exclusive.com: Market leader in the Benelux
2,7M
54% of
members
2,3M
45% of
members
40K
1% of
members
The Benelux Market =
29M people with high purchase power
> 6 M members
in the Benelux
Up to
230 000 unique
visitors per day
> 2 500
partner brands in
Europe
Founded in
2007
> 300
staff in Brussels
& Amsterdam
€ 126 M
turnover in 2016*
+ 54% vs. 2015
* NET turnover: VAT excluded, after forced cancellations, user cancellations, discounts,
shipping fees and returns excluded
Key figures 2016
Meet the IT team
5 Squads (~50 people)
Customer facing shop front- and backend
Logistics warehouse software, deliveries
ESPN Backoffice for employees to configure shop,
manage sales, customer-care, …
Operations company wide IT-support, shop issues
Data Business intelligence
● Provide business with valuable information for
decision making (KPI’s & dashboards)
● Provide analysts with uniform query-able data
(data-warehouse)
● Relevance (recommendation, sale ranking,
competitor pricing...)
Meet the data team
1 product manager
3 data engineers
2 data scientists
1 tableau expert
Our responsibilities
v
A major growth supported by strategic alliances
FRANCE
SPAIN
ITALY
UK
SPAIN
ITALY
SWITZERLAND
POLAND
BELGIUM
NETHERLANDS
LUXEMBOURG
GERMANY
AUSTRIA
DENMARK
Geographic expansion to Germany, Austria & Scandinavia
COPENHAGEN
We will need to scale our multichannel e-commerce platform
We will need to scale our multichannel e-commerce platform
More customers & sales horizontal
More geographic locations geographical
https://cloud.google.com/kubernetes-engine/kubernetes-comic/
Scaling the business
Monolith architecture
● One large application
● Single production database
● Dedicated machines
Scaling the business
Monolith architecture
● One large application
● Single production database
● Dedicated machines
Drawbacks
● Integration nightmares (hope all parts keep working together)
● Deployment nightmares (hope the platform does not go down)
Scaling the business
Scaling the monolith...
Brussels
Brussels Amsterdam
Horizontal Geographical
Leads to...
● Inconsistencies
● Inefficient resource allocation
… while we already had BI challenges to fix
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
uniform data warehouse
Consistency
… while we already had BI challenges to fix
2Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
uniform data warehouse precalculated KPI’s
Consistency
… while we already had BI challenges to fix
Efficiency
2
3
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
● Redundant recalculations
● Redeveloping queries
→ Waste of human and computing resources
uniform data warehouse precalculated KPI’s
precalculated KPI’s
Consistency
… while we already had BI challenges to fix
Efficiency
2
Availability43
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
● Redundant recalculations
● Redeveloping queries
→ Waste of human and computing resources
Increasing use cases for real-time data
→ Hard to deliver without affecting system
performance
uniform data warehouse precalculated KPI’s
precalculated KPI’s streaming KPI pipelines
Microservice architecture
Scaling the business
Monolith architecture
Our solution: microservices
• Small, modular service
• Unique process that
serves a business goal
• Independently deployable
Production
database
MongoDB
Database
Cloud SQL
Database
Scaling the business: microservices
Microservice challenges
• Management overhead
• Need well defined
communication between services
+ Big challenge for Business Intelligence
• Need to collect and merge data from multiple sources
• NoSQL databases are not suitable for analytical queries
Original platform architecture
● Monolithic .Net application
● Single production database
● Dedicated machines
Production
database
.Net
Application
Original platform architecture
Production
database
Reporting
database
SQL
Microsoft
Excel
● Monolithic .Net application
● Single production database
● Dedicated machines
● Data copied to reporting server nightly
● Most analysis in SQL & Excel
.Net
Application
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
.Net
Application
Nightly table
transfers
Cloud
Storage
Apache Airflow
Batch Orchestration
Open source, developed at Airbnb
Extract Transfer Load (ETL)
DAGs to define sequence of tasks
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
.Net
Application
Nightly table
transfers
Cloud
Storage
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
Channel
interactions
.Net
Application
Nightly table
transfers
Cloud
Storage
Google BigQuery
● Analytics data warehouse
● zero configuration
No worries about memory, network,
CPU or disk
● Petabyte scale
● Vex: ~16TB
Queried 1 month: ~700TB
Google BigQuery
● Based on Google Dremel
● Parallel query execution:
1. Columnar Storage
→ high compression ratio and scan
throughput
2. Tree Architecture
→ dispatching queries and aggregating
results across thousands of machines
Hope you are not easily impressed
How long it would take to read 80GB from a hard drive at
100 MB/s?
~ 80 000 / 100 = 800s = 13.33 min
What if we use an SSD (700 MB/s)?
~ 80 000 / 700 = 114s = 2 min
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
Channel
interactions
.Net
Application
Nightly table
transfers
Cloud
Storage
Business intelligence managed
Production
database
Reporting
database
BigQuery
Nightly table
transfers
SparkPost
ADYEN
E-mail
Payments
Channel
interactions
.Net
Application
Cloud
Storage
Business intelligence managed using Tableau
Production
database
Reporting
database
Nightly table
transfers
.Net
Application
Tableau
Server
Microsoft
Excel
Cloud
Storage
BigQuery
SparkPost
ADYEN
E-mail
Payments
Channel
interactions
Personalization using PySpark on DataProc, and BigQuery
Production
database
Reporting
database
BigQuery
Nightly transfers
Cloud
Storage
Cloud
Dataproc
.Net
Applications
Tableau
Server
Microsoft
Excel
Relevance calculations
E.g. sale-ranking
BigQuery
SparkPost
ADYEN
E-mail
Payments
Channel
interactions
Why the cloud?
We use Google Cloud Platform (PaaS)
Why the cloud?
We use Google Cloud Platform (PaaS)
Managed products
Managed infrastructure
Focus on solving the application
challenges at hand
With state-of-art the developer products
that integrate well
Without worrying about infrastructure
Main advantages Enable us to
Also, we only pay for the resources we use!
Why the cloud?
We use Google Cloud Platform (PaaS)
Managed products
Managed infrastructure
Focus on solving the application
challenges at hand
With state-of-art the developer products
that integrate well
Without worrying about infrastructure
Main advantages Enable us to
Also, we only pay for the resources we use!
Disadvantage: we depend on google...
New microservice architecture
● Monolithic application is
decomposed into microservices
● New architecture allows to quickly
scale horizontally and
geographically
● Team standardized on .Net Core,
Angular 2 and (mostly) MongoDB
● Each service has its own (No)SQL
database
Kubernetes
Production microservices
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Containers?
a lightweight, stand-alone,
executable package
...
of a piece of software
...
that includes everything needed to
run it: code, runtime, system tools,
system libraries, settings
Kubernetes?
open-source system for automating deployment, scaling, and management
of containerized applications
Pokémon go! launch
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0 P1.1
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0 P1.1
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0 P1.1
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.1
Microservices pros/cons
+ Scaling
+ CI/CD
+ If something breaks, fix it in one place
- Container management and deployment: bumpy road
- Communication between services → contracts
- Business intelligence: data collection + aggregation
Back-Office Sale Progress
One back-office screen requires information from multiple services
Product catalog Pricing Stock Orders Clicks
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Event Sourcing with PubSub
Message-oriented middleware
Many-to-many, asynchronous
• Data is published onto a
topic
• Data can be pulled through
a subscription on this topic
Open source alternative: Kafka
Membership
microservice
Messaging
microservice
“member-
created”
Sends welcome email
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Entity builder
DataFlow streaming
BigQuery
Google Dataflow / Apache Beam
Unified model for streaming and batch pipelines
for processing large datasets
Focus on logical composition instead of physical orchestration
→ focus on what instead of how
Useful abstractions: distribution, coordinating workers, data
sharding, ...
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Entity builder
DataFlow streaming
BigQuery
Google BigTable
● Massively Scalable NoSQL
● Key value store
● 3 dimensions: rows, columns, time
● Simultaneously read and write
● Large throughput, minimal latency
Example BigTable schema
member_id auth profile ...member channel ip address ...
member@20170602 20:30
member@20170604 08:26
member@20171207 12:17
member@20171014 14:57
Example BigTable schema
member_id auth profile ...member channel ip address ...
member@20170602 20:30
member@20170604 08:26
member@20171207 12:17
member@20171014 14:57
Why BigTable?
● Fast lookups & writes
→ essential for our real-time pipelines!
● Bonus points: store complete history
What is the main difference with BigQuery?
Kubernetes
Production microservices
Data is looped back through microservices
Cloud
Bigtable
Data egress
Python / GO
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Entity builder
DataFlow streaming
Kubernetes
Production microservices
Large data streams are stored in BigQuery
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Channel
interactions
Data egress
Python / GO
Data ingress
Python / GO
SparkPost
ADYEN
External event
Cloud Pub/Sub
Payments
E-mail
Entity builder
DataFlow streaming
BigQuery
Cloud
Dataflow
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
How to get from production data to analyst data?
Production data
(2) Analyst data
MongoDB
Database Cloud SQL
Name
BigQuery
BI infrastructure
Microservices
External data
SparkPost
ADYEN
Calculations
BigQuery
Entities
BigQuery
Raw entity data
Processed data
Cloud
Bigtable
(1) Production data
Cloud
Bigtable
2
1
Example: Real-time data enrichment
Cloud
Pub/Sub
Cloud
Bigtable
Cloud
Dataflow
Entity information
Example: unique visitors per country
Cloud
Bigtable
Demo
time
60
Key take-aways
Cloud enables us to do a lot in a short amount of time
Microservices have trade-offs.
For us, scaling is worth it.
Good tooling is very important.
Also make your own tools that are business specific.
Interesting references
● Inside look at Google Bigquery
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● Comic: CI/CD with kubernetes
https://cloud.google.com/kubernetes-engine/kubernetes-comic/
● The Children's Illustrated Guide to Kubernetes
https://deis.com/blog/2016/kubernetes-illustrated-guide/
● Netflix microservice architecture
https://www.youtube.com/watch?v=57UK46qfBLY
● Streaming pipelines with Google Dataflow
https://youtu.be/JZPTQrNKsqI
63
wout.scheepers@exellys.com

More Related Content

PPTX
Architecting Snowflake for High Concurrency and High Performance
PDF
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
PDF
Snowflakes in the Cloud Real world experience on a new approach for Big Data
PDF
2021 gartner mq dsml
PPTX
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
PDF
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
PPT
Google App Engine
PPTX
Zero to Snowflake Presentation
Architecting Snowflake for High Concurrency and High Performance
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Snowflakes in the Cloud Real world experience on a new approach for Big Data
2021 gartner mq dsml
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Google App Engine
Zero to Snowflake Presentation

What's hot (20)

PPTX
Playing to Win: Turbocharged Tableau with a GPU Database
PDF
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
PDF
Seeing Redshift: How Amazon Changed Data Warehousing Forever
PDF
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
PDF
Accelerating Big Data Analytics with Apache Kylin
PDF
Advanced data science algorithms applied to scalable stream processing by Dav...
PDF
The Rise of Engineering-Driven Analytics by Loren Shure
PDF
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
PDF
SLC Snowflake User Group - Mar 12, 2020
PDF
Suburface 2021 IBM Cloud Data Lake
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PDF
Delivering rapid-fire Analytics with Snowflake and Tableau
PPTX
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
PPTX
Accelerating Data Warehouse Modernization
PDF
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
PDF
Dataiku & Snowflake Meetup Berlin 2020
PPTX
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
PPTX
How to Realize an Additional 270% ROI on Snowflake
PDF
Actionable Insights with AI - Snowflake for Data Science
PDF
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Playing to Win: Turbocharged Tableau with a GPU Database
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Accelerating Big Data Analytics with Apache Kylin
Advanced data science algorithms applied to scalable stream processing by Dav...
The Rise of Engineering-Driven Analytics by Loren Shure
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
SLC Snowflake User Group - Mar 12, 2020
Suburface 2021 IBM Cloud Data Lake
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Delivering rapid-fire Analytics with Snowflake and Tableau
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Accelerating Data Warehouse Modernization
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Dataiku & Snowflake Meetup Berlin 2020
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
How to Realize an Additional 270% ROI on Snowflake
Actionable Insights with AI - Snowflake for Data Science
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Ad

Similar to Data Architecture at Vente-Exclusive.com - TOTM Exellys (20)

PPT
AWS Summit Berlin 2013 - Big Data Analytics
PDF
How to Develop and Operate Cloud First Data Platforms
PDF
Creating a Modern Data Architecture for Digital Transformation
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
PDF
DevOps in the Cloud with Microsoft Azure
PDF
Big data for Telco: opportunity or threat?
PDF
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
PPTX
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
PPTX
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
PDF
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
PDF
Data Science in the Cloud @StitchFix
PPTX
Applying linear regression and predictive analytics
PDF
OpenMetadata Spotlight - OpenMetadata @ Gorgias
PDF
How to Develop and Operate Cloud Native Data Platforms and Applications
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PDF
Simply Business' Data Platform
PDF
Horses for Courses: Database Roundtable
PPTX
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
PPTX
AWS Summit Berlin 2013 - Big Data Analytics
How to Develop and Operate Cloud First Data Platforms
Creating a Modern Data Architecture for Digital Transformation
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
DevOps in the Cloud with Microsoft Azure
Big data for Telco: opportunity or threat?
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Data Science in the Cloud @StitchFix
Applying linear regression and predictive analytics
OpenMetadata Spotlight - OpenMetadata @ Gorgias
How to Develop and Operate Cloud Native Data Platforms and Applications
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Simply Business' Data Platform
Horses for Courses: Database Roundtable
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Ad

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
NewMind AI Monthly Chronicles - July 2025
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Data Architecture at Vente-Exclusive.com - TOTM Exellys

  • 1. Data Architecture at VEX Talk Of The Minds Wout Scheepers
  • 2. Vente-Exclusive.com: Market leader in the Benelux 2,7M 54% of members 2,3M 45% of members 40K 1% of members The Benelux Market = 29M people with high purchase power > 6 M members in the Benelux Up to 230 000 unique visitors per day > 2 500 partner brands in Europe Founded in 2007 > 300 staff in Brussels & Amsterdam € 126 M turnover in 2016* + 54% vs. 2015 * NET turnover: VAT excluded, after forced cancellations, user cancellations, discounts, shipping fees and returns excluded Key figures 2016
  • 3. Meet the IT team 5 Squads (~50 people) Customer facing shop front- and backend Logistics warehouse software, deliveries ESPN Backoffice for employees to configure shop, manage sales, customer-care, … Operations company wide IT-support, shop issues Data Business intelligence
  • 4. ● Provide business with valuable information for decision making (KPI’s & dashboards) ● Provide analysts with uniform query-able data (data-warehouse) ● Relevance (recommendation, sale ranking, competitor pricing...) Meet the data team 1 product manager 3 data engineers 2 data scientists 1 tableau expert Our responsibilities
  • 5. v A major growth supported by strategic alliances FRANCE SPAIN ITALY UK SPAIN ITALY SWITZERLAND POLAND BELGIUM NETHERLANDS LUXEMBOURG GERMANY AUSTRIA DENMARK Geographic expansion to Germany, Austria & Scandinavia COPENHAGEN
  • 6. We will need to scale our multichannel e-commerce platform
  • 7. We will need to scale our multichannel e-commerce platform More customers & sales horizontal More geographic locations geographical
  • 9. Scaling the business Monolith architecture ● One large application ● Single production database ● Dedicated machines
  • 10. Scaling the business Monolith architecture ● One large application ● Single production database ● Dedicated machines Drawbacks ● Integration nightmares (hope all parts keep working together) ● Deployment nightmares (hope the platform does not go down)
  • 11. Scaling the business Scaling the monolith... Brussels Brussels Amsterdam Horizontal Geographical Leads to... ● Inconsistencies ● Inefficient resource allocation
  • 12. … while we already had BI challenges to fix Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data uniform data warehouse
  • 13. Consistency … while we already had BI challenges to fix 2Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data ● No single definition of KPIs ● Analysts write different calculations from different data sources → Inconsistencies uniform data warehouse precalculated KPI’s
  • 14. Consistency … while we already had BI challenges to fix Efficiency 2 3 Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data ● No single definition of KPIs ● Analysts write different calculations from different data sources → Inconsistencies ● Redundant recalculations ● Redeveloping queries → Waste of human and computing resources uniform data warehouse precalculated KPI’s precalculated KPI’s
  • 15. Consistency … while we already had BI challenges to fix Efficiency 2 Availability43 Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data ● No single definition of KPIs ● Analysts write different calculations from different data sources → Inconsistencies ● Redundant recalculations ● Redeveloping queries → Waste of human and computing resources Increasing use cases for real-time data → Hard to deliver without affecting system performance uniform data warehouse precalculated KPI’s precalculated KPI’s streaming KPI pipelines
  • 16. Microservice architecture Scaling the business Monolith architecture Our solution: microservices • Small, modular service • Unique process that serves a business goal • Independently deployable
  • 17. Production database MongoDB Database Cloud SQL Database Scaling the business: microservices Microservice challenges • Management overhead • Need well defined communication between services + Big challenge for Business Intelligence • Need to collect and merge data from multiple sources • NoSQL databases are not suitable for analytical queries
  • 18. Original platform architecture ● Monolithic .Net application ● Single production database ● Dedicated machines Production database .Net Application
  • 19. Original platform architecture Production database Reporting database SQL Microsoft Excel ● Monolithic .Net application ● Single production database ● Dedicated machines ● Data copied to reporting server nightly ● Most analysis in SQL & Excel .Net Application
  • 20. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers .Net Application Nightly table transfers Cloud Storage
  • 21. Apache Airflow Batch Orchestration Open source, developed at Airbnb Extract Transfer Load (ETL) DAGs to define sequence of tasks
  • 23. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers .Net Application Nightly table transfers Cloud Storage
  • 24. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers Channel interactions .Net Application Nightly table transfers Cloud Storage
  • 25. Google BigQuery ● Analytics data warehouse ● zero configuration No worries about memory, network, CPU or disk ● Petabyte scale ● Vex: ~16TB Queried 1 month: ~700TB
  • 26. Google BigQuery ● Based on Google Dremel ● Parallel query execution: 1. Columnar Storage → high compression ratio and scan throughput 2. Tree Architecture → dispatching queries and aggregating results across thousands of machines
  • 27. Hope you are not easily impressed How long it would take to read 80GB from a hard drive at 100 MB/s? ~ 80 000 / 100 = 800s = 13.33 min What if we use an SSD (700 MB/s)? ~ 80 000 / 700 = 114s = 2 min
  • 28. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers Channel interactions .Net Application Nightly table transfers Cloud Storage
  • 29. Business intelligence managed Production database Reporting database BigQuery Nightly table transfers SparkPost ADYEN E-mail Payments Channel interactions .Net Application Cloud Storage
  • 30. Business intelligence managed using Tableau Production database Reporting database Nightly table transfers .Net Application Tableau Server Microsoft Excel Cloud Storage BigQuery SparkPost ADYEN E-mail Payments Channel interactions
  • 31. Personalization using PySpark on DataProc, and BigQuery Production database Reporting database BigQuery Nightly transfers Cloud Storage Cloud Dataproc .Net Applications Tableau Server Microsoft Excel Relevance calculations E.g. sale-ranking BigQuery SparkPost ADYEN E-mail Payments Channel interactions
  • 32. Why the cloud? We use Google Cloud Platform (PaaS)
  • 33. Why the cloud? We use Google Cloud Platform (PaaS) Managed products Managed infrastructure Focus on solving the application challenges at hand With state-of-art the developer products that integrate well Without worrying about infrastructure Main advantages Enable us to Also, we only pay for the resources we use!
  • 34. Why the cloud? We use Google Cloud Platform (PaaS) Managed products Managed infrastructure Focus on solving the application challenges at hand With state-of-art the developer products that integrate well Without worrying about infrastructure Main advantages Enable us to Also, we only pay for the resources we use! Disadvantage: we depend on google...
  • 35. New microservice architecture ● Monolithic application is decomposed into microservices ● New architecture allows to quickly scale horizontally and geographically ● Team standardized on .Net Core, Angular 2 and (mostly) MongoDB ● Each service has its own (No)SQL database Kubernetes Production microservices Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core
  • 36. Containers? a lightweight, stand-alone, executable package ... of a piece of software ... that includes everything needed to run it: code, runtime, system tools, system libraries, settings
  • 37. Kubernetes? open-source system for automating deployment, scaling, and management of containerized applications Pokémon go! launch
  • 38. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0
  • 39. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0 P1.1
  • 40. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0 P1.1
  • 41. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0 P1.1
  • 42. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.1
  • 43. Microservices pros/cons + Scaling + CI/CD + If something breaks, fix it in one place - Container management and deployment: bumpy road - Communication between services → contracts - Business intelligence: data collection + aggregation
  • 44. Back-Office Sale Progress One back-office screen requires information from multiple services Product catalog Pricing Stock Orders Clicks
  • 45. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub
  • 46. Event Sourcing with PubSub Message-oriented middleware Many-to-many, asynchronous • Data is published onto a topic • Data can be pulled through a subscription on this topic Open source alternative: Kafka Membership microservice Messaging microservice “member- created” Sends welcome email
  • 47. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub
  • 48. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub Cloud Bigtable Entity builder DataFlow streaming BigQuery
  • 49. Google Dataflow / Apache Beam Unified model for streaming and batch pipelines for processing large datasets Focus on logical composition instead of physical orchestration → focus on what instead of how Useful abstractions: distribution, coordinating workers, data sharding, ...
  • 50. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub Cloud Bigtable Entity builder DataFlow streaming BigQuery
  • 51. Google BigTable ● Massively Scalable NoSQL ● Key value store ● 3 dimensions: rows, columns, time ● Simultaneously read and write ● Large throughput, minimal latency
  • 52. Example BigTable schema member_id auth profile ...member channel ip address ... member@20170602 20:30 member@20170604 08:26 member@20171207 12:17 member@20171014 14:57
  • 53. Example BigTable schema member_id auth profile ...member channel ip address ... member@20170602 20:30 member@20170604 08:26 member@20171207 12:17 member@20171014 14:57 Why BigTable? ● Fast lookups & writes → essential for our real-time pipelines! ● Bonus points: store complete history What is the main difference with BigQuery?
  • 54. Kubernetes Production microservices Data is looped back through microservices Cloud Bigtable Data egress Python / GO Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub Entity builder DataFlow streaming
  • 55. Kubernetes Production microservices Large data streams are stored in BigQuery Event sourcing Cloud Pub/Sub Cloud Bigtable Channel interactions Data egress Python / GO Data ingress Python / GO SparkPost ADYEN External event Cloud Pub/Sub Payments E-mail Entity builder DataFlow streaming BigQuery Cloud Dataflow Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core
  • 56. How to get from production data to analyst data? Production data (2) Analyst data MongoDB Database Cloud SQL Name BigQuery BI infrastructure Microservices External data SparkPost ADYEN Calculations BigQuery Entities BigQuery Raw entity data Processed data Cloud Bigtable (1) Production data Cloud Bigtable 2 1
  • 57. Example: Real-time data enrichment Cloud Pub/Sub Cloud Bigtable Cloud Dataflow Entity information Example: unique visitors per country Cloud Bigtable
  • 59. Key take-aways Cloud enables us to do a lot in a short amount of time Microservices have trade-offs. For us, scaling is worth it. Good tooling is very important. Also make your own tools that are business specific.
  • 60. Interesting references ● Inside look at Google Bigquery https://cloud.google.com/files/BigQueryTechnicalWP.pdf ● Comic: CI/CD with kubernetes https://cloud.google.com/kubernetes-engine/kubernetes-comic/ ● The Children's Illustrated Guide to Kubernetes https://deis.com/blog/2016/kubernetes-illustrated-guide/ ● Netflix microservice architecture https://www.youtube.com/watch?v=57UK46qfBLY ● Streaming pipelines with Google Dataflow https://youtu.be/JZPTQrNKsqI