SlideShare a Scribd company logo
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
Scaling fashionably 
How PostgreSQL helped 
Zalando to become one of the 
biggest online fashion retailers 
in Europe
About me 
Valentine Gogichashvili 
Database Engineer @Zalando 
twitter: @valgog 
google+: +valgog 
email: valentine.gogichashvili@zalando.de
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
One of Europe's largest 
online fashion retailers 
15 countries 
3 fulfillment centers 
13.7+ million active customers 
1.8 billion € revenue 2013 
150,000+ products 
640+ million visits in first half-year 2014
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
Some more numbers 
200+ deployment units (WARs) 
1300+ production tomcat instances 
80+ database master instances 
90+ different database schemas 
300+ developers and 200+ QA and PM 
10 database engineers
Even more numbers 
● > 4.0 TB of PostgreSQL data 
● Biggest instances (not counted before) 
○ eventlogdb (3TB) 
■ 20 GB per week 
○ riskmgmtdb (5TB) 
■ 12 GB per day
Biggest challenges 
● Constantly growing 
● Fast development cycles 
● No downtimes are tolerated
Agenda 
How we 
● access data 
● change data models without downtimes 
● shard without limits 
● monitor
Agenda 
How we 
● access data 
● change data models without downtimes 
● shard without limits 
● monitor
Accessing data 
- customer 
- bank account 
- order -> bank account 
- order position 
- return order -> order 
- return position -> order position 
- financial document 
- financial transaction -> order
Accessing data 
NoSQL 
▶ map your object hierarchy to a document 
▶ (de-)serialization is easy 
▶ transactions are not needed
Accessing data 
NoSQL 
▶ map your object hierarchy to a document 
▶ (de-)serialization is easy 
▶ transactions are not needed 
▷ No SQL 
▷ implicit schemas are tricky
Accessing data 
ORM 
▶ is well known to developers 
▶ CRUD operations are easy 
▶ all business logic inside your application 
▶ developers are in their comfort zone
Accessing data 
ORM 
▶ is well known to developers 
▶ CRUD operations are easy 
▶ all business logic inside your application 
▶ developers are in their comfort zone 
▷ error prone transaction management 
▷ you have to reflect your tables in your code 
▷ all business logic inside your application 
▷ schema changes are not easy
Accessing data 
Are there alternatives to ORM? 
Stored Procedures 
▶ return/receive entity aggregates 
▶ clear transaction scope 
▶ more data consistency checks 
▶ independent from underlying data schema
Java Application 
JDBC 
Database Tables
Java Application 
JDBC 
Stored Procedure API 
Database Database Tables 
Tables
Java Sproc Wrapper 
Java Application 
Java Application 
Sproc Wrapper 
JDBC 
Stored Procedure API 
Database Database Tables 
Tables
CREATE FUNCTION register_customer(p_email text, 
p_gender z_data.gender) 
RETURNS int 
AS $$ 
INSERT INTO z_data.customer (c_email, c_gender) 
VALUES (p_email, p_gender) 
RETURNING c_id 
$$ LANGUAGE 'sql' SECURITY DEFINER; 
SQL 
Java Sproc Wrapper
@SProcService 
public interface CustomerSProcService { 
@SProcCall 
int registerCustomer(@SProcParam String email, 
@SProcParam Gender gender); 
} 
JAVA 
CREATE FUNCTION register_customer(p_email text, 
p_gender z_data.gender) 
RETURNS int 
AS $$ 
INSERT INTO z_data.customer (c_email, c_gender) 
VALUES (p_email, p_gender) 
RETURNING c_id 
$$ LANGUAGE 'sql' SECURITY DEFINER; 
SQL 
Java Sproc Wrapper
@SProcService 
public interface CustomerSProcService { 
@SProcCall 
int registerCustomer(@SProcParam String email, 
@SProcParam Gender gender); 
} 
JAVA 
CREATE FUNCTION register_customer(p_email text, 
p_gender z_data.gender) 
RETURNS int 
AS $$ 
INSERT INTO z_data.customer (c_email, c_gender) 
VALUES (p_email, p_gender) 
RETURNING c_id 
$$ LANGUAGE 'sql' SECURITY DEFINER; 
SQL 
Java Sproc Wrapper
Java Sproc Wrapper 
CREATE FUNCTION find_orders(p_email text, 
OUT order_id int, 
OUT order_created timestamptz, 
OUT shipping_address order_address) 
RETURNS SETOF record 
AS $$ 
SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address 
FROM z_data."order" 
JOIN z_data.order_address ON oa_order_id = o_id 
JOIN z_data.customer ON c_id = o_customer_id 
WHERE c_email = p_email 
$$ LANGUAGE 'sql' SECURITY DEFINER; 
SQL 
@SProcCall 
List<Order> findOrders(@SProcParam String email); 
JAVA
Java Sproc Wrapper 
CREATE FUNCTION find_orders(p_email text, 
OUT order_id int, 
OUT order_created timestamptz, 
OUT shipping_address order_address) 
RETURNS SETOF record 
AS $$ 
SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address 
FROM z_data."order" 
JOIN z_data.order_address ON oa_order_id = o_id 
JOIN z_data.customer ON c_id = o_customer_id 
WHERE c_email = p_email 
$$ LANGUAGE 'sql' SECURITY DEFINER; 
SQL 
@SProcCall 
List<Order> findOrders(@SProcParam String email); 
JAVA
Stored Procedures 
for developers 
▷ CRUD operations need too much code 
▷ Developers have to learn SQL 
▷ Developers can write bad SQL 
▷ Code reviews are needed
Stored Procedures 
for developers 
▷ CRUD operations need too much code 
▷ Developers have to learn SQL 
▷ Developers can write bad SQL 
▷ Code reviews are needed 
▶ Use-case driven 
▶ Developers have to learn SQL 
▶ Developers learn how to write good SQL
Horror story 
▷ Never map your data manually 
▷ Educate developers 
▷ Educate yourself
Stored Procedure 
API versioning 
Database Tables 
search_path = 
api_v14_23, public; 
api_v14_23
Stored Procedure 
API versioning 
api_v14_23 api_v14_24 
Database Tables 
search_path = 
api_v14_23, public;
Stored Procedure 
API versioning 
api_v14_23 api_v14_24 
Database Tables 
search_path = 
api_v14_23, public; 
search_path = 
api_v14_24, public;
Stored Procedure 
API versioning 
search_path = 
api_v14_24, public; 
api_v14_23 api_v14_24 
Database Tables
Stored Procedure 
API versioning 
search_path = 
api_v14_24, public; 
Database Tables 
api_v14_24
Stored Procedure 
API versioning 
▶ Tests are done to the whole API version 
▶ No API migrations needed 
▶ Deployments are fully automated
Agenda 
How we 
● access data 
● change data models without downtimes 
● shard without limits 
● monitor
Easy schema changes 
PostgreSQL 
▶ Schema changes with minimal locks with: 
ADD/RENAME/DROP COLUMN 
ADD/DROP DEFAULT VALUE 
▶ CREATE/DROP INDEX CONCURRENTLY 
▷ Constraints are still difficult to ALTER 
(will be much better in PostgreSQL 9.4)
Easy schema changes 
Stored Procedure API layer 
▶ Can fill missing data on the fly 
▶ Helps to change data structure 
without application noticing it
Easy schema changes 
● Read and write to old structure 
● Write to both structures, old and new. 
Try to read from new, fallback to old 
● Migrate data 
● Read from new, write to old and new
Easy schema changes 
Schema changes using SQL script files 
● SQL scripts written by developers (DBDIFFs) 
● registering DBDIFFs with Versioning 
● should be reviewed by DB guys 
● DB guys are rolling DB changes on the live system
Easy schema changes 
BEGIN; 
SELECT _v.register_patch('ZEOS-5430.order'); 
CREATE TABLE z_data.order_address ( 
oa_id int SERIAL, 
oa_country z_data.country, 
oa_city varchar(64), 
oa_street varchar(128), ... 
); 
ALTER TABLE z_data."order" ADD o_shipping_address_id int 
REFERENCES z_data.order_address (oa_id); 
COMMIT; 
DBDIFF SQL
Easy schema changes 
BEGIN; 
SELECT _v.register_patch('ZEOS-5430.order'); 
i order/database/order/10_tables/10_order_address.sql 
ALTER TABLE z_data."order" ADD o_shipping_address_id int 
REFERENCES z_data.order_address (oa_id); 
COMMIT; 
DBDIFF SQL
Easy schema changes 
BEGIN; 
SELECT _v.register_patch('ZEOS-5430.order'); 
i order/database/order/10_tables/10_order_address.sql 
SET statement_timeout TO ‘3s’; 
ALTER TABLE z_data."order" ADD o_shipping_address_id int 
REFERENCES z_data.order_address (oa_id); 
COMMIT; 
DBDIFF SQL
Easy schema changes
Easy schema changes
Easy schema changes 
No downtime due to migrations or 
deployment since we use PostgreSQL
Easy schema changes 
One downtime due to migrations or 
deployment since we use PostgreSQL
Horror story 
▷ Invest in staging environments 
▷ Do not become a bottleneck for developers 
▷ Educate developers 
▷ Educate yourself
Agenda 
How we 
● access data 
● change data models without downtimes 
● shard without limits 
● monitor
One big database 
▶ Joins between any entities 
▶ Perfect for BI 
▶ Simple access strategy 
▶ Less machines to manage
One big database 
▷ Data does not fit into memory 
▷ OLTP becomes slower 
▷ Longer data migration times 
▷ Database maintenance tasks take longer
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
Sharded database 
▶ Data fits into memory 
▶ IO bottleneck wider 
▶ OLTP is fast again 
▶ Data migrations are faster 
▶ Database maintenance tasks are faster
Sharded database 
▷ Joins only between entities aggregates 
▷ BI need more tooling 
▷ Accessing data needs more tooling 
▷ Managing more servers needs more tooling
Sharded database 
▷ Need more tooling
Sharding without limits 
Java Application 
Sproc Wrapper 
Stored Procedure API 
Database Database Tables 
Tables
Sharding without limits 
Java Application 
... 
Sproc Wrapper 
Shard 1 Shard 2 Shard 3 Shard N
@SProcCall 
int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber, 
@SProcParam String email, 
@SProcParam Gender gender); 
JAVA 
Sharding with Java Sproc Wrapper 
@SProcCall 
Article getArticle(@SProcParam @ShardKey Sku sku); 
JAVA 
@SProcCall(runOnAllShards = true, parallel = true) 
List<Order> findOrders(@SProcParam String email); 
JAVA
Sharding with Java Sproc Wrapper 
Entity lookup strategies 
● search on all shards (in parallel) 
● hash lookups 
● unique shard aware ID 
○ Virtual Shard IDs (pre-sharding)
Agenda 
How we 
● access data 
● change data models without downtimes 
● shard without limits 
● monitor
Monitoring
pg_view
Monitoring 
● Nagios/Icinga was replaced by ZMON2 
● Dedicated 24x7 monitoring team 
● Custom monitoring infrastructure ZMON2
PGObserver
PGObserver
Links 
SProcWrapper – Java library for stored procedure access 
github.com/zalando/java-sproc-wrapper 
PGObserver – monitoring web tool for PostgreSQL 
github.com/zalando/PGObserver 
pg_view – top-like command line activity monitor 
github.com/zalando/pg_view
Thank you!

More Related Content

PDF
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
PDF
Presto - Hadoop Conference Japan 2014
PDF
Facebook Presto presentation
PDF
Presto Meetup (2015-03-19)
PDF
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
PDF
Presto Strata Hadoop SJ 2016 short talk
PDF
YoctoDB в Яндекс.Вертикалях
PDF
Presto anatomy
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Presto - Hadoop Conference Japan 2014
Facebook Presto presentation
Presto Meetup (2015-03-19)
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
Presto Strata Hadoop SJ 2016 short talk
YoctoDB в Яндекс.Вертикалях
Presto anatomy

What's hot (20)

PDF
Tale of ISUCON and Its Bench Tools
PDF
Scalding - Big Data Programming with Scala
PPTX
Денис Резник "Моя база данных не справляется с нагрузкой. Что делать?"
PPTX
How to ensure Presto scalability 
in multi use case
PPTX
Inside sql server in memory oltp sql sat nyc 2017
PDF
Presto At Treasure Data
PDF
Logging for Production Systems in The Container Era
PDF
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
PDF
Treasure Data and AWS - Developers.io 2015
PDF
Fluentd - Flexible, Stable, Scalable
PDF
Understanding Presto - Presto meetup @ Tokyo #1
PPTX
Custom management apps for Kafka
PDF
Presto+MySQLで分散SQL
PPTX
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
PDF
Introduction to Presto at Treasure Data
PDF
Presto updates to 0.178
PDF
Supercharging MySQL and MariaDB with Plug-ins (SCaLE 12x)
PDF
20140120 presto meetup_en
PDF
Prestogres internals
PDF
Presto at Hadoop Summit 2016
Tale of ISUCON and Its Bench Tools
Scalding - Big Data Programming with Scala
Денис Резник "Моя база данных не справляется с нагрузкой. Что делать?"
How to ensure Presto scalability 
in multi use case
Inside sql server in memory oltp sql sat nyc 2017
Presto At Treasure Data
Logging for Production Systems in The Container Era
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Treasure Data and AWS - Developers.io 2015
Fluentd - Flexible, Stable, Scalable
Understanding Presto - Presto meetup @ Tokyo #1
Custom management apps for Kafka
Presto+MySQLで分散SQL
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
Introduction to Presto at Treasure Data
Presto updates to 0.178
Supercharging MySQL and MariaDB with Plug-ins (SCaLE 12x)
20140120 presto meetup_en
Prestogres internals
Presto at Hadoop Summit 2016
Ad

Viewers also liked (20)

PDF
PostgreSQL в высоконагруженных проектах
PPTX
Учебный план для highload гуру / Андрей Аксёнов (Sphinx Technologies Inc.)
PPTX
Настройка и оптимизация высоконагруженных J2EE веб-приложений / Шамим Ахмед (...
PDF
NodeJS в HighLoad проекте / Акрицкий Владимир (iAge Engineering)
PDF
Разработка аналитической системы для высоконагруженного медиа, Олег Новиков, ...
PDF
Асинхронная репликация без цензуры, Олег Царёв (Mail.ru Group)
PDF
Как не положить тысячи серверов с помощью системы централизованного управлени...
PDF
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
PDF
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
PDF
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
PDF
Qualcomm Snapdragon 835 SoC 介紹
PPTX
Как выбрать In-memory NoSQL базу данных с умом. Тестируем производительность ...
PDF
Highload на GPU, опыт Vinci / Олег Илларионов (ВКонтакте)
PDF
ST 96Boards Demo
PPTX
Пайплайн машинного обучения на Apache Spark / Павел Клеменков (Rambler&Co)
PPTX
Нейронные сети на страже индустриальной кибербезопасности / Павел Филонов (Ла...
PDF
Введение в архитектуры нейронных сетей / Григорий Сапунов (Intento)
PPTX
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
PDF
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
PDF
Socionext ARMv8 server SoC chipset demo
PostgreSQL в высоконагруженных проектах
Учебный план для highload гуру / Андрей Аксёнов (Sphinx Technologies Inc.)
Настройка и оптимизация высоконагруженных J2EE веб-приложений / Шамим Ахмед (...
NodeJS в HighLoad проекте / Акрицкий Владимир (iAge Engineering)
Разработка аналитической системы для высоконагруженного медиа, Олег Новиков, ...
Асинхронная репликация без цензуры, Олег Царёв (Mail.ru Group)
Как не положить тысячи серверов с помощью системы централизованного управлени...
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
Qualcomm Snapdragon 835 SoC 介紹
Как выбрать In-memory NoSQL базу данных с умом. Тестируем производительность ...
Highload на GPU, опыт Vinci / Олег Илларионов (ВКонтакте)
ST 96Boards Demo
Пайплайн машинного обучения на Apache Spark / Павел Клеменков (Rambler&Co)
Нейронные сети на страже индустриальной кибербезопасности / Павел Филонов (Ла...
Введение в архитектуры нейронных сетей / Григорий Сапунов (Intento)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Socionext ARMv8 server SoC chipset demo
Ad

Similar to PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE) (20)

PDF
Jena University Talk 2016.03.09 -- SQL at Zalando Technology
PDF
Creating PostgreSQL-as-a-Service at Scale
ODP
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
PDF
Domain Driven Design Tactical Patterns
PDF
Introduction to Spark Datasets - Functional and relational together at last
PPT
Distributed Queries in IDS: New features.
PPTX
SQL for Web APIs - Simplifying Data Access for API Consumers
ODP
Voltdb: Shard It by V. Torshyn
PPT
Sql Portfolio
PDF
Access Data from XPages with the Relational Controls
PPTX
Full Stack Development With Node.Js And NoSQL (Nic Raboy & Arun Gupta)
PDF
Asynchronous single page applications without a line of HTML or Javascript, o...
PDF
Adding Data into your SOA with WSO2 WSAS
PDF
GraphQL - when REST API is not enough - lessons learned
PDF
NoSQL meets Microservices - Michael Hackstein
PDF
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
PPT
Evolutionary db development
PDF
Triggers and Stored Procedures
PDF
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Jena University Talk 2016.03.09 -- SQL at Zalando Technology
Creating PostgreSQL-as-a-Service at Scale
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Domain Driven Design Tactical Patterns
Introduction to Spark Datasets - Functional and relational together at last
Distributed Queries in IDS: New features.
SQL for Web APIs - Simplifying Data Access for API Consumers
Voltdb: Shard It by V. Torshyn
Sql Portfolio
Access Data from XPages with the Relational Controls
Full Stack Development With Node.Js And NoSQL (Nic Raboy & Arun Gupta)
Asynchronous single page applications without a line of HTML or Javascript, o...
Adding Data into your SOA with WSO2 WSAS
GraphQL - when REST API is not enough - lessons learned
NoSQL meets Microservices - Michael Hackstein
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Evolutionary db development
Triggers and Stored Procedures
Best Practices for Building and Deploying Data Pipelines in Apache Spark

More from Ontico (20)

PDF
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
PDF
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
PPTX
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
PDF
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
PDF
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PDF
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PDF
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
PDF
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
PPTX
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
PDF
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
PPTX
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
PPTX
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
PDF
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
PPT
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
PPTX
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
PPTX
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
PPTX
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
PPTX
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
PDF
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...

Recently uploaded (20)

PPTX
innovation process that make everything different.pptx
PDF
Introduction to the IoT system, how the IoT system works
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PPTX
Digital Literacy And Online Safety on internet
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
Testing WebRTC applications at scale.pdf
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
E -tech empowerment technologies PowerPoint
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
innovation process that make everything different.pptx
Introduction to the IoT system, how the IoT system works
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
Digital Literacy And Online Safety on internet
PptxGenJS_Demo_Chart_20250317130215833.pptx
Testing WebRTC applications at scale.pdf
presentation_pfe-universite-molay-seltan.pptx
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Power Point - Lesson 3_2.pptx grad school presentation
E -tech empowerment technologies PowerPoint
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Introuction about WHO-FIC in ICD-10.pptx
Cloud-Scale Log Monitoring _ Datadog.pdf
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Module 1 - Cyber Law and Ethics 101.pptx
Decoding a Decade: 10 Years of Applied CTI Discipline

PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)

  • 2. Scaling fashionably How PostgreSQL helped Zalando to become one of the biggest online fashion retailers in Europe
  • 3. About me Valentine Gogichashvili Database Engineer @Zalando twitter: @valgog google+: +valgog email: valentine.gogichashvili@zalando.de
  • 5. One of Europe's largest online fashion retailers 15 countries 3 fulfillment centers 13.7+ million active customers 1.8 billion € revenue 2013 150,000+ products 640+ million visits in first half-year 2014
  • 8. Some more numbers 200+ deployment units (WARs) 1300+ production tomcat instances 80+ database master instances 90+ different database schemas 300+ developers and 200+ QA and PM 10 database engineers
  • 9. Even more numbers ● > 4.0 TB of PostgreSQL data ● Biggest instances (not counted before) ○ eventlogdb (3TB) ■ 20 GB per week ○ riskmgmtdb (5TB) ■ 12 GB per day
  • 10. Biggest challenges ● Constantly growing ● Fast development cycles ● No downtimes are tolerated
  • 11. Agenda How we ● access data ● change data models without downtimes ● shard without limits ● monitor
  • 12. Agenda How we ● access data ● change data models without downtimes ● shard without limits ● monitor
  • 13. Accessing data - customer - bank account - order -> bank account - order position - return order -> order - return position -> order position - financial document - financial transaction -> order
  • 14. Accessing data NoSQL ▶ map your object hierarchy to a document ▶ (de-)serialization is easy ▶ transactions are not needed
  • 15. Accessing data NoSQL ▶ map your object hierarchy to a document ▶ (de-)serialization is easy ▶ transactions are not needed ▷ No SQL ▷ implicit schemas are tricky
  • 16. Accessing data ORM ▶ is well known to developers ▶ CRUD operations are easy ▶ all business logic inside your application ▶ developers are in their comfort zone
  • 17. Accessing data ORM ▶ is well known to developers ▶ CRUD operations are easy ▶ all business logic inside your application ▶ developers are in their comfort zone ▷ error prone transaction management ▷ you have to reflect your tables in your code ▷ all business logic inside your application ▷ schema changes are not easy
  • 18. Accessing data Are there alternatives to ORM? Stored Procedures ▶ return/receive entity aggregates ▶ clear transaction scope ▶ more data consistency checks ▶ independent from underlying data schema
  • 19. Java Application JDBC Database Tables
  • 20. Java Application JDBC Stored Procedure API Database Database Tables Tables
  • 21. Java Sproc Wrapper Java Application Java Application Sproc Wrapper JDBC Stored Procedure API Database Database Tables Tables
  • 22. CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER; SQL Java Sproc Wrapper
  • 23. @SProcService public interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender); } JAVA CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER; SQL Java Sproc Wrapper
  • 24. @SProcService public interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender); } JAVA CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER; SQL Java Sproc Wrapper
  • 25. Java Sproc Wrapper CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF record AS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email $$ LANGUAGE 'sql' SECURITY DEFINER; SQL @SProcCall List<Order> findOrders(@SProcParam String email); JAVA
  • 26. Java Sproc Wrapper CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF record AS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email $$ LANGUAGE 'sql' SECURITY DEFINER; SQL @SProcCall List<Order> findOrders(@SProcParam String email); JAVA
  • 27. Stored Procedures for developers ▷ CRUD operations need too much code ▷ Developers have to learn SQL ▷ Developers can write bad SQL ▷ Code reviews are needed
  • 28. Stored Procedures for developers ▷ CRUD operations need too much code ▷ Developers have to learn SQL ▷ Developers can write bad SQL ▷ Code reviews are needed ▶ Use-case driven ▶ Developers have to learn SQL ▶ Developers learn how to write good SQL
  • 29. Horror story ▷ Never map your data manually ▷ Educate developers ▷ Educate yourself
  • 30. Stored Procedure API versioning Database Tables search_path = api_v14_23, public; api_v14_23
  • 31. Stored Procedure API versioning api_v14_23 api_v14_24 Database Tables search_path = api_v14_23, public;
  • 32. Stored Procedure API versioning api_v14_23 api_v14_24 Database Tables search_path = api_v14_23, public; search_path = api_v14_24, public;
  • 33. Stored Procedure API versioning search_path = api_v14_24, public; api_v14_23 api_v14_24 Database Tables
  • 34. Stored Procedure API versioning search_path = api_v14_24, public; Database Tables api_v14_24
  • 35. Stored Procedure API versioning ▶ Tests are done to the whole API version ▶ No API migrations needed ▶ Deployments are fully automated
  • 36. Agenda How we ● access data ● change data models without downtimes ● shard without limits ● monitor
  • 37. Easy schema changes PostgreSQL ▶ Schema changes with minimal locks with: ADD/RENAME/DROP COLUMN ADD/DROP DEFAULT VALUE ▶ CREATE/DROP INDEX CONCURRENTLY ▷ Constraints are still difficult to ALTER (will be much better in PostgreSQL 9.4)
  • 38. Easy schema changes Stored Procedure API layer ▶ Can fill missing data on the fly ▶ Helps to change data structure without application noticing it
  • 39. Easy schema changes ● Read and write to old structure ● Write to both structures, old and new. Try to read from new, fallback to old ● Migrate data ● Read from new, write to old and new
  • 40. Easy schema changes Schema changes using SQL script files ● SQL scripts written by developers (DBDIFFs) ● registering DBDIFFs with Versioning ● should be reviewed by DB guys ● DB guys are rolling DB changes on the live system
  • 41. Easy schema changes BEGIN; SELECT _v.register_patch('ZEOS-5430.order'); CREATE TABLE z_data.order_address ( oa_id int SERIAL, oa_country z_data.country, oa_city varchar(64), oa_street varchar(128), ... ); ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id); COMMIT; DBDIFF SQL
  • 42. Easy schema changes BEGIN; SELECT _v.register_patch('ZEOS-5430.order'); i order/database/order/10_tables/10_order_address.sql ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id); COMMIT; DBDIFF SQL
  • 43. Easy schema changes BEGIN; SELECT _v.register_patch('ZEOS-5430.order'); i order/database/order/10_tables/10_order_address.sql SET statement_timeout TO ‘3s’; ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id); COMMIT; DBDIFF SQL
  • 46. Easy schema changes No downtime due to migrations or deployment since we use PostgreSQL
  • 47. Easy schema changes One downtime due to migrations or deployment since we use PostgreSQL
  • 48. Horror story ▷ Invest in staging environments ▷ Do not become a bottleneck for developers ▷ Educate developers ▷ Educate yourself
  • 49. Agenda How we ● access data ● change data models without downtimes ● shard without limits ● monitor
  • 50. One big database ▶ Joins between any entities ▶ Perfect for BI ▶ Simple access strategy ▶ Less machines to manage
  • 51. One big database ▷ Data does not fit into memory ▷ OLTP becomes slower ▷ Longer data migration times ▷ Database maintenance tasks take longer
  • 54. Sharded database ▶ Data fits into memory ▶ IO bottleneck wider ▶ OLTP is fast again ▶ Data migrations are faster ▶ Database maintenance tasks are faster
  • 55. Sharded database ▷ Joins only between entities aggregates ▷ BI need more tooling ▷ Accessing data needs more tooling ▷ Managing more servers needs more tooling
  • 56. Sharded database ▷ Need more tooling
  • 57. Sharding without limits Java Application Sproc Wrapper Stored Procedure API Database Database Tables Tables
  • 58. Sharding without limits Java Application ... Sproc Wrapper Shard 1 Shard 2 Shard 3 Shard N
  • 59. @SProcCall int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber, @SProcParam String email, @SProcParam Gender gender); JAVA Sharding with Java Sproc Wrapper @SProcCall Article getArticle(@SProcParam @ShardKey Sku sku); JAVA @SProcCall(runOnAllShards = true, parallel = true) List<Order> findOrders(@SProcParam String email); JAVA
  • 60. Sharding with Java Sproc Wrapper Entity lookup strategies ● search on all shards (in parallel) ● hash lookups ● unique shard aware ID ○ Virtual Shard IDs (pre-sharding)
  • 61. Agenda How we ● access data ● change data models without downtimes ● shard without limits ● monitor
  • 64. Monitoring ● Nagios/Icinga was replaced by ZMON2 ● Dedicated 24x7 monitoring team ● Custom monitoring infrastructure ZMON2
  • 67. Links SProcWrapper – Java library for stored procedure access github.com/zalando/java-sproc-wrapper PGObserver – monitoring web tool for PostgreSQL github.com/zalando/PGObserver pg_view – top-like command line activity monitor github.com/zalando/pg_view