SlideShare a Scribd company logo
PostgreSQL at Zalando
SQL in Fashion
About me
Valentine Gogichashvili
Head of Data Engineering @ZalandoTech
twitter: @valgog
google+: +valgog
email: valentine.gogichashvili@zalando.de
Jena University Talk 2016.03.09 -- SQL at Zalando Technology
15 countries
4 fulfillment centers
15+ million active customers
2.9 billion € revenue 2015
150,000+ products
9,000+ employees
One of Europe's largest
online fashion retailers
Zalando Technology
BERLIN
DORTMUND
DUBLIN
HELSINKI
ERFURT
MÖNCHENGLADBACH
HAMBURG
Zalando Technology
900+ TECHNOLOGISTS
Rapidly growing
international team
http://tech.zalando.de
THE HISTORY
Once upon a time...
Started as a tiny online shop
Prototyped on Magento (PHP)
Used MySQL as a database
Web Application
Backend
Database
REBOOT
REBOOT
5½ years ago
● Java
○ macro service architecture with SOAP as RPC layer
● PostgreSQL
○ Heavy usage of Stored Procedures
○ 4 databases + 1 sharded database on 2 shards
● Python for tooling (i.e code deploy automation)
REBOOT
Java Web
Frontend
Java Backend
PostgreSQL
Java Backend
PostgreSQL
Java Backend
PostgreSQL 9.0
RC1PostgreSQL 9.0
RC1PostgreSQL 9.0
RC1PostgreSQL
"macro" services
SQL
SQL
● human readable query language
● standed the test of time
● allows automatic optimization of
access to data
SQL
● window functions for moving averages etc.
● recursive SQL for searching trees and networks
● non-blocking locking
● DDL locking (schema guarantees)
PostgreSQL
PostgreSQL
The world's
most advanced
open-source
database
PostgreSQL
● Minimal DDL locks (easy schema changes)
● Nearest neighbour searching using an index
● Block-range indexes
● Serializability (true theoretical serializability)
● JSONB - compressed JSON, fully indexable
● Data sampling with guarantees on execution time
PostgreSQL
● Read scalability: "Hot standby" replicas
● Synchronous Replication
● Cascading Replication
● Logical Decoding to allow Change Data Capture
● BDR for Multi-Master Scalability
● Write scalability/sharding (is on the way)
PostgreSQL
● Multiple data models (tables, JSON, ROWs, arrays)
● ACID
● Eventual Consistency (on-demand)
Stored Procedures
Stored Procedures
● clean transaction scope
● very clean data
● processing close to data
● no need in classical ORM mappers
Stored Procedures
● no debugger in Eclipse or IntelliJ
● difficult for projects heavy on CRUD
● versioning automation is needed
Stored Procedures
Java Sproc Wrapper
● very easy to use
● proxies stored procedures as Java method calls
● supports complex type mapping
● supports transparent sharding
Stored Procedures
CREATE FUNCTION register_customer(p_email text,
p_gender z_data.gender)
RETURNS int
AS $$
INSERT INTO z_data.customer (c_email, c_gender)
VALUES (p_email, p_gender)
RETURNING c_id
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
@SProcService
public interface CustomerSProcService {
@SProcCall
int registerCustomer(@SProcParam String email,
@SProcParam Gender gender);
}
JAVA
CREATE FUNCTION register_customer(p_email text,
p_gender z_data.gender)
RETURNS int
AS $$
INSERT INTO z_data.customer (c_email, c_gender)
VALUES (p_email, p_gender)
RETURNING c_id
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
@SProcService
public interface CustomerSProcService {
@SProcCall
int registerCustomer(@SProcParam String email,
@SProcParam Gender gender);
}
CREATE FUNCTION register_customer(p_email text,
p_gender z_data.gender)
RETURNS int
AS $$
INSERT INTO z_data.customer (c_email, c_gender)
VALUES (p_email, p_gender)
RETURNING c_id
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
JAVA
Stored Procedures
@SProcCall
List<Order> findOrders(@SProcParam String email);
JAVA
CREATE FUNCTION find_orders(p_email text,
OUT order_id int,
OUT order_created timestamptz,
OUT shipping_address order_address)
RETURNS SETOF record
AS $$
SELECT o_id, o_created,
ROW(oa_street, oa_city, oa_country)::order_address
FROM z_data."order"
JOIN z_data.order_address ON oa_order_id = o_id
JOIN z_data.customer ON c_id = o_customer_id
WHERE c_email = p_email
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
CREATE FUNCTION find_orders(p_email text,
OUT order_id int,
OUT order_created timestamptz,
OUT shipping_address order_address)
RETURNS SETOF record
AS $$
SELECT o_id, o_created,
ROW(oa_street, oa_city, oa_country)::order_address
FROM z_data."order"
JOIN z_data.order_address ON oa_order_id = o_id
JOIN z_data.customer ON c_id = o_customer_id
WHERE c_email = p_email
$$
LANGUAGE 'sql' SECURITY DEFINER;
@SProcCall
List<Order> findOrders(@SProcParam String email);
JAVA
SQL
Stored Procedures
@SProcCall
int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber,
@SProcParam String email,
@SProcParam Gender gender);
@SProcCall
Article getArticle(@SProcParam @ShardKey Sku sku);
@SProcCall(runOnAllShards = true, parallel = true)
List<Order> findOrders(@SProcParam String email);
JAVA
JAVA
JAVA
Stored Procedures
Virtual Shard IDs (pre-sharding)
0 1 0 1 1 0 0 1
7 6 4 3 2 1 05
md5(partitioning_key)
SprocWrapper
PostgreSQL 9.0
RC1PostgreSQL 9.0
RC1PostgreSQL 9.0
RC1PostgreSQL
Java Application
Stored Procedures
Schema based stored procedure versioning
● uses search_path on the clients
● new schema for every application version
● automated deployments
Stored Procedures
Database Tables
api_v16_01
Stored Procedures
Database Tables
api_v16_01
search_path =
api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path =
api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path =
api_v16_02, public;
search_path =
api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path =
api_v16_02, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path =
api_v16_02, public;
Stored Procedures
Database Tables
api_v16_02
search_path =
api_v16_02, public;
https://github.com/zalando/PGObserver
https://github.com/zalando/PGObserver
Schema Management
Schema management
DBDIFF database schema management
● schema changes must be documented
● atomic changes per feature
● locks should be minimal
Schema management
BEGIN;
SELECT _v.register_patch('ZEOS-15430.order');
CREATE TABLE z_data.order_address (
oa_id int SERIAL,
oa_country z_data.country,
oa_city varchar(64),
oa_street varchar(128), ...
);
ALTER TABLE z_data."order" ADD o_shipping_address_id int
REFERENCES z_data.order_address (oa_id);
COMMIT;
DBDIFF SQL
Schema management
BEGIN;
SELECT _v.register_patch('ZEOS-15430.order');
i order/database/order/10_tables/10_order_address.sql
ALTER TABLE z_data."order" ADD o_shipping_address_id int
REFERENCES z_data.order_address (oa_id);
COMMIT;
DBDIFF SQL
Schema management
BEGIN;
SELECT _v.register_patch('ZEOS-15430.order');
i order/database/order/10_tables/10_order_address.sql
SET statement_timeout TO '3s';
ALTER TABLE z_data."order" ADD o_shipping_address_id int
REFERENCES z_data.order_address (oa_id);
COMMIT;
DBDIFF SQL
Schema management
pg_view (https://github.com/zalando/pg_view)
● helps to monitor locks and load in real-time
● used during all DB schema change rollouts
nice_updater (https://github.com/zalando/acid-tools)
● runs big migrations controlling database/system load
● used during automatic data migrations
https://github.com/zalando/pg_view
High Availability
High Availability
Patroni - High Availability Runner
● etcd or ZooKeeper for master election
Spilo - PostgreSQL AWS appliance
● Zalando Patroni
for High Availability
● Docker
for packaging
● Zalando STUPS
for audit compliance
Monitoring: ZMON
Open Source at Zalando Technology
● Tech Blog of Zalando Technology
● https://zalando.github.io/ - Open Source Projects
○ Java Sproc Wrapper
○ PGObserver
○ pg_view
○ Patroni
○ Spilo
○ ...
Questions?
Where to Find Us:
Tech Blog: tech.zalando.com
GitHub: github.com/zalando
Twitter: @ZalandoTech
Instagram: zalandotech
Jobs: http://tech.zalando.com/jobs

More Related Content

PDF
Building your first aplication using Apache Apex
PDF
Interactive Data Analysis in Spark Streaming
PDF
Introduction to dataset
PDF
Spark Workflow Management
PDF
Migrating to Spark 2.0 - Part 2
PDF
Anatomy of Data Source API : A deep dive into Spark Data source API
PDF
A Tool For Big Data Analysis using Apache Spark
PDF
Airframe RPC
Building your first aplication using Apache Apex
Interactive Data Analysis in Spark Streaming
Introduction to dataset
Spark Workflow Management
Migrating to Spark 2.0 - Part 2
Anatomy of Data Source API : A deep dive into Spark Data source API
A Tool For Big Data Analysis using Apache Spark
Airframe RPC

What's hot (19)

PDF
Interactive workflow management using Azkaban
PDF
Migrating to spark 2.0
PDF
Introduction to Structured Data Processing with Spark SQL
PPTX
Multi Source Data Analysis using Spark and Tellius
PDF
Metrics by coda hale : to know your app’ health
PDF
Introduction to Datasource V2 API
PDF
Timeseries - data visualization in Grafana
PPTX
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
PDF
ReactiveX-SEA
PDF
ReactiveX
PDF
Jan Stepien - Introducing structure in Clojure - Codemotion Milan 2017
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
PPTX
Scalable Applications with Scala
PPTX
Javantura v3 - Going Reactive with RxJava – Hrvoje Crnjak
PDF
Building end to end streaming application on Spark
PDF
Structured Streaming with Kafka
PDF
Introduction to Structured streaming
PPTX
Going Reactive with Spring 5
Interactive workflow management using Azkaban
Migrating to spark 2.0
Introduction to Structured Data Processing with Spark SQL
Multi Source Data Analysis using Spark and Tellius
Metrics by coda hale : to know your app’ health
Introduction to Datasource V2 API
Timeseries - data visualization in Grafana
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
ReactiveX-SEA
ReactiveX
Jan Stepien - Introducing structure in Clojure - Codemotion Milan 2017
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Scalable Applications with Scala
Javantura v3 - Going Reactive with RxJava – Hrvoje Crnjak
Building end to end streaming application on Spark
Structured Streaming with Kafka
Introduction to Structured streaming
Going Reactive with Spring 5
Ad

Viewers also liked (8)

PDF
Flexible Replication
PDF
Adding replication protocol support for psycopg2
PDF
Do postgres-dream-of-graph-database
ODP
Fun Things to do with Logical Decoding
PDF
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
PPTX
kafka for db as postgres
PDF
PostgreSQL 9.4
PDF
Geographically Distributed PostgreSQL
Flexible Replication
Adding replication protocol support for psycopg2
Do postgres-dream-of-graph-database
Fun Things to do with Logical Decoding
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
kafka for db as postgres
PostgreSQL 9.4
Geographically Distributed PostgreSQL
Ad

Similar to Jena University Talk 2016.03.09 -- SQL at Zalando Technology (20)

PDF
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PDF
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
PDF
soft-shake.ch - JAX-RS and Java EE 6
PDF
Headless approach for offloading heavy tasks in Magento
PDF
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
PDF
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
PDF
PyCon 2016: Personalised emails with Spark and Python
PPTX
Migration Spring PetClinic to Quarkus
PPTX
How and why we evolved a legacy Java web application to Scala... and we are s...
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PPTX
Migration Spring Boot PetClinic REST to Quarkus 1.2.0
PDF
Meetup 2022 - APIs with Quarkus.pdf
PDF
Creating PostgreSQL-as-a-Service at Scale
PDF
ArangoDB – A different approach to NoSQL
PPTX
Java @ Cloud - Setor Público SP
PDF
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
PDF
Modern ETL Pipelines with Change Data Capture
PDF
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
PDF
20160307 apex connects_jira
PDF
Graphql usage
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
soft-shake.ch - JAX-RS and Java EE 6
Headless approach for offloading heavy tasks in Magento
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
PyCon 2016: Personalised emails with Spark and Python
Migration Spring PetClinic to Quarkus
How and why we evolved a legacy Java web application to Scala... and we are s...
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Migration Spring Boot PetClinic REST to Quarkus 1.2.0
Meetup 2022 - APIs with Quarkus.pdf
Creating PostgreSQL-as-a-Service at Scale
ArangoDB – A different approach to NoSQL
Java @ Cloud - Setor Público SP
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
Modern ETL Pipelines with Change Data Capture
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
20160307 apex connects_jira
Graphql usage

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
DOCX
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
The AUB Centre for AI in Media Proposal.docx

Jena University Talk 2016.03.09 -- SQL at Zalando Technology