SlideShare a Scribd company logo
Skypicker.com
Travel anywhere, anytime.
Skypicker
• flight ticket search&booking engine
• covering markets in Europe, Russia and China
• hundreds of TBs of airline data processed monthly
• selling thousands of tickets daily
• covering LCCs only
• LCs in progress!
API
• millions of daily searches
• average response time <1s (we are slowing down
some queries)
• Built on top of PostgreSQL
• used worldwide =)
• a huge home grown data processing framework
running under the API
SP databases
• 5 db clusters, >0.5TB memory each
• the main one handling 20M updates / hour (this is caused
be the airline tickets price changes)
• main table has 1 billion+ rows
• basically unlimited read scaling with the replication feature
• managed mostly by Ansible and custom bash stuff for semi
auto failover
• there are tools like repmgr though
PostgreSQL
• our silver bullet for everything
• for low-cost Big data, there is no better combo than
PG+Redshift, yet.
• dont even try with Hadoop/Cassandra, they will
make your wallet cry
HW tips
• running on bare metal, because the SSD RAID 10
on Intel 3700 series
• series 3500 will kill you. After few weeks of 24/7
load, there will be a huge performance drop, always
• (its a feature!)
Why not cloud
• AWS is not the best fit for a high performance PG
cluster, the I/O is unstable and unpredictable
• joyent.com cloud is fine (8k$/mth/instance)
• bare metal from Rackspace works also well
(1800$/mth). The traffic can get expensive here…
How we found out these
things
• …randomly…by fucking things up…
• But! We multiplied our master db performance 5
times in the past 6 months
• From 15M to nearly 75M
Prove it!
• October 2014, 15M
• April 2015, 75M
Replication for dummies
• simple master-slave
• a good way to die
Cascaded replication
Pros&cons
• adds some replication delay (-)
• nobody cares (+)
• because it scales! (++)
How the data flows
1. new price for the flight pushed to a queue
2. picked up by a worker
3. inserted to db
4. (magic happens here, shitload of updates)
5. copied to slave servers for select statements
6. search query on slave server
7. …booking made…?
8. profit!
Queue over the db
• our data processing framework is pushing the data
to a redis queue
• workers are picking up the data and inserting to DB
• load can be easily balanced here
• you wont loose any data if you need to restart your
db (this can be achieved also with pgbouncer)
• monitor the size of the queue and keep it near 0 =)
HaProxy
• probably the most stable piece of software ever
made. TCP balancer
• has a health check for PG
• if your slave will go down, nobody will notice
• (just dont forget to have alert for it)
Pgbouncer
• small shit
• useful when you are doing thousands of connections
to your db
• lowered the server load to half
• boosted the writes by 30%
Optimalizations steps
(for dummies)
1. optimize your queries with Explain
2. do some pg config changes
3. buy better hw
4. goto 2
Little bit advanced!
• table partitioning (this is the game changer)
• partial indexes
• turn off vacuum, use pg_repack for rebuilding the
tables
• run analyse often
• turn on the genetic query planner
Redshift
• Dummy PG database from Amazon made for
science&some sql
• its costly to download the data from AWS after they
are processed
• you should also try Snowflake, Vertica
Redshift flow
• using the PG fdw feature to connect RS remotely to
our slave database
• download data
• process it
• push it to master db
Postgresql replication
• no battle tested master-master solution, yet (9.4)
• its async - dont forget to monitor the delay between
your master and slaves
• cascading replication for unlimited scaling
Postgresql config tuning
• 12-Step Program for Scaling Web Applications on
PostgreSQL from Wanelo.com
• they cover every aspect of the config optimalization
and we dont want to copy it here =)
What are our pains
why we are here
• our data will grow 10 times by adding legacy carriers
in the next 2 months
• we need DB masters and developers who will help
us to manage this growth
We are hiring!
• We offer
• many money
• skills
Get in touch at jk@skypicker.com

More Related Content

PPTX
Jk rubyslava 25
PPTX
To Hire, or to train, that is the question (Percona Live 2014)
PPT
Владимир Мигуро "Дао Node.js"
PPTX
Developers’ mDay 2019. - Rastko Vasiljević, SuperAdmins – Infrastructure as c...
PPTX
Azure sql insert perf
PDF
Будь первым
PDF
Nodejs - A-quick-tour-v3
PDF
Nodejs - Should Ruby Developers Care?
Jk rubyslava 25
To Hire, or to train, that is the question (Percona Live 2014)
Владимир Мигуро "Дао Node.js"
Developers’ mDay 2019. - Rastko Vasiljević, SuperAdmins – Infrastructure as c...
Azure sql insert perf
Будь первым
Nodejs - A-quick-tour-v3
Nodejs - Should Ruby Developers Care?

What's hot (19)

PDF
Hujs 总结
PDF
Odoo Performance Limits
ODP
Event Loop in Javascript
PDF
Odoo Online platform: architecture and challenges
KEY
Node.js - As a networking tool
PDF
Nodejs - A quick tour (v4)
PPTX
HBaseCon 2013: OpenTSDB at Box
PDF
Dirty - How simple is your database?
ODP
Presentation of JSConf.eu
KEY
Node.js - A practical introduction (v2)
KEY
Mysqlnd uh
KEY
Ender
PDF
"Metrics: Where and How", Vsevolod Polyakov
PDF
Nodejs - A quick tour (v5)
PPTX
Ac cuda c_1
PDF
Openstack at NTT Feb 7, 2011
PDF
Simple Tips and Tricks with Ansible
PDF
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
PDF
2 docker engine_hands_on
 
Hujs 总结
Odoo Performance Limits
Event Loop in Javascript
Odoo Online platform: architecture and challenges
Node.js - As a networking tool
Nodejs - A quick tour (v4)
HBaseCon 2013: OpenTSDB at Box
Dirty - How simple is your database?
Presentation of JSConf.eu
Node.js - A practical introduction (v2)
Mysqlnd uh
Ender
"Metrics: Where and How", Vsevolod Polyakov
Nodejs - A quick tour (v5)
Ac cuda c_1
Openstack at NTT Feb 7, 2011
Simple Tips and Tricks with Ansible
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
2 docker engine_hands_on
 
Ad

Similar to Rubyslava + PyVo #48 (20)

PPTX
Keynote - Hosted PostgreSQL: An Objective Look
 
PDF
Cloud arch patterns
PDF
On The Building Of A PostgreSQL Cluster
PPTX
Batch to near-realtime: inspired by a real production incident
PPTX
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
PDF
Creating PostgreSQL-as-a-Service at Scale
PDF
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
PDF
Reliable Data Replication by Cameron Morgan
PDF
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
PDF
Leveraging Databricks for Spark Pipelines
PDF
Leveraging Databricks for Spark pipelines
PDF
Planet-scale Data Ingestion Pipeline: Bigdam
PPTX
Data Analysis on AWS
PDF
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
PDF
Five Years of EC2 Distilled
PDF
Webinar slides: Our Guide to MySQL & MariaDB Performance Tuning
ODP
Shootout at the PAAS Corral
PPTX
DC Migration and Hadoop Scale For Big Billion Days
PDF
Howmysqlworks
PPTX
AWS Redshift Introduction - Big Data Analytics
Keynote - Hosted PostgreSQL: An Objective Look
 
Cloud arch patterns
On The Building Of A PostgreSQL Cluster
Batch to near-realtime: inspired by a real production incident
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Creating PostgreSQL-as-a-Service at Scale
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
Reliable Data Replication by Cameron Morgan
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark pipelines
Planet-scale Data Ingestion Pipeline: Bigdam
Data Analysis on AWS
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Five Years of EC2 Distilled
Webinar slides: Our Guide to MySQL & MariaDB Performance Tuning
Shootout at the PAAS Corral
DC Migration and Hadoop Scale For Big Billion Days
Howmysqlworks
AWS Redshift Introduction - Big Data Analytics
Ad

Recently uploaded (20)

PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
modul_python (1).pptx for professional and student
PDF
Business Analytics and business intelligence.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
Business_Capability_Map_Collection__pptx
PDF
Microsoft Core Cloud Services powerpoint
PDF
Introduction to the R Programming Language
DOCX
Factor Analysis Word Document Presentation
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPT
Predictive modeling basics in data cleaning process
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Microsoft 365 products and services descrption
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Introduction to Data Science and Data Analysis
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
modul_python (1).pptx for professional and student
Business Analytics and business intelligence.pdf
[EN] Industrial Machine Downtime Prediction
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Business_Capability_Map_Collection__pptx
Microsoft Core Cloud Services powerpoint
Introduction to the R Programming Language
Factor Analysis Word Document Presentation
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Predictive modeling basics in data cleaning process
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Microsoft 365 products and services descrption
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Optimise Shopper Experiences with a Strong Data Estate.pdf
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Introduction to Data Science and Data Analysis

Rubyslava + PyVo #48

  • 2. Skypicker • flight ticket search&booking engine • covering markets in Europe, Russia and China • hundreds of TBs of airline data processed monthly • selling thousands of tickets daily • covering LCCs only • LCs in progress!
  • 3. API • millions of daily searches • average response time <1s (we are slowing down some queries) • Built on top of PostgreSQL • used worldwide =) • a huge home grown data processing framework running under the API
  • 4. SP databases • 5 db clusters, >0.5TB memory each • the main one handling 20M updates / hour (this is caused be the airline tickets price changes) • main table has 1 billion+ rows • basically unlimited read scaling with the replication feature • managed mostly by Ansible and custom bash stuff for semi auto failover • there are tools like repmgr though
  • 5. PostgreSQL • our silver bullet for everything • for low-cost Big data, there is no better combo than PG+Redshift, yet. • dont even try with Hadoop/Cassandra, they will make your wallet cry
  • 6. HW tips • running on bare metal, because the SSD RAID 10 on Intel 3700 series • series 3500 will kill you. After few weeks of 24/7 load, there will be a huge performance drop, always • (its a feature!)
  • 7. Why not cloud • AWS is not the best fit for a high performance PG cluster, the I/O is unstable and unpredictable • joyent.com cloud is fine (8k$/mth/instance) • bare metal from Rackspace works also well (1800$/mth). The traffic can get expensive here…
  • 8. How we found out these things • …randomly…by fucking things up… • But! We multiplied our master db performance 5 times in the past 6 months • From 15M to nearly 75M
  • 9. Prove it! • October 2014, 15M • April 2015, 75M
  • 10. Replication for dummies • simple master-slave • a good way to die
  • 12. Pros&cons • adds some replication delay (-) • nobody cares (+) • because it scales! (++)
  • 13. How the data flows 1. new price for the flight pushed to a queue 2. picked up by a worker 3. inserted to db 4. (magic happens here, shitload of updates) 5. copied to slave servers for select statements 6. search query on slave server 7. …booking made…? 8. profit!
  • 14. Queue over the db • our data processing framework is pushing the data to a redis queue • workers are picking up the data and inserting to DB • load can be easily balanced here • you wont loose any data if you need to restart your db (this can be achieved also with pgbouncer) • monitor the size of the queue and keep it near 0 =)
  • 15. HaProxy • probably the most stable piece of software ever made. TCP balancer • has a health check for PG • if your slave will go down, nobody will notice • (just dont forget to have alert for it)
  • 16. Pgbouncer • small shit • useful when you are doing thousands of connections to your db • lowered the server load to half • boosted the writes by 30%
  • 17. Optimalizations steps (for dummies) 1. optimize your queries with Explain 2. do some pg config changes 3. buy better hw 4. goto 2
  • 18. Little bit advanced! • table partitioning (this is the game changer) • partial indexes • turn off vacuum, use pg_repack for rebuilding the tables • run analyse often • turn on the genetic query planner
  • 19. Redshift • Dummy PG database from Amazon made for science&some sql • its costly to download the data from AWS after they are processed • you should also try Snowflake, Vertica
  • 20. Redshift flow • using the PG fdw feature to connect RS remotely to our slave database • download data • process it • push it to master db
  • 21. Postgresql replication • no battle tested master-master solution, yet (9.4) • its async - dont forget to monitor the delay between your master and slaves • cascading replication for unlimited scaling
  • 22. Postgresql config tuning • 12-Step Program for Scaling Web Applications on PostgreSQL from Wanelo.com • they cover every aspect of the config optimalization and we dont want to copy it here =)
  • 23. What are our pains why we are here • our data will grow 10 times by adding legacy carriers in the next 2 months • we need DB masters and developers who will help us to manage this growth
  • 24. We are hiring! • We offer • many money • skills Get in touch at jk@skypicker.com