SlideShare a Scribd company logo
NANCY CLI:
a unified way to manage Database Experiments in clouds
Postgres.AI
Nikolay Samokhvalov
twitter: @postgresmen
email: ru@postgresql.org
About me
Postgres experience: 12+ years (database systems: 17+)
Founder and CTO of 3 startups (total 30M+ users), all based on Postgres
Founder of #RuPostgres (1700+ members on Meetup.com, 2nd largest globally)
Re-launched consulting practice in the SF Bay Area http://PostgreSQL.support
Founder of Postgres.AI – the Postgres platform to automate what is not yet automated
Twitter: @postgresmen
Email: ru@postgresql.org
Part 0. Pre-story
Pre-story. Finding the largest tables in a database
How many times did
you google things
like this?
Finding the largest tables, a semi-automated way
postgres_dba – The missing set of useful tools for Postgres https://github.com/NikolayS/postgres_dba
Finding the largest tables, a semi-automated way
Report #2: sizes of the tables in the current database
Installation of postgres_dba
Installation is trivial:
Important: psql version 10 is needed.
(install postgresql-client-10 package, see README)
Server version may be older
(Use ssh tunnel to connect to remote servers, see README)
7
Part 1. Why do we need
automated DB experiments?
- Let’s do default_statistics_target = 1000!
- Let’s do random_page_cost = 1!
- I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore,
¾ is much better!
- Let’s add this index here!
- Let’s use partitioning for this table!
- Let’s don’t allow having >100,000 dead tuples in a table!
- Etc, etc, etc...
How do we do performance improvements nowadays?
The Whole Truth - Let’s do default_statistics_target = 1000!
- Let’s do random_page_cost = 1!
- I’ve heard that setting shared_buffers to ¼ of RAM
doesn’t rock anymore, ¾ is much better!
- Let’s add this index here!
- Let’s use partitioning for this table!
- Let’s don’t allow having >100,000 dead tuples in a
table!
- Etc, etc, etc...
The Whole Truth
Does it give better (or at
least not worse)
performance for all queries?
Is this value the best for our
database & workload?
Does it give a real gain for
our database & workload?
- Let’s do default_statistics_target = 1000!
- Let’s do random_page_cost = 1!
- I’ve heard that setting shared_buffers to ¼ of RAM
doesn’t rock anymore, ¾ is much better!
- Let’s add this index here!
- Let’s use partitioning for this table!
- Let’s don’t allow having >100,000 dead tuples in a
table!
- Etc, etc, etc...
...and more
- postgres_dba shows that the bloat level is 35% for this
1B-rows table. Is it good or bad? How bad? (Or how good?)
- When do I need to add more RAM to my database server to
keep good performance characteristics?
- What will happen when more users come to use my app?
- Will i3.xlarge handle 3,000 UPDATEs per second?
- Etc, etc, etc
How do we make changes now?
Option #1
Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this
query is slow on production. Let’s fix that!
⇒ the problem is already there...
How do we make changes now?
Option #1
Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this
query is slow on production. Let’s fix that!
⇒ the problem is already there...
Option #2
DBA: well, this will be slow. Add this index! And we can do even better, let’s use
a partial index here!
⇒ good if the DBA is really experienced and/or verified ideas on a DB
clone. But how often is it so? And how many queries were checked?
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Option 2: a human or artificial DBAs has an idea for improvement.
Let’s verify it! Again with the same “what-if” API
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Option 2: a human or artificial DBAs has an idea for improvement.
Let’s verify it! Again with the same “what-if” API
⇒ continuous database administration
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Option 2: a human or artificial DBAs has an idea for improvement.
Let’s verify it! Again with the same “what-if” API
⇒ continuous database administration
Bonus: let’s use the same “what-if” API to get more knowledge for AI we are
building (to train ML models)
So, why do we need to automate DB experiments?
If we see that our change is good for one or several queries it doesn’t mean
that it is so for all queries.
Without performing the deep SQL query analysis (analyzing “all” query groups
on pg_stat_statement’s Top-N) we are blind.
And database administration is a black magic.
Part 2. Existing works and tools
Existing works and solutions
Andy Pavlo, CMU:
● “What is a Self-Driving Database Management System?” great overview
of history and existing works, an entry point to learn what’s done (good
research papers, Microsoft works, etc)
● PelotonDB, ottertune – great research projects
Existing works and solutions
Andy Pavlo, CMU:
● “What is a Self-Driving Database Management System?” great overview
of history and existing works, an entry point to learn what’s done (good
research papers, Microsoft works, etc)
● PelotonDB, ottertune – great research projects
Oracle’s RAT (Real Application Testing)
– Database Replay + SQL Performance Analyzer:
● Real Application Testing, Oracle 18c
● Sample report
● “Oracle Real Application Testing Delivers 224% ROI”
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
DIY automated pipeline for DB optimization
DIY automated pipeline for DB optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
The basis for Nancy
DIY automated pipeline for DB optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
pgBadger:
● Grouping queries can be implemented better (see pg_query)
● Makes all queries lower cased (hurts "camelCased" names)
● Doesn’t really support plans (auto_explain)
DIY automated pipeline for DB optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
pgBadger:
● Grouping queries can be implemented better (see pg_query)
● Makes all queries lower cased (hurts "camelCased" names)
● Doesn’t really support plans (auto_explain)
pgreplay and pgBadger are not friends,
require different log formats
Part 3. Postgres.ai basics
Already automated:
● Setup/tune hardware, OS, FS
● Provision Postgres instances
● Create replicas
● High Availability:
detect failures and switch to replicas
● Create backups
● Basic monitoring
28
Already automated:
● Postgres parameters tuning
● Query analysis and optimization
● Index set optimization
● Detailed monitoring
● Verify optimization ideas
● Benchmarks
● Regression&performance CI-like testing
● Setup/tune hardware, OS, FS
● Provision Postgres instances
● Create replicas
● High Availability:
detect failures and switch to replicas
● Create backups
● Basic monitoring
Little to zero level of automation:
29
Already automated:
● Postgres parameters tuning
● Query analysis and optimization
● Index set optimization
● Detailed monitoring
● Verify optimization ideas
● Benchmarks
● Regression&performance CI-like testing
● Setup/tune hardware, OS, FS
● Provision Postgres instances
● Create replicas
● High Availability:
detect failures and switch to replicas
● Create backups
● Basic monitoring
Little to zero level of automation:
30
Can be done with
Database Experiments
What is the Database Experiment?
The Database Experiment – a set of actions...
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
with an optional change of the database and environment (called “delta”).
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
with an optional change of the database and environment (called “delta”).
An experiment may consist of one or more experimental runs.
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
with an optional change of the database and environment (called “delta”).
An experiment may consist of one or more experimental runs.
To analyze the impact of some delta, we need at least two runs, one of them
being “clean run” (w/o delta).
What is the Database Experiment?
The input of a experimental run:
● Environment
○ Location (on-premise or GCP or AWS), hardware (CPU, RAM, disks)
○ System (OS, file system)
○ Postgres version
○ Postgres configuration
● Database snapshot. Can be:
○ A dump (regular or in directory format)
○ A physical archive (pg_basebackup or pgBackRest/WAL-E/WAL-G/…)
○ A replica promoted for experiments
○ Some synthetic one, a generated database (“create table as …”, pgbench -i, etc)
● Workload. Can be:
○ Synthetic (custom SQL), single-threaded
○ Synthetic (custom SQL), multi-threaded (with pgbench)
○ “Real workload” (based on logs)
● [Optional] Delta:
○ Configuration change(s) (e.g.: shared_buffers = 16GB)
○ Some DDL (e.g.: `create index …`). “Undo” DDL is required in this case `drop index …`) to enable
serialization of experiments
What is the Database Experiment?
The output of a experimental run:
● the contents of basic pg_stat_*** (e.g. pg_stat_user_tables)
● the contents of pg_stat_statements
● the contents of pg_stat_kcache
● the PostgreSQL detailed log (with auto_explain turned on)
● the pgBadger’s extended report in JSON format
● the Postgres config at time of applying the workload
AI-based cloud-friendly platform to automate database administration
41
Steve
AI-based expert in capacity planning and
database tuning
Joe
AI-based expert in query optimization and
Postgres indexes
Nancy
AI-based expert in database experiments.
Conducts experiments and presents
results to human and artificial DBAs
Sign up for early access:
http://Postgres.ai
AI-based cloud-friendly platform to automate database administration
42
Steve
AI-based expert in capacity planning and
database tuning
Joe
AI-based expert in query optimization and
Postgres indexes
Nancy
AI-based expert in database experiments.
Conducts experiments and presents
results to human and artificial DBAs
Sign up for early access:
http://Postgres.ai
Demo 1
Postgres.AI GUI live demonstration
Metastorage +
GUI
Postgres.AI architecture
Databases being
observed
AWS S3
dump/
backup
nancyprepare-workload
Nancy
Steve
Joe
AWS EC2 docker machines
Nancy CLI
A human engineer can use:
● GUI
● CLI
● Chat
Part 4. Nancy CLI
Demo 1
Nancy CLI live demonstration (local + AWS)
Meet Nancy CLI (open source)
Nancy CLI https://github.com/postgres-ai/nancy
● custom docker image (Postgres with extensions & tools)
● nancy prepare-workload to convert Postgres logs (now only .csv)
to workload binary file
● nancy run to run experiments
● able to run locally (any machine) on in EC2 spot instance (low price!),
including i3.*** instances (with NVMe)
● fully automated management of EC2 spots
What’s inside the docker container?
Source: https://github.com/postgres-ai/nancy/tree/master/docker
Image: https://hub.docker.com/r/postgresmen/postgres-with-stuff/
Inside:
● Ubuntu 16.04
● Postgres (now 9.6 or 10)
● postgres_dba (for manual debugging)
● pg_stat_statements enabled
● auto_explain enabled (all queries, with timing)
● pgreplay
● pgBadger
● pg_stat_kcache (soon)
● additional utilities
Part 5. The future of Nancy CLI
Various ways to create an experimental database
● plain text pg_dump
○ restoration is very slow (1 vcpu utilized)
○ “logical” – physical structure is lost (cannot experiment with bloat, etc)
○ small (if compressed)
○ “snapshot” only
● pg_dump with either -Fd (“directory”) or -Fc (“custom”):
○ restoration is faster (multiple vCPUs, -j option)
○ “logical” (again: bloat, physical layout is “lost”)
○ small (because compressed)
○ “snapshot” only
● pg_basebackup + WALs, point-in-time recovery (PITR), possibly with help from WAL-E, WAL-G, pgBackRest
○ less reliable, sometimes there issues (especially if 3rd party tools involved - e.g. WAL-E & WAL-G don’t
support tablespaces, there are bugs sometimes, etc)
○ “physical”: bloat and physical structure is preserved
○ not small – ~ size of the DB
○ can “walk in time” (PITR)
○ requires warm-up procedure (data is not in the memory!)
● AWS RDS: create a replica + promote it
○ no Spots :-/
○ Lazy Load is tricky (it looks like the DB is there but it’s very slow – warm-up is needed)
How can we speed up experimental runs?
● Prepare the EC2 instance(s) in advance and keep it
● Prepare EBS volume(s) only (perhaps, using an instance of the different
type) and keep it ready. When attached to the new instance, do warm-up
● Resource re-usage:
○ reuse docker container
○ reuse EC2 instance
○ serialize experimental runs serialization (DDL Do/Undo; VACUUM FULL; cleanup)
● Partial database snapshots (dump/restore only needed tables)
The future development of Nancy CLI
● Speedup DB creation
● Support GCP
● More artifacts delivered: pg_stat_kcache, etc
● nancy see-report to print the summary + top-30 queries
● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for
top-30 queries, ordered by by total time based on the 1st report)
● Postgres 11
● pgbench -i for database initialization
● pgbench to generate multithreaded synthetic workload
● Workload analysis: automatically detect “N+1 SELECT” when running workload
● Better support for the serialization of experimental runs
● Better support for multiple runs:
○ interval with step
○ gradient descent
● Provide costs estimation (time + money)
● Rewrite in Python or Go
The future development of Nancy CLI
● Speedup DB creation
● Support GCP
● More artifacts delivered: pg_stat_kcache, etc
● nancy see-report to print the summary + top-30 queries
● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for
top-30 queries, ordered by by total time based on the 1st report)
● Postgres 11
● pgbench -i for database initialization
● pgbench to generate multithreaded synthetic workload
● Workload analysis: automatically detect “N+1 SELECT” when running workload
● Better support for the serialization of experimental runs
● Better support for multiple runs:
○ interval with step
○ gradient descent
● Provide costs estimation (time + money)
● Rewrite in Python or Go
Contributions welcome!
Thank you!
Nikolay Samokhvalov
ru@postgresql.org
twitter: @postgresmen
Postgres.ai
https://github.com/postgres-ai/nancy
54

More Related Content

PDF
Logging for Production Systems in The Container Era
PDF
Building a Complex, Real-Time Data Management Application
PPTX
How to automate all your SEO projects
PDF
Postgres
PPT
Building a CRM on top of ElasticSearch
PDF
[2B1]검색엔진의 패러다임 전환
PDF
[2D1]Elasticsearch 성능 최적화
PDF
Analyse your SEO Data with R and Kibana
Logging for Production Systems in The Container Era
Building a Complex, Real-Time Data Management Application
How to automate all your SEO projects
Postgres
Building a CRM on top of ElasticSearch
[2B1]검색엔진의 패러다임 전환
[2D1]Elasticsearch 성능 최적화
Analyse your SEO Data with R and Kibana

What's hot (20)

PDF
Forcelandia 2016 PK Chunking
PDF
Mathias test
PPTX
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
PDF
PGConf APAC 2018 - Tale from Trenches
PDF
[262] netflix 빅데이터 플랫폼
PDF
Spark with Elasticsearch - umd version 2014
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PDF
An introduction to Storm Crawler
PDF
ElasticSearch
PDF
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ODP
Query DSL In Elasticsearch
PDF
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
PDF
아파트 정보를 이용한 ELK stack 활용 - 오근문
PDF
Elasticsearch first-steps
KEY
Elasticsearch & "PeopleSearch"
PDF
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
PPTX
Introduction to ELK
PDF
Elasticsearch speed is key
PDF
Streaming in Scala with Avro
PDF
Practical automation for beginners
Forcelandia 2016 PK Chunking
Mathias test
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
PGConf APAC 2018 - Tale from Trenches
[262] netflix 빅데이터 플랫폼
Spark with Elasticsearch - umd version 2014
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
An introduction to Storm Crawler
ElasticSearch
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
Query DSL In Elasticsearch
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
아파트 정보를 이용한 ELK stack 활용 - 오근문
Elasticsearch first-steps
Elasticsearch & "PeopleSearch"
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Introduction to ELK
Elasticsearch speed is key
Streaming in Scala with Avro
Practical automation for beginners
Ad

Similar to Nancy CLI. Automated Database Experiments (20)

PDF
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
PDF
Creating PostgreSQL-as-a-Service at Scale
DOC
Quick guide to PostgreSQL Performance Tuning
PDF
Best Practices for Becoming an Exceptional Postgres DBA
 
PDF
Performance Whackamole (short version)
PDF
EnterpriseDB's Best Practices for Postgres DBAs
 
PDF
5 Steps to PostgreSQL Performance
PDF
Five steps perform_2009 (1)
PPTX
PostGreSQL Performance Tuning
PDF
Postgresql 90 High Performance Paperback 20101020 Gregory Smith
PDF
The Accidental DBA
PDF
Five steps perform_2013
PPTX
High Performance and Scalability Database Design
PPTX
Transform your DBMS to drive engagement innovation with Big Data
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
PPTX
Lessons learned from designing a QA Automation for analytics databases (big d...
PPTX
Postgres in production.2014
 
PDF
Performance Whack-a-Mole Tutorial (pgCon 2009)
PDF
Using PEM to understand and improve performance in Postgres: Postgres Tuning ...
 
PPTX
CS 542 -- Query Execution
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
Creating PostgreSQL-as-a-Service at Scale
Quick guide to PostgreSQL Performance Tuning
Best Practices for Becoming an Exceptional Postgres DBA
 
Performance Whackamole (short version)
EnterpriseDB's Best Practices for Postgres DBAs
 
5 Steps to PostgreSQL Performance
Five steps perform_2009 (1)
PostGreSQL Performance Tuning
Postgresql 90 High Performance Paperback 20101020 Gregory Smith
The Accidental DBA
Five steps perform_2013
High Performance and Scalability Database Design
Transform your DBMS to drive engagement innovation with Big Data
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Lessons learned from designing a QA Automation for analytics databases (big d...
Postgres in production.2014
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Using PEM to understand and improve performance in Postgres: Postgres Tuning ...
 
CS 542 -- Query Execution
Ad

More from Nikolay Samokhvalov (20)

PDF
Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
PDF
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
PDF
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
PDF
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
PDF
Database First! О распространённых ошибках использования РСУБД
PDF
2016.10.13 PostgreSQL in Russia
PDF
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
PDF
#noBackend, или Как выжить в эпоху толстеющих клиентов
PPTX
#PostgreSQLRussia в банке Тинькофф, доклад №1
PDF
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
PDF
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
PDF
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
PPTX
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
PDF
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
PDF
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
PPTX
2014.12.23 Александр Андреев, Parallels
PDF
2014.10.15 Сергей Бурладян, Avito.ru
PDF
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
PDF
2014.10.15 блиц-доклад PostgreSQL kNN search
PDF
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)
Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
Database First! О распространённых ошибках использования РСУБД
2016.10.13 PostgreSQL in Russia
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
#noBackend, или Как выжить в эпоху толстеющих клиентов
#PostgreSQLRussia в банке Тинькофф, доклад №1
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
2014.12.23 Александр Андреев, Parallels
2014.10.15 Сергей Бурладян, Avito.ru
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
2014.10.15 блиц-доклад PostgreSQL kNN search
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Monthly Chronicles - July 2025
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
cuic standard and advanced reporting.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Nancy CLI. Automated Database Experiments

  • 1. NANCY CLI: a unified way to manage Database Experiments in clouds Postgres.AI Nikolay Samokhvalov twitter: @postgresmen email: ru@postgresql.org
  • 2. About me Postgres experience: 12+ years (database systems: 17+) Founder and CTO of 3 startups (total 30M+ users), all based on Postgres Founder of #RuPostgres (1700+ members on Meetup.com, 2nd largest globally) Re-launched consulting practice in the SF Bay Area http://PostgreSQL.support Founder of Postgres.AI – the Postgres platform to automate what is not yet automated Twitter: @postgresmen Email: ru@postgresql.org
  • 4. Pre-story. Finding the largest tables in a database How many times did you google things like this?
  • 5. Finding the largest tables, a semi-automated way postgres_dba – The missing set of useful tools for Postgres https://github.com/NikolayS/postgres_dba
  • 6. Finding the largest tables, a semi-automated way Report #2: sizes of the tables in the current database
  • 7. Installation of postgres_dba Installation is trivial: Important: psql version 10 is needed. (install postgresql-client-10 package, see README) Server version may be older (Use ssh tunnel to connect to remote servers, see README) 7
  • 8. Part 1. Why do we need automated DB experiments?
  • 9. - Let’s do default_statistics_target = 1000! - Let’s do random_page_cost = 1! - I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore, ¾ is much better! - Let’s add this index here! - Let’s use partitioning for this table! - Let’s don’t allow having >100,000 dead tuples in a table! - Etc, etc, etc... How do we do performance improvements nowadays?
  • 10. The Whole Truth - Let’s do default_statistics_target = 1000! - Let’s do random_page_cost = 1! - I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore, ¾ is much better! - Let’s add this index here! - Let’s use partitioning for this table! - Let’s don’t allow having >100,000 dead tuples in a table! - Etc, etc, etc...
  • 11. The Whole Truth Does it give better (or at least not worse) performance for all queries? Is this value the best for our database & workload? Does it give a real gain for our database & workload? - Let’s do default_statistics_target = 1000! - Let’s do random_page_cost = 1! - I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore, ¾ is much better! - Let’s add this index here! - Let’s use partitioning for this table! - Let’s don’t allow having >100,000 dead tuples in a table! - Etc, etc, etc...
  • 12. ...and more - postgres_dba shows that the bloat level is 35% for this 1B-rows table. Is it good or bad? How bad? (Or how good?) - When do I need to add more RAM to my database server to keep good performance characteristics? - What will happen when more users come to use my app? - Will i3.xlarge handle 3,000 UPDATEs per second? - Etc, etc, etc
  • 13. How do we make changes now? Option #1 Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this query is slow on production. Let’s fix that! ⇒ the problem is already there...
  • 14. How do we make changes now? Option #1 Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this query is slow on production. Let’s fix that! ⇒ the problem is already there... Option #2 DBA: well, this will be slow. Add this index! And we can do even better, let’s use a partial index here! ⇒ good if the DBA is really experienced and/or verified ideas on a DB clone. But how often is it so? And how many queries were checked?
  • 15. Towards the better future Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API
  • 16. Towards the better future Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API Option 2: a human or artificial DBAs has an idea for improvement. Let’s verify it! Again with the same “what-if” API
  • 17. Towards the better future Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API Option 2: a human or artificial DBAs has an idea for improvement. Let’s verify it! Again with the same “what-if” API ⇒ continuous database administration
  • 18. Towards the better future Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API Option 2: a human or artificial DBAs has an idea for improvement. Let’s verify it! Again with the same “what-if” API ⇒ continuous database administration Bonus: let’s use the same “what-if” API to get more knowledge for AI we are building (to train ML models)
  • 19. So, why do we need to automate DB experiments? If we see that our change is good for one or several queries it doesn’t mean that it is so for all queries. Without performing the deep SQL query analysis (analyzing “all” query groups on pg_stat_statement’s Top-N) we are blind. And database administration is a black magic.
  • 20. Part 2. Existing works and tools
  • 21. Existing works and solutions Andy Pavlo, CMU: ● “What is a Self-Driving Database Management System?” great overview of history and existing works, an entry point to learn what’s done (good research papers, Microsoft works, etc) ● PelotonDB, ottertune – great research projects
  • 22. Existing works and solutions Andy Pavlo, CMU: ● “What is a Self-Driving Database Management System?” great overview of history and existing works, an entry point to learn what’s done (good research papers, Microsoft works, etc) ● PelotonDB, ottertune – great research projects Oracle’s RAT (Real Application Testing) – Database Replay + SQL Performance Analyzer: ● Real Application Testing, Oracle 18c ● Sample report ● “Oracle Real Application Testing Delivers 224% ROI”
  • 23. How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib DIY automated pipeline for DB optimization
  • 24. DIY automated pipeline for DB optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib The basis for Nancy
  • 25. DIY automated pipeline for DB optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib pgBadger: ● Grouping queries can be implemented better (see pg_query) ● Makes all queries lower cased (hurts "camelCased" names) ● Doesn’t really support plans (auto_explain)
  • 26. DIY automated pipeline for DB optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib pgBadger: ● Grouping queries can be implemented better (see pg_query) ● Makes all queries lower cased (hurts "camelCased" names) ● Doesn’t really support plans (auto_explain) pgreplay and pgBadger are not friends, require different log formats
  • 28. Already automated: ● Setup/tune hardware, OS, FS ● Provision Postgres instances ● Create replicas ● High Availability: detect failures and switch to replicas ● Create backups ● Basic monitoring 28
  • 29. Already automated: ● Postgres parameters tuning ● Query analysis and optimization ● Index set optimization ● Detailed monitoring ● Verify optimization ideas ● Benchmarks ● Regression&performance CI-like testing ● Setup/tune hardware, OS, FS ● Provision Postgres instances ● Create replicas ● High Availability: detect failures and switch to replicas ● Create backups ● Basic monitoring Little to zero level of automation: 29
  • 30. Already automated: ● Postgres parameters tuning ● Query analysis and optimization ● Index set optimization ● Detailed monitoring ● Verify optimization ideas ● Benchmarks ● Regression&performance CI-like testing ● Setup/tune hardware, OS, FS ● Provision Postgres instances ● Create replicas ● High Availability: detect failures and switch to replicas ● Create backups ● Basic monitoring Little to zero level of automation: 30 Can be done with Database Experiments
  • 31. What is the Database Experiment? The Database Experiment – a set of actions...
  • 32. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis
  • 33. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database
  • 34. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload
  • 35. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment
  • 36. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment with an optional change of the database and environment (called “delta”).
  • 37. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment with an optional change of the database and environment (called “delta”). An experiment may consist of one or more experimental runs.
  • 38. What is the Database Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment with an optional change of the database and environment (called “delta”). An experiment may consist of one or more experimental runs. To analyze the impact of some delta, we need at least two runs, one of them being “clean run” (w/o delta).
  • 39. What is the Database Experiment? The input of a experimental run: ● Environment ○ Location (on-premise or GCP or AWS), hardware (CPU, RAM, disks) ○ System (OS, file system) ○ Postgres version ○ Postgres configuration ● Database snapshot. Can be: ○ A dump (regular or in directory format) ○ A physical archive (pg_basebackup or pgBackRest/WAL-E/WAL-G/…) ○ A replica promoted for experiments ○ Some synthetic one, a generated database (“create table as …”, pgbench -i, etc) ● Workload. Can be: ○ Synthetic (custom SQL), single-threaded ○ Synthetic (custom SQL), multi-threaded (with pgbench) ○ “Real workload” (based on logs) ● [Optional] Delta: ○ Configuration change(s) (e.g.: shared_buffers = 16GB) ○ Some DDL (e.g.: `create index …`). “Undo” DDL is required in this case `drop index …`) to enable serialization of experiments
  • 40. What is the Database Experiment? The output of a experimental run: ● the contents of basic pg_stat_*** (e.g. pg_stat_user_tables) ● the contents of pg_stat_statements ● the contents of pg_stat_kcache ● the PostgreSQL detailed log (with auto_explain turned on) ● the pgBadger’s extended report in JSON format ● the Postgres config at time of applying the workload
  • 41. AI-based cloud-friendly platform to automate database administration 41 Steve AI-based expert in capacity planning and database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in database experiments. Conducts experiments and presents results to human and artificial DBAs Sign up for early access: http://Postgres.ai
  • 42. AI-based cloud-friendly platform to automate database administration 42 Steve AI-based expert in capacity planning and database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in database experiments. Conducts experiments and presents results to human and artificial DBAs Sign up for early access: http://Postgres.ai
  • 43. Demo 1 Postgres.AI GUI live demonstration
  • 44. Metastorage + GUI Postgres.AI architecture Databases being observed AWS S3 dump/ backup nancyprepare-workload Nancy Steve Joe AWS EC2 docker machines Nancy CLI A human engineer can use: ● GUI ● CLI ● Chat
  • 46. Demo 1 Nancy CLI live demonstration (local + AWS)
  • 47. Meet Nancy CLI (open source) Nancy CLI https://github.com/postgres-ai/nancy ● custom docker image (Postgres with extensions & tools) ● nancy prepare-workload to convert Postgres logs (now only .csv) to workload binary file ● nancy run to run experiments ● able to run locally (any machine) on in EC2 spot instance (low price!), including i3.*** instances (with NVMe) ● fully automated management of EC2 spots
  • 48. What’s inside the docker container? Source: https://github.com/postgres-ai/nancy/tree/master/docker Image: https://hub.docker.com/r/postgresmen/postgres-with-stuff/ Inside: ● Ubuntu 16.04 ● Postgres (now 9.6 or 10) ● postgres_dba (for manual debugging) ● pg_stat_statements enabled ● auto_explain enabled (all queries, with timing) ● pgreplay ● pgBadger ● pg_stat_kcache (soon) ● additional utilities
  • 49. Part 5. The future of Nancy CLI
  • 50. Various ways to create an experimental database ● plain text pg_dump ○ restoration is very slow (1 vcpu utilized) ○ “logical” – physical structure is lost (cannot experiment with bloat, etc) ○ small (if compressed) ○ “snapshot” only ● pg_dump with either -Fd (“directory”) or -Fc (“custom”): ○ restoration is faster (multiple vCPUs, -j option) ○ “logical” (again: bloat, physical layout is “lost”) ○ small (because compressed) ○ “snapshot” only ● pg_basebackup + WALs, point-in-time recovery (PITR), possibly with help from WAL-E, WAL-G, pgBackRest ○ less reliable, sometimes there issues (especially if 3rd party tools involved - e.g. WAL-E & WAL-G don’t support tablespaces, there are bugs sometimes, etc) ○ “physical”: bloat and physical structure is preserved ○ not small – ~ size of the DB ○ can “walk in time” (PITR) ○ requires warm-up procedure (data is not in the memory!) ● AWS RDS: create a replica + promote it ○ no Spots :-/ ○ Lazy Load is tricky (it looks like the DB is there but it’s very slow – warm-up is needed)
  • 51. How can we speed up experimental runs? ● Prepare the EC2 instance(s) in advance and keep it ● Prepare EBS volume(s) only (perhaps, using an instance of the different type) and keep it ready. When attached to the new instance, do warm-up ● Resource re-usage: ○ reuse docker container ○ reuse EC2 instance ○ serialize experimental runs serialization (DDL Do/Undo; VACUUM FULL; cleanup) ● Partial database snapshots (dump/restore only needed tables)
  • 52. The future development of Nancy CLI ● Speedup DB creation ● Support GCP ● More artifacts delivered: pg_stat_kcache, etc ● nancy see-report to print the summary + top-30 queries ● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for top-30 queries, ordered by by total time based on the 1st report) ● Postgres 11 ● pgbench -i for database initialization ● pgbench to generate multithreaded synthetic workload ● Workload analysis: automatically detect “N+1 SELECT” when running workload ● Better support for the serialization of experimental runs ● Better support for multiple runs: ○ interval with step ○ gradient descent ● Provide costs estimation (time + money) ● Rewrite in Python or Go
  • 53. The future development of Nancy CLI ● Speedup DB creation ● Support GCP ● More artifacts delivered: pg_stat_kcache, etc ● nancy see-report to print the summary + top-30 queries ● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for top-30 queries, ordered by by total time based on the 1st report) ● Postgres 11 ● pgbench -i for database initialization ● pgbench to generate multithreaded synthetic workload ● Workload analysis: automatically detect “N+1 SELECT” when running workload ● Better support for the serialization of experimental runs ● Better support for multiple runs: ○ interval with step ○ gradient descent ● Provide costs estimation (time + money) ● Rewrite in Python or Go Contributions welcome!
  • 54. Thank you! Nikolay Samokhvalov ru@postgresql.org twitter: @postgresmen Postgres.ai https://github.com/postgres-ai/nancy 54