SlideShare a Scribd company logo
Postgres Demystified
                        Craig Kerstiens
                        @craigkerstiens
                        http://www.craigkerstiens.com
https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified
Postgres Demystified
Postgres Demystified
Postgres Demystified
                   We’re
                   Hiring
Getting Setup
            Postgres.app
Agenda
     Brief History
Developing w/ Postgres
 Postgres Performance
       Querying
Postgres History
                  Post Ingress
 Postgres
             Around since 1989/1995
PostgresQL
             Community Driven/Owned
MVCC

Each query sees transactions committed before it
   Locks for writing don’t conflict with reading
Why Postgres
Why Postgres

“ its the emacs of databases”
Developing w/ Postgres
Basics
psql is your friend
Basics
psql is your friend
  #   dt
  #   d
  #   d tablename
  #   x
  #   e
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point        float
inet                        polygon numeric                circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp box
            macaddr
                          tsvector
Datatypes
                                                   UUID
                             date                          boolean
    timestamptz array                  interval integer
      smallint line         bigint XML
                                              enum char
                                  money
              serial bytea                        point        float
inet                         polygon numeric                circle
             cidr   varchar          tsquery
                                text                timetz
  path               time                timestamp box
            macaddr
                           tsvector
Datatypes
                                                   UUID
                             date                          boolean
    timestamptz array                  interval integer
      smallint line         bigint XML
                                              enum char
                                  money
              serial bytea                        point        float
inet                         polygon numeric                circle
             cidr   varchar          tsquery
                                text                timetz
  path               time                timestamp box
            macaddr
                           tsvector
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point         float
inet                        polygon numeric                 circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp
            macaddr                                    box
                          tsvector
Datatypes
CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],
	

 created_at timestamp
);
Datatypes
CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],
	

 created_at timestamp
);
Datatypes
INSERT INTO items
VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());

INSERT INTO items
VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point        float
inet                        polygon numeric                circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp box
            macaddr
                          tsvector
Extensions
 dblink      hstore                trigram
                      uuid-ossp            pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
Extensions
 dblink       hstore               trigram
                       uuid-ossp           pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
NoSQL in your SQL
CREATE EXTENSION hstore;
CREATE TABLE users (
   id integer NOT NULL,
   email character varying(255),
   data hstore,
   created_at timestamp without time zone,
   last_login timestamp without time zone
);
hStore
INSERT INTO users
VALUES (
  1,
  'craig.kerstiens@gmail.com',
  'sex => "M", state => “California”',
  now(),
  now()
);
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;


                          V8 w/ PLV8

                                                            9.2
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;


                          V8 w/ PLV8

                                                            9.2
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;



create or replace function
                          V8 w/ PLV8
js(src text) returns text as $$
  return eval(
  "(function() { " + src + "})"
  )();
$$ LANGUAGE plv8;                                           9.2
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;



create or replace function
                          V8 w/ PLV8
js(src text) returns text as $$
  return eval(                               Bad Idea
  "(function() { " + src + "})"
  )();
$$ LANGUAGE plv8;                                           9.2
Range Types



              9.2
Range Types
CREATE TABLE talks (room int, during tsrange);
INSERT INTO talks VALUES
  (3, '[2012-09-24 13:00, 2012-09-24 13:50)');




                                                 9.2
Range Types
CREATE TABLE talks (room int, during tsrange);
INSERT INTO talks VALUES
  (3, '[2012-09-24 13:00, 2012-09-24 13:50)');


ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);
INSERT INTO talks VALUES
   (1108, '[2012-09-24 13:30, 2012-09-24 14:00)');
ERROR: conflicting key value violates exclusion constraint
"talks_during_excl"
                                                       9.2
Full Text Search
Full Text Search
TSVECTOR - Text Data
TSQUERY - Search Predicates

Specialized Indexes and Operators
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point         float
inet                        polygon numeric                 circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp
            macaddr                                    box
                          tsvector
PostGIS
PostGIS
1. New datatypes   i.e. (2d/3d boxes)
PostGIS
1. New datatypes   i.e. (2d/3d boxes)

2. New operators   i.e. SELECT foo && bar ...
PostGIS
1. New datatypes     i.e. (2d/3d boxes)

2. New operators     i.e. SELECT foo && bar ...

3. Understand relationships and distance
                     i.e. person within location, nearest distance
Postgres demystified
Performance
Sequential Scans
Sequential Scans

They’re Bad
Sequential Scans

They’re Bad (most of the time)
Indexes
Indexes

They’re Good
Indexes

They’re Good (most of the time)
Indexes
B-Tree
Generalized Inverted Index (GIN)
Generalized Search Tree (GIST)
K Nearest Neighbors (KNN)
Space Partitioned GIST (SP-GIST)
Indexes
B-Tree
  Default
  Usually want this
Indexes
Generalized Inverted Index (GIN)
  Use with multiple values in 1 column
  Array/hStore
Indexes
Generalized Search Tree (GIST)
  Full text search
  Shapes
Understanding Query Perf
           Given
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;
     Startup Cost
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;                                             Max Time
     Startup Cost
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;                                             Max Time
     Startup Cost
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
                                                 Rows Returned
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
                                 Max Time
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
                                 Max Time
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)                           Rows Returned
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
                                 Max Time
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)                           Rows Returned
# CREATE INDEX idx_emps ON employees (salary);
# CREATE INDEX idx_emps ON employees (salary);
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
  Index Cond: (salary >= 50000)
Total runtime: 1.771 ms
(3 rows)
# CREATE INDEX idx_emps ON employees (salary);
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
  Index Cond: (salary >= 50000)
Total runtime: 1.771 ms
(3 rows)
Indexes Pro Tips
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
CREATE INDEX WHERE foo=bar
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
CREATE INDEX WHERE foo=bar
SELECT * WHERE foo LIKE ‘%bar% is BAD
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
CREATE INDEX WHERE foo=bar
SELECT * WHERE foo LIKE ‘%bar% is BAD
SELECT * WHERE Food LIKE ‘bar%’ is OKAY
Extensions
 dblink      hstore                trigram
                      uuid-ossp            pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
Postgres demystified
Cache Hit Rate
SELECT
       'index hit rate' as name,
       (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio
     FROM pg_statio_user_indexes
     union all
     SELECT
      'cache hit rate' as name,
       case sum(idx_blks_hit)
         when 0 then 'NaN'::numeric
         else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read),
'99.99')::numeric
       end as ratio
     FROM pg_statio_user_indexes;)
Index Hit Rate
SELECT
 relname,
 100 * idx_scan / (seq_scan + idx_scan),
 n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
Index Hit Rate
        relname      | percent_of_times_index_used | rows_in_table
---------------------+-----------------------------+---------------
 events              |                           0 |        669917
 app_infos_user_info |                           0 |        198218
 app_infos           |                          50 |        175640
 user_info           |                           3 |         46718
 rollouts            |                           0 |         34078
 favorites           |                           0 |          3059
 schema_migrations   |                           0 |             2
 authorizations      |                           0 |             0
 delayed_jobs        |                          23 |             0
pg_stats_statements



                      9.2
pg_stats_statements
$ select * from pg_stat_statements where query ~ 'from users where email';
userid                      │   16384
dbid                        │   16388
query                       │   select * from users where email = ?;
calls                       │   2
total_time                  │   0.000268
rows                        │   2
shared_blks_hit             │   16
shared_blks_read            │   0
shared_blks_dirtied         │   0
shared_blks_written         │   0
local_blks_hit              │   0
local_blks_read             │   0
local_blks_dirtied          │   0



                                                                             9.2
local_blks_written          │   0
temp_blks_read              │   0
temp_blks_written           │   0
time_read                   │   0
time_write                  │   0
pg_stats_statements



                      9.2
pg_stats_statements
SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
             nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
         FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
----------------------------------------------------------------------
query                 | UPDATE pgbench_branches SET bbalance
= bbalance + ? WHERE bid = ?;
calls                 | 3000
total_time | 9609.00100000002
rows                  | 2836
hit_percent | 99.9778970000200936
                                                               9.2
Postgres demystified
Querying
Window Functions
Example:
 Biggest spender by state
Window Functions
SELECT
 email,
 users.data->'state',
 sum(total(items)),
 rank() OVER
  (PARTITION BY users.data->'state'
  ORDER BY sum(total(items)) desc)
FROM
 users, purchases
WHERE purchases.user_id = users.id
GROUP BY 1, 2;
Window Functions
SELECT
 email,
 users.data->'state',
 sum(total(items)),
 rank() OVER
  (PARTITION BY users.data->'state'
  ORDER BY sum(total(items)) desc)
FROM
 users, purchases
WHERE purchases.user_id = users.id
GROUP BY 1, 2;
Extensions
 dblink      hstore                trigram
                      uuid-ossp            pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
Fuzzystrmatch
Fuzzystrmatch
SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');
Fuzzystrmatch
SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');


SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');
SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');
Moving Data Around
copy (SELECT * FROM users) TO ‘~/users.csv’;

copy users FROM ‘~/users.csv’;
db_link
SELECT dblink_connect('myconn', 'dbname=postgres');
SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text);

   a |      b
-------+------------
   1 | example
   2 | example2
Foreign Data Wrappers
oracle             mysql             odbc
         twitter          sybase redis           jdbc
files                couch             s3
        www                ldap
                                      informix
          mongodb
Foreign Data Wrappers
CREATE EXTENSION redis_fdw;

CREATE SERVER redis_server
	

 FOREIGN DATA WRAPPER redis_fdw
	

 OPTIONS (address '127.0.0.1', port '6379');

CREATE FOREIGN TABLE redis_db0 (key text, value text)
	

 SERVER redis_server
	

 OPTIONS (database '0');

CREATE USER MAPPING FOR PUBLIC
	

 SERVER redis_server
	

 OPTIONS (password 'secret');
Query Redis from Postgres
SELECT *
FROM redis_db0;

SELECT
 id,
 email,
 value as visits
FROM
 users,
 redis_db0
WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text)
   AND cast(value as int) > 40;
All are not equal
oracle             mysql             odbc
         twitter          sybase redis           jdbc
files                couch             s3
        www                ldap
                                      informix
          mongodb
Production Ready FDWs
oracle             mysql             odbc
         twitter          sybase redis           jdbc
files                couch             s3
        www                ldap
                                      informix
          mongodb
Readability
Readability
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
                              ’
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
                Don’t do this in production
                              ’
ORDER BY 1;
Brief History
Developing w/ Postgres
 Postgres Performance
       Querying
Extras
Extras
Listen/Notify
Extras
Listen/Notify
Per Transaction Synchronous Replication
Extras
Listen/Notify
Per Transaction Synchronous Replication
SELECT for UPDATE
Postgres - TLDR
         Datatypes                Fast Column Addition
    Conditional Indexes               Listen/Notify
     Transactional DDL              Table Inheritance
  Foreign Data Wrappers     Per Transaction sync replication
Concurrent Index Creation          Window functions
        Extensions                  NoSQL inside SQL
Common Table Expressions               Momentum
Questions?
                        Craig Kerstiens
                        @craigkerstiens
                        http://www.craigkerstiens.com
https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

More Related Content

PPT
Using QString effectively
PDF
04 - Qt Data
KEY
Esperwhispering
PDF
Encryption Boot Camp on the JVM
PDF
Monolith to Reactive Microservices
PDF
How does cryptography work? by Jeroen Ooms
PDF
Better Web Clients with Mantle and AFNetworking
PDF
Java script objects 2
 
Using QString effectively
04 - Qt Data
Esperwhispering
Encryption Boot Camp on the JVM
Monolith to Reactive Microservices
How does cryptography work? by Jeroen Ooms
Better Web Clients with Mantle and AFNetworking
Java script objects 2
 

Similar to Postgres demystified (20)

PDF
Going beyond Django ORM limitations with Postgres
PDF
Heroku Postgres SQL Tips, Tricks, Hacks
PDF
Heroku Postgres Cloud Database Webinar
PDF
Postgresql 9.3 overview
PDF
Drizzles Approach To Improving Performance Of The Server
PPTX
PostgreSQL.pptx
PDF
Transition from relational to NoSQL Philly DAMA Day
PDF
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
PDF
Introduction to NoSQL and Couchbase
PDF
On Beyond (PostgreSQL) Data Types
PDF
Extensions on PostgreSQL
PDF
50 Ways To Love Your Project
PDF
Exalead managing terrabytes
PDF
Introduction to Tokyo Products
PDF
Introduction to tokyo products
PPTX
PDF
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
PDF
CMF: a pain in the F @ PHPDay 05-14-2011
PDF
Breaking the-database-type-barrier-replicating-across-different-dbms
Going beyond Django ORM limitations with Postgres
Heroku Postgres SQL Tips, Tricks, Hacks
Heroku Postgres Cloud Database Webinar
Postgresql 9.3 overview
Drizzles Approach To Improving Performance Of The Server
PostgreSQL.pptx
Transition from relational to NoSQL Philly DAMA Day
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Introduction to NoSQL and Couchbase
On Beyond (PostgreSQL) Data Types
Extensions on PostgreSQL
50 Ways To Love Your Project
Exalead managing terrabytes
Introduction to Tokyo Products
Introduction to tokyo products
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
CMF: a pain in the F @ PHPDay 05-14-2011
Breaking the-database-type-barrier-replicating-across-different-dbms
Ad

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Ad

Postgres demystified

  • 1. Postgres Demystified Craig Kerstiens @craigkerstiens http://www.craigkerstiens.com https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified
  • 4. Postgres Demystified We’re Hiring
  • 5. Getting Setup Postgres.app
  • 6. Agenda Brief History Developing w/ Postgres Postgres Performance Querying
  • 7. Postgres History Post Ingress Postgres Around since 1989/1995 PostgresQL Community Driven/Owned
  • 8. MVCC Each query sees transactions committed before it Locks for writing don’t conflict with reading
  • 10. Why Postgres “ its the emacs of databases”
  • 13. Basics psql is your friend # dt # d # d tablename # x # e
  • 14. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 15. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 16. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 17. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp macaddr box tsvector
  • 18. Datatypes CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );
  • 19. Datatypes CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );
  • 20. Datatypes INSERT INTO items VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now()); INSERT INTO items VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());
  • 21. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 22. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 23. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 24. NoSQL in your SQL CREATE EXTENSION hstore; CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone );
  • 25. hStore INSERT INTO users VALUES ( 1, 'craig.kerstiens@gmail.com', 'sex => "M", state => “California”', now(), now() );
  • 26. JSON SELECT '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json; V8 w/ PLV8 9.2
  • 27. JSON SELECT '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json; V8 w/ PLV8 9.2
  • 28. JSON SELECT '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json; create or replace function V8 w/ PLV8 js(src text) returns text as $$ return eval( "(function() { " + src + "})" )(); $$ LANGUAGE plv8; 9.2
  • 29. JSON SELECT '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json; create or replace function V8 w/ PLV8 js(src text) returns text as $$ return eval( Bad Idea "(function() { " + src + "})" )(); $$ LANGUAGE plv8; 9.2
  • 30. Range Types 9.2
  • 31. Range Types CREATE TABLE talks (room int, during tsrange); INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)'); 9.2
  • 32. Range Types CREATE TABLE talks (room int, during tsrange); INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)'); ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&); INSERT INTO talks VALUES (1108, '[2012-09-24 13:30, 2012-09-24 14:00)'); ERROR: conflicting key value violates exclusion constraint "talks_during_excl" 9.2
  • 34. Full Text Search TSVECTOR - Text Data TSQUERY - Search Predicates Specialized Indexes and Operators
  • 35. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp macaddr box tsvector
  • 37. PostGIS 1. New datatypes i.e. (2d/3d boxes)
  • 38. PostGIS 1. New datatypes i.e. (2d/3d boxes) 2. New operators i.e. SELECT foo && bar ...
  • 39. PostGIS 1. New datatypes i.e. (2d/3d boxes) 2. New operators i.e. SELECT foo && bar ... 3. Understand relationships and distance i.e. person within location, nearest distance
  • 44. Sequential Scans They’re Bad (most of the time)
  • 48. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)
  • 49. Indexes B-Tree Default Usually want this
  • 50. Indexes Generalized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore
  • 51. Indexes Generalized Search Tree (GIST) Full text search Shapes
  • 52. Understanding Query Perf Given SELECT last_name FROM employees WHERE salary >= 50000;
  • 53. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  • 54. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  • 55. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; Max Time Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  • 56. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; Max Time Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Rows Returned
  • 57. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  • 58. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  • 59. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN Max Time ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  • 60. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN Max Time ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Rows Returned
  • 61. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN Max Time ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Rows Returned
  • 62. # CREATE INDEX idx_emps ON employees (salary);
  • 63. # CREATE INDEX idx_emps ON employees (salary); # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)
  • 64. # CREATE INDEX idx_emps ON employees (salary); # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)
  • 66. Indexes Pro Tips CREATE INDEX CONCURRENTLY
  • 67. Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar
  • 68. Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD
  • 69. Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD SELECT * WHERE Food LIKE ‘bar%’ is OKAY
  • 70. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 72. Cache Hit Rate SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes;)
  • 73. Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;
  • 74. Index Hit Rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0
  • 76. pg_stats_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 9.2 local_blks_written │ 0 temp_blks_read │ 0 temp_blks_written │ 0 time_read │ 0 time_write │ 0
  • 78. pg_stats_statements SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; ---------------------------------------------------------------------- query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?; calls | 3000 total_time | 9609.00100000002 rows | 2836 hit_percent | 99.9778970000200936 9.2
  • 82. Window Functions SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2;
  • 83. Window Functions SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2;
  • 84. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 87. Fuzzystrmatch SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will'); SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg'); SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');
  • 88. Moving Data Around copy (SELECT * FROM users) TO ‘~/users.csv’; copy users FROM ‘~/users.csv’;
  • 89. db_link SELECT dblink_connect('myconn', 'dbname=postgres'); SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text); a | b -------+------------ 1 | example 2 | example2
  • 90. Foreign Data Wrappers oracle mysql odbc twitter sybase redis jdbc files couch s3 www ldap informix mongodb
  • 91. Foreign Data Wrappers CREATE EXTENSION redis_fdw; CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address '127.0.0.1', port '6379'); CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database '0'); CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password 'secret');
  • 92. Query Redis from Postgres SELECT * FROM redis_db0; SELECT id, email, value as visits FROM users, redis_db0 WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text) AND cast(value as int) > 40;
  • 93. All are not equal oracle mysql odbc twitter sybase redis jdbc files couch s3 www ldap informix mongodb
  • 94. Production Ready FDWs oracle mysql odbc twitter sybase redis jdbc files couch s3 www ldap informix mongodb
  • 96. Readability WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;
  • 97. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;
  • 98. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;
  • 99. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ’ ORDER BY 1;
  • 100. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 Don’t do this in production ’ ORDER BY 1;
  • 101. Brief History Developing w/ Postgres Postgres Performance Querying
  • 102. Extras
  • 105. Extras Listen/Notify Per Transaction Synchronous Replication SELECT for UPDATE
  • 106. Postgres - TLDR Datatypes Fast Column Addition Conditional Indexes Listen/Notify Transactional DDL Table Inheritance Foreign Data Wrappers Per Transaction sync replication Concurrent Index Creation Window functions Extensions NoSQL inside SQL Common Table Expressions Momentum
  • 107. Questions? Craig Kerstiens @craigkerstiens http://www.craigkerstiens.com https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified