Postgres demystified

Postgres Demystified
Craig Kerstiens
@craigkerstiens
http://www.craigkerstiens.com
https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

Postgres Demystified
We’re
Hiring

Getting Setup
Postgres.app

Agenda
Brief History
Developing w/ Postgres
Postgres Performance
Querying

Postgres History
Post Ingress
Postgres
Around since 1989/1995
PostgresQL
Community Driven/Owned

MVCC

Each query sees transactions committed before it
Locks for writing don’t conflict with reading

Why Postgres

“ its the emacs of databases”

Basics
psql is your friend
# dt
# d
# d tablename
# x
# e

Datatypes
UUID
date boolean
timestamptz array interval integer
smallint line bigint XML
enum char
money
serial bytea point float
inet polygon numeric circle
cidr varchar tsquery
text timetz
path time timestamp box
macaddr
tsvector

Datatypes
UUID
date boolean
timestamptz array interval integer
smallint line bigint XML
enum char
money
serial bytea point float
inet polygon numeric circle
cidr varchar tsquery
text timetz
path time timestamp
macaddr box
tsvector

Datatypes
CREATE TABLE items (
id serial NOT NULL,
name varchar (255),
tags varchar(255) [],

created_at timestamp
);

Datatypes
INSERT INTO items
VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());

INSERT INTO items
VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());

Extensions
dblink hstore trigram
uuid-ossp pgstattuple
citext pgrowlocks
pgcrypto
isn ltree fuzzystrmatch
cube earthdistance dict_int
tablefunc
unaccent btree_gist dict_xsyn

NoSQL in your SQL
CREATE EXTENSION hstore;
CREATE TABLE users (
id integer NOT NULL,
email character varying(255),
data hstore,
created_at timestamp without time zone,
last_login timestamp without time zone
);

hStore
INSERT INTO users
VALUES (
1,
'craig.kerstiens@gmail.com',
'sex => "M", state => “California”',
now(),
now()
);

JSON
SELECT
'{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;

V8 w/ PLV8

9.2

JSON
SELECT

create or replace function
V8 w/ PLV8
js(src text) returns text as $$
return eval(
"(function() { " + src + "})"
)();
$$ LANGUAGE plv8; 9.2

JSON
SELECT

create or replace function
V8 w/ PLV8
js(src text) returns text as $$
return eval( Bad Idea
"(function() { " + src + "})"
)();
$$ LANGUAGE plv8; 9.2

Range Types
CREATE TABLE talks (room int, during tsrange);
INSERT INTO talks VALUES
(3, '[2012-09-24 13:00, 2012-09-24 13:50)');

9.2

Range Types
CREATE TABLE talks (room int, during tsrange);
(3, '[2012-09-24 13:00, 2012-09-24 13:50)');

ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);
(1108, '[2012-09-24 13:30, 2012-09-24 14:00)');
ERROR: conﬂicting key value violates exclusion constraint
"talks_during_excl"
9.2

Full Text Search
TSVECTOR - Text Data
TSQUERY - Search Predicates

Specialized Indexes and Operators

PostGIS
1. New datatypes i.e. (2d/3d boxes)

PostGIS

2. New operators i.e. SELECT foo && bar ...

PostGIS

2. New operators i.e. SELECT foo && bar ...

3. Understand relationships and distance
i.e. person within location, nearest distance

Sequential Scans

They’re Bad

Sequential Scans

They’re Bad (most of the time)

Indexes

They’re Good (most of the time)

Indexes
B-Tree
Generalized Inverted Index (GIN)
Generalized Search Tree (GIST)
K Nearest Neighbors (KNN)
Space Partitioned GIST (SP-GIST)

Indexes
B-Tree
Default
Usually want this

Indexes
Generalized Inverted Index (GIN)
Use with multiple values in 1 column
Array/hStore

Indexes
Generalized Search Tree (GIST)
Full text search
Shapes

Understanding Query Perf
Given
SELECT last_name
FROM employees
WHERE salary >= 50000;

Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
Filter: (salary >= 50000)
(3 rows)

Explain
50000;
Startup Cost
QUERY PLAN
-------------------------------------------------------------------------
(3 rows)

Explain
50000; Max Time
Startup Cost
QUERY PLAN
-------------------------------------------------------------------------
(3 rows)

Explain
50000; Max Time
Startup Cost
QUERY PLAN
-------------------------------------------------------------------------
(3 rows)
Rows Returned

Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
Total runtime: 295.379
(3 rows)

Explain
salary >= 50000;
Startup Cost QUERY PLAN
-------------------------------------------------------------------------
time=2.401..295.247 rows=1428 loops=1)
(3 rows)

Explain
salary >= 50000;
Max Time
-------------------------------------------------------------------------
time=2.401..295.247 rows=1428 loops=1)
(3 rows)

Explain
salary >= 50000;
Max Time
-------------------------------------------------------------------------
time=2.401..295.247 rows=1428 loops=1)
(3 rows) Rows Returned

# CREATE INDEX idx_emps ON employees (salary);

# CREATE INDEX idx_emps ON employees (salary);
salary >= 50000;
QUERY PLAN
-------------------------------------------------------------------------
Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
Index Cond: (salary >= 50000)
Total runtime: 1.771 ms
(3 rows)

Indexes Pro Tips
CREATE INDEX CONCURRENTLY

Indexes Pro Tips
CREATE INDEX WHERE foo=bar

Indexes Pro Tips
SELECT * WHERE foo LIKE ‘%bar% is BAD

Indexes Pro Tips
SELECT * WHERE foo LIKE ‘%bar% is BAD
SELECT * WHERE Food LIKE ‘bar%’ is OKAY

Cache Hit Rate
SELECT
'index hit rate' as name,
(sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio
FROM pg_statio_user_indexes
union all
SELECT
'cache hit rate' as name,
case sum(idx_blks_hit)
when 0 then 'NaN'::numeric
else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read),
'99.99')::numeric
end as ratio
FROM pg_statio_user_indexes;)

Index Hit Rate
SELECT
relname,
100 * idx_scan / (seq_scan + idx_scan),
n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

Index Hit Rate
relname | percent_of_times_index_used | rows_in_table
---------------------+-----------------------------+---------------
events | 0 | 669917
app_infos_user_info | 0 | 198218
app_infos | 50 | 175640
user_info | 3 | 46718
rollouts | 0 | 34078
favorites | 0 | 3059
schema_migrations | 0 | 2
authorizations | 0 | 0
delayed_jobs | 23 | 0

pg_stats_statements

9.2

pg_stats_statements
$ select * from pg_stat_statements where query ~ 'from users where email';
userid │ 16384
dbid │ 16388
query │ select * from users where email = ?;
calls │ 2
total_time │ 0.000268
rows │ 2
shared_blks_hit │ 16
shared_blks_read │ 0
shared_blks_dirtied │ 0
shared_blks_written │ 0
local_blks_hit │ 0
local_blks_read │ 0
local_blks_dirtied │ 0

9.2
local_blks_written │ 0
temp_blks_read │ 0
temp_blks_written │ 0
time_read │ 0
time_write │ 0

pg_stats_statements
SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
----------------------------------------------------------------------
query | UPDATE pgbench_branches SET bbalance
= bbalance + ? WHERE bid = ?;
calls | 3000
total_time | 9609.00100000002
rows | 2836
hit_percent | 99.9778970000200936
9.2

Window Functions
Example:
Biggest spender by state

Window Functions
SELECT
email,
users.data->'state',
sum(total(items)),
rank() OVER
(PARTITION BY users.data->'state'
ORDER BY sum(total(items)) desc)
FROM
users, purchases
WHERE purchases.user_id = users.id
GROUP BY 1, 2;

Fuzzystrmatch
SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

Fuzzystrmatch
SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');
SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');

Moving Data Around
copy (SELECT * FROM users) TO ‘~/users.csv’;

copy users FROM ‘~/users.csv’;

db_link
SELECT dblink_connect('myconn', 'dbname=postgres');
SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text);

a | b
-------+------------
1 | example
2 | example2

Foreign Data Wrappers
oracle mysql odbc
twitter sybase redis jdbc
files couch s3
www ldap
informix
mongodb

Foreign Data Wrappers
CREATE EXTENSION redis_fdw;

CREATE SERVER redis_server

FOREIGN DATA WRAPPER redis_fdw

OPTIONS (address '127.0.0.1', port '6379');

CREATE FOREIGN TABLE redis_db0 (key text, value text)

SERVER redis_server

OPTIONS (database '0');

CREATE USER MAPPING FOR PUBLIC

SERVER redis_server

OPTIONS (password 'secret');

Query Redis from Postgres
SELECT *
FROM redis_db0;

SELECT
id,
email,
value as visits
FROM
users,
redis_db0
WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text)
AND cast(value as int) > 40;

All are not equal
oracle mysql odbc
files couch s3
www ldap
informix
mongodb

Production Ready FDWs
oracle mysql odbc
files couch s3
www ldap
informix
mongodb

Readability
WITH top_5_products AS (
SELECT products.*, count(*)
FROM products, line_items
WHERE products.id = line_items.product_id
GROUP BY products.id
ORDER BY count(*) DESC
LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;

Common Table Expressions
LIMIT 5
)

GROUP BY 1
ORDER BY 1;

LIMIT 5
)

GROUP BY 1
’
ORDER BY 1;

LIMIT 5
)

GROUP BY 1
Don’t do this in production
’
ORDER BY 1;

Brief History
Developing w/ Postgres
Postgres Performance
Querying

Extras
Listen/Notify
Per Transaction Synchronous Replication

Extras
Listen/Notify
Per Transaction Synchronous Replication
SELECT for UPDATE

Postgres - TLDR
Datatypes Fast Column Addition
Conditional Indexes Listen/Notify
Transactional DDL Table Inheritance
Foreign Data Wrappers Per Transaction sync replication
Concurrent Index Creation Window functions
Extensions NoSQL inside SQL
Common Table Expressions Momentum

Questions?
Craig Kerstiens
@craigkerstiens
http://www.craigkerstiens.com
https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

Postgres demystified

More Related Content

Similar to Postgres demystified (20)

Recently uploaded (20)

Postgres demystified