SlideShare a Scribd company logo
PostgreSQL Procedural
Languages:
Tips, Tricks & Gotchas
Who Am I?
● Jim Mlodgenski
– jimm@openscg.com
– @jim_mlodgenski
● Co-organizer of
– NYC PUG (www.nycpug.org)
– Philly PUG (www.phlpug.org)
● CTO, OpenSCG
– www.openscg.com
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Stored procedures/functions
● Code that runs inside of the database
● Used for:
– Performance
– Security
– Convenience
functions=# SELECT airport FROM bird_strikes LIMIT 5;
airport
--------------------------
NEWARK LIBERTY INTL ARPT
UNKNOWN
DENVER INTL AIRPORT
CHICAGO O'HARE INTL ARPT
JOHN F KENNEDY INTL
(5 rows)
Source: http://wildlife.faa.gov/
Sample Data
functions=# SELECT count(*)
functions-# FROM bird_strikes
functions-# WHERE get_iata_code_from_abbr_name(airport) =
'LAX';
count
-------
850
(1 row)
Time: 13490.611 ms
Data Formatting Functions
functions=# EXPLAIN ANALYZE SELECT count(*) FROM bird_strikes ...
QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=29418.79..29418.80 rows=1 width=0) (actual
time=13463.628..13463.629 rows=1 loops=1)
-> Seq Scan on bird_strikes (cost=0.00..29417.55 rows=497 width=0)
(actual time=15.721..13463.293 rows=850 loops=1)
Filter: ((get_iata_code_from_abbr_name(airport))::text =
'LAX'::text)
Rows Removed by Filter: 98554
Planning time: 0.124 ms
Execution time: 13463.682 ms
(6 rows)
Check Performance
functions=# set track_functions = 'pl';
SET
functions=# select * from pg_stat_user_functions;
(No rows)
functions=# SELECT count(*) FROM bird_strikes ...
-[ RECORD 1 ]
count | 850
Track Function Usage
functions=# select * from pg_stat_user_functions;
-[ RECORD 1 ]----------------------------
funcid | 41247
schemaname | public
funcname | get_iata_code_from_name
calls | 88547
total_time | 12493.419
self_time | 12493.419
-[ RECORD 2 ]----------------------------
funcid | 41246
schemaname | public
funcname | get_iata_code_from_abbr_name
calls | 99404
total_time | 13977.674
self_time | 1484.255
Isolate Performance Issues
CREATE OR REPLACE FUNCTION get_iata_code_from_abbr_name(abbr_name varchar)
RETURNS varchar AS
$$
DECLARE
working_name varchar;
code varchar := null;
BEGIN
working_name := upper(abbr_name);
IF working_name = 'UNKNOWN' THEN
RETURN null;
END IF;
working_name := replace(working_name, 'INTL', 'INTERNATIONAL');
working_name := replace(working_name, 'ARPT', 'AIRPORT');
working_name := replace(working_name, 'MUNI', 'MUNICIPAL');
working_name := replace(working_name, 'METRO', 'METROPOLITAN');
working_name := replace(working_name, 'NATL', 'NATIONAL');
working_name := replace(working_name, '-', ' ');
working_name := replace(working_name, '/', ' ');
working_name := working_name || '%';
code := get_iata_code_from_name(working_name);
RETURN code;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION get_iata_code_from_name(airport_name varchar)
RETURNS varchar AS
$$
DECLARE
working_name varchar;
code varchar := null;
BEGIN
working_name := upper(airport_name);
EXECUTE $__$ SELECT iata_code
FROM airports
WHERE upper(name) LIKE $1
$__$
INTO code
USING working_name;
RETURN code;
END;
$$ LANGUAGE plpgsql;
Debugger
http://git.postgresql.org/gitweb/?p=pldebugger.git
functions=# select * from pl_profiler ;
func_oid | line_number | line | exec_count | total_time | longest_time
----------+-------------+---------------------------------------------------------------------+------------+------------+--------------
41246 | 1 | | 0 | 0 | 0
41246 | 2 | DECLARE | 0 | 0 | 0
41246 | 3 | working_name varchar; | 0 | 0 | 0
41246 | 4 | code varchar := null; | 0 | 0 | 0
41246 | 5 | BEGIN | 0 | 0 | 0
41246 | 6 | working_name := upper(abbr_name); | 99404 | 210587 | 363
41246 | 7 | | 0 | 0 | 0
41246 | 8 | IF working_name = 'UNKNOWN' THEN | 99404 | 63406 | 97
41246 | 9 | RETURN null; | 10857 | 2744 | 15
41246 | 10 | END IF; | 0 | 0 | 0
41246 | 11 | | 0 | 0 | 0
41246 | 12 | working_name := replace(working_name, 'INTL', 'INTERNATIONAL'); | 88547 | 116474 | 145
41246 | 13 | working_name := replace(working_name, 'ARPT', 'AIRPORT'); | 88547 | 83015 | 91
41246 | 14 | working_name := replace(working_name, 'MUNI', 'MUNICIPAL'); | 88547 | 70676 | 74
41246 | 15 | working_name := replace(working_name, 'METRO', 'METROPOLITAN'); | 88547 | 67392 | 63
41246 | 16 | working_name := replace(working_name, 'NATL', 'NATIONAL'); | 88547 | 64681 | 70
41246 | 17 | | 0 | 0 | 0
41246 | 18 | working_name := replace(working_name, '-', ' '); | 88547 | 66771 | 62
41246 | 19 | working_name := replace(working_name, '/', ' '); | 88547 | 65054 | 66
41246 | 20 | working_name := working_name || '%'; | 88547 | 64892 | 207
41246 | 21 | | 0 | 0 | 0
41246 | 22 | code := get_iata_code_from_name(working_name); | 88547 | 12282997 | 3709
41246 | 23 | | 0 | 0 | 0
41246 | 24 | RETURN code; | 88547 | 33374 | 14
41246 | 25 | END; | 0 | 0 | 0
41247 | 1 | | 0 | 0 | 0
41247 | 2 | DECLARE | 0 | 0 | 0
41247 | 3 | working_name varchar; | 0 | 0 | 0
41247 | 4 | code varchar := null; | 0 | 0 | 0
41247 | 5 | BEGIN | 0 | 0 | 0
41247 | 6 | working_name := upper(airport_name); | 88547 | 170273 | 90
41247 | 7 | | 0 | 0 | 0
41247 | 8 | EXECUTE $__$ SELECT iata_code | 88547 | 11572604 | 3273
41247 | 9 | FROM airports | 0 | 0 | 0
41247 | 10 | WHERE upper(name) LIKE $1 | 0 | 0 | 0
41247 | 11 | $__$ | 0 | 0 | 0
41247 | 12 | INTO code | 0 | 0 | 0
41247 | 13 | USING working_name; | 0 | 0 | 0
41247 | 14 | | 0 | 0 | 0
41247 | 15 | RETURN code; | 88547 | 121574 | 27
41247 | 16 | END; | 0 | 0 | 0
(41 rows)
Profiler
https://bitbucket.org/openscg/plprofiler
● Be careful when you have a
function call another function
– May lead to difficult to diagnose
performance problems
● Be careful when a function is used
in a WHERE clause
– For sequential scans, it may
execute once per row in the table
functions=# SELECT iso_region FROM airports LIMIT 5;
iso_region
------------
US-PA
US-AK
US-AL
US-AR
US-AZ
(5 rows)
Source: http://ourairports.com/data/
Sample Data
CREATE TYPE airport_regions AS (airport_name varchar,
airport_continent varchar,
airport_country varchar,
airport_state varchar);
CREATE OR REPLACE FUNCTION get_airport_regions()
RETURNS SETOF airport_regions AS
$$
BEGIN
RETURN QUERY SELECT name::varchar, continent::varchar,
iso_country::varchar,
split_part(iso_region, '-', 2)::varchar
FROM airports;
END;
$$ LANGUAGE plpgsql;
Set Returning Functions
functions=# SELECT b.num_wildlife_struck
FROM bird_strikes b, state_code s,
get_airport_regions() r
WHERE b.origin_state = s.name
AND s.abbreviation = r.airport_state
AND r.airport_continent = 'NA';
num_wildlife_struck
---------------------
…
Time: 48507.635 ms
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Nested Loop (cost=42.10..318.77 rows=1972 width=2) (actual time=43.468..38467.229 rows=60334427 loops=1)
-> Hash Join (cost=12.81..14.51 rows=1 width=9) (actual time=43.284..58.007 rows=21488 loops=1)
Hash Cond: ((s.abbreviation)::text = (r.airport_state)::text)
-> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.007..0.045 rows=50
loops=1)
-> Hash (cost=12.75..12.75 rows=5 width=32) (actual time=43.264..43.264 rows=25056 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 857kB
-> Function Scan on get_airport_regions r (cost=0.25..12.75 rows=5 width=32) (actual
time=34.050..39.650 rows=25056 loops=1)
Filter: ((airport_continent)::text = 'NA'::text)
Rows Removed by Filter: 21150
-> Bitmap Heap Scan on bird_strikes b (cost=29.29..288.48 rows=1578 width=10) (actual
time=0.445..1.343 rows=2808 loops=21488)
Recheck Cond: ((origin_state)::text = (s.name)::text)
Heap Blocks: exact=31639334
-> Bitmap Index Scan on bird_strikes_state (cost=0.00..28.89 rows=1578 width=0) (actual
time=0.285..0.285 rows=2808 loops=21488)
Index Cond: ((origin_state)::text = (s.name)::text)
Planning time: 0.742 ms
Execution time: 40447.925 ms
(16 rows)
Time: 40449.209 ms
CREATE OR REPLACE FUNCTION get_airport_regions()
RETURNS SETOF airport_regions AS
$$
BEGIN
RETURN QUERY SELECT name::varchar, continent::varchar,
iso_country::varchar,
split_part(iso_region, '-', 2)::varchar
FROM airports;
END;
$$ LANGUAGE plpgsql
ROWS 46206
COST 600000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Hash Join (cost=2081.87..7687.83 rows=91120 width=2) (actual time=51.589..7568.729 rows=60334427 loops=1)
Hash Cond: ((b.origin_state)::text = (s.name)::text)
-> Seq Scan on bird_strikes b (cost=0.00..4318.04 rows=99404 width=10) (actual time=0.006..14.207
rows=99404 loops=1)
-> Hash (cost=2081.15..2081.15 rows=58 width=9) (actual time=51.571..51.571 rows=21488 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 861kB
-> Hash Join (cost=1502.12..2081.15 rows=58 width=9) (actual time=37.574..48.385 rows=21488 loops=1)
Hash Cond: ((r.airport_state)::text = (s.abbreviation)::text)
-> Function Scan on get_airport_regions r (cost=1500.00..2077.57 rows=231 width=32) (actual
time=37.526..42.626 rows=25056 loops=1)
Filter: ((airport_continent)::text = 'NA'::text)
Rows Removed by Filter: 21150
-> Hash (cost=1.50..1.50 rows=50 width=12) (actual time=0.041..0.041 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 3kB
-> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.004..0.020
rows=50 loops=1)
Planning time: 0.722 ms
Execution time: 9572.353 ms
(15 rows)
Time: 9573.716 ms
● When using set returning functions as
tables, the row and cost estimates are
usually way off
– Default ROWS: 1000
– Default COST: 100
● Note: COST is in units of
cpu_operator_cost which is 0.0025
● Do not use functions to mask a
bad data model
● Use functions to help load the data
into the correct format
Table Partitioning
● Usually done for performance
● Uses check constraints and inherited tables
● Triggers are preferred over rules so COPY can be used
● Trigger functions used to move the data to the correct
child table
CREATE UNLOGGED TABLE trigger_test (key serial primary key,
value varchar,
insert_ts timestamp,
update_ts timestamp);
CREATE UNLOGGED TABLE trigger_test_0
(CHECK ( key % 5 = 0)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_1
(CHECK ( key % 5 = 1)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_2
(CHECK ( key % 5 = 2)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_3
(CHECK ( key % 5 = 3)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_4
(CHECK ( key % 5 = 4)) INHERITS (trigger_test);
CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$
DECLARE
partition int;
BEGIN
partition = NEW.key % 5;
EXECUTE 'INSERT INTO trigger_test_' || partition || ' VALUES
(($1).*)' USING NEW;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER partition_trigger BEFORE INSERT ON trigger_test
FOR EACH ROW EXECUTE PROCEDURE partition_trigger();
Dynamic Trigger
CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$
BEGIN
CASE NEW.key % 5
WHEN 0 THEN
INSERT INTO trigger_test_0 VALUES (NEW.*);
WHEN 1 THEN
INSERT INTO trigger_test_1 VALUES (NEW.*);
WHEN 2 THEN
INSERT INTO trigger_test_2 VALUES (NEW.*);
WHEN 3 THEN
INSERT INTO trigger_test_3 VALUES (NEW.*);
WHEN 4 THEN
INSERT INTO trigger_test_4 VALUES (NEW.*);
END CASE;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Case Statement
● 16% performance gain using CASE Statement
● Tested inserting 100,000 rows
Dynamic Trigger Case Trigger
3200
3400
3600
3800
4000
4200
4400
Performance of Partition Triggers
Trigger Overhead
● Triggers get executed when an event
happens in the database
– INSERT, UPDATE, DELETE
● Event Triggers fire on DDL
– CREATE, DROP, ALTER
CREATE UNLOGGED TABLE trigger_test (
key serial primary key,
value varchar,
insert_ts timestamp,
update_ts timestamp
);
INSERTS.pgbench
INSERT INTO trigger_test (value) VALUES (‘hello’);
pgbench -n -t 100000
-f INSERTS.pgbench functions
Inserts: 5191 TPS
CREATE FUNCTION empty_trigger() RETURNS trigger AS $$
BEGIN
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER empty_trigger BEFORE INSERT OR UPDATE ON
trigger_test
FOR EACH ROW EXECUTE PROCEDURE empty_trigger();
pgbench -n -t 100000
-f INSERTS.pgbench functions
Inserts: 4906 TPS (5.5% overhead)
Overhead of PL Languages
● PL/pgSQL
● C
● PL/Perl
● PL/TCL
● PL/Python
● PL/v8
● PL/Lua
● PL/R
● PL/sh
PL/pgSQL
CREATE FUNCTION empty_trigger() RETURNS
trigger AS $$
BEGIN
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
C
#include "postgres.h"
#include "commands/trigger.h"
PG_MODULE_MAGIC;
Datum empty_c_trigger(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(empty_c_trigger);
Datum
empty_c_trigger(PG_FUNCTION_ARGS)
{
TriggerData *tg;
HeapTuple ret;
tg = (TriggerData *) (fcinfo->context);
if (TRIGGER_FIRED_BY_UPDATE(tg->tg_event))
ret = tg->tg_newtuple;
else
ret = tg->tg_trigtuple;
return PointerGetDatum(ret);
}
PL/Python
CREATE FUNCTION empty_python_trigger()
RETURNS trigger AS
$$
return
$$ LANGUAGE plpythonu;
PL/Perl
CREATE FUNCTION empty_perl_trigger()
RETURNS trigger AS
$$
return;
$$ LANGUAGE plperl;
PL/TCL
CREATE FUNCTION empty_tcl_trigger()
RETURNS trigger AS
$$
return [array get NEW]
$$ LANGUAGE pltcl;
PL/v8
CREATE FUNCTION empty_v8_trigger()
RETURNS trigger AS
$$
return NEW;
$$
LANGUAGE plv8;
PL/R
CREATE FUNCTION empty_r_trigger()
RETURNS trigger AS
$$
return(pg.tg.new)
$$ LANGUAGE plr;
PL/Lua
CREATE FUNCTION empty_lua_trigger()
RETURNS trigger AS
$$
return
$$ LANGUAGE pllua;
PL/sh
CREATE FUNCTION empty_sh_trigger()
RETURNS trigger AS
$$
#!/bin/sh
exit 0
$$ LANGUAGE plsh;
C PL/pgSQL PL/Lua PL/Python PL/Perl PL/v8 PL/TCL PL/R PL/sh
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Percent overhead of triggers
● Think things through before
adding server side code
● Performance test your functions
● Don't use a procedural language
just because it's cool
– Use the right tool for the job
Questions?
jimm@openscg.com

More Related Content

PDF
Strategic autovacuum
PDF
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
PDF
PostgreSQL and PL/Java
PDF
Profiling PL/pgSQL
PDF
Oracle postgre sql-mirgration-top-10-mistakes
PPTX
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
PPT
11 Things About 11gr2
PDF
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Strategic autovacuum
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
PostgreSQL and PL/Java
Profiling PL/pgSQL
Oracle postgre sql-mirgration-top-10-mistakes
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
11 Things About 11gr2
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)

What's hot (19)

PPTX
Using Cerberus and PySpark to validate semi-structured datasets
PPTX
Apache Spark in your likeness - low and high level customization
PDF
Python sqlite3 - flask
PPTX
Best Practices in Handling Performance Issues
PDF
Deep dive into PostgreSQL statistics.
PPTX
Apache Spark Structured Streaming + Apache Kafka = ♡
PDF
Perl6 Regexen: Reduce the line noise in your code.
PDF
Congfigure python as_ide
PDF
Tests unitaires pour PostgreSQL avec pgTap
PDF
pg_proctab: Accessing System Stats in PostgreSQL
PDF
pg_proctab: Accessing System Stats in PostgreSQL
PDF
Troubleshooting PostgreSQL Streaming Replication
ODP
Building and Incredible Machine with Pipelines and Generators in PHP (IPC Ber...
PDF
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
PDF
pg_proctab: Accessing System Stats in PostgreSQL
PDF
Hypers and Gathers and Takes! Oh my!
PDF
BSDM with BASH: Command Interpolation
PDF
Memory Manglement in Raku
Using Cerberus and PySpark to validate semi-structured datasets
Apache Spark in your likeness - low and high level customization
Python sqlite3 - flask
Best Practices in Handling Performance Issues
Deep dive into PostgreSQL statistics.
Apache Spark Structured Streaming + Apache Kafka = ♡
Perl6 Regexen: Reduce the line noise in your code.
Congfigure python as_ide
Tests unitaires pour PostgreSQL avec pgTap
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Troubleshooting PostgreSQL Streaming Replication
Building and Incredible Machine with Pipelines and Generators in PHP (IPC Ber...
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
pg_proctab: Accessing System Stats in PostgreSQL
Hypers and Gathers and Takes! Oh my!
BSDM with BASH: Command Interpolation
Memory Manglement in Raku
Ad

Similar to PostgreSQL Procedural Languages: Tips, Tricks and Gotchas (20)

PDF
Oracle APEX Cheat Sheet
PDF
CREATE STATISTICS - what is it for?
PDF
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
PDF
ETL Patterns with Postgres
DOCX
here is the SQL. PLEASE ONLY START FROM # 6 TO #10 AS I DID #1 TO #5.docx
PDF
Postgres performance for humans
PDF
Function Procedure Trigger Partition.pdf
PPTX
How to tune a query - ODTUG 2012
PPT
98765432345671223Intro-to-PostgreSQL.ppt
PDF
Practical SQL A Beginner s Guide to Storytelling with Data 2nd Edition Anthon...
PDF
Practical SQL: A Beginner's Guide to Storytelling with Data, 2nd Edition Anth...
PPT
A brief introduction to PostgreSQL
PDF
PostgreSQL: Data analysis and analytics
PDF
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PDF
CREATE STATISTICS - What is it for? (PostgresLondon)
PPTX
Greenplum 6 Changes
PDF
Practical SQL: A Beginner's Guide to Storytelling with Data, 2nd Edition Anth...
DOCX
PL/SQL Code for Sample Projects
PPTX
Sql analytic queries tips
PDF
Becoming a better developer with EXPLAIN
Oracle APEX Cheat Sheet
CREATE STATISTICS - what is it for?
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
ETL Patterns with Postgres
here is the SQL. PLEASE ONLY START FROM # 6 TO #10 AS I DID #1 TO #5.docx
Postgres performance for humans
Function Procedure Trigger Partition.pdf
How to tune a query - ODTUG 2012
98765432345671223Intro-to-PostgreSQL.ppt
Practical SQL A Beginner s Guide to Storytelling with Data 2nd Edition Anthon...
Practical SQL: A Beginner's Guide to Storytelling with Data, 2nd Edition Anth...
A brief introduction to PostgreSQL
PostgreSQL: Data analysis and analytics
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
CREATE STATISTICS - What is it for? (PostgresLondon)
Greenplum 6 Changes
Practical SQL: A Beginner's Guide to Storytelling with Data, 2nd Edition Anth...
PL/SQL Code for Sample Projects
Sql analytic queries tips
Becoming a better developer with EXPLAIN
Ad

More from Jim Mlodgenski (8)

PDF
Debugging Your PL/pgSQL Code
PDF
An Introduction To PostgreSQL Triggers
ODP
Introduction to PostgreSQL
ODP
Postgresql Federation
PPT
Leveraging Hadoop in your PostgreSQL Environment
PDF
Scaling PostreSQL with Stado
ODP
Multi-Master Replication with Slony
ODP
Scaling PostgreSQL With GridSQL
Debugging Your PL/pgSQL Code
An Introduction To PostgreSQL Triggers
Introduction to PostgreSQL
Postgresql Federation
Leveraging Hadoop in your PostgreSQL Environment
Scaling PostreSQL with Stado
Multi-Master Replication with Slony
Scaling PostgreSQL With GridSQL

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence

PostgreSQL Procedural Languages: Tips, Tricks and Gotchas

  • 2. Who Am I? ● Jim Mlodgenski – jimm@openscg.com – @jim_mlodgenski ● Co-organizer of – NYC PUG (www.nycpug.org) – Philly PUG (www.phlpug.org) ● CTO, OpenSCG – www.openscg.com
  • 4. Stored procedures/functions ● Code that runs inside of the database ● Used for: – Performance – Security – Convenience
  • 5. functions=# SELECT airport FROM bird_strikes LIMIT 5; airport -------------------------- NEWARK LIBERTY INTL ARPT UNKNOWN DENVER INTL AIRPORT CHICAGO O'HARE INTL ARPT JOHN F KENNEDY INTL (5 rows) Source: http://wildlife.faa.gov/ Sample Data
  • 6. functions=# SELECT count(*) functions-# FROM bird_strikes functions-# WHERE get_iata_code_from_abbr_name(airport) = 'LAX'; count ------- 850 (1 row) Time: 13490.611 ms Data Formatting Functions
  • 7. functions=# EXPLAIN ANALYZE SELECT count(*) FROM bird_strikes ... QUERY PLAN ------------------------------------------------------------------------ Aggregate (cost=29418.79..29418.80 rows=1 width=0) (actual time=13463.628..13463.629 rows=1 loops=1) -> Seq Scan on bird_strikes (cost=0.00..29417.55 rows=497 width=0) (actual time=15.721..13463.293 rows=850 loops=1) Filter: ((get_iata_code_from_abbr_name(airport))::text = 'LAX'::text) Rows Removed by Filter: 98554 Planning time: 0.124 ms Execution time: 13463.682 ms (6 rows) Check Performance
  • 8. functions=# set track_functions = 'pl'; SET functions=# select * from pg_stat_user_functions; (No rows) functions=# SELECT count(*) FROM bird_strikes ... -[ RECORD 1 ] count | 850 Track Function Usage
  • 9. functions=# select * from pg_stat_user_functions; -[ RECORD 1 ]---------------------------- funcid | 41247 schemaname | public funcname | get_iata_code_from_name calls | 88547 total_time | 12493.419 self_time | 12493.419 -[ RECORD 2 ]---------------------------- funcid | 41246 schemaname | public funcname | get_iata_code_from_abbr_name calls | 99404 total_time | 13977.674 self_time | 1484.255 Isolate Performance Issues
  • 10. CREATE OR REPLACE FUNCTION get_iata_code_from_abbr_name(abbr_name varchar) RETURNS varchar AS $$ DECLARE working_name varchar; code varchar := null; BEGIN working_name := upper(abbr_name); IF working_name = 'UNKNOWN' THEN RETURN null; END IF; working_name := replace(working_name, 'INTL', 'INTERNATIONAL'); working_name := replace(working_name, 'ARPT', 'AIRPORT'); working_name := replace(working_name, 'MUNI', 'MUNICIPAL'); working_name := replace(working_name, 'METRO', 'METROPOLITAN'); working_name := replace(working_name, 'NATL', 'NATIONAL'); working_name := replace(working_name, '-', ' '); working_name := replace(working_name, '/', ' '); working_name := working_name || '%'; code := get_iata_code_from_name(working_name); RETURN code; END; $$ LANGUAGE plpgsql;
  • 11. CREATE OR REPLACE FUNCTION get_iata_code_from_name(airport_name varchar) RETURNS varchar AS $$ DECLARE working_name varchar; code varchar := null; BEGIN working_name := upper(airport_name); EXECUTE $__$ SELECT iata_code FROM airports WHERE upper(name) LIKE $1 $__$ INTO code USING working_name; RETURN code; END; $$ LANGUAGE plpgsql;
  • 13. functions=# select * from pl_profiler ; func_oid | line_number | line | exec_count | total_time | longest_time ----------+-------------+---------------------------------------------------------------------+------------+------------+-------------- 41246 | 1 | | 0 | 0 | 0 41246 | 2 | DECLARE | 0 | 0 | 0 41246 | 3 | working_name varchar; | 0 | 0 | 0 41246 | 4 | code varchar := null; | 0 | 0 | 0 41246 | 5 | BEGIN | 0 | 0 | 0 41246 | 6 | working_name := upper(abbr_name); | 99404 | 210587 | 363 41246 | 7 | | 0 | 0 | 0 41246 | 8 | IF working_name = 'UNKNOWN' THEN | 99404 | 63406 | 97 41246 | 9 | RETURN null; | 10857 | 2744 | 15 41246 | 10 | END IF; | 0 | 0 | 0 41246 | 11 | | 0 | 0 | 0 41246 | 12 | working_name := replace(working_name, 'INTL', 'INTERNATIONAL'); | 88547 | 116474 | 145 41246 | 13 | working_name := replace(working_name, 'ARPT', 'AIRPORT'); | 88547 | 83015 | 91 41246 | 14 | working_name := replace(working_name, 'MUNI', 'MUNICIPAL'); | 88547 | 70676 | 74 41246 | 15 | working_name := replace(working_name, 'METRO', 'METROPOLITAN'); | 88547 | 67392 | 63 41246 | 16 | working_name := replace(working_name, 'NATL', 'NATIONAL'); | 88547 | 64681 | 70 41246 | 17 | | 0 | 0 | 0 41246 | 18 | working_name := replace(working_name, '-', ' '); | 88547 | 66771 | 62 41246 | 19 | working_name := replace(working_name, '/', ' '); | 88547 | 65054 | 66 41246 | 20 | working_name := working_name || '%'; | 88547 | 64892 | 207 41246 | 21 | | 0 | 0 | 0 41246 | 22 | code := get_iata_code_from_name(working_name); | 88547 | 12282997 | 3709 41246 | 23 | | 0 | 0 | 0 41246 | 24 | RETURN code; | 88547 | 33374 | 14 41246 | 25 | END; | 0 | 0 | 0 41247 | 1 | | 0 | 0 | 0 41247 | 2 | DECLARE | 0 | 0 | 0 41247 | 3 | working_name varchar; | 0 | 0 | 0 41247 | 4 | code varchar := null; | 0 | 0 | 0 41247 | 5 | BEGIN | 0 | 0 | 0 41247 | 6 | working_name := upper(airport_name); | 88547 | 170273 | 90 41247 | 7 | | 0 | 0 | 0 41247 | 8 | EXECUTE $__$ SELECT iata_code | 88547 | 11572604 | 3273 41247 | 9 | FROM airports | 0 | 0 | 0 41247 | 10 | WHERE upper(name) LIKE $1 | 0 | 0 | 0 41247 | 11 | $__$ | 0 | 0 | 0 41247 | 12 | INTO code | 0 | 0 | 0 41247 | 13 | USING working_name; | 0 | 0 | 0 41247 | 14 | | 0 | 0 | 0 41247 | 15 | RETURN code; | 88547 | 121574 | 27 41247 | 16 | END; | 0 | 0 | 0 (41 rows) Profiler https://bitbucket.org/openscg/plprofiler
  • 14. ● Be careful when you have a function call another function – May lead to difficult to diagnose performance problems ● Be careful when a function is used in a WHERE clause – For sequential scans, it may execute once per row in the table
  • 15. functions=# SELECT iso_region FROM airports LIMIT 5; iso_region ------------ US-PA US-AK US-AL US-AR US-AZ (5 rows) Source: http://ourairports.com/data/ Sample Data
  • 16. CREATE TYPE airport_regions AS (airport_name varchar, airport_continent varchar, airport_country varchar, airport_state varchar); CREATE OR REPLACE FUNCTION get_airport_regions() RETURNS SETOF airport_regions AS $$ BEGIN RETURN QUERY SELECT name::varchar, continent::varchar, iso_country::varchar, split_part(iso_region, '-', 2)::varchar FROM airports; END; $$ LANGUAGE plpgsql; Set Returning Functions
  • 17. functions=# SELECT b.num_wildlife_struck FROM bird_strikes b, state_code s, get_airport_regions() r WHERE b.origin_state = s.name AND s.abbreviation = r.airport_state AND r.airport_continent = 'NA'; num_wildlife_struck --------------------- … Time: 48507.635 ms
  • 18. QUERY PLAN ----------------------------------------------------------------------------------------------------------- Nested Loop (cost=42.10..318.77 rows=1972 width=2) (actual time=43.468..38467.229 rows=60334427 loops=1) -> Hash Join (cost=12.81..14.51 rows=1 width=9) (actual time=43.284..58.007 rows=21488 loops=1) Hash Cond: ((s.abbreviation)::text = (r.airport_state)::text) -> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.007..0.045 rows=50 loops=1) -> Hash (cost=12.75..12.75 rows=5 width=32) (actual time=43.264..43.264 rows=25056 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 857kB -> Function Scan on get_airport_regions r (cost=0.25..12.75 rows=5 width=32) (actual time=34.050..39.650 rows=25056 loops=1) Filter: ((airport_continent)::text = 'NA'::text) Rows Removed by Filter: 21150 -> Bitmap Heap Scan on bird_strikes b (cost=29.29..288.48 rows=1578 width=10) (actual time=0.445..1.343 rows=2808 loops=21488) Recheck Cond: ((origin_state)::text = (s.name)::text) Heap Blocks: exact=31639334 -> Bitmap Index Scan on bird_strikes_state (cost=0.00..28.89 rows=1578 width=0) (actual time=0.285..0.285 rows=2808 loops=21488) Index Cond: ((origin_state)::text = (s.name)::text) Planning time: 0.742 ms Execution time: 40447.925 ms (16 rows) Time: 40449.209 ms
  • 19. CREATE OR REPLACE FUNCTION get_airport_regions() RETURNS SETOF airport_regions AS $$ BEGIN RETURN QUERY SELECT name::varchar, continent::varchar, iso_country::varchar, split_part(iso_region, '-', 2)::varchar FROM airports; END; $$ LANGUAGE plpgsql ROWS 46206 COST 600000;
  • 20. QUERY PLAN -------------------------------------------------------------------------------------------------------------- Hash Join (cost=2081.87..7687.83 rows=91120 width=2) (actual time=51.589..7568.729 rows=60334427 loops=1) Hash Cond: ((b.origin_state)::text = (s.name)::text) -> Seq Scan on bird_strikes b (cost=0.00..4318.04 rows=99404 width=10) (actual time=0.006..14.207 rows=99404 loops=1) -> Hash (cost=2081.15..2081.15 rows=58 width=9) (actual time=51.571..51.571 rows=21488 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 861kB -> Hash Join (cost=1502.12..2081.15 rows=58 width=9) (actual time=37.574..48.385 rows=21488 loops=1) Hash Cond: ((r.airport_state)::text = (s.abbreviation)::text) -> Function Scan on get_airport_regions r (cost=1500.00..2077.57 rows=231 width=32) (actual time=37.526..42.626 rows=25056 loops=1) Filter: ((airport_continent)::text = 'NA'::text) Rows Removed by Filter: 21150 -> Hash (cost=1.50..1.50 rows=50 width=12) (actual time=0.041..0.041 rows=50 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 3kB -> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.004..0.020 rows=50 loops=1) Planning time: 0.722 ms Execution time: 9572.353 ms (15 rows) Time: 9573.716 ms
  • 21. ● When using set returning functions as tables, the row and cost estimates are usually way off – Default ROWS: 1000 – Default COST: 100 ● Note: COST is in units of cpu_operator_cost which is 0.0025
  • 22. ● Do not use functions to mask a bad data model ● Use functions to help load the data into the correct format
  • 23. Table Partitioning ● Usually done for performance ● Uses check constraints and inherited tables ● Triggers are preferred over rules so COPY can be used ● Trigger functions used to move the data to the correct child table
  • 24. CREATE UNLOGGED TABLE trigger_test (key serial primary key, value varchar, insert_ts timestamp, update_ts timestamp); CREATE UNLOGGED TABLE trigger_test_0 (CHECK ( key % 5 = 0)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_1 (CHECK ( key % 5 = 1)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_2 (CHECK ( key % 5 = 2)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_3 (CHECK ( key % 5 = 3)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_4 (CHECK ( key % 5 = 4)) INHERITS (trigger_test);
  • 25. CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$ DECLARE partition int; BEGIN partition = NEW.key % 5; EXECUTE 'INSERT INTO trigger_test_' || partition || ' VALUES (($1).*)' USING NEW; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER partition_trigger BEFORE INSERT ON trigger_test FOR EACH ROW EXECUTE PROCEDURE partition_trigger(); Dynamic Trigger
  • 26. CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$ BEGIN CASE NEW.key % 5 WHEN 0 THEN INSERT INTO trigger_test_0 VALUES (NEW.*); WHEN 1 THEN INSERT INTO trigger_test_1 VALUES (NEW.*); WHEN 2 THEN INSERT INTO trigger_test_2 VALUES (NEW.*); WHEN 3 THEN INSERT INTO trigger_test_3 VALUES (NEW.*); WHEN 4 THEN INSERT INTO trigger_test_4 VALUES (NEW.*); END CASE; RETURN NULL; END; $$ LANGUAGE plpgsql; Case Statement
  • 27. ● 16% performance gain using CASE Statement ● Tested inserting 100,000 rows Dynamic Trigger Case Trigger 3200 3400 3600 3800 4000 4200 4400 Performance of Partition Triggers
  • 28. Trigger Overhead ● Triggers get executed when an event happens in the database – INSERT, UPDATE, DELETE ● Event Triggers fire on DDL – CREATE, DROP, ALTER
  • 29. CREATE UNLOGGED TABLE trigger_test ( key serial primary key, value varchar, insert_ts timestamp, update_ts timestamp ); INSERTS.pgbench INSERT INTO trigger_test (value) VALUES (‘hello’);
  • 30. pgbench -n -t 100000 -f INSERTS.pgbench functions Inserts: 5191 TPS
  • 31. CREATE FUNCTION empty_trigger() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER empty_trigger BEFORE INSERT OR UPDATE ON trigger_test FOR EACH ROW EXECUTE PROCEDURE empty_trigger();
  • 32. pgbench -n -t 100000 -f INSERTS.pgbench functions Inserts: 4906 TPS (5.5% overhead)
  • 33. Overhead of PL Languages ● PL/pgSQL ● C ● PL/Perl ● PL/TCL ● PL/Python ● PL/v8 ● PL/Lua ● PL/R ● PL/sh
  • 34. PL/pgSQL CREATE FUNCTION empty_trigger() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ LANGUAGE plpgsql;
  • 35. C #include "postgres.h" #include "commands/trigger.h" PG_MODULE_MAGIC; Datum empty_c_trigger(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(empty_c_trigger); Datum empty_c_trigger(PG_FUNCTION_ARGS) { TriggerData *tg; HeapTuple ret; tg = (TriggerData *) (fcinfo->context); if (TRIGGER_FIRED_BY_UPDATE(tg->tg_event)) ret = tg->tg_newtuple; else ret = tg->tg_trigtuple; return PointerGetDatum(ret); }
  • 36. PL/Python CREATE FUNCTION empty_python_trigger() RETURNS trigger AS $$ return $$ LANGUAGE plpythonu;
  • 37. PL/Perl CREATE FUNCTION empty_perl_trigger() RETURNS trigger AS $$ return; $$ LANGUAGE plperl;
  • 38. PL/TCL CREATE FUNCTION empty_tcl_trigger() RETURNS trigger AS $$ return [array get NEW] $$ LANGUAGE pltcl;
  • 39. PL/v8 CREATE FUNCTION empty_v8_trigger() RETURNS trigger AS $$ return NEW; $$ LANGUAGE plv8;
  • 40. PL/R CREATE FUNCTION empty_r_trigger() RETURNS trigger AS $$ return(pg.tg.new) $$ LANGUAGE plr;
  • 41. PL/Lua CREATE FUNCTION empty_lua_trigger() RETURNS trigger AS $$ return $$ LANGUAGE pllua;
  • 42. PL/sh CREATE FUNCTION empty_sh_trigger() RETURNS trigger AS $$ #!/bin/sh exit 0 $$ LANGUAGE plsh;
  • 43. C PL/pgSQL PL/Lua PL/Python PL/Perl PL/v8 PL/TCL PL/R PL/sh 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Percent overhead of triggers
  • 44. ● Think things through before adding server side code ● Performance test your functions ● Don't use a procedural language just because it's cool – Use the right tool for the job