Advanced Int->Bigint Conversions

ADVANCED
INT->BIGINT
CONVERSIONS
Robert Treat

introduction
Robert Treat
“I help people Postgres”
@robtreat2
xzilla.net

ground rules
ok to ask questions
slides will be online
feel free to take notes

warning!
all code is based on
poorly written notes
derived from code written
under duress
during real production
outages
ymmv

tldr;
ERROR: integer out of range

SMALLINT BIGINT
2 bytes
+32,767
INTEGER
4 bytes
+2,147,483,647
Goldilocks & the 3 data types
8 bytes
+9,223,372,036
,854,775,807
foreshadow: min value is not zero, but negative max

More practically…
Postgres “SERIAL” data type
● most applications want auto-generated unique values to use as a surrogate
primary key* aka “id serial primary key”
● SERIAL type creates an integer column and a sequence and ties them together
● There is a “BIGSERIAL” type which ties to bigint, but it isn’t as widely known nor
default in most schema creation tools

More practically…
What about identity columns?
● “id integer primary key generated always as identity
● OKAY… but you still might be wrong. We’ll come back to that later.

More practically…
What about identity columns?
● “id integer primary key generated always as identity
● OKAY… but you still might be wrong. We’ll come back to that later.
We are not going to debate logical vs surrogate keys in this talk,
nor are we going to discuss the merits of uuid based primary keys!!!

Please keep in mind…
The nature of integer overflow problems means typically
● Often surprising. Often have to be fixed under stress.
● May have taken years to get there. Institutional knowledge may be scarce.
● Lot’s of data. Like 2 billion rows of it maybe. Makes everything harder.

Can we eliminate the problem?
use bigint where “needed”
- usually surprising
- bugs in ORM
- artificial escalation

artificial escalation => errors and rollbacks
create table x (y serial primary key not null, z jsonb not null);
BEGIN; insert into x values (default,'{}'::jsonb);
insert into x values (default,'{}'::jsonb);
insert into x values (default,'{}'::jsonb); ROLLBACK;
select count(*) from x;
count
-----
0

artificial escalation => errors and rollbacks
create table x (y serial primary key not null, z jsonb not null);
BEGIN; insert into x values (default,'{}'::jsonb);
insert into x values (default,'{}'::jsonb);
insert into x values (default,'{}'::jsonb); ROLLBACK;
select count(*) from x;
count
-----
0
select * from x_y_seq;
last_value | 3
log_cnt | 30
is_called | t

artificial escalation => insert … on conflict …
create table x (b int primary key not null, i serial);
INSERT INTO x (b) select 1 union all select 2 union all select 3 ON CONFLICT DO NOTHING;
INSERT INTO x (b) select 5 ON CONFLICT DO NOTHING;

artificial escalation => insert … on conflict …
create table x (b int primary key not null, i serial);
INSERT INTO x (b) select 5 ON CONFLICT DO NOTHING;
select * from x;
b | i
---|---
1 | 1
2 | 2
3 | 3
5 | 10
select * from x_i_seq;
last_value | 10
log_cnt | 23
is_called | t

artificial escalation => on purpose
setval
alter sequence

- bugs in ORM
use bigint everywhere?
- more space on disk (heap)
- more space on disk (index)
- more ram
- more swap
- more network usage

- bugs in ORM
use bigint everywhere?
- more space on disk (heap)
- more space on disk (index)
- more ram
- more swap
- more network usage
But actually… other databases handle it this way (crdb)

We could use UUID based primary keys!

We could use UUID based primary keys!
But I already told you we aren’t here for that.

Ok, we can’t stop it a priori…
but I bet we can monitor the problem away!

Ok, we can’t stop it a priori…
but I bet we can monitor the problem away!
We work in complex distributed systems with incomplete mental models and constantly
changing inputs; The idea that it is possible to test comprehensively enough to avoid
production outages is a logical fallacy.

select max(id) from mesa;
probably fine

select max(id) from mesa;
probably fine
what about foreign keys?
select max(parent_id) from child_table;
need to build extra indexes

what about foreign keys?
select max(parent_id) from child_table;
need to build extra indexes
real world issues:
- in billion row systems, people often drop FK to work around
locking/performance issues.
- doesn’t account for integer arrays
- doesn’t account for externally referenced ID’s
- or any normal int columns not part of FK

WITH
cols AS (
select attrelid, attname, atttypid::regtype::text as type,
relname, nspname
from pg_attribute
JOIN pg_class ON (attrelid=oid)
JOIN pg_namespace ON (relnamespace=pg_namespace.oid)
Where relkind='r'
AND atttypid::regtype::text IN ('integer', 'bigint', 'integer[]')
),
intarrvals AS (
SELECT s.tablename, s.attname, cols.type, max(i), min(i)
FROM pg_stats s
JOIN cols ON (cols.type = 'integer[]' AND s.schemaname = cols.nspname AND s.tablename = cols.relname AND s.attname=cols.attname),
unnest(histogram_bounds::text::text[]) a,
unnest(a::int[]) i
GROUP BY s.tablename, s.attname, cols.type
),
intvals AS (
FROM pg_stats s
JOIN cols ON (cols.type = 'integer' AND s.schemaname = cols.nspname AND s.tablename = cols.relname AND s.attname=cols.attname),
unnest(histogram_bounds::text::int[]) i
),
data AS (
select * from intvals
union all
select * from intarrvals
)
select tablename, attname, type, min, max from data;

WITH
cols AS (
relname, nspname
from pg_attribute
Where relkind='r'
),
intarrvals AS (
FROM pg_stats s
unnest(a::int[]) i
),
intvals AS (
FROM pg_stats s
),
data AS (
union all
)
Gimme all the columns that are
integer/bigint/int array

WITH
cols AS (
relname, nspname
from pg_attribute
Where relkind='r'
),
intarrvals AS (
FROM pg_stats s
JOIN cols ON (cols.type = 'integer[]' AND s.schemaname = cols.nspname AND s.tablename = cols.relname AND
s.attname=cols.attname),
unnest(a::int[]) i
),
intvals AS (
FROM pg_stats s
),
data AS (
union all
)
Now grab the min/max values
from pg_stats that we have
collected from analyze

WITH
cols AS (
relname, nspname
from pg_attribute
Where relkind='r'
),
intarrvals AS (
FROM pg_stats s
unnest(a::int[]) i
),
intvals AS (
FROM pg_stats s
),
data AS (
union all
)
smash that data together and
then tell me where each table
stands

WITH
cols AS (
relname, nspname
from pg_attribute
Where relkind='r'
),
intarrvals AS (
FROM pg_stats s
unnest(a::int[]) i
),
intvals AS (
FROM pg_stats s
),
data AS (
union all
)
Even with this query, be
careful!
- only as good as your last
analyze
- watch out for negatives
- still might not protect you
from artificial escalation

ERROR: integer out of range
💩

alter sequence @seqname
minvalue -2147483648
restart -2147483648;
This will flip your sequence negative and begin counting upwards.
You now have 2 billion transactions to FYS (fix your system).
Good luck! Oh yeah, this might break things if you do silly things
like rely on pk ordering. It might also break your apps, but we’ll
come back to that :-)

db=> begin; alter table m rename to other_m;
db-> create view m as select coalesce(y::bigint,fut_y) as y, z from other_m; commit;
ALTER TABLE
CREATE VIEW
View "public.m"
Column | Type | Nullable | Default | Storage | Description
--------+--------+----------+---------+----------+-------------
y | bigint | | | plain |
z | jsonb | | | extended |
View definition:
SELECT COALESCE(other_m.y::bigint, other_m.fut_y) AS y,
other_m.z
FROM other_m;
*add trigger(s) for ins/upd/del on m { y := fut_y() }

⇒ backfill update other_m set fut_y = y;
db=> begin; drop view m;
db-> alter table other_m drop column y;
db-> alter table other_m rename column fut_y to y;
db-> alter table other_m rename to m; commit;
DROP VIEW
ALTER TABLE
ALTER TABLE
ALTER TABLE
Table "public.m"
--------+---------+----------+------------------------------
y | bigint | not null | nextval('m_y_seq'::regclass)
z | jsonb | |

db=> begin; alter table m rename to orig_m;
db-> create view m as select
db-> x, coalesce(o.y::bigint,f.y) as y, z
db-> from orig_m o join future_m f using (x); commit;
ALTER TABLE
CREATE VIEW
View "public.m"
--------+--------+----------+---------+----------+-------------
x | bigint | | | plain |

db=> begin; alter table m rename to orig_m;
db-> create view m as select
db-> x, coalesce(o.y::bigint,f.y) as y, z
db-> from orig_m o join future_m f using (x); commit;
ALTER TABLE
CREATE VIEW
View "public.m"
--------+--------+----------+---------+----------+-------------
x | bigint | | | plain |
*add trigger(s) on m => ins/upd/del orig_m where x=$1
*add trigger(s) on orig_m => ins/upd/del future_m where x=$1

tip: you can play the same tricks as views
and new tables using logical replication or
FDW, it is just a bit more complex.

tip: I glossed over a lot of things like trigger
code, foreign keys, constraints, and similar
trickery. You can work it out, just takes more
time/effort.

Won’t somebody think of the children?
By children we mean app code, because developers (just kidding!)
● was your app based on the original ORM schema
definition (ie. int)?

● what number types does your language support?
○ unsigned int? (0 to 4294967295, oh my!)

● what number types does your language support?
○ unsigned int? (0 to 4294967295, oh my!)
● modern systems are like ogre’s… they have layers
○ api?
○ compiled?

CREATE OR REPLACE FUNCTION public.generate_pk_id()
RETURNS bigint AS
$BODY$
DECLARE
per_mil int;
BEGIN
SELECT (random() * 100.0::FLOAT8)::INT INTO per_mil;
CASE
WHEN per_mil = 100 THEN
return nextval('pk_id_seq'::regclass);
ELSE
return nextval('ex_pk_id_seq'::regclass);
END CASE;
END
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

Additional Readings
● https://doordash.engineering/2020/10/21/hot-swapping-
production-data-tables/
● https://doordash.engineering/2022/01/19/making-applic
ations-compatible-with-postgres-tables-bigint-update/
● https://medium.com/doctolib/how-to-change-a-column-t
ype-in-your-productions-postgresql-database-35d6fa19
4cb8

Advanced Int->Bigint Conversions

More Related Content

Similar to Advanced Int->Bigint Conversions (20)

More from Robert Treat (20)

Recently uploaded (20)

Advanced Int->Bigint Conversions