SlideShare a Scribd company logo
Running a managed service on
Kubernetes and PostgreSQL
What we learned at Timescale
DoK Day Europe, May 16th, 2022
Oleksii Kliukin
TimescaleDB is
PostgreSQL, supercharged for
time-series data and analytics.
TimescaleDB is
free and open PostgreSQL
extension
Timescale Cloud is a hosted
and fully managed TimescaleDB
running on Kubernetes
Agenda
01
02
Architecture overview 🏗
Challenges 🙃
01
Architecture overview
What we've learned from running a PostgreSQL managed service on Kubernetes
Timescale cloud DB on Kubernetes / AWS
02
Challenges
And solutions
• Timescale continuous/real-time aggregates may require a lot of
memory. Out of memory (OOM) when limits are set low is not
uncommon.
• OOM behavior assumed by PostgreSQL developers:



• Linux OOM killer: SIGKILL a random Postgres process
• A backend process is killed: disruption, restart of every connection
• A postmaster is killed: unclean shutdown, in extreme cases to
startup instance
Challenges
OOM killer
OOM causes abrupt shutdown of PostgreSQL
ERROR: out of memory on a request of 1024 bytes
Challenges
OOM killer
OOM causes abrupt shutdown of PostgreSQL
$ kubectl get pod tinyforkv01-an-0 -o json
jq '.spec.containers[0].resources'
{
"limits": {
"cpu": "4",
"memory": "1Gi"
},
"requests": {
"cpu": "4",
"memory": "1Gi"
}
}
Challenges
OOM killer
OOM causes abrupt shutdown of PostgreSQL
* Image source: user MesserWoland, https://en.m.wikipedia.org/wiki/File:The_death.svg
*
• Regular PostgreSQL: set memory overcommit, enable swap
• vm.overcommit_memory = 2
• Can’t set it individually per container
• A node typically runs some pods (eg. daemonsets for logging)
incompatible with this setting



https://github.com/kubernetes/kubernetes/issues/90973
Challenges
OOM killer
OOM causes abrupt shutdown of PostgreSQL
• Solution: OOMGuard library collects statistics on the memory
usage, overriding malloc
• Use LD_PRELOAD_LIBRARY to install it for Postgres
processes
• Can just report statics, or actually block allocations going above
the predefined threshold, emulating regular malloc behavior
• OOM_GUARD_LIMIT threshold is derived from the container
memory limit, accounting for shared_buffers and OS overhead.
Challenges
OOM killer
OOM causes abrupt shutdown of PostgreSQL
• PostgreSQL: provide memory allocation hooks to do internal
accounting and deny allocations via extensions.
• Linux/Kubernetes: configure oom_adj_score and
vm_overcommit per cgroup on the Linux/Kubernetes layer.
• Improved debugging experience (locating debug symbols from
the container when running perf or gdb on the host)
Challenges
OOM killer
Wishlist
• New versions of Timescale extension are released regularly
• A new timescale-docker-ha image is built once the extension is
released
• We want to deliver latest extension (but not necessary auto-
upgrade) to our customers immediately
• Changing pod’s docker image requires a pod restart
• Planned customer downtime may only happen during
maintenance window, only a few times a year
Challenges
Fewer downtimes
Extension updates require a pod bounce
Challenges
Fewer downtimes
Extension updates require a pod bounce
• Solution: hot-forge
• A binary inside the container to fetch pre-packaged bundles and
put them in the container
• The bundles are delivered using a postgres connection (COPY
TO PROGRAM)
• The bundles are written to a persistent volume and linked to a
container filesystem
• Mostly adding new data (although can potentially replace/delete
existing files in the container)
Challenges
Fewer downtimes
Extension updates require a pod bounce
• Allow bouncing of individual containers in the pod and changing
the docker image
• Support “mutable” area inside the pod to deliver updates.
Challenges
Fewer downtimes
Wishlist
• Etcd is a core of the Kubernetes cluster
• Consists of multiple nodes (we run 5) - should be resilient?
• Can degrade on master node updates
• Performance issues (EBS burst balance, too many objects)
• Patroni dependency (no Kubernetes API - instances are read-
only)
Challenges
Operating Etcd
Etcd is a 5-nodes single point of failure
Image source: Andy Langager, https://www.flickr.com/photos/andylangager/8631388149
• Solution: no silver bullet
• Many small clusters in each region instead of a single big one
• Etcd performance monitoring
• Fire drills on ephemeral clusters
• Solution: Patroni experimental static_primary mode:
• Enforce single primary by rejecting connections from other
nodes
• Do not demote when Kubernetes API is not available
Challenges
Operating Etcd
Etcd is a 5-nodes single point of failure
• Some operational instructions when Etcd is down
• Better observability inside Etcd
• Patroni “isolated” mode scalable to any number of pods
Challenges
Operating Etcd
Wishlist
• New (1TB+) encrypted EBS volumes show an existing partition
marker (Atari partition)
• Kubernetes refuses to format them
• Pod is stuck at startup
Challenges
AWS bugs
Encrypted EBS volumes
Image source: Bill Bertram, https://commons.wikimedia.org/wiki/File:Atari_1040STf.jpg
• Solution: create a small 1GB encrypted volume
• Snapshot it into a “golden snapshot”
• Create new encrypted volumes from the golden snapshot
• Need to resize the filesystem in the init container (as per
Kubernetes 1.19)
• Recent fix by AWS: https://github.com/kubernetes/kubernetes/
issues/86064
Challenges
AWS bugs
Big encrypted EBS volumes
Challenges
AWS bugs
Big encrypted EBS volumes
$ kubectl get statefulset l0c154j810-an | jq
'.spec.volumeClaimTemplates[0].spec'
{
"accessModes": [
"ReadWriteOnce"
],
"dataSource": {
"apiGroup": "snapshot.storage.k8s.io",
"kind": "VolumeSnapshot",
"name": "golden-snapshot--wsboilqtlr"
},
"resources": {
"requests": {
"storage": "2500Gi"
}
},
"storageClassName": "ebs-sc",
"volumeMode": "Filesystem"
}
Challenges
AWS bugs • Fewer bugs :-)
• Improved support for VolumeSnapshots, e.g. provisioning
volumes across namespaces, resizing a filesystem when
provisioning from a snapshot
Wishlist
• AWS EBS and other PersistentVolume implementations only
allow volume size increments.
• A volume autoscaler (Timescale service) may decide to
increase the volume upon a data ingestion
• When data is subsequently compressed the customer doesn’t
need to pay for a bigger volume
Challenges
Volume resize
Volume size can only be increased, not decreased
• Solution: provide a functionality to fork a service
• A fork is a clone of a service with possibly different CPU and
storage specs
• A fork is implemented by restoring another instance from the
backup of the original one, taken from S3
Challenges
Volume resize
Volume size can only be increased, not decreased
• Native volume downsize
• Kubernetes support, possibly with custom checks from K8s to
determine this is possible.
• Support for volume resizing in a statefulset
Challenges
Volume resize
Wishlist
Thank you!
#AlwaysBeLaunching
Feedback and Questions:
oleksii@timescale.com
Twitter: @hintbits

More Related Content

PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PDF
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PDF
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
PDF
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
PDF
PostgreSQL High Availability in a Containerized World
PDF
Java Performance Analysis on Linux with Flame Graphs
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
PPTX
Integrating Apache Spark and NiFi for Data Lakes
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
PostgreSQL High Availability in a Containerized World
Java Performance Analysis on Linux with Flame Graphs
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Integrating Apache Spark and NiFi for Data Lakes

What's hot (20)

PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PPTX
Service Discovery In Kubernetes
PDF
PostgreSQL HA
PDF
Blazing Performance with Flame Graphs
PPTX
Thrift vs Protocol Buffers vs Avro - Biased Comparison
PDF
Parquet performance tuning: the missing guide
PPTX
Managing your Hadoop Clusters with Apache Ambari
PPTX
How to upgrade like a boss to my sql 8.0?
PDF
PostgreSQL WAL for DBAs
PPTX
Airflow and supervisor
PPTX
Introduction to Apache Spark
PPTX
Anatomy of a data driven architecture - Tamir Dresher
PDF
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
PPTX
Data Engineering Efficiency @ Netflix - Strata 2017
PDF
Get to know PostgreSQL!
PPTX
How to deploy Apache Spark in a multi-tenant, on-premises environment
PDF
CD using ArgoCD(KnolX).pdf
PDF
The Apache Spark File Format Ecosystem
PDF
patroni-based citrus high availability environment deployment
PDF
Monitoring with prometheus
Apache Iceberg - A Table Format for Hige Analytic Datasets
Service Discovery In Kubernetes
PostgreSQL HA
Blazing Performance with Flame Graphs
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Parquet performance tuning: the missing guide
Managing your Hadoop Clusters with Apache Ambari
How to upgrade like a boss to my sql 8.0?
PostgreSQL WAL for DBAs
Airflow and supervisor
Introduction to Apache Spark
Anatomy of a data driven architecture - Tamir Dresher
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Data Engineering Efficiency @ Netflix - Strata 2017
Get to know PostgreSQL!
How to deploy Apache Spark in a multi-tenant, on-premises environment
CD using ArgoCD(KnolX).pdf
The Apache Spark File Format Ecosystem
patroni-based citrus high availability environment deployment
Monitoring with prometheus
Ad

Similar to What we've learned from running a PostgreSQL managed service on Kubernetes (20)

PDF
Postgres on Kubernetes - Dos and Donts.pdf
PDF
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PDF
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PPTX
Kubernetes Immersion
PPTX
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
PDF
Lessons learned and challenges faced while running Kubernetes at Scale
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
From Monolith to Docker Distributed Applications
PDF
Kubernetes at Datadog the very hard way
PDF
Deploying deep learning models with Docker and Kubernetes
PDF
On Prem Container Cloud - Lessons Learned
PPTX
Kubernetes
PDF
Solving k8s persistent workloads using k8s DevOps style
PDF
Kubernetes security
PDF
All Your Containers Are Belong To Us
PPTX
Database as a Service (DBaaS) on Kubernetes
PDF
Cloud Native PostgreSQL - APJ
 
PPTX
A brief study on Kubernetes and its components
PPTX
Docker and kubernetes
PPTX
Containerized Hadoop beyond Kubernetes
Postgres on Kubernetes - Dos and Donts.pdf
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
Kubernetes Immersion
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
Lessons learned and challenges faced while running Kubernetes at Scale
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
From Monolith to Docker Distributed Applications
Kubernetes at Datadog the very hard way
Deploying deep learning models with Docker and Kubernetes
On Prem Container Cloud - Lessons Learned
Kubernetes
Solving k8s persistent workloads using k8s DevOps style
Kubernetes security
All Your Containers Are Belong To Us
Database as a Service (DBaaS) on Kubernetes
Cloud Native PostgreSQL - APJ
 
A brief study on Kubernetes and its components
Docker and kubernetes
Containerized Hadoop beyond Kubernetes
Ad

More from DoKC (20)

PDF
Distributed Vector Databases - What, Why, and How
PDF
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
PDF
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
PDF
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
PDF
The State of Stateful on Kubernetes
PDF
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
PDF
Make Your Kafka Cluster Production-Ready
PDF
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
PDF
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
PDF
The Kubernetes Native Database
PDF
ING Data Services hosted on ICHP DoK Amsterdam 2023
PDF
Implementing data and databases on K8s within the Dutch government
PDF
StatefulSets in K8s - DoK Talks #154
PDF
Analytics with Apache Superset and ClickHouse - DoK Talks #151
PPTX
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
PDF
Evaluating Cloud Native Storage Vendors - DoK Talks #147
PDF
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
PDF
We will Dok You! - The journey to adopt stateful workloads on k8s
PPTX
Mastering MongoDB on Kubernetes, the power of operators
PDF
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Distributed Vector Databases - What, Why, and How
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
The State of Stateful on Kubernetes
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Make Your Kafka Cluster Production-Ready
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
The Kubernetes Native Database
ING Data Services hosted on ICHP DoK Amsterdam 2023
Implementing data and databases on K8s within the Dutch government
StatefulSets in K8s - DoK Talks #154
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
We will Dok You! - The journey to adopt stateful workloads on k8s
Mastering MongoDB on Kubernetes, the power of operators
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
NewMind AI Monthly Chronicles - July 2025
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf

What we've learned from running a PostgreSQL managed service on Kubernetes

  • 1. Running a managed service on Kubernetes and PostgreSQL What we learned at Timescale DoK Day Europe, May 16th, 2022 Oleksii Kliukin
  • 2. TimescaleDB is PostgreSQL, supercharged for time-series data and analytics.
  • 3. TimescaleDB is free and open PostgreSQL extension
  • 4. Timescale Cloud is a hosted and fully managed TimescaleDB running on Kubernetes
  • 8. Timescale cloud DB on Kubernetes / AWS
  • 10. • Timescale continuous/real-time aggregates may require a lot of memory. Out of memory (OOM) when limits are set low is not uncommon. • OOM behavior assumed by PostgreSQL developers:
 
 • Linux OOM killer: SIGKILL a random Postgres process • A backend process is killed: disruption, restart of every connection • A postmaster is killed: unclean shutdown, in extreme cases to startup instance Challenges OOM killer OOM causes abrupt shutdown of PostgreSQL ERROR: out of memory on a request of 1024 bytes
  • 11. Challenges OOM killer OOM causes abrupt shutdown of PostgreSQL $ kubectl get pod tinyforkv01-an-0 -o json jq '.spec.containers[0].resources' { "limits": { "cpu": "4", "memory": "1Gi" }, "requests": { "cpu": "4", "memory": "1Gi" } }
  • 12. Challenges OOM killer OOM causes abrupt shutdown of PostgreSQL * Image source: user MesserWoland, https://en.m.wikipedia.org/wiki/File:The_death.svg *
  • 13. • Regular PostgreSQL: set memory overcommit, enable swap • vm.overcommit_memory = 2 • Can’t set it individually per container • A node typically runs some pods (eg. daemonsets for logging) incompatible with this setting
 
 https://github.com/kubernetes/kubernetes/issues/90973 Challenges OOM killer OOM causes abrupt shutdown of PostgreSQL
  • 14. • Solution: OOMGuard library collects statistics on the memory usage, overriding malloc • Use LD_PRELOAD_LIBRARY to install it for Postgres processes • Can just report statics, or actually block allocations going above the predefined threshold, emulating regular malloc behavior • OOM_GUARD_LIMIT threshold is derived from the container memory limit, accounting for shared_buffers and OS overhead. Challenges OOM killer OOM causes abrupt shutdown of PostgreSQL
  • 15. • PostgreSQL: provide memory allocation hooks to do internal accounting and deny allocations via extensions. • Linux/Kubernetes: configure oom_adj_score and vm_overcommit per cgroup on the Linux/Kubernetes layer. • Improved debugging experience (locating debug symbols from the container when running perf or gdb on the host) Challenges OOM killer Wishlist
  • 16. • New versions of Timescale extension are released regularly • A new timescale-docker-ha image is built once the extension is released • We want to deliver latest extension (but not necessary auto- upgrade) to our customers immediately • Changing pod’s docker image requires a pod restart • Planned customer downtime may only happen during maintenance window, only a few times a year Challenges Fewer downtimes Extension updates require a pod bounce
  • 18. • Solution: hot-forge • A binary inside the container to fetch pre-packaged bundles and put them in the container • The bundles are delivered using a postgres connection (COPY TO PROGRAM) • The bundles are written to a persistent volume and linked to a container filesystem • Mostly adding new data (although can potentially replace/delete existing files in the container) Challenges Fewer downtimes Extension updates require a pod bounce
  • 19. • Allow bouncing of individual containers in the pod and changing the docker image • Support “mutable” area inside the pod to deliver updates. Challenges Fewer downtimes Wishlist
  • 20. • Etcd is a core of the Kubernetes cluster • Consists of multiple nodes (we run 5) - should be resilient? • Can degrade on master node updates • Performance issues (EBS burst balance, too many objects) • Patroni dependency (no Kubernetes API - instances are read- only) Challenges Operating Etcd Etcd is a 5-nodes single point of failure Image source: Andy Langager, https://www.flickr.com/photos/andylangager/8631388149
  • 21. • Solution: no silver bullet • Many small clusters in each region instead of a single big one • Etcd performance monitoring • Fire drills on ephemeral clusters • Solution: Patroni experimental static_primary mode: • Enforce single primary by rejecting connections from other nodes • Do not demote when Kubernetes API is not available Challenges Operating Etcd Etcd is a 5-nodes single point of failure
  • 22. • Some operational instructions when Etcd is down • Better observability inside Etcd • Patroni “isolated” mode scalable to any number of pods Challenges Operating Etcd Wishlist
  • 23. • New (1TB+) encrypted EBS volumes show an existing partition marker (Atari partition) • Kubernetes refuses to format them • Pod is stuck at startup Challenges AWS bugs Encrypted EBS volumes Image source: Bill Bertram, https://commons.wikimedia.org/wiki/File:Atari_1040STf.jpg
  • 24. • Solution: create a small 1GB encrypted volume • Snapshot it into a “golden snapshot” • Create new encrypted volumes from the golden snapshot • Need to resize the filesystem in the init container (as per Kubernetes 1.19) • Recent fix by AWS: https://github.com/kubernetes/kubernetes/ issues/86064 Challenges AWS bugs Big encrypted EBS volumes
  • 25. Challenges AWS bugs Big encrypted EBS volumes $ kubectl get statefulset l0c154j810-an | jq '.spec.volumeClaimTemplates[0].spec' { "accessModes": [ "ReadWriteOnce" ], "dataSource": { "apiGroup": "snapshot.storage.k8s.io", "kind": "VolumeSnapshot", "name": "golden-snapshot--wsboilqtlr" }, "resources": { "requests": { "storage": "2500Gi" } }, "storageClassName": "ebs-sc", "volumeMode": "Filesystem" }
  • 26. Challenges AWS bugs • Fewer bugs :-) • Improved support for VolumeSnapshots, e.g. provisioning volumes across namespaces, resizing a filesystem when provisioning from a snapshot Wishlist
  • 27. • AWS EBS and other PersistentVolume implementations only allow volume size increments. • A volume autoscaler (Timescale service) may decide to increase the volume upon a data ingestion • When data is subsequently compressed the customer doesn’t need to pay for a bigger volume Challenges Volume resize Volume size can only be increased, not decreased
  • 28. • Solution: provide a functionality to fork a service • A fork is a clone of a service with possibly different CPU and storage specs • A fork is implemented by restoring another instance from the backup of the original one, taken from S3 Challenges Volume resize Volume size can only be increased, not decreased
  • 29. • Native volume downsize • Kubernetes support, possibly with custom checks from K8s to determine this is possible. • Support for volume resizing in a statefulset Challenges Volume resize Wishlist