Managing Geospatial
Open Data Serverlessly:
paddelbuch.ch
Chris Bingham - CTO, AWS Ambassador
cloudypandas.ch
Slides, Blogs, and Code Available Now!
About Me
• I’ve worked in IT for over 20 years, 12 in the cloud
• I specialise in AWS and am an AWS Ambassador
• I paddle a bright yellow sea kayak!
• I’m presently CTO at Fujitsu Switzerland
• No, Fujitsu does not make cameras…
or washing machines… 😉
Key technologies
Vertical areas
4 cross-industry areas that
solve societal issues
Horizontal areas
3 technical areas that
support cross-industry
7 Key Focus Areas
Achieving a Sustainable World
Provide digital services originating from cross-industry societal
issues, which contribute to customer growth.
Computing Network AI Data &
Security
Converging
Technologies
Sustainable
Manufacturing
Consumer
Experience
Healthy
Living
Trusted
Society
Digital Shifts Business
Applications
Hybrid IT
In The Beginning…
Image source: https://xkcd.com/927/
What is Paddel Buch?
• tl;dr – Rivermap.org, but for non-whitewater paddle sports
• Aims
• To make available for free the geospatial data needed to plan safe and legal
trips on lakes and river sections up to WW2
• To make all that data freely and programaticaly accessible for reuse,
including commercial
• To provide a reliable central store of this information for all stakeholders
• So it’s a database + website + API
Hallo Paddel Buch!
Goals
0
Ops
0
CHF/m
Immeditate Fail!
• Domains cost money!!!
• USD 13/year for .ch via
AWS Route 53
Image source: https://knowyourmeme.com/photos/918810-funny-error-messages
Slightly More Realistic Goals
0
Ops
1
CHF/m
(approx.)
Version 1.0
• Built with Gatsby Cloud, GraphCMS, and Mapbox
• Why?... Because I found a tutorial!
• First successful build occurred at a campsite on the Aare, during a
kayak trip
• Established the visual design and layout that persists today
• Initial data ingestion via Databricks
• GraphQL API provided by GraphCMS
Version 1.0 Architecture
Actuals
0
Ops
1
CHF/m
(approx.)
Ah Fudgesicles, I Gotta Migrate!
• Gatsby Cloud’s free tier changed; Paddel Buch was too big!
• This meant lossing both hosting and the CI/CD build environment
• Naturally I looked to AWS for a zero-ops, zero-cost replacement 😉
• Amplify provides a nice alternative
• But some code changes were needed…
A Simple Little Code Chage
gatsby-config.js
Resources:
PaddelBuchApp:
Type: AWS::Amplify::App
Properties:
…
CustomHeaders: |-
customHeaders:
- pattern: ’*’
headers:
- key: X-Frame-Options
value: DENY
- key: X-XSS-Protection
value: 1; mode=block
- key: X-Content-Type-Options
value: nosniff
...
frontend-deploy.yaml
{
resolve: `gatsby-plugin-gatsby-cloud`,
options: {
mergeSecurityHeaders: false,
mergeCachingHeaders: true,
mergeLinkHeaders: true,
allPageHeaders: [
"X-Frame-Options: DENY",
"X-XSS-Protection: 1; mode=block",
"X-Content-Type-Options: nosniff",
...
]
}
}
Post Migration Architecture
Actuals
0
Ops
2
CHF/m
(approx.)
(approx.)
Now About Those Licenses…
• Open data needs clear licensing
• Not so fast! All that OSM data came with ODbL attached
• Fortunately, ODbL allows specific items to have different licenses attached
• So we can license our own data using Creative Commons for CC-BY-SA-4.0
for ease
• So data lineage tracking is suddenly super improtant
• So is providing an API for data access
• Therefore; the schema needs to be changed!
• Side note: thanks to the Rivermap.org team for their guidance here
Data Schema is Evolving!
So, It Happened Again…
• Hygraph (formerly GraphCMS) changed:
• They stopped dev on the “open source” Gatsby plugin = instant
abandonware
• The free tier shrank; Paddel Buch was too big!
• So a second migration is needed… and this time, it’s data!
• Selected Contentful as the new CMS
• Contentful and Hygraph have different rich text formats
• So data transformation and schema evolution are needed too
• What’s the zero-ops, zero-cost way to migrate and transform an
entire CMS?
Which Tools to Use
• Naturally, we’re going AWS and
serverless 😁
• Main structure:
• Python Glue job to migrate the data
• Node.js Lambda function to
convert the rich text formats
• SSM Parameter Store for, well,
parameterisation!
• Step Functions to orchestrate and
parallelise it all
The Migration Process
• Start with dimension tables
• The ones everything else
references
• Cycle through the fact
tables based on cross
references
• Most referenced one first
• Parallelise as much as
possible
Outcome
40-50
Minutes to Migrate
0.10
USD
Post Post Migration Changes
• Frontend rewrite
• No UI/UX changes
• Just a complete replumbing of the data connections
• We lost the API!
• That’s a open data license problem
• My backups for Contentful are super hacky!
• Wonky webhooks to dump individual items to S3, triggered on write
Actuals
0
Ops
2
CHF/m
(approx.)
(very approx.)
The Big Push
• All the value is in the data… we need more data!
• So it’s time for field work!
• 2-ish months of travel around Switzerland
• Over 10’000km travelled
• Every major lake (almost) and many smaller ones surveyed
• Approx. 3x increase in spots 🎉
• Spots are the core data type; the most valuable and relevant to paddlers
• Lots of additional navigational data gathered
Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]
Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]
Replacing the Lost API
• Legal requirement inherited from OSM via the ODbL
• So what’s the zero-ops, zero-cost way to provide an API?
• Static APIs!
• An “API” without all those inconvinent active, server-side bits
• Basically just flat files for download…
• … but in nice, programatically consumable formats
• Fortunately, there’s a Gatsby Plugin for that
Static API Code
gatsby-config.js
const api_spottypes_de = require("./src/api/spottypes_de");
...
{
resolve: "@binghamchris/gatsby-plugin-json-pages",
options: {
pages: [
api_spots_de,
...
]
}
}
...
const api_spots_de = {
fileName: "api/spots-de",
query: `
query {
allContentfulSpot(
limit: 999,
filter: {rejected: {ne: true},
node_locale: {eq: "de"}},
sort: {slug: ASC}
){
nodes {
slug
...
}
}`,
transformer: ({
data: {
allContentfulSpot: { nodes },
},
}) => nodes,
}
module.exports = api_spots_de;
spottypes_de.js
Import
Adding a Nice “API” Page
Fixing the Dodgy Backup
• What’s the zero-ops, zero-cost way to backup Contentful?
• EDA!
• Contentful’s backup tool is an npm package – sounds like a job for Lambda
• Changes in the data trigger Amplify builds via webhooks
• Amplify emits messages via SNS on build success
• But not all Amplify builds are triggered by data changes
• Fortunately, one of our static API endpoints provides the timestamp for the
last data change
Cost Optimising an EDA System
• Create space for cost optimisation by having 2 stages:
1. Determine if the data has changed
2. If yes, perform a backup
• This lets us cost optimise our Lambdas
• Keep #1 small to minimise cost for triggers than don’t result in a backup
• Tune the size of #2 to balance runtime and cost per millisecond
• Backup is quite WORN-ish, so S3 storage classes are important
• Initial storage, probably rarely read = S3 Standard IA
• Longer term storage, probably never read = S3 Glacier Deep Archive
Event Driven Backup Architecture
Current Actuals
0
Ops
2
CHF/m
(approx.)
(very approx.)
Open Source Peril!
• Paddel Buch would not be possible without OSS 🙏
• But, build failures stalk(ed) the project from:
• Packages being abandonded
• Forced a switch of CMS and a full blown data migration
• Maintainers perhaps not maintaining
• Had to fork and modify to keep the project running
• npm packages just disappearing entirely without warning
• Had to republish (with attributation) to keep the project running
• Supply chain management is crucial when working with OSS
Thanks for your time!
Questions, perhaps Answers
Chris Bingham
cloudypandas.ch

More Related Content

PDF
Managing Geospatial Open Data Serverlessly: paddelbuch.ch [Cloud Native Compu...
PDF
Managing Geospatial Open Data Serverlessly [AWS Community Day CH 2025]
PPTX
Serverless at Lifestage
PDF
Windows Azure introduction
PPTX
Mapping Life Science Informatics to the Cloud
PPTX
Reducing latency on the web with the Azure CDN- TechDays NL 2014
PDF
Serverless Compose vs hurtownia danych
PDF
Building real time data-driven products
Managing Geospatial Open Data Serverlessly: paddelbuch.ch [Cloud Native Compu...
Managing Geospatial Open Data Serverlessly [AWS Community Day CH 2025]
Serverless at Lifestage
Windows Azure introduction
Mapping Life Science Informatics to the Cloud
Reducing latency on the web with the Azure CDN- TechDays NL 2014
Serverless Compose vs hurtownia danych
Building real time data-driven products

Similar to Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025] (20)

PDF
Elastic Data Analytics Platform @Datadog
PPTX
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
PPTX
Machine Learning for Smarter Apps - Jacksonville Meetup
PDF
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
PDF
Extending Hadoop for Fun & Profit
PPTX
Gcp dataflow
PDF
AWS Well Architected-Info Session WeCloudData
PDF
The Next Big Thing: Serverless
PPTX
Azure Messaging Crossroads
PPTX
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
PDF
Data Science in the Cloud @StitchFix
PDF
From a student to an apache committer practice of apache io tdb
PPTX
How Glidewell Moves Data to Amazon Redshift
PPT
Cloud computing
PDF
Capacity Planning for fun & profit
PDF
The Wix Microservice Stack
PPT
Cloud Computing
PPTX
Scality S3 Server: Node js Meetup Presentation
PDF
exoscale at the CloudStack User Group London - June 26th 2014
PDF
Hpc lunch and learn
Elastic Data Analytics Platform @Datadog
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Machine Learning for Smarter Apps - Jacksonville Meetup
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Extending Hadoop for Fun & Profit
Gcp dataflow
AWS Well Architected-Info Session WeCloudData
The Next Big Thing: Serverless
Azure Messaging Crossroads
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Data Science in the Cloud @StitchFix
From a student to an apache committer practice of apache io tdb
How Glidewell Moves Data to Amazon Redshift
Cloud computing
Capacity Planning for fun & profit
The Wix Microservice Stack
Cloud Computing
Scality S3 Server: Node js Meetup Presentation
exoscale at the CloudStack User Group London - June 26th 2014
Hpc lunch and learn
Ad

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
DOCX
search engine optimization ppt fir known well about this
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
August Patch Tuesday
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Architecture types and enterprise applications.pdf
PPT
Geologic Time for studying geology for geologist
PDF
Five Habits of High-Impact Board Members
PDF
Getting started with AI Agents and Multi-Agent Systems
PPT
What is a Computer? Input Devices /output devices
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
The various Industrial Revolutions .pptx
A comparative study of natural language inference in Swahili using monolingua...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
sustainability-14-14877-v2.pddhzftheheeeee
search engine optimization ppt fir known well about this
1 - Historical Antecedents, Social Consideration.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
observCloud-Native Containerability and monitoring.pptx
August Patch Tuesday
Developing a website for English-speaking practice to English as a foreign la...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A review of recent deep learning applications in wood surface defect identifi...
Architecture types and enterprise applications.pdf
Geologic Time for studying geology for geologist
Five Habits of High-Impact Board Members
Getting started with AI Agents and Multi-Agent Systems
What is a Computer? Input Devices /output devices
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
DP Operators-handbook-extract for the Mautical Institute
A novel scalable deep ensemble learning framework for big data classification...
The various Industrial Revolutions .pptx
Ad

Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]

  • 1. Managing Geospatial Open Data Serverlessly: paddelbuch.ch Chris Bingham - CTO, AWS Ambassador cloudypandas.ch
  • 2. Slides, Blogs, and Code Available Now!
  • 3. About Me • I’ve worked in IT for over 20 years, 12 in the cloud • I specialise in AWS and am an AWS Ambassador • I paddle a bright yellow sea kayak! • I’m presently CTO at Fujitsu Switzerland • No, Fujitsu does not make cameras… or washing machines… 😉
  • 4. Key technologies Vertical areas 4 cross-industry areas that solve societal issues Horizontal areas 3 technical areas that support cross-industry 7 Key Focus Areas Achieving a Sustainable World Provide digital services originating from cross-industry societal issues, which contribute to customer growth. Computing Network AI Data & Security Converging Technologies Sustainable Manufacturing Consumer Experience Healthy Living Trusted Society Digital Shifts Business Applications Hybrid IT
  • 5. In The Beginning… Image source: https://xkcd.com/927/
  • 6. What is Paddel Buch? • tl;dr – Rivermap.org, but for non-whitewater paddle sports • Aims • To make available for free the geospatial data needed to plan safe and legal trips on lakes and river sections up to WW2 • To make all that data freely and programaticaly accessible for reuse, including commercial • To provide a reliable central store of this information for all stakeholders • So it’s a database + website + API
  • 9. Immeditate Fail! • Domains cost money!!! • USD 13/year for .ch via AWS Route 53 Image source: https://knowyourmeme.com/photos/918810-funny-error-messages
  • 10. Slightly More Realistic Goals 0 Ops 1 CHF/m (approx.)
  • 11. Version 1.0 • Built with Gatsby Cloud, GraphCMS, and Mapbox • Why?... Because I found a tutorial! • First successful build occurred at a campsite on the Aare, during a kayak trip • Established the visual design and layout that persists today • Initial data ingestion via Databricks • GraphQL API provided by GraphCMS
  • 14. Ah Fudgesicles, I Gotta Migrate! • Gatsby Cloud’s free tier changed; Paddel Buch was too big! • This meant lossing both hosting and the CI/CD build environment • Naturally I looked to AWS for a zero-ops, zero-cost replacement 😉 • Amplify provides a nice alternative • But some code changes were needed…
  • 15. A Simple Little Code Chage gatsby-config.js Resources: PaddelBuchApp: Type: AWS::Amplify::App Properties: … CustomHeaders: |- customHeaders: - pattern: ’*’ headers: - key: X-Frame-Options value: DENY - key: X-XSS-Protection value: 1; mode=block - key: X-Content-Type-Options value: nosniff ... frontend-deploy.yaml { resolve: `gatsby-plugin-gatsby-cloud`, options: { mergeSecurityHeaders: false, mergeCachingHeaders: true, mergeLinkHeaders: true, allPageHeaders: [ "X-Frame-Options: DENY", "X-XSS-Protection: 1; mode=block", "X-Content-Type-Options: nosniff", ... ] } }
  • 18. Now About Those Licenses… • Open data needs clear licensing • Not so fast! All that OSM data came with ODbL attached • Fortunately, ODbL allows specific items to have different licenses attached • So we can license our own data using Creative Commons for CC-BY-SA-4.0 for ease • So data lineage tracking is suddenly super improtant • So is providing an API for data access • Therefore; the schema needs to be changed! • Side note: thanks to the Rivermap.org team for their guidance here
  • 19. Data Schema is Evolving!
  • 20. So, It Happened Again… • Hygraph (formerly GraphCMS) changed: • They stopped dev on the “open source” Gatsby plugin = instant abandonware • The free tier shrank; Paddel Buch was too big! • So a second migration is needed… and this time, it’s data! • Selected Contentful as the new CMS • Contentful and Hygraph have different rich text formats • So data transformation and schema evolution are needed too • What’s the zero-ops, zero-cost way to migrate and transform an entire CMS?
  • 21. Which Tools to Use • Naturally, we’re going AWS and serverless 😁 • Main structure: • Python Glue job to migrate the data • Node.js Lambda function to convert the rich text formats • SSM Parameter Store for, well, parameterisation! • Step Functions to orchestrate and parallelise it all
  • 22. The Migration Process • Start with dimension tables • The ones everything else references • Cycle through the fact tables based on cross references • Most referenced one first • Parallelise as much as possible
  • 24. Post Post Migration Changes • Frontend rewrite • No UI/UX changes • Just a complete replumbing of the data connections • We lost the API! • That’s a open data license problem • My backups for Contentful are super hacky! • Wonky webhooks to dump individual items to S3, triggered on write
  • 26. The Big Push • All the value is in the data… we need more data! • So it’s time for field work! • 2-ish months of travel around Switzerland • Over 10’000km travelled • Every major lake (almost) and many smaller ones surveyed • Approx. 3x increase in spots 🎉 • Spots are the core data type; the most valuable and relevant to paddlers • Lots of additional navigational data gathered
  • 29. Replacing the Lost API • Legal requirement inherited from OSM via the ODbL • So what’s the zero-ops, zero-cost way to provide an API? • Static APIs! • An “API” without all those inconvinent active, server-side bits • Basically just flat files for download… • … but in nice, programatically consumable formats • Fortunately, there’s a Gatsby Plugin for that
  • 30. Static API Code gatsby-config.js const api_spottypes_de = require("./src/api/spottypes_de"); ... { resolve: "@binghamchris/gatsby-plugin-json-pages", options: { pages: [ api_spots_de, ... ] } } ... const api_spots_de = { fileName: "api/spots-de", query: ` query { allContentfulSpot( limit: 999, filter: {rejected: {ne: true}, node_locale: {eq: "de"}}, sort: {slug: ASC} ){ nodes { slug ... } }`, transformer: ({ data: { allContentfulSpot: { nodes }, }, }) => nodes, } module.exports = api_spots_de; spottypes_de.js Import
  • 31. Adding a Nice “API” Page
  • 32. Fixing the Dodgy Backup • What’s the zero-ops, zero-cost way to backup Contentful? • EDA! • Contentful’s backup tool is an npm package – sounds like a job for Lambda • Changes in the data trigger Amplify builds via webhooks • Amplify emits messages via SNS on build success • But not all Amplify builds are triggered by data changes • Fortunately, one of our static API endpoints provides the timestamp for the last data change
  • 33. Cost Optimising an EDA System • Create space for cost optimisation by having 2 stages: 1. Determine if the data has changed 2. If yes, perform a backup • This lets us cost optimise our Lambdas • Keep #1 small to minimise cost for triggers than don’t result in a backup • Tune the size of #2 to balance runtime and cost per millisecond • Backup is quite WORN-ish, so S3 storage classes are important • Initial storage, probably rarely read = S3 Standard IA • Longer term storage, probably never read = S3 Glacier Deep Archive
  • 34. Event Driven Backup Architecture
  • 36. Open Source Peril! • Paddel Buch would not be possible without OSS 🙏 • But, build failures stalk(ed) the project from: • Packages being abandonded • Forced a switch of CMS and a full blown data migration • Maintainers perhaps not maintaining • Had to fork and modify to keep the project running • npm packages just disappearing entirely without warning • Had to republish (with attributation) to keep the project running • Supply chain management is crucial when working with OSS
  • 37. Thanks for your time! Questions, perhaps Answers Chris Bingham cloudypandas.ch