Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]

Managing Geospatial
Open Data Serverlessly:
paddelbuch.ch
Chris Bingham - CTO, AWS Ambassador
cloudypandas.ch

Slides, Blogs, and Code Available Now!

About Me
• I’ve worked in IT for over 20 years, 12 in the cloud
• I specialise in AWS and am an AWS Ambassador
• I paddle a bright yellow sea kayak!
• I’m presently CTO at Fujitsu Switzerland
• No, Fujitsu does not make cameras…
or washing machines… 😉

Key technologies
Vertical areas
4 cross-industry areas that
solve societal issues
Horizontal areas
3 technical areas that
support cross-industry
7 Key Focus Areas
Achieving a Sustainable World
Provide digital services originating from cross-industry societal
issues, which contribute to customer growth.
Computing Network AI Data &
Security
Converging
Technologies
Sustainable
Manufacturing
Consumer
Experience
Healthy
Living
Trusted
Society
Digital Shifts Business
Applications
Hybrid IT

In The Beginning…
Image source: https://xkcd.com/927/

What is Paddel Buch?
• tl;dr – Rivermap.org, but for non-whitewater paddle sports
• Aims
• To make available for free the geospatial data needed to plan safe and legal
trips on lakes and river sections up to WW2
• To make all that data freely and programaticaly accessible for reuse,
including commercial
• To provide a reliable central store of this information for all stakeholders
• So it’s a database + website + API

Immeditate Fail!
• Domains cost money!!!
• USD 13/year for .ch via
AWS Route 53
Image source: https://knowyourmeme.com/photos/918810-funny-error-messages

Slightly More Realistic Goals
0
Ops
1
CHF/m
(approx.)

Version 1.0
• Built with Gatsby Cloud, GraphCMS, and Mapbox
• Why?... Because I found a tutorial!
• First successful build occurred at a campsite on the Aare, during a
kayak trip
• Established the visual design and layout that persists today
• Initial data ingestion via Databricks
• GraphQL API provided by GraphCMS

Actuals
0
Ops
1
CHF/m
(approx.)

Ah Fudgesicles, I Gotta Migrate!
• Gatsby Cloud’s free tier changed; Paddel Buch was too big!
• This meant lossing both hosting and the CI/CD build environment
• Naturally I looked to AWS for a zero-ops, zero-cost replacement 😉
• Amplify provides a nice alternative
• But some code changes were needed…

A Simple Little Code Chage
gatsby-config.js
Resources:
PaddelBuchApp:
Type: AWS::Amplify::App
Properties:
…
CustomHeaders: |-
customHeaders:
- pattern: ’*’
headers:
- key: X-Frame-Options
value: DENY
- key: X-XSS-Protection
value: 1; mode=block
- key: X-Content-Type-Options
value: nosniff
...
frontend-deploy.yaml
{
resolve: `gatsby-plugin-gatsby-cloud`,
options: {
mergeSecurityHeaders: false,
mergeCachingHeaders: true,
mergeLinkHeaders: true,
allPageHeaders: [
"X-Frame-Options: DENY",
"X-XSS-Protection: 1; mode=block",
"X-Content-Type-Options: nosniff",
...
]
}
}

Actuals
0
Ops
2
CHF/m
(approx.)
(approx.)

Now About Those Licenses…
• Open data needs clear licensing
• Not so fast! All that OSM data came with ODbL attached
• Fortunately, ODbL allows specific items to have diﬀerent licenses attached
• So we can license our own data using Creative Commons for CC-BY-SA-4.0
for ease
• So data lineage tracking is suddenly super improtant
• So is providing an API for data access
• Therefore; the schema needs to be changed!
• Side note: thanks to the Rivermap.org team for their guidance here

So, It Happened Again…
• Hygraph (formerly GraphCMS) changed:
• They stopped dev on the “open source” Gatsby plugin = instant
abandonware
• The free tier shrank; Paddel Buch was too big!
• So a second migration is needed… and this time, it’s data!
• Selected Contentful as the new CMS
• Contentful and Hygraph have diﬀerent rich text formats
• So data transformation and schema evolution are needed too
• What’s the zero-ops, zero-cost way to migrate and transform an
entire CMS?

Which Tools to Use
• Naturally, we’re going AWS and
serverless 😁
• Main structure:
• Python Glue job to migrate the data
• Node.js Lambda function to
convert the rich text formats
• SSM Parameter Store for, well,
parameterisation!
• Step Functions to orchestrate and
parallelise it all

The Migration Process
• Start with dimension tables
• The ones everything else
references
• Cycle through the fact
tables based on cross
references
• Most referenced one first
• Parallelise as much as
possible

Outcome
40-50
Minutes to Migrate
0.10
USD

Post Post Migration Changes
• Frontend rewrite
• No UI/UX changes
• Just a complete replumbing of the data connections
• We lost the API!
• That’s a open data license problem
• My backups for Contentful are super hacky!
• Wonky webhooks to dump individual items to S3, triggered on write

Actuals
0
Ops
2
CHF/m
(approx.)
(very approx.)

The Big Push
• All the value is in the data… we need more data!
• So it’s time for field work!
• 2-ish months of travel around Switzerland
• Over 10’000km travelled
• Every major lake (almost) and many smaller ones surveyed
• Approx. 3x increase in spots 🎉
• Spots are the core data type; the most valuable and relevant to paddlers
• Lots of additional navigational data gathered

Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]

Replacing the Lost API
• Legal requirement inherited from OSM via the ODbL
• So what’s the zero-ops, zero-cost way to provide an API?
• Static APIs!
• An “API” without all those inconvinent active, server-side bits
• Basically just flat files for download…
• … but in nice, programatically consumable formats
• Fortunately, there’s a Gatsby Plugin for that

Static API Code
gatsby-config.js
const api_spottypes_de = require("./src/api/spottypes_de");
...
{
resolve: "@binghamchris/gatsby-plugin-json-pages",
options: {
pages: [
api_spots_de,
...
]
}
}
...
const api_spots_de = {
fileName: "api/spots-de",
query: `
query {
allContentfulSpot(
limit: 999,
filter: {rejected: {ne: true},
node_locale: {eq: "de"}},
sort: {slug: ASC}
){
nodes {
slug
...
}
}`,
transformer: ({
data: {
allContentfulSpot: { nodes },
},
}) => nodes,
}
module.exports = api_spots_de;
spottypes_de.js
Import

Fixing the Dodgy Backup
• What’s the zero-ops, zero-cost way to backup Contentful?
• EDA!
• Contentful’s backup tool is an npm package – sounds like a job for Lambda
• Changes in the data trigger Amplify builds via webhooks
• Amplify emits messages via SNS on build success
• But not all Amplify builds are triggered by data changes
• Fortunately, one of our static API endpoints provides the timestamp for the
last data change

Cost Optimising an EDA System
• Create space for cost optimisation by having 2 stages:
1. Determine if the data has changed
2. If yes, perform a backup
• This lets us cost optimise our Lambdas
• Keep #1 small to minimise cost for triggers than don’t result in a backup
• Tune the size of #2 to balance runtime and cost per millisecond
• Backup is quite WORN-ish, so S3 storage classes are important
• Initial storage, probably rarely read = S3 Standard IA
• Longer term storage, probably never read = S3 Glacier Deep Archive

Event Driven Backup Architecture

Current Actuals
0
Ops
2
CHF/m
(approx.)
(very approx.)

Open Source Peril!
• Paddel Buch would not be possible without OSS 🙏
• But, build failures stalk(ed) the project from:
• Packages being abandonded
• Forced a switch of CMS and a full blown data migration
• Maintainers perhaps not maintaining
• Had to fork and modify to keep the project running
• npm packages just disappearing entirely without warning
• Had to republish (with attributation) to keep the project running
• Supply chain management is crucial when working with OSS

Thanks for your time!
Questions, perhaps Answers
Chris Bingham
cloudypandas.ch

Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]

More Related Content

Similar to Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025] (20)

Recently uploaded (20)

Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]