Public Project Challenge: Parkrun meets Data Analytics
If you haven't heard of Parkrun...buckle in.
This is my personal second love, closely behind data recruitment 😉
(It also happens to involve a lot of data and an incredible ETL process)
What the heck is Parkrun?
Parkrun is a global pseudo-organised 5 kilometre run that occurs every Saturday morning at multiple location. Just in Perth, there are over 20 locations in the central city area, and more down south...even in Albany! All these run concurrently on a Saturday morning.
Parkrun doesn't just offer the chance to run but has something equally good: a database with 36 million timed runs. (or walks), anonymised demographic data and more.
The Parkrun Model is pretty simple (as described in this article)
You register online and get your unique Parkrun barcode.
You show up at your local Parkrun, with your barcode in hand.
You run or walk the 5 km course.
Upon completion a volunteer timer clicks a stopwatch to indicate a finished runner.
Another volunteer hands the runner a finishing token with a position number on it.
Volunteers then scan the runner barcode with the finishing barcode.
Behind the scenes, this links your ID to a timestamp and position.
Data is uploaded, cleaned, joined and distributed globally within hours.
You get your result via email - with stats like personal bests, age-grade performance, and lifetime Parkrun count.
It's a simple process...think of this as the "business logic".
The system collates thousands of running datapoints globally, every single Saturday and stores them online -> https://www.parkrun.com.au/applecross/results/latestresults/
I think about the data flows related to Parkrun a lot and here is a data model, possibly incorrectly, I've dreamt up...Parkrun Data Model
The Challenge
If you’re a data analyst, engineer or even just a data nerd who’s laced up for a Parkrun before…here's your mission:
Rebuild or Reimagine the Parkrun Data Model
Can you reverse-engineer the backend of a global event that...
Serves millions of users
Runs on volunteer infrastructure
Delivers real-time-ish feedback
And processes a quarter-billion finish tokens a year?
Your goal:
Design a data warehouse or analytics-ready schema for Parkrun. You can:
Build a star schema for analytics
Sketch out an event-driven pipeline
Simulate the ETL flow from barcode scan to published result
Bonus points: Build a BI dashboard for Parkrun HQ
Create a project portfolio that helps both runners and parkrun technical team to understand their data.
To help, here’s what we know:
~36 million records in the DB
Weekly data ingested globally
Standardised structure + format
Publicly viewable endpoints (like this: )
Event-level, participant-level and volunteer-level data
Will require webscraping, public API not available yet
Submit your attempt:
Create a Github repo to share your schema, sketch, ERD, warehouse design or dashboard.
Shoot me through an email with the subject "Parkrun Project Challenge..." and I'll review the top ones to repost and share the winner end of July.
e: douglas@analyticsrecruitment.com.au
An example of a great project readme is here if you're looking for inspiration.
Extra brownie points if you build it while wearing your barcode and a pair of Asics.
I'm creating this informal challenge because:
To be of inspiration to budding data professionals if you're looking for a project
I love parkrun and want to see more awareness for it and that can start with data
Long-term vision to support Parkrun myself one day on the data awareness side
Looking forward to seeing your dashboards!
Principal BI Developer | Data Visualisation Consultant | Certified Tableau Specialist | Power BI | Public Health Data | Pharmacist
1moKeen to team up if anyone is looking to build a dashboard on top of a data engineering solution :) Great idea, Doug!
Financial Data Analyst at Duratec
1moTime to get a candidate on the inside and validate that data model 😉
Data Systems Analyst | Business Intelligence Insights through Process Automation | Transforming One Table at a Time
1moWhich panda should I use because I haven't been able to scrape the data using python? Do we access to the database? I have tried doing something similar for my own before and haven't found the time to finish it yet
Hands-On Technology Executive | FINSIA Fellow | Ultra Runner
1moGreat idea! Yours and Parkrun. Parkrun can be walked as I used to with my kids. Great way of meeting people in your local community and a great idea around mental health 'act belong commit'. Just turn up and walk. If your lucky your local park run has a coffee van at the end of the route :)