Public Project Challenge: Parkrun meets Data Analytics

Public Project Challenge: Parkrun meets Data Analytics

If you haven't heard of Parkrun...buckle in.

This is my personal second love, closely behind data recruitment 😉

(It also happens to involve a lot of data and an incredible ETL process)

What the heck is Parkrun?

Parkrun is a global pseudo-organised 5 kilometre run that occurs every Saturday morning at multiple location. Just in Perth, there are over 20 locations in the central city area, and more down south...even in Albany! All these run concurrently on a Saturday morning.

Parkrun doesn't just offer the chance to run but has something equally good: a database with 36 million timed runs. (or walks), anonymised demographic data and more.

Parkrun events across Perth, green ticks are ones I've personally completed.

The Parkrun Model is pretty simple (as described in this article)

  1. You register online and get your unique Parkrun barcode.

  2. You show up at your local Parkrun, with your barcode in hand.

  3. You run or walk the 5 km course.

  4. Upon completion a volunteer timer clicks a stopwatch to indicate a finished runner.

  5. Another volunteer hands the runner a finishing token with a position number on it.

  6. Volunteers then scan the runner barcode with the finishing barcode.

  7. Behind the scenes, this links your ID to a timestamp and position.

  8. Data is uploaded, cleaned, joined and distributed globally within hours.

  9. You get your result via email - with stats like personal bests, age-grade performance, and lifetime Parkrun count.

It's a simple process...think of this as the "business logic".

The system collates thousands of running datapoints globally, every single Saturday and stores them online -> https://www.parkrun.com.au/applecross/results/latestresults/

I think about the data flows related to Parkrun a lot and here is a data model, possibly incorrectly, I've dreamt up...Parkrun Data Model

Attempted Parkrun data model (possibly not correct!)

The Challenge

If you’re a data analyst, engineer or even just a data nerd who’s laced up for a Parkrun before…here's your mission:

Rebuild or Reimagine the Parkrun Data Model

Can you reverse-engineer the backend of a global event that...

  • Serves millions of users

  • Runs on volunteer infrastructure

  • Delivers real-time-ish feedback

  • And processes a quarter-billion finish tokens a year?

Your goal:

Design a data warehouse or analytics-ready schema for Parkrun. You can:

  • Build a star schema for analytics

  • Sketch out an event-driven pipeline

  • Simulate the ETL flow from barcode scan to published result

  • Bonus points: Build a BI dashboard for Parkrun HQ

  • Create a project portfolio that helps both runners and parkrun technical team to understand their data.

To help, here’s what we know:

  • ~36 million records in the DB

  • Weekly data ingested globally

  • Standardised structure + format

  • Publicly viewable endpoints (like this: )

  • Event-level, participant-level and volunteer-level data

  • Will require webscraping, public API not available yet


Submit your attempt:

Create a Github repo to share your schema, sketch, ERD, warehouse design or dashboard.

Shoot me through an email with the subject "Parkrun Project Challenge..." and I'll review the top ones to repost and share the winner end of July.

e: douglas@analyticsrecruitment.com.au

An example of a great project readme is here if you're looking for inspiration.

Extra brownie points if you build it while wearing your barcode and a pair of Asics.


I'm creating this informal challenge because:

  1. To be of inspiration to budding data professionals if you're looking for a project

  2. I love parkrun and want to see more awareness for it and that can start with data

  3. Long-term vision to support Parkrun myself one day on the data awareness side

Looking forward to seeing your dashboards!

Jane Mamas

Principal BI Developer | Data Visualisation Consultant | Certified Tableau Specialist | Power BI | Public Health Data | Pharmacist

1mo

Keen to team up if anyone is looking to build a dashboard on top of a data engineering solution :) Great idea, Doug!

Tas Mitaros

Financial Data Analyst at Duratec

1mo

Time to get a candidate on the inside and validate that data model 😉

Dion O.

Data Systems Analyst | Business Intelligence Insights through Process Automation | Transforming One Table at a Time

1mo

Which panda should I use because I haven't been able to scrape the data using python? Do we access to the database? I have tried doing something similar for my own before and haven't found the time to finish it yet

Like
Reply
David Martin F FIN

Hands-On Technology Executive | FINSIA Fellow | Ultra Runner

1mo

Great idea! Yours and Parkrun. Parkrun can be walked as I used to with my kids. Great way of meeting people in your local community and a great idea around mental health 'act belong commit'. Just turn up and walk. If your lucky your local park run has a coffee van at the end of the route :)

To view or add a comment, sign in

Others also viewed

Explore topics