Case Study Walkthrough: Turning Unstructured Survey Data into Community Health Insights

Jay Burgess, MScDS, M.Ed, MBA

Chief Revenue Architect | Architecting Predictable, Profitable Revenue Systems | Closing Revenue Gaps with AI | Building Revenue OS Systems | Ex-Walmart, Adobe | Margin & Value Optimization

Published Apr 22, 2025

Project: Community Health Insights Dashboard Stack: Python (spaCy, Scikit-learn), Azure ML, Power BI Client: Public sector agency (anonymized for confidentiality) Goal: Convert open-ended resident survey data + public feedback into actionable insights for policy decision-making.

Step 1: Collecting the Data

Sources:

Open-text responses from health surveys
311 community complaints
Notes from social workers and service case files

Challenge: These were all unstructured and filled with typos, slang, and overlapping topics (e.g., “rats,” “mold,” and “food storage” all related to housing conditions).

Step 2: Preprocessing with Python

To clean and prepare the data, I built a pipeline using:

import spacy from sklearn.feature_extraction.text import TfidfVectorizer nlp = spacy.load("en_core_web_sm") def clean_text(doc): doc = nlp(doc.lower()) tokens = [token.lemma_ for token in doc if not token.is_stop and token.is_alpha] return " ".join(tokens)

Removed noise (stop words, punctuation, numbers)
Lemmatized words to unify variations (e.g., “sleeping” → “sleep”)
Prepared for vectorization and classification

Step 3: Classifying Themes with NLP

Used custom multi-label classification with Scikit-learn to tag themes like:

Housing Insecurity
Mental Health
Food Access
Safety & Policing
Employment Barriers

Model training included:

from sklearn.pipeline

import Pipeline from sklearn.linear_model

import Logistic Regression from sklearn.multiclass

import OneVsRestClassifier

pipeline = Pipeline([ ('tfidf', TfidfVectorizer(max_df=0.8, min_df=5)), ('clf', OneVsRestClassifier(LogisticRegression())) ])

Evaluation metrics:

F1 Score: 0.78 avg
Precision: 0.81
Recall: 0.74

📊 Step 4: Bringing It to Life in Power BI

The structured output (themes + location + sentiment) was exported into a SQL backend feeding a live Power BI dashboard. The dashboard allowed decision-makers to:

Filter by district, theme, or urgency
View weekly trends in reported issues
Download summary reports for budget planning
Compare current insights with historical baselines

🔥 Impact

✅ Reduced manual review time by 80% ✅ Enabled real-time insight reporting for 5+ city departments ✅ Informed funding allocation for new health programs and outreach teams

Case Study Walkthrough: Turning Unstructured Survey Data into Community Health Insights

Jay Burgess, MScDS, M.Ed, MBA

Chief Revenue Architect | Architecting Predictable, Profitable Revenue Systems | Closing Revenue Gaps with AI | Building Revenue OS Systems | Ex-Walmart, Adobe | Margin & Value Optimization

Step 1: Collecting the Data

Step 2: Preprocessing with Python

Step 3: Classifying Themes with NLP

📊 Step 4: Bringing It to Life in Power BI

🔥 Impact

More articles by this author

Others also viewed

2024 Highlights: The AI and Data Science Articles That Made a Splash

LLMs, AI Agents, the Economics of Generative AI, and Other August Must-Reads

TIDES-009: Nine (9) Data Science Concepts and Resources for Learning

Forecasting, but make it Modern

Kaggle “Dogs vs. Cats” Challenge — Complete Step by Step Guide — Part 1

Text Similarity

Cracking the Code: Naïve Bayes Made Simple for Smarter Machine Learning

Topics in Data Science: A Detailed List

🚀 51 GitHub Repositories to Learn Artificial Intelligence in 2025 (From Zero to Advanced)

What is Qdrant?

Explore topics

Step 1: Collecting the Data

Step 2: Preprocessing with Python

Step 3: Classifying Themes with NLP

📊 Step 4: Bringing It to Life in Power BI

🔥 Impact

How Disney Could Optimize Dynamic Pricing Using AI

Jul 14, 2025

“How to Create Your Own Pricing Algorithm (Without a Data Science Team)”

Jul 4, 2025

The Science of Revenue: A Top-Down Analysis for Maximizing Enterprise Performance

Jun 18, 2025

NLP in the Wild: Real-World Use Cases That Go Beyond Chatbots

Apr 22, 2025

Building Scalable Machine Learning Systems with Azure ML and MLflow

Apr 22, 2025

From Insights to Action: How Predictive Analytics Drives Real Business Impact

Apr 22, 2025

Why Your Business Needs a Product Manager (Even If You Think You Don’t)

Apr 15, 2025

The Unsung Architects: Why Internal Product Management Is the Real Crucible

Apr 15, 2025

Why I Partner with Founders for Equity (Instead of Just a Paycheck)

Apr 5, 2025

The Way We Hire Is Broken — Here’s How to Fix It

Apr 3, 2025

Others also viewed

2024 Highlights: The AI and Data Science Articles That Made a Splash

LLMs, AI Agents, the Economics of Generative AI, and Other August Must-Reads

TIDES-009: Nine (9) Data Science Concepts and Resources for Learning

Forecasting, but make it Modern

Kaggle “Dogs vs. Cats” Challenge — Complete Step by Step Guide — Part 1

Text Similarity

Cracking the Code: Naïve Bayes Made Simple for Smarter Machine Learning

Topics in Data Science: A Detailed List

🚀 51 GitHub Repositories to Learn Artificial Intelligence in 2025 (From Zero to Advanced)

What is Qdrant?

Explore topics