Case Study Walkthrough: Turning Unstructured Survey Data into Community Health Insights

Case Study Walkthrough: Turning Unstructured Survey Data into Community Health Insights

Project: Community Health Insights Dashboard Stack: Python (spaCy, Scikit-learn), Azure ML, Power BI Client: Public sector agency (anonymized for confidentiality) Goal: Convert open-ended resident survey data + public feedback into actionable insights for policy decision-making.


Step 1: Collecting the Data

Sources:

  • Open-text responses from health surveys
  • 311 community complaints
  • Notes from social workers and service case files

Challenge: These were all unstructured and filled with typos, slang, and overlapping topics (e.g., “rats,” “mold,” and “food storage” all related to housing conditions).


Step 2: Preprocessing with Python

To clean and prepare the data, I built a pipeline using:

import spacy from sklearn.feature_extraction.text import TfidfVectorizer nlp = spacy.load("en_core_web_sm") def clean_text(doc): doc = nlp(doc.lower()) tokens = [token.lemma_ for token in doc if not token.is_stop and token.is_alpha] return " ".join(tokens)

  • Removed noise (stop words, punctuation, numbers)
  • Lemmatized words to unify variations (e.g., “sleeping” → “sleep”)
  • Prepared for vectorization and classification


Step 3: Classifying Themes with NLP

Used custom multi-label classification with Scikit-learn to tag themes like:

  • Housing Insecurity
  • Mental Health
  • Food Access
  • Safety & Policing
  • Employment Barriers

Model training included:

from sklearn.pipeline

import Pipeline from sklearn.linear_model

import Logistic Regression from sklearn.multiclass

import OneVsRestClassifier

pipeline = Pipeline([ ('tfidf', TfidfVectorizer(max_df=0.8, min_df=5)), ('clf', OneVsRestClassifier(LogisticRegression())) ])

Evaluation metrics:

  • F1 Score: 0.78 avg
  • Precision: 0.81
  • Recall: 0.74


📊 Step 4: Bringing It to Life in Power BI

The structured output (themes + location + sentiment) was exported into a SQL backend feeding a live Power BI dashboard. The dashboard allowed decision-makers to:

  • Filter by district, theme, or urgency
  • View weekly trends in reported issues
  • Download summary reports for budget planning
  • Compare current insights with historical baselines


🔥 Impact

Reduced manual review time by 80%Enabled real-time insight reporting for 5+ city departmentsInformed funding allocation for new health programs and outreach teams

To view or add a comment, sign in

Others also viewed

Explore topics