Building a Real-Time, Low-Cost Geospatial Air Pollution Monitoring System for Colombo Using AI and IoT

Building a Real-Time, Low-Cost Geospatial Air Pollution Monitoring System for Colombo Using AI and IoT

Mohamed Rimsan Fathima Rashadha | BSc (Hons) in Artificial Intelligence & Data Science (2025)

1. Introduction

Air pollution is an invisible killer in many developing cities and Colombo is no exception. Yet when you visit platforms like the World Air Quality Index (WAQI), you’ll notice fewer than 3 real-time air quality entries from Colombo leaving entire communities without critical environmental data.

To address this, I developed a low-cost, real-time geospatial air pollution monitoring system tailored for Colombo using a Raspberry Pi, low-cost sensors, and a machine learning pipeline deployed on Google Cloud TPUs. The system offers high spatial resolution, live anomaly detection, and interactive dashboards for real-time visualization. Additionally, it includes a Geospatial LSTM model designed to predict AQI trends per city by learning from past pollutant patterns and geolocation data, enabling spatio-temporal forecasting for up to 6 days in advance. Together, these components form a comprehensive toolkit for public health monitoring, early warning systems, and environmental intelligence in urban settings.

2. Hardware Setup & Sensor Integration

Each sensor node is built around a Raspberry Pi 4B, serving as the central computing unit. The system integrates multiple sensors through a GPIO extension board and breadboard setup, using jumper wires for modular connectivity.

Power Management: Sensors are powered through the Pi’s 3.3V/5V rails based on voltage requirements. A common GND rail ensures consistent grounding.

Sensor Interfaces:

  • UART/Serial: SDS011 for PM2.5/PM10, and GPS modules
  • I2C: DHT22 (humidity/temp) & BMP280 (pressure)
  • ADC (via ADS1015): MQ-series & MiCS sensors (CO, NO₂, SO₂, O₃)

Article content
Complete Set of Sensors Used in the Air Quality Monitoring System
Article content
Hardware Setup: Raspberry Pi + Sensors

3. Technology Stack

Article content

4. Real-Time Sensor Data Acquisition

Using a modular EnvironmentalMonitor class in Python, each sensor is initialized and read periodically (every 60 seconds):

  • read_sds011() – Airborne particulate matter (PM2.5 & PM10)

  • read_gps() – Real-time geolocation (latitude, longitude)

  • read_dht() – Humidity and ambient temperature

  • read_bmp() – Atmospheric pressure

  • read_adc(channel) – Gas concentrations via MQ/MiCS analog output

Each reading is enriched using a stat_record() function to compute descriptive statistics (min, max, median, variance), improving the robustness of the dataset even under sensor anomalies or failures.

5. Data Collection & Processing Pipeline

  • Sensor nodes were installed across selected urban locations in Colombo.
  • Each node collected PM2.5, PM10, CO, NO₂, SO₂, O₃, humidity, pressure, and geolocation data every 60 seconds.
  • Data was streamed to Google BigQuery, enabling structured storage and efficient SQL-based queries.
  • Edge preprocessing included basic filtering and formatting, while centralized cloud preprocessing handled missing value imputation, feature engineering, spatial clustering, and scaling.

Article content
Methodology Flowchart
Article content
BigQuery Schema
Article content
BigQuery Schema Preview

6. Real-Time Dashboards & GIS Insights

I built a set of interactive dashboards in Looker Studio to visualize trends, anomalies, and spatial pollution patterns:

Article content
Bubble Map of AQII (Air Quality Impact Index) Across Colombo
Article content
PM2.5 & PM10 vs WHO Guidelines Over Time
Article content
Pollution Composite Index (PCI) Over Time
Article content
Risk Projection Scores Over Time
Article content
Satellite vs Ground NO₂ Correlation (Low R² = 0.037)

All dashboards allow filtering by pollutant, city, time range, and anomaly status.

7. ML Models & TPU Deployment

7.1. Geospatial LSTM Model for AQI Prediction

I implemented a custom geospatial LSTM (Long Short-Term Memory) model designed to forecast AQI values based on historical patterns across space and time. This model incorporates temporal sequences of pollutant levels and GPS-coordinated sensor data, enabling spatio-temporal AQI prediction.

7.1.1. Key Features:

  • Model Type: Multi-feature LSTM with geospatial context
  • Input Features: PM2.5, PM10, CO, NO₂, SO₂, O₃, temperature, humidity, pressure, wind speed, GPS lat/long
  • Sequence Length: 10-time steps (10 days of past data per city, one record per day)
  • Hidden Units: 128
  • Dropout: 0.3
  • Batch Size: 64
  • Optimizer: Adam
  • Loss Function: Mean Squared Error (MSE)
  • Evaluation Metric: Root Mean Squared Error (RMSE) evaluated per city

7.1.2. Model Outcome:

  • Capable of forecasting AQI for the next 6 days for each sensor node or city using its past 10-day history
  • Trained on daily-segmented sequences per city, allowing for localized AQI forecasting

7.1.3. Key Achievement

Comprehensive exploratory data analysis including pollutant distribution visualization, time-series trend analysis, and correlation heatmaps guided data preprocessing and feature engineering. This significantly enhanced the model's ability to detect patterns across different pollutants and climatic variables.

Article content
Time series Plot of AQI for Various Pollutants
Article content
Plot Distribution of AQI values for each pollutant
Article content

Achieved a validation RMSE of 8.2 averaged across cities, demonstrating strong predictive accuracy and temporal consistency.

7.2. Anomaly Detection Suite

To detect unusual pollution spikes and sensor errors, I built an anomaly detection suite with:

  • Autoencoder trained on clean baseline data to detect deviations.

Article content
L

  • Isolation Forest to isolate outliers in high-dimensional space.
  • Gaussian Mixture Model (GMM) to cluster normal vs abnormal behavior.

7.2.2. TPU Training Pipeline Highlights:

  • Autoencoder trained on Google Cloud TPUs using TPUStrategy
  • Dockerized training script and deployed via Vertex AI Pipeline
  • Models registered and served on endpoints for continuous monitoring

Article content
TPU Training Pipeline
Article content
Docker Registry Image
Article content
The model is deployed to an endpoint ready for prediction

8. Model Evaluation: Key Achievements for Anomaly Detection Suite

8.1. Autoencoder

Article content
Article content
Loss Curve
Article content
Error Distribution
Article content
Time Series of Error
Article content
PCA Anomaly Clusters

Summary:

Autoencoder showed strong generalization, stable convergence, and effective anomaly separation using unsupervised reconstruction error. It detected ~5% anomalies and maintained a balance between sensitivity and false positives.

8.2. Gaussian Mixture Model (GMM)

Article content
Article content
Log Likelihood Plot
Article content
AIC/BIC vs. Components
Article content
Responsibility Heatmap
Article content
Anomaly Score by City

Summary:

GMM with 6 components achieved the highest log likelihood, clean clustering (as seen in heatmaps), and robust generalization, making it ideal for characterizing complex air quality conditions and distinguishing anomalies from normal environmental fluctuations.

8.3. Isolation Forest

Article content
Article content
Learning Curve

Summary:

Isolation Forest achieved high clustering quality and accurate anomaly detection, aligning with the contamination threshold. It is computationally efficient and integrates well in ensemble settings.

8.4. Overall Achievements

Article content

10. Future Use: Forest Fire Detection & Environmental Monitoring

Due to its real-time anomaly detection, GPS integration, and CO/PM monitoring, this system can be adapted for early forest fire alerts, especially in dry zones where smoke rises before visible flames.

Use Case: Detecting sudden spikes in PM2.5/CO in forest-adjacent zones Satellite + Ground Sensor Hybrid Mapping via Google Earth Engine

11. Impact on Environmentalists & ML Engineers

11.1. For Environmentalists:

  • Monitor pollution trends, spot industrial hotspots, and identify policy intervention points
  • Understand long-term climate trends using Looker dashboards and satellite-ground comparisons

11.2. For ML Engineers:

  • A reproducible MLOps pipeline from sensor to dashboard
  • Explore the integration of spatial-temporal modeling with LSTM for geospatial AQI prediction a practical application of deep learning in environmental intelligence
  • Opportunity to experiment with real-world unsupervised anomaly detection
  • Ready-to-extend framework for other smart city or disaster response applications

12. Explore the Project

Rukshan Shamri

Associate Systems Engineer || Cloud & Network Enthusiast

2mo

Fathima Rashadha , Very interesting project👏👏👏

To view or add a comment, sign in

Explore topics