Building a Real-Time, Low-Cost Geospatial Air Pollution Monitoring System for Colombo Using AI and IoT
Mohamed Rimsan Fathima Rashadha | BSc (Hons) in Artificial Intelligence & Data Science (2025)
1. Introduction
Air pollution is an invisible killer in many developing cities and Colombo is no exception. Yet when you visit platforms like the World Air Quality Index (WAQI), you’ll notice fewer than 3 real-time air quality entries from Colombo leaving entire communities without critical environmental data.
To address this, I developed a low-cost, real-time geospatial air pollution monitoring system tailored for Colombo using a Raspberry Pi, low-cost sensors, and a machine learning pipeline deployed on Google Cloud TPUs. The system offers high spatial resolution, live anomaly detection, and interactive dashboards for real-time visualization. Additionally, it includes a Geospatial LSTM model designed to predict AQI trends per city by learning from past pollutant patterns and geolocation data, enabling spatio-temporal forecasting for up to 6 days in advance. Together, these components form a comprehensive toolkit for public health monitoring, early warning systems, and environmental intelligence in urban settings.
2. Hardware Setup & Sensor Integration
Each sensor node is built around a Raspberry Pi 4B, serving as the central computing unit. The system integrates multiple sensors through a GPIO extension board and breadboard setup, using jumper wires for modular connectivity.
Power Management: Sensors are powered through the Pi’s 3.3V/5V rails based on voltage requirements. A common GND rail ensures consistent grounding.
Sensor Interfaces:
3. Technology Stack
4. Real-Time Sensor Data Acquisition
Using a modular EnvironmentalMonitor class in Python, each sensor is initialized and read periodically (every 60 seconds):
Each reading is enriched using a stat_record() function to compute descriptive statistics (min, max, median, variance), improving the robustness of the dataset even under sensor anomalies or failures.
5. Data Collection & Processing Pipeline
6. Real-Time Dashboards & GIS Insights
I built a set of interactive dashboards in Looker Studio to visualize trends, anomalies, and spatial pollution patterns:
All dashboards allow filtering by pollutant, city, time range, and anomaly status.
7. ML Models & TPU Deployment
7.1. Geospatial LSTM Model for AQI Prediction
I implemented a custom geospatial LSTM (Long Short-Term Memory) model designed to forecast AQI values based on historical patterns across space and time. This model incorporates temporal sequences of pollutant levels and GPS-coordinated sensor data, enabling spatio-temporal AQI prediction.
7.1.1. Key Features:
7.1.2. Model Outcome:
7.1.3. Key Achievement
Comprehensive exploratory data analysis including pollutant distribution visualization, time-series trend analysis, and correlation heatmaps guided data preprocessing and feature engineering. This significantly enhanced the model's ability to detect patterns across different pollutants and climatic variables.
Achieved a validation RMSE of 8.2 averaged across cities, demonstrating strong predictive accuracy and temporal consistency.
7.2. Anomaly Detection Suite
To detect unusual pollution spikes and sensor errors, I built an anomaly detection suite with:
7.2.2. TPU Training Pipeline Highlights:
8. Model Evaluation: Key Achievements for Anomaly Detection Suite
8.1. Autoencoder
Summary:
Autoencoder showed strong generalization, stable convergence, and effective anomaly separation using unsupervised reconstruction error. It detected ~5% anomalies and maintained a balance between sensitivity and false positives.
8.2. Gaussian Mixture Model (GMM)
Summary:
GMM with 6 components achieved the highest log likelihood, clean clustering (as seen in heatmaps), and robust generalization, making it ideal for characterizing complex air quality conditions and distinguishing anomalies from normal environmental fluctuations.
8.3. Isolation Forest
Summary:
Isolation Forest achieved high clustering quality and accurate anomaly detection, aligning with the contamination threshold. It is computationally efficient and integrates well in ensemble settings.
8.4. Overall Achievements
10. Future Use: Forest Fire Detection & Environmental Monitoring
Due to its real-time anomaly detection, GPS integration, and CO/PM monitoring, this system can be adapted for early forest fire alerts, especially in dry zones where smoke rises before visible flames.
Use Case: Detecting sudden spikes in PM2.5/CO in forest-adjacent zones Satellite + Ground Sensor Hybrid Mapping via Google Earth Engine
11. Impact on Environmentalists & ML Engineers
11.1. For Environmentalists:
11.2. For ML Engineers:
12. Explore the Project
Associate Systems Engineer || Cloud & Network Enthusiast
2moFathima Rashadha , Very interesting project👏👏👏