SlideShare a Scribd company logo
4
Most read
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1852
A Deep Learning based Air Quality Prediction
B. Lakshmi Sravya 1, A.S. MahaLakshmi 2, D.Balaji Bhavya Swarupini3, B.V. Sai Jaswanth4
1,2,3,4 Lendi Institute of Engineering & Technology, Jonnada, Vizianagaram, Andhra Pradesh
---------------------------------------------------------------------***------------------------------------------------------------------
Abstract - Industries are the major means of air pollutants. Air pollution in the form of carbon dioxide and methane raises the
earth’s temperature, the less gasoline we burn, the better we do to reduce air pollution and harmful effects of climate change.
Especially at metropolitan cities, the change in the temperature combined with harmful chemicals mayleadto dangeroussigns of
air pollution. Quality of air prediction techniques has a major importance in the current learning world. Many machine learning
algorithms done a lot of research in identifying the air quality index. Applying deep learning models on these data can showgreat
difference in predicting the quality of air. We proposed an LSTM based deep learning technique in evaluating hourly based
encompassing air quality. The proposed results outperformed the existing model results through predicting RMSE value.
Key Words: Pollution, Prediction, LSTM, Deep Learning.
1. INTRODUCTION
Industrialization and urbanization have intensified environmental healthrisksandpollution,especiallyindeveloping
countries like India. Study shows that air pollution poses a major health risks such as stroke, heart disease, lung cancer, and
chronic and acute respiratory diseases. According to the World Health Organization (WHO) report [1], 14 out of the top 15
most polluted cities in the world are in India (in which Delhi is among the top list), an estimated 12.6 million people die from
environmental health risks annually. According to the WHO, 92% of the world’s population lives in areas wherethe airquality
is below the WHO standards [2].
About 88% of premature deaths occur in the low and middle-income countries, where air pollution is escalatingatan
alarming rate. India is the third largest producer of greenhouse gases after China and the United States [3]. The severity of air
pollution is so much that as per 2016 study conducted by the Indian Institute of Tropical Meteorology(IITM)andAtmospheric
Chemistry. Observations and Modelling Laboratory, National Centre forAtmospheric Research,Boulder,Colorado,USA[4],life
expectancy among Indians reduces by 3.4 years on an average while among the residents of Delhi it reduces by almost 6.3
years.
There are 6 prominent air pollutants present in the air, Particulate Matter (PM2.5 and PM10),CarbonMonoxide(CO),
Ozone (O3), Nitrogen dioxides (NO2), Sulphur dioxide (SO2). Table1showsthesourcesofairpollutantsandtheir majoreffects
on human health and environment [5]. To track the rising pollution trend in India, the government of India has installed
pollutant’s measuring sensors at various stations covering major pollution prone areas. Multiple steps have beentaken bythe
government to control pollution such as metro facility, increase in public transport, and laws such as even-odd system for
personal vehicles. Considering the current trend of pollution growth, these solutions are bound to fail in future. Therefore, air
pollution forecasting and generating solutions to control it are today’s need.
1.1 CRITERIA POLLUTANTS
Table -1: Emission Sources and Major Effects
Criteria
pollutants
Emission sources Major effects
Natural sources Anthropogenic
sources
Health effects
Environmental effects
Sulphur Dioxide
(SO2)
Volcanic emissions
Burning of fossil
fuels, metal
melting etc.
Respiratory
problems, heart and
lung disorders, visual
impairment
Acid rain
Nitrogen dioxide
(NO2)
Lightning, forest
fires
etc.
Burning of fossil
fuels,
biomass & high
temperature
Pulmonary disorders,
increased
susceptibility to
respiratory infections
Precursor of ozone
formation
in troposphere, aerosol
formation.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1853
combustion
process
Particulate matter
(PM)
Windblown dust,
pollen spores,
photochemically
produced
particles
Vehicular
emissions,
industrial
combustion
processes,
construction
industries
Respiratory
problems, liver
fibrosis, lung/liver
cancer,
heart stroke, bone
problems
Visibility reduction
Carbon monoxide
(CO)
Animal
metabolism, forest
fires, volcanic
activity
Burning of
carbonaceous
fuels,
emission from IC
engines
Anoxemia leading to
various
cardiovascular
problems. infants,
pregnant women and
elderly people are at
higher
risk.
Effects the amount of
greenhouse gases
which
are linked to climate
change
and global warming.
Ozone (O3)
Present in
stratosphere
at 10-50 km height
Hydrocarbons and
NOx upon
reacting with
sunlight results
in (O3) formation.
Respiratory
problems,
asthma, bronchitis
etc.
O3 in upper
troposphere
causes green house
effects,
harmful effects on
plants,
death of plant tissues.
1.2 OBJECTIVE
Figuring out these problems in the environment and applying techniques to improve the efficiencyandpredictingthe
air index using different learning problems.
2. RELATED WORK
Reddy et al. [6] investigate the use of LSTM framework for forecasting pollution in future based on time series pollutant and
meteorological data in Beijing area. The main aim of this paper is the application of LSTM sequence to scalar model to forecast
pollution. Zheng et al. [7] address the issue of air quality inference based on air quality reported by existing sensor stations.
Meteorological data, traffic flow, human mobility, point of interests (POIs) are other features used to infer AQI at non-sensor
locations.
He et al. [7], Roy et al. [8] provide a method to predict PM10 concentration. Pérez et al. [9] proposed a method to predict PM2.5
concentration for the next 24 hours. In our work, we are providinga method to predict each pollutantconcentrationandAQIup
to next 12 hours.
Roy et al. [8] used Mill tailings at Kolar Gold Fields data for their experiments. Monitoring was carried out at the National
Institute of Rock Mechanics (NIRM), Kolar Gold Fields(KGF). Pérez et al. [9] performedtheirexperimentsforMalaysia.Thedata
was provided by Malaysian Meteorological Department (MMD) and Department of Environment (DOE).
He et al. [10] develop a hybrid methodology to forecast PM10. The paper combines both Autoregressive Integrated Moving
Average (ARIMA) and ANNmodelstoimproveforecastaccuracy.ThepaperusedARIMAtomodelthelinearcomponentandthen
ANN model is used to take care of the residuals from ARIMA model. They report that hybrid model can be a effective way to
improve PM10 forecasting accuracy compared to single ARIMA model. Roy et al. [8] present an ANN based approach as
predictive and data analysis tool for the evaluation of air pollutant. The paper proposes a multilayer feed forward network to
predict PM10 concentration using meteorological data. Pérez et al. [9] proposed a three layer neural network to predict PM2.5
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1854
concentration. They used previous 24 hours PM2.5 data for prediction. In this work, we have developed a time series based
stacked LSTM model to forecast air pollutant concentration.
2.1 RESULTS AND DISCUSSIONS
The main aim of the study is to predict PM2.5 level and detect air quality based on a data set consisting of daily
atmospheric conditions in a specific city. Deep learning is employed to predict future values of PM2.5 based on the previous
PM2.5 readings. This can be done by using Long Short-Term Memory (LSTM) which is an artificial recurrent neural network
(RNN) architecture used in the field of deep learning. The output is the result of calculated Root Mean Square Error (RMSE).
Low RMSE value indicates that the model has accurate results.
Fig. 2.1 LSTM and Nerve Cell
2.2 METHODOLOGY
Step 1: The data is taken in the form of csv file. (data.csv)
Step 2: After the input dataset is given, the data will be preprocessed by
 Removing Null values from a data frame and replace NaN values with default values.
 Sometimes our data will be qualitative form, that is we have texts as our data. We can find categories in text
form. Now it gets complicated for machines to understandtexts and process them, ratherthannumbers,since
the models are based on mathematical equations and calculations. Therefore, we have to encode the
categorical data.
 Then it fit the model to the data, then transform the data according to the fitted model.
Step 3: After the preprocessing, the data is scaled to a fixed range - usually 0 to 1. The cost of having this bounded range - in
contrast to standardization - is that we will end up with smaller standard deviations, which can suppress the effect of outliers.
Then using s_to_super function the first column of row(t) is shifted to last column of row(t-1) and concatenated. This act
transforms a normal preprocessed dataset to recurrent dataset.
Step 4: Now we need to split our dataset into two sets — a Training set and a Test set. We will train our machine learning
models on our training set, i.e. our machine learning models will try to understand any correlations in our training set and then
we will test the models on our test set to check how accurately it can predict. A generalruleofthethumbistoallocate80%ofthe
dataset to training set and the remaining 20% to test set. For this task, we will import test_train_split from model_selection
library of scikit.
Step 5 : Now to build our training and test sets, we will create 4 sets— X_train (training part of the matrix of features), X_test
(test part of the matrix of features), Y_train (training part of the dependent variables associated with the X train sets, and
therefore also the same indices) , Y_test (test part of the dependent variables associated with the X test sets, and therefore also
the same indices). We will assign to them the test_train_split, which takes the parameters — arrays (X and Y), test_size.
Step 6: Now, we need to build a model to train the data. Here the model used is Long Short-Term Memory.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1855
Step 7: An LSTM has a similar control flow as a recurrent neural network. It processes data passing on information as it
propagates forward. The differences are the operations within the LSTM’s cells. These operations are usedtoallowtheLSTMto
keep or forget information.
Step 8: The first step in LSTM is to decide what information you are going to throw away from the cell state. This decision is
made by a sigmoid layer called the “forget gate layer.” It gives a value between 0 and 1, where a 1 represents “keep this as itis”
while a 0 represents “get rid of this.”
Fig 2.2 Forget Gate
This step has two parts:
 First, a sigmoid layer called the “input gate layer” decides which values we’ll update.
 Next, a tanh layer creates a vector of new candidate values that could be added to the state. In the next step, by
combining these two layers, a new update is being created.
Fig 2.3 Input Gate
Step 9: It is now time to update the old cell state, Ct−1, into the new cell state Ct. The last step has already created an update.
We only need to update it.
Fig 2.4 Current state
Step 10: Finally, we need to decide what we’re going to output based on the context that we have selected.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1856
Fig 2.5 Output layer
Step 11: The prediction class is given to the model with the input data instances. With the help of those input instances the
model predicts our required output. here the input instances are given from test_X data(test part of the matrix of features)
Step 12: To Predict a model we took model_predict ().
Step 13: And we calculate Root Mean Square Error (RMSE).It is the standard deviation of the residuals (prediction errors).
Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these
residuals are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is
commonly used in climatology, forecasting, and regression analysis to verify experimental results. It is a standard way to
measure the error of a model in predicting quantitative data. Formally it is defined as follows:
It indicates the absolute fit of the model to the data–how close the observed data points are to the model's predicted values.
Whereas R-squared is a relative measure of fit, RMSE is an absolute measure of fit. ... Lower values of RMSE indicate better fit.
Fig. 2.6 Data Visualization
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1857
Fig. 2.7 Predicted RMSE and P.M.2.5 Values
3. CONCLUSION
Deep learning method is gradually developing as a promising technique for forecasting non-linear time series
information like meteorological and pollution data. In this paper, we used different deep learning models for the predictionof
air quality. Here we trained LSTM network on AirNet data topredictthefuturePM2.5andCalculatedRMSE(RootMeanSquare
Error). The analysis can be further extended by utilizing methods like Convolution Neural Network (CNN) tocatchtheuneven
changes happening in the air pollution data.Theconnectionbetweendifferentfeaturescanlikewisebeassessedhenceenabling
us to see whether there is any hidden parameter which will correlate the performance of features that appears to be different
from the first peek.
REFERENCES
[1] IndiaToday.in. 2018. 14 of world’s most polluted 15 cities in India, Kanpur tops WHO list. India Today (2018). https:
//www.indiatoday.in/education-today/gk-current-affairs/story/14-worlds-most-polluted-15-cities-india-kanpur-tops-who-
list-1224730-2018-05-02
[2] weforum.org. 2016. 92 % of us are breathing unsafe air. This map shows just how bad the problem is. (2016).
https://www.weforum.org/agenda/2016/09/ 92-of-the-world-s-population-lives-in-areas-with-unsafe-air-pollution-levels
-this-interactive-map-shows-just-how-bad-the-problem-is/
[3] reuters.com. [n. d.]. India says is now third highest carbon emitter. ([n. d.]). https://www.reuters.com/article/us-india-
climate/india-says-is-now-third-highest-carbon-emitter-idUSTRE6932PE20101004
[4] hindustantimes.com. 2016. Air pollution shortens your life by 3.4 years, Delhiites worst hit. Hindustan Times (2016).
https://www.hindustantimes.com/mumbai/air-pollution-shortens-your-life-by-3-4-years/story-
L9VOawHyX4PCMfCuAjv4ML.html
[5] Central pollution Control Board. [n. d.]. ENVIS Centre on Control of Pollution Water, Air and Noise. ([n. d.]).
http://cpcbenvis.nic.in/envis_newsletter/Air%20pollution%20in%20Delhi.pdf
[6] Vikram Simha A Reddy, Pavan S. Yedavalli, Shrestha Mohanty, and Udit Nakhat. 2017.DeepAir:ForecastingAirPollutionin
Beijing, China.
[7] Yu Zheng, Furui Liu, and Hsun-Ping Hsieh. 2013. U-Air: when urban air quality inference meets big data. In KDD.
[8] Surendra Roy. 2012. Prediction of Particulate Matter Concentrations Using Artificial Neural Network. 2 (03 2012), 30–36.
[9] Patricio Perez and Jorge Reyes. 2001. Prediction of Particlulate Air Pollution using Neural Techniques.NeuralComputing&
Applications 10, 2 (01 May 2001), 165–171. https://doi.org/10.1007/s005210170008
[10] G. He and Qihong Deng. 2012. A Hybrid ARIMA and Neural Network Model toForecastParticulateMatterConcentrationin
Changsha, China.

More Related Content

PDF
IRJET - Prediction of Air Pollutant Concentration using Deep Learning
PDF
IRJET- Air Pollution Prediction System for Smart City using Data Mining T...
PDF
IRJET- Recognition of Future Air Quality Index using Artificial Neural Network
PDF
Air Pollution Prediction using Machine Learning
PDF
Analysis Of Air Pollutants Affecting The Air Quality Using ARIMA
PDF
IRJET- Prediction of Fine-Grained Air Quality for Pollution Control
PDF
IRJET- Analysis and Prediction of Air Quality
PDF
Analysis and Prediction of Air Quality in India
IRJET - Prediction of Air Pollutant Concentration using Deep Learning
IRJET- Air Pollution Prediction System for Smart City using Data Mining T...
IRJET- Recognition of Future Air Quality Index using Artificial Neural Network
Air Pollution Prediction using Machine Learning
Analysis Of Air Pollutants Affecting The Air Quality Using ARIMA
IRJET- Prediction of Fine-Grained Air Quality for Pollution Control
IRJET- Analysis and Prediction of Air Quality
Analysis and Prediction of Air Quality in India

Similar to A Deep Learning Based Air Quality Prediction (20)

PDF
Ae4102224236
PDF
A Smart air pollution detector using SVM Classification
PDF
Evaluating the Effect of Human Activity on Air Quality using Bayesian Network...
PDF
Prediction of Air Quality Influential Factors with AtmosphericAir Present Pol...
PDF
Air Quality Visualization
PDF
Atmospheric Pollutant Concentration Prediction Based on KPCA BP
DOC
Final Synopsis -Bharathi(21-4-23).doc
PDF
Air Pollution Prediction via Differential Evolution Strategies with Random Fo...
PDF
Prediction of Air Quality Index using Random Forest Algorithm
PDF
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
PDF
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
PDF
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
PDF
HOSTILE GAS MONITORING SYSTEM USING IoT
PDF
Implementation of Integration VaaMSN and SEMAR for Wide Coverage Air Quality ...
PDF
EDD Project A35 group. final.pdf Department of ENTC
PDF
IRJET - Air Quality Index – A Study to Assess the Air Quality
PDF
Assessment of Variation in Concentration of Air Pollutants Within Monitoring ...
PDF
IRJET- Air Quality and Dust Level Monitoring using IoT
PDF
IRJET- Air Pollution Prediction using Machine Learning
PDF
Air Quality Monitoring and Control System in IoT
Ae4102224236
A Smart air pollution detector using SVM Classification
Evaluating the Effect of Human Activity on Air Quality using Bayesian Network...
Prediction of Air Quality Influential Factors with AtmosphericAir Present Pol...
Air Quality Visualization
Atmospheric Pollutant Concentration Prediction Based on KPCA BP
Final Synopsis -Bharathi(21-4-23).doc
Air Pollution Prediction via Differential Evolution Strategies with Random Fo...
Prediction of Air Quality Index using Random Forest Algorithm
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
ONLINE SCALABLE SVM ENSEMBLE LEARNING METHOD (OSSELM) FOR SPATIO-TEMPORAL AIR...
HOSTILE GAS MONITORING SYSTEM USING IoT
Implementation of Integration VaaMSN and SEMAR for Wide Coverage Air Quality ...
EDD Project A35 group. final.pdf Department of ENTC
IRJET - Air Quality Index – A Study to Assess the Air Quality
Assessment of Variation in Concentration of Air Pollutants Within Monitoring ...
IRJET- Air Quality and Dust Level Monitoring using IoT
IRJET- Air Pollution Prediction using Machine Learning
Air Quality Monitoring and Control System in IoT
Ad

More from Dereck Downing (20)

PDF
How To Write Dialogue A Master List Of Grammar Techniques
PDF
Writing Paper Service Educational Blog Secrets To Writing Blog Even
PDF
Scholarship Essay Essays Terrorism
PDF
Best Nursing Essay Writing Services - Essay Help Online
PDF
The Federalist Papers By Alexander Hamilton Over
PDF
Example Of Acknowledgment
PDF
Writing Paper Or Write My Paper -
PDF
Observation Report-1
PDF
Third Person Narrative Essay - First, Second, And Third-Person Points
PDF
007 Apa Essay Format Example Thatsnotus
PDF
Writing Paper - Etsy
PDF
Research Paper For Cheap, Papers Online Essay
PDF
Law Essay Example CustomEssayMeister.Com
PDF
Good Introductions For Research Papers. How To
PDF
How To Write Better Essays - Ebooksz
PDF
Handwriting Without Tears Worksheets Free Printable Fr
PDF
Inspirational Quotes For Writers. QuotesGram
PDF
What Is Abstract In Research. H
PDF
Anecdotes Are Commonly Use
PDF
Synthesis Journal Example. Synthesis Examples. 2022-1
How To Write Dialogue A Master List Of Grammar Techniques
Writing Paper Service Educational Blog Secrets To Writing Blog Even
Scholarship Essay Essays Terrorism
Best Nursing Essay Writing Services - Essay Help Online
The Federalist Papers By Alexander Hamilton Over
Example Of Acknowledgment
Writing Paper Or Write My Paper -
Observation Report-1
Third Person Narrative Essay - First, Second, And Third-Person Points
007 Apa Essay Format Example Thatsnotus
Writing Paper - Etsy
Research Paper For Cheap, Papers Online Essay
Law Essay Example CustomEssayMeister.Com
Good Introductions For Research Papers. How To
How To Write Better Essays - Ebooksz
Handwriting Without Tears Worksheets Free Printable Fr
Inspirational Quotes For Writers. QuotesGram
What Is Abstract In Research. H
Anecdotes Are Commonly Use
Synthesis Journal Example. Synthesis Examples. 2022-1
Ad

Recently uploaded (20)

PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Complications of Minimal Access Surgery at WLH
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Institutional Correction lecture only . . .
PDF
Computing-Curriculum for Schools in Ghana
PPTX
master seminar digital applications in india
PDF
Pre independence Education in Inndia.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
human mycosis Human fungal infections are called human mycosis..pptx
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial disease of the cardiovascular and lymphatic systems
Complications of Minimal Access Surgery at WLH
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPH.pptx obstetrics and gynecology in nursing
Institutional Correction lecture only . . .
Computing-Curriculum for Schools in Ghana
master seminar digital applications in india
Pre independence Education in Inndia.pdf
GDM (1) (1).pptx small presentation for students
STATICS OF THE RIGID BODIES Hibbelers.pdf
Microbial diseases, their pathogenesis and prophylaxis
Pharma ospi slides which help in ospi learning
Renaissance Architecture: A Journey from Faith to Humanism

A Deep Learning Based Air Quality Prediction

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1852 A Deep Learning based Air Quality Prediction B. Lakshmi Sravya 1, A.S. MahaLakshmi 2, D.Balaji Bhavya Swarupini3, B.V. Sai Jaswanth4 1,2,3,4 Lendi Institute of Engineering & Technology, Jonnada, Vizianagaram, Andhra Pradesh ---------------------------------------------------------------------***------------------------------------------------------------------ Abstract - Industries are the major means of air pollutants. Air pollution in the form of carbon dioxide and methane raises the earth’s temperature, the less gasoline we burn, the better we do to reduce air pollution and harmful effects of climate change. Especially at metropolitan cities, the change in the temperature combined with harmful chemicals mayleadto dangeroussigns of air pollution. Quality of air prediction techniques has a major importance in the current learning world. Many machine learning algorithms done a lot of research in identifying the air quality index. Applying deep learning models on these data can showgreat difference in predicting the quality of air. We proposed an LSTM based deep learning technique in evaluating hourly based encompassing air quality. The proposed results outperformed the existing model results through predicting RMSE value. Key Words: Pollution, Prediction, LSTM, Deep Learning. 1. INTRODUCTION Industrialization and urbanization have intensified environmental healthrisksandpollution,especiallyindeveloping countries like India. Study shows that air pollution poses a major health risks such as stroke, heart disease, lung cancer, and chronic and acute respiratory diseases. According to the World Health Organization (WHO) report [1], 14 out of the top 15 most polluted cities in the world are in India (in which Delhi is among the top list), an estimated 12.6 million people die from environmental health risks annually. According to the WHO, 92% of the world’s population lives in areas wherethe airquality is below the WHO standards [2]. About 88% of premature deaths occur in the low and middle-income countries, where air pollution is escalatingatan alarming rate. India is the third largest producer of greenhouse gases after China and the United States [3]. The severity of air pollution is so much that as per 2016 study conducted by the Indian Institute of Tropical Meteorology(IITM)andAtmospheric Chemistry. Observations and Modelling Laboratory, National Centre forAtmospheric Research,Boulder,Colorado,USA[4],life expectancy among Indians reduces by 3.4 years on an average while among the residents of Delhi it reduces by almost 6.3 years. There are 6 prominent air pollutants present in the air, Particulate Matter (PM2.5 and PM10),CarbonMonoxide(CO), Ozone (O3), Nitrogen dioxides (NO2), Sulphur dioxide (SO2). Table1showsthesourcesofairpollutantsandtheir majoreffects on human health and environment [5]. To track the rising pollution trend in India, the government of India has installed pollutant’s measuring sensors at various stations covering major pollution prone areas. Multiple steps have beentaken bythe government to control pollution such as metro facility, increase in public transport, and laws such as even-odd system for personal vehicles. Considering the current trend of pollution growth, these solutions are bound to fail in future. Therefore, air pollution forecasting and generating solutions to control it are today’s need. 1.1 CRITERIA POLLUTANTS Table -1: Emission Sources and Major Effects Criteria pollutants Emission sources Major effects Natural sources Anthropogenic sources Health effects Environmental effects Sulphur Dioxide (SO2) Volcanic emissions Burning of fossil fuels, metal melting etc. Respiratory problems, heart and lung disorders, visual impairment Acid rain Nitrogen dioxide (NO2) Lightning, forest fires etc. Burning of fossil fuels, biomass & high temperature Pulmonary disorders, increased susceptibility to respiratory infections Precursor of ozone formation in troposphere, aerosol formation.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1853 combustion process Particulate matter (PM) Windblown dust, pollen spores, photochemically produced particles Vehicular emissions, industrial combustion processes, construction industries Respiratory problems, liver fibrosis, lung/liver cancer, heart stroke, bone problems Visibility reduction Carbon monoxide (CO) Animal metabolism, forest fires, volcanic activity Burning of carbonaceous fuels, emission from IC engines Anoxemia leading to various cardiovascular problems. infants, pregnant women and elderly people are at higher risk. Effects the amount of greenhouse gases which are linked to climate change and global warming. Ozone (O3) Present in stratosphere at 10-50 km height Hydrocarbons and NOx upon reacting with sunlight results in (O3) formation. Respiratory problems, asthma, bronchitis etc. O3 in upper troposphere causes green house effects, harmful effects on plants, death of plant tissues. 1.2 OBJECTIVE Figuring out these problems in the environment and applying techniques to improve the efficiencyandpredictingthe air index using different learning problems. 2. RELATED WORK Reddy et al. [6] investigate the use of LSTM framework for forecasting pollution in future based on time series pollutant and meteorological data in Beijing area. The main aim of this paper is the application of LSTM sequence to scalar model to forecast pollution. Zheng et al. [7] address the issue of air quality inference based on air quality reported by existing sensor stations. Meteorological data, traffic flow, human mobility, point of interests (POIs) are other features used to infer AQI at non-sensor locations. He et al. [7], Roy et al. [8] provide a method to predict PM10 concentration. Pérez et al. [9] proposed a method to predict PM2.5 concentration for the next 24 hours. In our work, we are providinga method to predict each pollutantconcentrationandAQIup to next 12 hours. Roy et al. [8] used Mill tailings at Kolar Gold Fields data for their experiments. Monitoring was carried out at the National Institute of Rock Mechanics (NIRM), Kolar Gold Fields(KGF). Pérez et al. [9] performedtheirexperimentsforMalaysia.Thedata was provided by Malaysian Meteorological Department (MMD) and Department of Environment (DOE). He et al. [10] develop a hybrid methodology to forecast PM10. The paper combines both Autoregressive Integrated Moving Average (ARIMA) and ANNmodelstoimproveforecastaccuracy.ThepaperusedARIMAtomodelthelinearcomponentandthen ANN model is used to take care of the residuals from ARIMA model. They report that hybrid model can be a effective way to improve PM10 forecasting accuracy compared to single ARIMA model. Roy et al. [8] present an ANN based approach as predictive and data analysis tool for the evaluation of air pollutant. The paper proposes a multilayer feed forward network to predict PM10 concentration using meteorological data. Pérez et al. [9] proposed a three layer neural network to predict PM2.5
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1854 concentration. They used previous 24 hours PM2.5 data for prediction. In this work, we have developed a time series based stacked LSTM model to forecast air pollutant concentration. 2.1 RESULTS AND DISCUSSIONS The main aim of the study is to predict PM2.5 level and detect air quality based on a data set consisting of daily atmospheric conditions in a specific city. Deep learning is employed to predict future values of PM2.5 based on the previous PM2.5 readings. This can be done by using Long Short-Term Memory (LSTM) which is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. The output is the result of calculated Root Mean Square Error (RMSE). Low RMSE value indicates that the model has accurate results. Fig. 2.1 LSTM and Nerve Cell 2.2 METHODOLOGY Step 1: The data is taken in the form of csv file. (data.csv) Step 2: After the input dataset is given, the data will be preprocessed by  Removing Null values from a data frame and replace NaN values with default values.  Sometimes our data will be qualitative form, that is we have texts as our data. We can find categories in text form. Now it gets complicated for machines to understandtexts and process them, ratherthannumbers,since the models are based on mathematical equations and calculations. Therefore, we have to encode the categorical data.  Then it fit the model to the data, then transform the data according to the fitted model. Step 3: After the preprocessing, the data is scaled to a fixed range - usually 0 to 1. The cost of having this bounded range - in contrast to standardization - is that we will end up with smaller standard deviations, which can suppress the effect of outliers. Then using s_to_super function the first column of row(t) is shifted to last column of row(t-1) and concatenated. This act transforms a normal preprocessed dataset to recurrent dataset. Step 4: Now we need to split our dataset into two sets — a Training set and a Test set. We will train our machine learning models on our training set, i.e. our machine learning models will try to understand any correlations in our training set and then we will test the models on our test set to check how accurately it can predict. A generalruleofthethumbistoallocate80%ofthe dataset to training set and the remaining 20% to test set. For this task, we will import test_train_split from model_selection library of scikit. Step 5 : Now to build our training and test sets, we will create 4 sets— X_train (training part of the matrix of features), X_test (test part of the matrix of features), Y_train (training part of the dependent variables associated with the X train sets, and therefore also the same indices) , Y_test (test part of the dependent variables associated with the X test sets, and therefore also the same indices). We will assign to them the test_train_split, which takes the parameters — arrays (X and Y), test_size. Step 6: Now, we need to build a model to train the data. Here the model used is Long Short-Term Memory.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1855 Step 7: An LSTM has a similar control flow as a recurrent neural network. It processes data passing on information as it propagates forward. The differences are the operations within the LSTM’s cells. These operations are usedtoallowtheLSTMto keep or forget information. Step 8: The first step in LSTM is to decide what information you are going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It gives a value between 0 and 1, where a 1 represents “keep this as itis” while a 0 represents “get rid of this.” Fig 2.2 Forget Gate This step has two parts:  First, a sigmoid layer called the “input gate layer” decides which values we’ll update.  Next, a tanh layer creates a vector of new candidate values that could be added to the state. In the next step, by combining these two layers, a new update is being created. Fig 2.3 Input Gate Step 9: It is now time to update the old cell state, Ct−1, into the new cell state Ct. The last step has already created an update. We only need to update it. Fig 2.4 Current state Step 10: Finally, we need to decide what we’re going to output based on the context that we have selected.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1856 Fig 2.5 Output layer Step 11: The prediction class is given to the model with the input data instances. With the help of those input instances the model predicts our required output. here the input instances are given from test_X data(test part of the matrix of features) Step 12: To Predict a model we took model_predict (). Step 13: And we calculate Root Mean Square Error (RMSE).It is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is commonly used in climatology, forecasting, and regression analysis to verify experimental results. It is a standard way to measure the error of a model in predicting quantitative data. Formally it is defined as follows: It indicates the absolute fit of the model to the data–how close the observed data points are to the model's predicted values. Whereas R-squared is a relative measure of fit, RMSE is an absolute measure of fit. ... Lower values of RMSE indicate better fit. Fig. 2.6 Data Visualization
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1857 Fig. 2.7 Predicted RMSE and P.M.2.5 Values 3. CONCLUSION Deep learning method is gradually developing as a promising technique for forecasting non-linear time series information like meteorological and pollution data. In this paper, we used different deep learning models for the predictionof air quality. Here we trained LSTM network on AirNet data topredictthefuturePM2.5andCalculatedRMSE(RootMeanSquare Error). The analysis can be further extended by utilizing methods like Convolution Neural Network (CNN) tocatchtheuneven changes happening in the air pollution data.Theconnectionbetweendifferentfeaturescanlikewisebeassessedhenceenabling us to see whether there is any hidden parameter which will correlate the performance of features that appears to be different from the first peek. REFERENCES [1] IndiaToday.in. 2018. 14 of world’s most polluted 15 cities in India, Kanpur tops WHO list. India Today (2018). https: //www.indiatoday.in/education-today/gk-current-affairs/story/14-worlds-most-polluted-15-cities-india-kanpur-tops-who- list-1224730-2018-05-02 [2] weforum.org. 2016. 92 % of us are breathing unsafe air. This map shows just how bad the problem is. (2016). https://www.weforum.org/agenda/2016/09/ 92-of-the-world-s-population-lives-in-areas-with-unsafe-air-pollution-levels -this-interactive-map-shows-just-how-bad-the-problem-is/ [3] reuters.com. [n. d.]. India says is now third highest carbon emitter. ([n. d.]). https://www.reuters.com/article/us-india- climate/india-says-is-now-third-highest-carbon-emitter-idUSTRE6932PE20101004 [4] hindustantimes.com. 2016. Air pollution shortens your life by 3.4 years, Delhiites worst hit. Hindustan Times (2016). https://www.hindustantimes.com/mumbai/air-pollution-shortens-your-life-by-3-4-years/story- L9VOawHyX4PCMfCuAjv4ML.html [5] Central pollution Control Board. [n. d.]. ENVIS Centre on Control of Pollution Water, Air and Noise. ([n. d.]). http://cpcbenvis.nic.in/envis_newsletter/Air%20pollution%20in%20Delhi.pdf [6] Vikram Simha A Reddy, Pavan S. Yedavalli, Shrestha Mohanty, and Udit Nakhat. 2017.DeepAir:ForecastingAirPollutionin Beijing, China. [7] Yu Zheng, Furui Liu, and Hsun-Ping Hsieh. 2013. U-Air: when urban air quality inference meets big data. In KDD. [8] Surendra Roy. 2012. Prediction of Particulate Matter Concentrations Using Artificial Neural Network. 2 (03 2012), 30–36. [9] Patricio Perez and Jorge Reyes. 2001. Prediction of Particlulate Air Pollution using Neural Techniques.NeuralComputing& Applications 10, 2 (01 May 2001), 165–171. https://doi.org/10.1007/s005210170008 [10] G. He and Qihong Deng. 2012. A Hybrid ARIMA and Neural Network Model toForecastParticulateMatterConcentrationin Changsha, China.