environmental quality predicti and it's deployment project
1. CONTENTS
Abstract
Objectives of the proposed system
Design Methodology
Model and Mechanism
Tools Used for Test Model
Applications
Conclusion
References
S V COLLEGE OF ENGINEERING, TIRUPATI
2. ENVIRONMENTAL QUALITY PREDICTION & IT’S DEPLOYMENT
Abstract
The environment is the source of survival for the human. In the
modern days, the degradation of the environment has been increased
significantly, when we compared to the last few centuries. The
meteorological and traffic factors, burning of fossil fuels, deforestation,
industrial parameters, and mass development of civilization played a
significant role in environmental quality. The deposition of harmful
gases in the air, mass deforestation, and industrial factors are affecting
the quality of people’s lives around the world. Many researchers began to
use the big data analytics approach as there environmental sensing
networks and sensor data available. In this project, we implement
machine learning models to detect and predict environmental quality.
Models in time series will be employed for the better prediction of
environmental quality.
S V COLLEGE OF ENGINEERING, TIRUPATI
3. OBJECTIVES OF PROPOSED SYSTEM
The proposed systems incorporates the machine learning domain
in it. The system is based on classification based technique of supervised
machine learning branch. For better accuracy, the supervised learning
counter parts such as Logistic regression, Naive bayes, KNN, Random
forest, Decision tree algorithm and Support Vector machine are compared.
• Add a new heuristic characteristics with machine learning techniques
to decrease the false positive in predicting the air quality.
• Made an effort to identify the finest model in machine learning of
supervised method to predict the air quality with higher efficacy than the
existing systems.
• Used different learning techniques such as Logistic regression, Naive
bayes, KNN, Random forest, Decision Trees and Support Vector machine.
S V COLLEGE OF ENGINEERING, TIRUPATI
5. MODEL AND MECHANISM
Data Validation
Raw data is converted to understandable format. Outliers are
removed. Missing values are filled with ’NAN’. Describing the data(shape,
count, mean, std etc). Reading the data into a variable. The libraries used
are pandas and numpy.
Exploration data analysis of visualization
In this the data is expressed in the form of bar plot, pie chart,
Heat map, Boxplot, scattering etc. this is done using matplot library,
seaborn.
S V COLLEGE OF ENGINEERING, TIRUPATI
6. MODEL AND MECHANISM
Preprocessing Technique
In this technique the object type data is converted into
numerical type using labelEncoder class that is in preprocessing module
and this module is in sklean library. There is fit_transform(data) in
labelEncoder which is used for scaling.
Logistic Regression Algorithm
In Logistic regression the data is represented in terms of ‘0’ and
‘1’.
S V COLLEGE OF ENGINEERING, TIRUPATI
AQI (dependent or output) CLASS( represented AQI as )
0-50 (Good) 1
50-100 (Satisfactory) 1
100-200 (Moderate) 0
200-300 (Poor) 0
Above 300(Very Poor) 0
7. MODEL AND MECHANISM
Creating feature matrix
In this the dependent columns and independent columns are
divided and they are read in to variables x, y.
Spliting the data for training and testing(x_train, x_test, y_train,
y_test)
The data is trained because module will analyze the patterns or
relationship between input and output. Here 70% data from data set is
given for the training. In training both the input and output features.
In testing only 30% of data is given. Only input features are given
to the model and we will predict the results and that are compared to
y_test column.This is achieved using sklearn library, model_selection
module, train_test_split.
S V COLLEGE OF ENGINEERING, TIRUPATI
8. MODEL AND MECHANISM
Training and Testing
The library,module,class,functions required are sklearn,
Linear_model, Logistic Regression, fit(),predict().
Accuracy
Classification report:
Accuracy is the ratio of true predictions to the total number of
predictions. In this weighted average is calculated based on samples.
Accuracy=(tp+tn) /( tp+tn+fp+fn)
S V COLLEGE OF ENGINEERING, TIRUPATI
Actual Value Prediction
True positive 1 1
True negative 0 0
False Positive 0 1
False negative 1 0
9. MODEL AND MECHANISM
Cross validation test result:
In this making sure that model is trained with most of the
patterns. If model is divided in ‘n’ patterns then there will ‘n’ number of
accuracy values by finding the mean of those accuracy values we will get
accuracy. This is done using cross_val_score .
Confusion matrix:
In sklearn matrix there will be a module named metric from that
confusion matrix is imported.
S V COLLEGE OF ENGINEERING, TIRUPATI
10. MODEL AND MECHANISM
Naïve Bayes Algorithm
This technique is based on Bayes theorem with an assumption of
independence among features. In this library, module, class, functions
used are sklearn, naive_bayes, GaussianNB, fit(), predict().
P(A∣B)=P(A)⋅P(B∣A)/ P(B)
Decision Tree Algorithm
It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each
leaf node represents the outcome. In this library, module, class, functions
used are sklearn, tree, DecisionTreeClassifier, fit(), predict().
S V COLLEGE OF ENGINEERING, TIRUPATI
11. MODEL AND MECHANISM
Random Forest Algorithm
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI
place Average
pollutants
X 50
Y 300
z 120
12. MODEL AND MECHANISM
Support Vector Machine Algorithm
In SVM algorithm the Hyperplane is created that can segregate
dimensional space into classes so that we can easily put the new data
point in the correct category in the future. In this sklearn, svm, SVC, fit(),
predict().
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI
13. MODEL AND MECHANISM
K- Nearest Algorithm
In this Eucledian distance formula used. Sklearn,neighbors,
KNeighborsClassifier, fit(), predict() are used.
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI
14. MODEL AND MECHANISM
Accuracy Results
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI
Algorithm Accuracy percentage
Logistic Regression 85.71
Gaussian Naïve Bayes 78.57
Decision Tree 71.42
Random Forest 92.85
Support Vector Machine 76.92
KNeighbors Classifier 78.57
16. TOOLS USED FOR TEST MODEL
Anaconda
Jupyter
Amazon Web services
APPLICATIONS
Can be used by Common people
Industrial areas
In cities
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI
17. CONCLUSION
Prevention of air pollution is the need of the hour, so a influential
machine learning system was established with the help of prediction
model. Prediction of pollution events has become most important issue in
major cities in India due to the increased expansion of the population and
the associated impact of traffic capacities. Data from a variety of
heterogeneous capitals were used and involved collection and cleansing
for use in machine learning algorithms. The number of model parameters
and optimized outputs were reduced with help of structure regularization
which in turn, alleviated model complexity. The Random forest tree
algorithm gave the best results among all the algorithms, with an overall
accuracy of 99.8.
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI
18. REFERENCES
[1] Acharjya, Debi Prasanna, and Kauser Ahmed (2019), ”A survey on big
data analytics: challenges, open research issues and tools.” International
Journal of Advanced Computer Science and Applications, vol.7,no.2,
pp.511- 518.
[2] A. Gnana Soundari, J. Gnana Jeslin, Akshaya A.C (2019),”Indian Air
Quality Prediction And Analysis Using Machine Learning”, International
Journal of Computer Applications Technology and Research ,Volume
8,Issue 09, 367-370.
[3] Abed Al Ahad M, Sullivan F, Demsar U, Melhem M, Kulu H(2020),” The
Effect Of Air-pollution And Weather Exposure On Mortality And Hospital
Admission And Implications For Further Research: A Systematic Scoping
Review”. PLoS ONE 15(10): e0241415.
[4] D. Qin, J. Yu, G. Zou, R. Yong, Q. Zhao and B. Zhang (2019), ”A Novel
Combined Prediction Scheme Based on CNN and LSTM for Urban PM2.5
Concentration,” in IEEE Access, vol.7, pp.20050-20059.
S V COLLEGE OF ENGINEERING, TIRUPATI
S V COLLEGE OF ENGINEERING, TIRUPATI