SlideShare a Scribd company logo
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[381]
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING
ALGORITHMS FOR PLANT DISEASE DETECTION
Rasha Riyas*1, Jismy Joseph*2
*1Scholar, Department of MCA, SCMS School Of Technology And Management, Ernakulam,
Kerala, India.
*2Associate Professor, Department of MCA, SCMS School Of Technology And Management, Ernakulam,
Kerala, India.
ABSTRACT
For the agriculture sector, detecting and identifying plant diseases at an early stage is extremely important and
still very challenging. Machine learning is an application of AI that helps us achieve this purpose effectively. It
uses a group of algorithms to analyze and interpret data, learn from it, and using it, smart decisions can be
made. For accomplishing this project, a dataset that contains a set of healthy & diseased plant leaf images are
used then using image processing we extract the features of the image. Then we model this dataset with
different machine learning algorithms like Random Forest, Support Vector Machine, Naïve Bayes etc. The aim is
to hold out a comparative study to spot which of those algorithm can predict diseases with the at most
accuracy. We compare factors like precision, accuracy, error rates as well as prediction time of different
machine learning algorithms. After all these comparison, valuable conclusions can be made for this project.
Keywords: Machine Learning, Support Vector Machine, Naïve Bayes, Random Forest, AI-Artificial Intelligence,
precision
I. INTRODUCTION
Machine Learning  has been an ever growing field that offers machines the power to perform and understand
varied data without being strictly programmed. In the plant world, the accurate detection of crop diseases can
ensure quality in products and as well increase production. When compared with humans, machines if rightly
programmed can carry out the process of detecting plant diseases much effectively and faster as most of them
could be very minute to see with bare eyes, but machines would be able to detect it. Well, we already know the
ever growing possibilities of using machine learning and detecting plant diseases, but we don’t have a clear idea as
to which algorithm can detect the diseases better. This kind of a study can help in making use of the best possible
technologies to implement this system. So, this project focuses on using different machine learning algorithms to
predict diseases and we will then compare the accuracies of each algorithm, to arrive at a conclusion as to which
best serves our purpose. The simplest way to detect plant diseases would be using image processing. Image
processing is a method that we use to perform some operations on an image in order to get a better version of
the image that highlights its features and help in understanding the image better. There exists other methods
for disease detection like visual analysis, optical sensor etc., while some might not give the accurate output and
for some, the system is not easy to implement and costly. Thus, image processing is our go to option to build a
simple, robust and accurate plant disease detection system. The Plant Disease Detection System implemented
for this project basically uses different machine learning algorithms to predict if a plant image is diseased or
healthy. Different algorithms used would include namely Random Forest, Support Vector Machine, Naïve Bayes,
Decision Tree, K-Nearest Neighbor, Logistic Regression and Linear Discriminant Analysis. Comparison is
performed to find which is most optimal for the system by considering terms of accuracy, precision and
different other parameters.
II. METHODOLOGY
In this project, the plant disease detection process is carried out by using different machine learning
algorithms, as mentioned in the previous section. The various stages involving this process would be data
collection, data pre-processing, training the model, testing the model and then deployment of the final
comparison result. The project is carried out on Anaconda Navigator IDE in Jupyter Notebook. We use Python 3
language to implement the code. The basic steps to implement the model is listed below.
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[382]
Image Pre-processing
This first section is basically loading the images, converting it into a format that is more easily distinguishable
in order to easily be able to extract its features. The main steps would include:
a. Loading the dataset
b. Convert image format from RGB to BGR
c. Convert image format from BGR to HSV
d. Image segmentation
Feature Extraction
In this section, the features of images are extracted usng global feature descriptors, namely, color, texture and
shape. The feature descriptors used are Hu Moments, Haralick Texture and Color Histogram. This way features
are extracted and we scale the features to a range between 0 and 1 for ease of modeling.
After the completion of these two section we move on to modeling the dataset with different machine learning
algorithms, training it and then testing it to carry out comparison between various algorithms.
III. MODELING AND ANALYSIS
Image classification refers to the classification of images into one among different types of predefined classes.
In this project we are using image classification with machine learning as discussed in the previous section. The
dataset used to train the models is of apple leaves. The data set contains of two folders which are namely
diseased and healthy. The Diseased folder contains unhealthy images and the Healthy folder consists of green
and healthy images. In total we are using 800 images per set to train and we have numbered the images from 1
to 800 for ease of training and prediction for the models. In the first couple of steps we are converting the
format of the images to get the most defined image for ease of processing. Then we move on to image
segmentation for extraction of colours in order to separate the picture of leaf from the background. Thus, the
colour of the leaf is extracted from the image. The next step would be feature extraction from the image using
three feature descriptors namely, color, shape and texture. Finally, to complete the first part, we save the
feature vectors using HDF5. The Hierarchical Data Format version 5, is a file format that is used for
heterogeneous data which is complex and larger in size.
The next part is training and testing the model. The dataset is trained over seven machine learning models. We
use Scikit learn to import all the necessary machine learning models. Scikit-learn is a machine learning library
for python. It has various algorithms in it which includes all the seven algorithms needed for this project. The
next process is cross-validation. It is a technique to evaluate models by separating the dataset into two, one
would be for training and the other for testing the models.
Figure 1: Plant disease detection process flow
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[383]
Next, we test the model with train data. Then, after training we are carrying out the actual comparison to see
which algorithm predicts plant disease better. The system predicts the output of each machine learning
algorithm one by one on the test data, thereafter predicting the accuracy, generating confusion matrix and
error rates of each algorithm.
IV. RESULTS AND DISCUSSION
The results obtained during the comparison process are provided in this section. The results are in terms for
accuracy, precision, f1score, mean absolute error etc., and as diagram for confusion matrix and comparison
chart. The main factors of comparison was accuracy score of each algorithm after performing disease detection
on the same test data. The time taken for prediction of whether an image is healthy or not for each algorithm
was also taken into account for having a clear understanding of which algorithm would be more optimal for the
process.
Table 1. Comparison of machine learning algorithms for plant disease detection
SN. Algorithm Accuracy
Score
Precision(for
predicting
healthy
images)
Precision(for
predicting
diseased
images)
Mean absolute
error
Time
Taken(in
secs)
1 Random
Forest
98.12 99 98 0.01875 2.62
2 K-Nearest
Neighbor
93.44 95 92 0.065625 0.84
3 Logistic
Regression
94.06 98 91 0.059375 29.16
4 Linear
Discriminant
Analysis
92.50 95 90 0.075 1.65
5 Decision
Tree
91.56 93 90 0.084375 0.97
6 Naïve Bayes 82.81 96 76 0.171875 0.67
7 Support
Vector
Machine
93.44 97 90 0.065625 1.91
Figure 2: Accuracy comparison chart
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[384]
V. CONCLUSION
After all the algorithms were compared we got a lot of valuable insights in order to be able to conclude as to
which algorithm would be optimal for plant disease detection. After performing the training evaluation and
testing the model, the results suggest that Random Forest is the most accurate algorithm for plant disease
detection with an accuracy rate of 98% followed by Logistic Regression. The algorithm that shows the least
accurate results are Naïve Bayes algorithm. Even though Random forest algorithm gives the most accurate
result for both healthy and unhealthy images, since we are always looking for algorithms & methods that would
reduce our effort time to minimum, comparing time taken for prediction is also important. So, the time taken
was also compared and it was found that even though Random Forest provides best result but time taken is not
the least for this algorithm. The least time taken algorithm was Naïve Bayes, but because this algorithm gives
the least accurate results, it is not taken into consideration. So, the next least time is for KNN and the accuracy
rate for this algorithm is a 93%, which is a good amount.
Thus, ideally the best time and best accuracy together would be KNN. But the algorithm with most accurate
result for plant disease detection is of course Random Forest.
VI. REFERENCES
[1] G. Geetha, S. (2020). Plant Leaf Disease Classification And Detection System Using Machine Learning.
[2] T.Daniya, D. (2019). A Review on Machine Learning Techniques for Rice Plant Disease Detection in
Agricultural Research.
[3] Paramasivam Alagumariappan, N. J. (2020). Intelligent Plant Disease Identification System Using
Machine Learning.
[4] G. Prem Rishi Kranth, M. H. (2018). Plant Disease Prediction using Machine Learning Algorithms.
[5] Aliyu M. Abdu, M. M. (2020). 5. Machine learning for plant disease detection: an investigative
comparison between support vector machine and deep learning.
[6] Hidayat ur Rahman, N. J. (2017). A comparative analysis of machine learning approaches for plant
disease identification.
[7] Dr. M. Safish Mary, R. C. (2015). A Comparative Study of Algorithms used for and Classification of Plant
Diseases.
[8] Ashwini T Sapkal, U. V. (2018). Comparative study of Leaf Disease Diagnosis system using Texture
features and Deep Learning Features.
[9] Muammer TÜRKOĞLU, D. H. (2019). Plant disease and pest detection using deep learning-based
features.
[10] Himani Kakkar, D. L. (2016). A Review of Image Processing Methods for Fruit Disease Detection
[11] Sanjeev S Sannakki, V. S. (2011). Leaf Disease Grading by Machine Vision and Fuzzy Logic.
[12] Guan Wang, Y. S. (2017). Automatic Image-Based Plant Disease Severity Estimation Using Deep
Learning.
[13] Aravindhan Venkataramanan, D. K. (2019). Plant Disease Detection and Classification Using Deep
Neural Networks.
[14] Dhiman Mondal, A. C. (2015). Detection and classification technique of Yellow Vein Mosaic Virus
disease in okra leaf images using leaf vein extraction and Naive Bayesian classifier.
[15] Sharada P. Mohanty, D. P. (2016). Using Deep Learning for Image-Based Plant Disease Detection.
[16] Yan Guo, J. Z. (2020). Plant Disease Identification Based on Deep Learning Algorithm in Smart Farming.
[17] Godliver Owomugisha, F. M. (n.d.). Machine Learning for diagnosis of disease in plants using spectral
data.
[18] Ms. Nilam Bhise, M. S. (2020). Plant Disease Detection using Machine Learning.
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[385]
[19] Arivazhagan,S., R. Newlin Shebiah, S. Ananthi, S. Vishnu Varthini(2013). Detection of unhealthy region
of plant leaves and classification of plant leaf diseases using texture features. Ananthi Agric Eng
Int,CIGR J.,15(1): 211-217
[20] Ma ,J. Q (2009). Content-based image retrieval with HSV color space and texture features. Web
Information Systems and Mining, 2009. WISM International Conference on. 61–63.

More Related Content

PPTX
Deterioration of crop varieties and methods to prevent them.
PPTX
Butterfly optimization algorithm
PDF
Bio alcohol
PDF
AI for Everyone (Chinese)
PPTX
Introduction to Artificial Intelligence and History of AI
PPTX
sorghum crop production technology AGRON 211
PDF
Short charts of Rabi Field crops
PPT
Classical Planning
Deterioration of crop varieties and methods to prevent them.
Butterfly optimization algorithm
Bio alcohol
AI for Everyone (Chinese)
Introduction to Artificial Intelligence and History of AI
sorghum crop production technology AGRON 211
Short charts of Rabi Field crops
Classical Planning

Similar to COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEASE DETECTION (20)

PDF
IRJET- An Expert System for Plant Disease Diagnosis by using Neural Network
PDF
IRJET- An Expert System for Plant Disease Diagnosis by using Neural Network
PDF
Android application for detection of leaf disease (Using Image processing and...
PDF
A Review Paper on Automated Plant Leaf Disease Detection Techniques
PDF
EARLY BLIGHT AND LATE BLIGHT DISEASE DETECTION ON POTATO LEAVES USING CONVOLU...
PPTX
Robotic process automation (RPA) is the application of technology that allows...
PDF
IRJET- IoT based Preventive Crop Disease Model using IP and CNN
PDF
Disease Prediction Using Machine Learning
PDF
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
PDF
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...
PDF
Plant Disease Prediction Using Image Processing
PDF
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
PDF
Genetic fuzzy process metric measurement system for an operating system
PDF
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
PDF
Research on Multidimensional Expert System Based on Facial Expression And Phy...
PDF
IRJET - Machine Learning for Diagnosis of Diabetes
PDF
IRJET- Hand Gesture Recognition and Voice Conversion for Deaf and Dumb
PDF
Health Care Application using Machine Learning and Deep Learning
PDF
IRJET-Android Based Plant Disease Identification System using Feature Extract...
PDF
Plant Disease Doctor App
IRJET- An Expert System for Plant Disease Diagnosis by using Neural Network
IRJET- An Expert System for Plant Disease Diagnosis by using Neural Network
Android application for detection of leaf disease (Using Image processing and...
A Review Paper on Automated Plant Leaf Disease Detection Techniques
EARLY BLIGHT AND LATE BLIGHT DISEASE DETECTION ON POTATO LEAVES USING CONVOLU...
Robotic process automation (RPA) is the application of technology that allows...
IRJET- IoT based Preventive Crop Disease Model using IP and CNN
Disease Prediction Using Machine Learning
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...
Plant Disease Prediction Using Image Processing
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
Genetic fuzzy process metric measurement system for an operating system
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
Research on Multidimensional Expert System Based on Facial Expression And Phy...
IRJET - Machine Learning for Diagnosis of Diabetes
IRJET- Hand Gesture Recognition and Voice Conversion for Deaf and Dumb
Health Care Application using Machine Learning and Deep Learning
IRJET-Android Based Plant Disease Identification System using Feature Extract...
Plant Disease Doctor App
Ad

More from International Research Journal of Modernization in Engineering Technology and Science (20)

PDF
THE COGENT FACTORS THAT CAN AFFECT ADOPTED ENGLISH PRONUNCIATION PEDAGOGY: A ...
PDF
DELTA LAKE IN FINTECH: ENHANCING DATA LAKE RELIABILITY WITH ACID TRANSACTIONS
PDF
BIG DATA IN THE MODERN ENTERPRISE: STRATEGIES FOR EFFECTIVE DATA PROCESSING W...
PDF
HYBRID CLOUD ARCHITECTURES FOR FINANCIAL DATA LAKES: DESIGN PATTERNS AND USE ...
PDF
SEAMLESS DATA MIGRATION: BEST PRACTICES FOR TRANSITIONING TO CLOUD ENVIRONMEN...
PDF
ARCHITECTING CLOUD SOLUTIONS: LEVERAGING AWS FOR SCALABLE AND SECURE SYSTEMS
PDF
NEXT-GENERATION ETL IN FINTECH: LEVERAGING AI AND ML FOR INTELLIGENT DATA TRA...
PDF
REAL-TIME DATA REPLICATION IN FINTECH: TECHNOLOGIES AND BEST PRACTICES
PDF
EXPLORING ACUTE LYMPHOBLASTIC LEUKAEMIA DYNAMICS THROUGH MATHEMATICAL MODELIN...
PDF
AUTOMATIC IRRIGATION SYSTEM DESIGN AND IMPLEMENTATION BASED ON IOT FOR AGRICU...
PDF
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
PDF
PDF
ROLE OF ELECTRONIC HUMAN RESOURCES (E-HR) IN ORGANIZATION
PDF
PARTIAL REPLACEMENT OF AGGREGATES BY BURNT BRICK BATS AND LATERITIC FINES IN ...
PDF
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
PDF
REVIEW ON PROCESS PARAMETER OPTIMIZATION FOR FORGING PROCESS
PDF
MODELLING AND VIBRATION ANALYSIS OF REINFORCED CONCRETE BRIDGE
THE COGENT FACTORS THAT CAN AFFECT ADOPTED ENGLISH PRONUNCIATION PEDAGOGY: A ...
DELTA LAKE IN FINTECH: ENHANCING DATA LAKE RELIABILITY WITH ACID TRANSACTIONS
BIG DATA IN THE MODERN ENTERPRISE: STRATEGIES FOR EFFECTIVE DATA PROCESSING W...
HYBRID CLOUD ARCHITECTURES FOR FINANCIAL DATA LAKES: DESIGN PATTERNS AND USE ...
SEAMLESS DATA MIGRATION: BEST PRACTICES FOR TRANSITIONING TO CLOUD ENVIRONMEN...
ARCHITECTING CLOUD SOLUTIONS: LEVERAGING AWS FOR SCALABLE AND SECURE SYSTEMS
NEXT-GENERATION ETL IN FINTECH: LEVERAGING AI AND ML FOR INTELLIGENT DATA TRA...
REAL-TIME DATA REPLICATION IN FINTECH: TECHNOLOGIES AND BEST PRACTICES
EXPLORING ACUTE LYMPHOBLASTIC LEUKAEMIA DYNAMICS THROUGH MATHEMATICAL MODELIN...
AUTOMATIC IRRIGATION SYSTEM DESIGN AND IMPLEMENTATION BASED ON IOT FOR AGRICU...
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
ROLE OF ELECTRONIC HUMAN RESOURCES (E-HR) IN ORGANIZATION
PARTIAL REPLACEMENT OF AGGREGATES BY BURNT BRICK BATS AND LATERITIC FINES IN ...
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
REVIEW ON PROCESS PARAMETER OPTIMIZATION FOR FORGING PROCESS
MODELLING AND VIBRATION ANALYSIS OF REINFORCED CONCRETE BRIDGE
Ad

Recently uploaded (20)

PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PPT on Performance Review to get promotions
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Sustainable Sites - Green Building Construction
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Project quality management in manufacturing
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
Foundation to blockchain - A guide to Blockchain Tech
Construction Project Organization Group 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CYBER-CRIMES AND SECURITY A guide to understanding
additive manufacturing of ss316l using mig welding
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT on Performance Review to get promotions
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Sustainable Sites - Green Building Construction
Model Code of Practice - Construction Work - 21102022 .pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Project quality management in manufacturing
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Geodesy 1.pptx...............................................
UNIT 4 Total Quality Management .pptx
Lesson 3_Tessellation.pptx finite Mathematics

COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEASE DETECTION

  • 1. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [381] COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEASE DETECTION Rasha Riyas*1, Jismy Joseph*2 *1Scholar, Department of MCA, SCMS School Of Technology And Management, Ernakulam, Kerala, India. *2Associate Professor, Department of MCA, SCMS School Of Technology And Management, Ernakulam, Kerala, India. ABSTRACT For the agriculture sector, detecting and identifying plant diseases at an early stage is extremely important and still very challenging. Machine learning is an application of AI that helps us achieve this purpose effectively. It uses a group of algorithms to analyze and interpret data, learn from it, and using it, smart decisions can be made. For accomplishing this project, a dataset that contains a set of healthy & diseased plant leaf images are used then using image processing we extract the features of the image. Then we model this dataset with different machine learning algorithms like Random Forest, Support Vector Machine, Naïve Bayes etc. The aim is to hold out a comparative study to spot which of those algorithm can predict diseases with the at most accuracy. We compare factors like precision, accuracy, error rates as well as prediction time of different machine learning algorithms. After all these comparison, valuable conclusions can be made for this project. Keywords: Machine Learning, Support Vector Machine, Naïve Bayes, Random Forest, AI-Artificial Intelligence, precision I. INTRODUCTION Machine Learning  has been an ever growing field that offers machines the power to perform and understand varied data without being strictly programmed. In the plant world, the accurate detection of crop diseases can ensure quality in products and as well increase production. When compared with humans, machines if rightly programmed can carry out the process of detecting plant diseases much effectively and faster as most of them could be very minute to see with bare eyes, but machines would be able to detect it. Well, we already know the ever growing possibilities of using machine learning and detecting plant diseases, but we don’t have a clear idea as to which algorithm can detect the diseases better. This kind of a study can help in making use of the best possible technologies to implement this system. So, this project focuses on using different machine learning algorithms to predict diseases and we will then compare the accuracies of each algorithm, to arrive at a conclusion as to which best serves our purpose. The simplest way to detect plant diseases would be using image processing. Image processing is a method that we use to perform some operations on an image in order to get a better version of the image that highlights its features and help in understanding the image better. There exists other methods for disease detection like visual analysis, optical sensor etc., while some might not give the accurate output and for some, the system is not easy to implement and costly. Thus, image processing is our go to option to build a simple, robust and accurate plant disease detection system. The Plant Disease Detection System implemented for this project basically uses different machine learning algorithms to predict if a plant image is diseased or healthy. Different algorithms used would include namely Random Forest, Support Vector Machine, Naïve Bayes, Decision Tree, K-Nearest Neighbor, Logistic Regression and Linear Discriminant Analysis. Comparison is performed to find which is most optimal for the system by considering terms of accuracy, precision and different other parameters. II. METHODOLOGY In this project, the plant disease detection process is carried out by using different machine learning algorithms, as mentioned in the previous section. The various stages involving this process would be data collection, data pre-processing, training the model, testing the model and then deployment of the final comparison result. The project is carried out on Anaconda Navigator IDE in Jupyter Notebook. We use Python 3 language to implement the code. The basic steps to implement the model is listed below.
  • 2. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [382] Image Pre-processing This first section is basically loading the images, converting it into a format that is more easily distinguishable in order to easily be able to extract its features. The main steps would include: a. Loading the dataset b. Convert image format from RGB to BGR c. Convert image format from BGR to HSV d. Image segmentation Feature Extraction In this section, the features of images are extracted usng global feature descriptors, namely, color, texture and shape. The feature descriptors used are Hu Moments, Haralick Texture and Color Histogram. This way features are extracted and we scale the features to a range between 0 and 1 for ease of modeling. After the completion of these two section we move on to modeling the dataset with different machine learning algorithms, training it and then testing it to carry out comparison between various algorithms. III. MODELING AND ANALYSIS Image classification refers to the classification of images into one among different types of predefined classes. In this project we are using image classification with machine learning as discussed in the previous section. The dataset used to train the models is of apple leaves. The data set contains of two folders which are namely diseased and healthy. The Diseased folder contains unhealthy images and the Healthy folder consists of green and healthy images. In total we are using 800 images per set to train and we have numbered the images from 1 to 800 for ease of training and prediction for the models. In the first couple of steps we are converting the format of the images to get the most defined image for ease of processing. Then we move on to image segmentation for extraction of colours in order to separate the picture of leaf from the background. Thus, the colour of the leaf is extracted from the image. The next step would be feature extraction from the image using three feature descriptors namely, color, shape and texture. Finally, to complete the first part, we save the feature vectors using HDF5. The Hierarchical Data Format version 5, is a file format that is used for heterogeneous data which is complex and larger in size. The next part is training and testing the model. The dataset is trained over seven machine learning models. We use Scikit learn to import all the necessary machine learning models. Scikit-learn is a machine learning library for python. It has various algorithms in it which includes all the seven algorithms needed for this project. The next process is cross-validation. It is a technique to evaluate models by separating the dataset into two, one would be for training and the other for testing the models. Figure 1: Plant disease detection process flow
  • 3. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [383] Next, we test the model with train data. Then, after training we are carrying out the actual comparison to see which algorithm predicts plant disease better. The system predicts the output of each machine learning algorithm one by one on the test data, thereafter predicting the accuracy, generating confusion matrix and error rates of each algorithm. IV. RESULTS AND DISCUSSION The results obtained during the comparison process are provided in this section. The results are in terms for accuracy, precision, f1score, mean absolute error etc., and as diagram for confusion matrix and comparison chart. The main factors of comparison was accuracy score of each algorithm after performing disease detection on the same test data. The time taken for prediction of whether an image is healthy or not for each algorithm was also taken into account for having a clear understanding of which algorithm would be more optimal for the process. Table 1. Comparison of machine learning algorithms for plant disease detection SN. Algorithm Accuracy Score Precision(for predicting healthy images) Precision(for predicting diseased images) Mean absolute error Time Taken(in secs) 1 Random Forest 98.12 99 98 0.01875 2.62 2 K-Nearest Neighbor 93.44 95 92 0.065625 0.84 3 Logistic Regression 94.06 98 91 0.059375 29.16 4 Linear Discriminant Analysis 92.50 95 90 0.075 1.65 5 Decision Tree 91.56 93 90 0.084375 0.97 6 Naïve Bayes 82.81 96 76 0.171875 0.67 7 Support Vector Machine 93.44 97 90 0.065625 1.91 Figure 2: Accuracy comparison chart
  • 4. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [384] V. CONCLUSION After all the algorithms were compared we got a lot of valuable insights in order to be able to conclude as to which algorithm would be optimal for plant disease detection. After performing the training evaluation and testing the model, the results suggest that Random Forest is the most accurate algorithm for plant disease detection with an accuracy rate of 98% followed by Logistic Regression. The algorithm that shows the least accurate results are Naïve Bayes algorithm. Even though Random forest algorithm gives the most accurate result for both healthy and unhealthy images, since we are always looking for algorithms & methods that would reduce our effort time to minimum, comparing time taken for prediction is also important. So, the time taken was also compared and it was found that even though Random Forest provides best result but time taken is not the least for this algorithm. The least time taken algorithm was Naïve Bayes, but because this algorithm gives the least accurate results, it is not taken into consideration. So, the next least time is for KNN and the accuracy rate for this algorithm is a 93%, which is a good amount. Thus, ideally the best time and best accuracy together would be KNN. But the algorithm with most accurate result for plant disease detection is of course Random Forest. VI. REFERENCES [1] G. Geetha, S. (2020). Plant Leaf Disease Classification And Detection System Using Machine Learning. [2] T.Daniya, D. (2019). A Review on Machine Learning Techniques for Rice Plant Disease Detection in Agricultural Research. [3] Paramasivam Alagumariappan, N. J. (2020). Intelligent Plant Disease Identification System Using Machine Learning. [4] G. Prem Rishi Kranth, M. H. (2018). Plant Disease Prediction using Machine Learning Algorithms. [5] Aliyu M. Abdu, M. M. (2020). 5. Machine learning for plant disease detection: an investigative comparison between support vector machine and deep learning. [6] Hidayat ur Rahman, N. J. (2017). A comparative analysis of machine learning approaches for plant disease identification. [7] Dr. M. Safish Mary, R. C. (2015). A Comparative Study of Algorithms used for and Classification of Plant Diseases. [8] Ashwini T Sapkal, U. V. (2018). Comparative study of Leaf Disease Diagnosis system using Texture features and Deep Learning Features. [9] Muammer TÜRKOĞLU, D. H. (2019). Plant disease and pest detection using deep learning-based features. [10] Himani Kakkar, D. L. (2016). A Review of Image Processing Methods for Fruit Disease Detection [11] Sanjeev S Sannakki, V. S. (2011). Leaf Disease Grading by Machine Vision and Fuzzy Logic. [12] Guan Wang, Y. S. (2017). Automatic Image-Based Plant Disease Severity Estimation Using Deep Learning. [13] Aravindhan Venkataramanan, D. K. (2019). Plant Disease Detection and Classification Using Deep Neural Networks. [14] Dhiman Mondal, A. C. (2015). Detection and classification technique of Yellow Vein Mosaic Virus disease in okra leaf images using leaf vein extraction and Naive Bayesian classifier. [15] Sharada P. Mohanty, D. P. (2016). Using Deep Learning for Image-Based Plant Disease Detection. [16] Yan Guo, J. Z. (2020). Plant Disease Identification Based on Deep Learning Algorithm in Smart Farming. [17] Godliver Owomugisha, F. M. (n.d.). Machine Learning for diagnosis of disease in plants using spectral data. [18] Ms. Nilam Bhise, M. S. (2020). Plant Disease Detection using Machine Learning.
  • 5. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:08/August-2021 Impact Factor- 5.354 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [385] [19] Arivazhagan,S., R. Newlin Shebiah, S. Ananthi, S. Vishnu Varthini(2013). Detection of unhealthy region of plant leaves and classification of plant leaf diseases using texture features. Ananthi Agric Eng Int,CIGR J.,15(1): 211-217 [20] Ma ,J. Q (2009). Content-based image retrieval with HSV color space and texture features. Web Information Systems and Mining, 2009. WISM International Conference on. 61–63.