SlideShare a Scribd company logo
Received: 13 November 2023 Accepted: 18 April 2024 Published online: 8 June 2024
DOI: 10.1002/agj2.21595
O R I G I N A L A R T I C L E
A g r o n o m i c A p p l i c a t i o n o f G e n e t i c R e s o u r c e s
Wheat crop genotype and age prediction using machine learning
with multispectral radiometer sensor data
Mutiullah Jamil1
Zoha Ahsan1
Muhammad Nauman Saeed1
Ali Raza2
Hazem Migdady3
Mohammad Sh. Daoud4
Maryam Altalhi5
Absalom E. Ezugwu6
Laith Abualigah7,8,9,10,11,12
1Institute of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
2Department of Software Engineering, University Of Lahore, Lahore 54000, Pakistan
3CSMIS Department, Oman College of Management and Technology, Barka, Oman
4College of Engineering, Al Ain University, Abu Dhabi, United Arab Emirates
5Department of Management Information Systems, College of Business Administration, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
6Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa
7Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, Jordan
8MEU Research Unit, Middle East University, Amman, Jordan
9Computer Science Department, Al al-Bayt University, Mafraq 25113, Jordan
10Applied Science Research Center, Applied Science Private University, Amman, Jordan
11School of Engineering and Technology, Sunway University Malaysia, Petaling Jaya 27500, Malaysia
12Jadara Research Center, Jadara University, Irbid 21110, Jordan
Correspondence
Absalom E. Ezugwu, Unit for Data Science
and Computing, North-West University, 11
Hoffman Street, Potchefstroom 2520, South
Africa.
Email: Absalom.ezugwu@nwu.ac.za
Assigned to Associate Editor David E. Clay.
Abstract
Wheat (Triticum aestivum) yield predictions can be improved by using multispectral
remote sensing to identify different genotypes and crop growth stages. We propose an
innovative machine learning technique aimed at classifying diverse wheat crop geno-
types and providing accurate estimations of plant age. Multispectral reflectance data
was obtained from different sites where various wheat genotypes were cultivated.
This approach involved analyzing incoming radiation and canopy light reflectance
across five distinct spectral bands using a multispectral radiometer. The newly col-
lected remote sensing data was utilized as input for the machine learning algorithm.
Impressively, the random forest model achieved an accuracy rate of 98.77% in wheat
crop genotype classification. Furthermore, the proposed approach’s effectiveness was
confirmed through a 10-fold cross-validation mechanism. Moreover, a multiple lin-
ear regression model for predicting the age of wheat genotypes explained 91% of
the observed variation. These findings signify significant progress in wheat crop
genotype and age prediction, ultimately leading to enhanced wheat yield.
Abbreviations: LR, linear regression; MLR, multiple linear regression; NDVI, normalized difference vegetation index; NIR, near infrared; RF, random
forest; SVM, support vector machine; SWNIR, short-wave near infrared.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original
work is properly cited.
© 2024 The Authors. Agronomy Journal published by Wiley Periodicals LLC on behalf of American Society of Agronomy.
Agronomy Journal. 2024;116:1643–1654. wileyonlinelibrary.com/journal/agj2 1643
1644 JAMIL ET AL.
1 INTRODUCTION
Wheat (Triticum aestivum) grains are rich in essential nutri-
ents, establishing themselves as a valuable nutritional source
that enhances diets worldwide (Zahra et al., 2023). Wheat
consumption in Pakistan surpasses that of rice, and the per
capita consumption of wheat is 124 kg annually (Bakhsh
et al., 2003). To effectively address Pakistan’s escalating food
requirements, advanced agricultural techniques are needed
(Panhwar et al., 2021). The research findings by Khan et al.
(2015) shed light on the vulnerability of the flowering stage
to heat stress. Furthermore, Nabwire et al. (2022) empha-
sizes the pivotal role of a plant’s age in managing water
stress and temperature and sourcing essential nutrients from
various avenues.
Plant morphology can be used to compare different species,
differentiate between various types of plants, or study how
plants respond to stimuli (Wyatt, 2016). Some of the most
important morphological traits include leaf shape, size, color,
texture, angle, and volume. Within the shoot system, leaves
adapt to their environment by altering their visual properties,
making them recognizable (Yang et al., 2015). Developing
alternative phenotypic classification approaches other than
physical measurements is important for accelerating breeding,
and the prediction of food resources is critical for improving
food security.
To comprehend a nation’s food resources, it is crucial to
conduct a comprehensive assessment of potential crop har-
vests (Akhter et al., 2023). In this ever-changing landscape,
precise and meticulous crop evaluations play a vital role
in generating valuable information that informs the strate-
gic management of plant cultivation, allocation of resources,
and food security. The intricate interplay between data-driven
analysis and farming methods encapsulates the essence of
this endeavor, illuminating the path toward sustainable and
resilient agricultural systems.
Consequently, this research aims to establish an innovative
machine learning-based framework for categorizing wheat
genotypes, accompanied by the development of a precise age
prediction model for each specific genotype. Optimal yield
can be achieved through the cultivation of wheat genotypes in
harmony with their respective conducive environmental con-
ditions. The efficacy and precision of our proposed machine
learning-based model for wheat genotype classification and
age predictions are evaluated through various parameters.
Numerous scholars have contributed to the advancement
of wheat genotype classification through a diverse array of
approaches. For example, Naser et al. (2020) proposed a
model that utilizes the Normalized Difference Vegetation
Index (NDVI) to distinguish between wheat genotypes’ pro-
ductivity in dry and wet environments. Their study, conducted
in Northeastern Colorado, encompassed various climatic
conditions. Employing NDVI data acquired from a prox-
Core Ideas
∙ This research addresses a significant challenge in
wheat crop genotype and age prediction.
∙ We propose an innovative machine learning
methodology to classify different wheat crop geno-
types.
∙ We collected different wheat seed genotype sam-
ples using the multispectral radiometer.
imal sensor to gauge the greenness of wheat fields, they
also gathered data on grain yield for each wheat geno-
type. The findings demonstrated a robust correlation between
NDVI and grain yield, with higher NDVI readings associ-
ated with wheat genotypes exhibiting greater grain yields.
Notably, precise measurements of grain yield and effective
discrimination of superior wheat genotypes were achieved at
non-saturated NDVI values, particularly around the threshold
of 0.9. Additionally, they determined that the k-means clus-
tering algorithm could reliably categorize wheat genotypes
into three classes of grain yield productivity based on their
respective NDVI readings.
A remote sensing study was conducted by Han et al. (2022)
to investigate using a random forest (RF) model in moni-
toring wheat phenology. They discovered that the RF model
demonstrated high accuracy in predicting plant nitrogen accu-
mulation, nitrogen nutrition index, aboveground biomass, and
nitrogen concentration. The researchers collected multispec-
tral images and crop data at five growth stages.
The study of Raoufi et al. (2018) involved the emulation
of growth and harvest patterns of diverse wet rice genotypes
at varying seedling ages using the AquaCrop model. The
research employed version 4.0 of the AquaCrop model to
simulate rice growth. The experimentation spanned 2 years
and was carried out at the Haraz Extension and Technology
Development Center in Amol, Mazandaran Province, Iran.
The study focused on three rice genotypes—Tarom, Ghaem,
and Fajr—each exhibiting distinct growth period durations.
Raoufi et al. (2018) showed that the model could be used to
predict rice yields.
Zhang et al. (2021) used MODIS NDVI time-series satellite
data to distinguish winter wheat from other crops. The Hei-
longjiang region was chosen for winter wheat mapping over
four consecutive years (2014–2017). The model employed the
peak–slope difference index and the NDVI time-series varia-
tion coefficient for wheat crop mapping, specifically utilizing
NDVI data from the MOD13Q1 dataset (Hubert-Moy et al.,
2019). Landsat-8 multispectral images were acquired from
the U.S. Geological Survey (USGS), and sample sites were
selected using data from the USGS website, Google Earth
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
JAMIL ET AL. 1645
F I G U R E 1 Proposed innovative methodology workflow for the prediction of wheat crop genotype and age.
photos, and statistical information. The coefficient of vari-
ation (COV) of PSDI demonstrated high user and accuracy
rates, achieving 94.10% and 93.74%, respectively.
Das et al. (2021) proposed a methodology to assess water
conditions in wheat genotypes using thermal imaging from
unmanned aerial vehicles. This approach was valuable in
predicting yields in sodic soils. This technique effectively
classified agricultural water stress factors and provided
biomass and grain production forecasts based on crop water
stress indices. Applying classification and regression trees
yielded highly accurate predictions for grain yield, root
mean square error, and biomass. In the context of sodic soil
conditions, wheat genotypes, including Gregory, Bremer,
Mace, Lancer, and Mitch, demonstrated greater productivity
than Flanker, Gladius, Emu Rock, Scout, and Janz. This
research highlights genotype-specific productivity, offering
valuable insights for wheat cultivation.
Sandhu et al. (2021) introduced multi-trait machine learn-
ing and deep learning models to enhance wheat breeding
programs. They observed that the proposed models outper-
formed genomic best linear unbiased predictor (GBLUP). The
authors conducted their study on a dataset comprising wheat
genotypes phenotyped for grain yield and grain protein con-
tent. Furthermore, the genotypes were assessed for spectral
reflectance, which was used to train the machine learning and
deep learning models. The authors compared the performance
of four uni-trait (UT) and four multi-trait (MT) models. Their
findings indicated that the MT and deep learning models sur-
passed the UT models and the GBLUP methods. The RF
and multilayer perceptron models demonstrated the highest
performance among the models. The authors concluded that
the proposed models represent a promising tool for genomic
selection in wheat breeding programs, suggesting their poten-
tial in selecting wheat genotypes with superior grain yield and
grain protein content.
Fang et al. (2020) used Sentinel-2 imagery with winter
wheat. The research conducted in Henan Province, Cen-
tral China, involved acquiring Sentinel-2 images of winter
wheat at a specific phenological stage through Google Earth
Engine. Machine learning techniques, including RF, sup-
port vector machine (SVM), and classification and regression
tree, were employed to identify and map winter wheat
across a wide area. Five-fold cross-validation and grid
search approaches were utilized to optimize machine learning
hyperparameters. The SVM demonstrated superior perfor-
mance in classifying winter wheat, as indicated by com-
paring the three algorithms. It achieved an overall accuracy
(OA) of 0.94, user’s accuracy (UA) of 0.95, producer’s
accuracy (PA) of 0.95, and Kappa coefficient (Kappa) of
0.92. The results emphasized the SVM’s sensitivity to spe-
cific parameters (C and gamma), which led to the highest
classification accuracy when these hyperparameters were
optimized.
Due to the lack of research on genotype classification using
multispectral data in the literature, our study aims to address
this gap. The primary objective of our research is to design
a data acquisition system using multispectral MSRF5 sen-
sors. Additionally, we have developed an automated machine
learning-based technique to detect wheat growth stages.
2 METHODS AND MATERIALS
The research study was conducted in the years 2020 and
2021 under the supervision of the IUB Agriculture Research
Center. During data collection, nine plots were harvested,
focusing on three types of genotypes and three different con-
ditions of water stress. These conditions included normal
watering, a 1-week delay in watering, and a 2-week delay in
watering. No fertilization or spraying treatments were applied.
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
1646 JAMIL ET AL.
In this study, the selection of wheat crop genotypes for
classification is based on diverse criteria, including genetic
variability, agronomic performance, and adaptability to spe-
cific environmental conditions. This comprehensive method-
ology allowed for a systematic and rigorous investigation
into the classification of wheat crop genotypes, providing
valuable insights into their genetic diversity and potential
agricultural applications.
Our proposed innovative research methodology (Figure 1)
involves architectural analysis. The multispectral radiometer
(MSR5)-based sensor data is collected and utilized for
building genotype classification and age prediction machine
learning models. The collected multispectral radiometer
sensor data is preprocessed and converted into five spectral
bands. The formatted dataset is then split into training and
testing portions. The 70% training portion of the dataset is
utilized for training the applied machine learning models.
The remaining 30% of the data is used for the evaluation of
the machine learning model. The machine learning model
is then used for cultivar classification. Following this, SPSS
software is employed to predict the cultivar age. Using
SPSS software, a multiple linear regression (MLR) model is
applied to the classified data for predicting the age of each
genotype.
2.1 Multispectral radiometer sensors data
The study focused on three test genotypes: Miraj, Punjnad,
and Aas, each cultivated in pairs, with one plot under water
stress. Plots with the dimensions of 3.66 by 3.66 m were
established in 2020 and 2021. Plots were planted at a rate
determined by each plot size, which measured 2.32 m2 (length
and width) on 2020 and 2021. After 2 weeks, each genotype
underwent 30 MSR5 scans by CROPSCAN, Inc. The process
yielded 90 samples at 15-day intervals over 3 months, totaling
540 samples representing six developmental stages, as shown
in Figure 2.
2.1.1 Data collection area
The data collection area chosen is within the Agricul-
tural Research Center located at the Islamia University of
Bahawalpur, situated in the dynamic city of Bahawalpur, Pun-
jab, Pakistan. The data collection locations are illustrated in
satellite as in Figure 3. This diverse study area encompasses
various agro-climates typical of Punjab, where the annual
rainfall can be as low as 2 mm (0.1 in.). Among these climates,
October records the scantiest rainfall, while July is the wettest
month, receiving 61 mm (2.4 in.) of rainfall. Bahawalpur,
known for its soaring temperatures, often grapples with water
scarcity issues that pose significant challenges.
F I G U R E 2 The photographic representation of wheat crop of six
stages: (a) stage 1, (b) stage 2, (c) stage 3, (d) stage 4, (e) stage 5, and
(f) stage 6.
2.1.2 Data collection experiment design
Observations were made between 2 and 12 weeks. This choice
was based on the fact that temperatures below 13◦C inhibit
flowering, while temperatures exceeding 14◦C after flow-
ering and fruit set have negligible effects on plant growth
(Noh et al., 2013). The wheat plants were categorized into
plants cultivated under optimal growth conditions) and plants
subjected to high-temperature stress. The selection of plants
for the stress group were randomly selected. Each group of
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
JAMIL ET AL. 1647
F I G U R E 3 Location of the study site using Google Earth View
with the map of Pakistan and highlighted in red color ROI at the upper
top right corner of the image.
TA B L E 1 The soil characteristics analysis during data collection.
Soil characteristics 0–15 cm 15–30 cm
Organic matter (%) 0.79 0.55
pH value 8.4 8.6
Electrical conductivity (dS/m) 250 230
Phosphorus (ppm) 7.1 5.1
Potassium (ppm) 112 114
Saturation characterizing soil
texture (%)
36 35
plants was cultivated in dedicated plots, maintaining a con-
sistent relative humidity of 70% throughout the entire growth
period. Both the standard and stressed groups adhered to dis-
tinct watering schedules, with irrigation administered every
15 days. The irrigation conditions included normal watering,
a 1-week delay in watering, and a 2-week delay in water-
ing. Wheat required a total of five irrigations. One irrigation
equals 3 ha in., so 15 ha in. was required. No fertilization
or spraying treatments were applied during data collection.
The ratio of plant population was between 1.2 and 2.0 mil-
lion seeds per ha. Soil physicochemical analysis was carried
out before sowing the crop. Soil samples were taken from
0.0 to 0.15 m and 0.15 to 0.30 m using a soil augur. Soil
characteristics (Shah et al., 2020) analysis data are given in
Table 1.
2.2 Multispectral radiometer bands
analysis
The multispectral radiometers are utilized to assess incom-
ing radiation and canopy light reflectance across five distinct
spectral bands (Qadri et al., 2019; Rehmani et al., 2015).
TA B L E 2 The wavelength and spatial resolution for the collected
crop scan MSR5 data.
Spectral band Wavelength (nm)
Spatial resolution
(area covered by the
sensor)
Band 1 Blue 450–520 1.524 m in radius
Band 2 Green 520–630 1.524 m in radius
Band 3 Red 630–690 1.524 m in radius
Band 4 SNIR 760–900 1.524 m in radius
Band 5 FNIR 1550–1750 1.524 m in radius
F I G U R E 4 The feature space analysis of extracted multispectral
bands data; most expressive feature (MEF 1, 2, and 3).
The generated output dataset contained blue (450–520 nm),
green (520–600 nm), red (630–690 nm), near-infrared (760–
900 nm), and far-infrared wavelengths (1550–1750 nm).
Within each specific spectral band, the half-peak width
varies, ranging from approximately 5 to 15 nm. This inno-
vative approach, referred to as MSR5, encapsulates an
entire scene by utilizing five distinct numerical values,
effectively representing five energy bands, as described in
Table 2.
2.3 Feature space analysis
A feature space analysis was conducted to extract the impor-
tant multispectral bands. The analysis began with the calcu-
lation of principal components for feature space analysis. We
selected the top five principal components from the band data
and illustrated them in Figure 4. This analysis reveals that over
90% of variance is captured in the multispectral bands data.
The dataset’s feature space exhibits greater linear separability
for wheat genotype classification.
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
1648 JAMIL ET AL.
2.4 Applied machine learning methods
2.4.1 Random forest
Random forest is a commonly used technique for the clas-
sification of multispectral data and yields enhanced results
compared to other machine learning models (Raza et al.,
2023). In the RF model, a value of 100 was utilized for the
n_estimators parameter, which specifies the number of trees
in the RF model.
The RF prediction for the wheat crop genotype can be
represented as:
𝑅𝐹(𝑋) =
1
𝑁
𝑁
∑
𝑖=1
𝑓𝑖(𝑋) (1)
where 𝑁 is the number of decision trees in the forest,
𝑋 is the feature matrix with 𝑛 samples and 𝑚 features,
𝑌 is the target variable representing the wheat crop genotype,
𝑇𝑖 represents the 𝑖th decision tree in the forest, and
𝑓𝑖(𝑋) is the prediction of the 𝑖th decision tree.
2.4.2 Support vector machine
A widely used supervised machine learning technique for
classification and regression tasks is known as a SVM (Raza
et al., 2022). SVMs have a good ability to differentiate
between multiple classes or make precise predictions for con-
tinuous values. The SVM model can be represented by the
following equation:
𝑓(𝑥) = sign
( 𝑛
∑
𝑖=1
𝛼𝑖𝑦𝑖𝐾(𝑥𝑖, 𝑥) + 𝑏
)
(2)
2.4.3 Logistic regression
Logistic regression (LR) (Raza et al., 2023) is a statisti-
cal method used to model the relationship between a binary
dependent variable and one or more independent variables.
The main objective of logistic regression is to estimate the
likelihood of a specific outcome based on distinct variables.
In contrast to linear regression, which employs a linear equa-
tion for modeling variable relationships, logistic regression
transforms independent variables into a probability range
from 0 to 1 using the logistic function, also known as the
sigmoid function. The logistic regression equation is given
by:
𝑃 (𝑌 = 1) =
1
1 + 𝑒−(𝑏0+𝑏1𝑥1+𝑏2𝑥2+⋯+𝑏𝑛𝑥𝑛)
(3)
2.4.4 Multiple linear regression
A statistical modeling method known as MLR (Sharma et al.,
2022) is employed to investigate the relationship between sev-
eral independent variables and a dependent variable. This
approach extends the principle of simple linear regression
to scenarios with multiple independent variables. The aim
of MLR is to determine the most accurate linear equa-
tion that estimates the value of the dependent variable based
on the values of the independent variables. The mathematical
equation for MLR can be written as:
𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑛𝑋𝑛 + 𝜀 (4)
Tables 3–5 provide a comprehensive array of regression
coefficients. In the “Unstandardized coefficients” column,
“B” indicates weights. Notably, for the Miraj, Punjnad, and
Aas tables, the “B” weights are 53.347, 107.728, and 107.126,
respectively, with the “Constant” row representing the inter-
cept. The “B” weight serves as a predictor in conjunction with
the slope. A negative slope value implies a negative correla-
tion. These coefficients intricately shape the linear regression
equation, providing insight into the relationship. Significance
across these three tables is remarkably low, at 0.000, under-
scoring the influential role of independent variables on the
dependent variable.
2.5 Hyperparameter tuning
The best-fit hyperparameters of the applied machine learn-
ing methods are determined, as illustrated in Table 6. In
this research study, we employed a grid search approach to
optimize the machine learning hyperparameters (Shekar &
Dagnew, 2019). The best-fit hyperparameters help us achieve
high-performance accuracy scores.
2.6 Analysis
We used the Python programming language to conduct all
research experiments (Hao & Ho, 2019). The Scikit-learn
library in Python, version 1.0.2, was utilized to evaluate
performance metrics for wheat crop genotype classification.
The performance metrics included accuracy, recall, preci-
sion, and F1 scores. We have employed several methods
to evaluate performance scores, including comparisons of
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
JAMIL ET AL. 1649
TA B L E 3 The coefficient analysis of the Miraj genotype.
Model
Unstandardized coefficients
Standardized
coefficients
B Stdard error Beta Significance
Constant 53.347 6.401 0.000
Blue spectral band −3.119 0.415 −0.614 0.000
Green spectral band −6.541 1.231 −0.688 0.000
Red spectral band 15.822 0.581 2.576 0.000
Near-infrared spectral band 1.500 0.100 0.694 0.000
Far-infrared spectral band −5.334 0.301 −1.019 0.000
TA B L E 4 The coefficient analysis of the Punjnad genotype.
Model
Unstandardized coefficients
Standardized
coefficients
Significance
B Stdard error Beta
Constant 105.728 11.277 0.000
Blue spectral band 0.692 0.759 0.170 0.364
Green spectral band −15.830 2.414 −1.947 0.000
Red spectral band 14.848 0.910 2.618 0.000
Near-infrared spectral band 0.670 0.177 0.238 0.000
Far-infrared spectral band −3.788 0.582 −0.745 0.000
TA B L E 5 The coefficient analysis of the Aas genotype.
Model
Unstandardized coefficients
Standardized
coefficients
Significance
B Stdard error Beta
Constant 107.126 8.837 0.000
Blue spectral band −0.550 0.636 −0.117 0.389
Green spectral band −15.028 1.703 −1.749 0.000
Red spectral band 16.724 0.636 2.820 0.000
Near-infrared spectral band 0.856 0.168 0.384 0.000
Far-infrared spectral band −4.858 0.494 −0.899 0.000
TA B L E 6 The hyperparameters settings for applied machine
learning models.
Method Hyperparameter description
RF n_estimators = 100, criterion = “entropy,”
random_state=1
SVM kernel = “linear,” C = 10, random_state = 3
LR random_state = 2, max_iter=700
Abbreviations: LR, logistic regression; RF, random forest; SVM, support vector
machine.
machine learning model results. This also includes confu-
sion matrix comparisons, k-fold cross-validation, and feature
space comparisons.
The accuracy metric is calculated using the following
equation:
Accuracy =
Number of correct predictions
Total number of predictions
(5)
The recall metric is calculated using the following equation:
Recall =
True positives
True positives + False negatives
(6)
The precision metric is calculated using the following
equation:
Precision =
True positives
True positives + False positives
(7)
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
1650 JAMIL ET AL.
TA B L E 7 Performance analysis of applied machine learning
methods for unseen testing data.
Method Accuracy
Target
class Precision Recall F1 score
RF 0.98 Miraj 0.96 1.00 0.98
Punjnad 1.00 0.97 0.98
Aas 1.00 1.00 1.00
Average 0.99 0.99 0.99
SVM 0.90 Miraj 0.98 0.98 0.98
Punjnad 0.89 0.89 0.89
Aas 0.86 0.86 0.86
Average 0.91 0.91 0.91
LR 0.84 Miraj 0.87 0.92 0.89
Punjnad 0.87 0.84 0.85
Aas 0.80 0.78 0.79
Average 0.85 0.85 0.85
Abbreviations: LR, logistic regression; RF, random forest; SVM, support vector
machine.
The F1 metric is calculated using the following equation:
𝐹1 =
2 × Precision × Recall
Precision + Recall
(8)
3 RESULTS
3.1 Performance analysis of machine
learning methods
This analysis provides valuable insights into the performance
of various machine learning models (Table 7). The analysis
compared: accuracy, precision, recall, and F1 score. More so,
the analysis shows that only the LR model achieved moderate
performance scores of 0.84 and RF model outperformed the
other models (Figure 5).
Furthermore, the comprehensive histogram analysis,
depicted in Figure 5 showed that the RF and SVM methods
had good precision, recall, and F1 scores. On the other hand,
the LR model yielded satisfactory results, indicating its
potential for further optimization.
The columns and rows in the confusion matrix (Figure 6)
are denoted by 0, 1, and 2, eloquently representing the
Miraj, Punjnad, and Aas genotypes, respectively. The diag-
onal elements within this matrix gracefully show the adeptly
classified data, showcasing the proficiency of the RF, SVM,
and logistic regression machine learning models. The remain-
ing entries of the matrix affectingly disclose instances where
the three genotypes were unfortunately mispredicted. The RF
model has successfully classified data with 98.77% accuracy,
the SVM accuracy stands at 90.74%, and the accuracy of the
logistic regression stands at 84.57% during validation.
TA B L E 8 k-Fold-based performance validation of applied
machine learning methods.
Method Folds Accuracy Standard deviation (+/−)
RF 10 0.99 0.0119
SVM 10 0.90 0.0292
LR 10 0.84 0.0532
Abbreviations: LR, logistic regression; RF, random forest; SVM, support vector
machine.
TA B L E 9 Performance analysis of the applied multiple regression
model.
Genotype R 𝑹𝟐 Adjusted 𝑹𝟐 Standard error
Miraj 𝑌1 0.972 0.946 0.943 6.12
Punjnad 𝑌2 0.931 0.867 0.861 9.57
Aas 𝑌3 0.963 0.926 0.923 7.12
3.2 k-Fold cross-validation
The performance of the applied machine learning models are
rigorously validated through 10-fold cross-validation. This
approach enables a comprehensive assessment of how well
the applied models handle unseen data. The outcomes of the
cross-validation analysis, summarizing the performance of
the models across different folds, are presented in Table 8.
These findings suggest that ML can be used to predict
genotype and plant age.
3.3 Performance analysis of crop genotype
age prediction
The interplay between the coefficients of green and short-
wave near infrared (SWNIR) and the age of the wheat crop
genotype showcases an inverse correlation. In practical terms,
as the plant’s age increases, there is a gradual reduction in the
intensity of the green spectral band, along with the SWNIR
values. Conversely, the red and NIR values demonstrate a
direct proportionality with the age of the wheat genotype,
exhibiting an upward trend.
The multiple regression model summary in Table 9 depicts
the correlation coefficient (R) and the R2 statistic, indicating
the “proportionate decrease in error.” These values collec-
tively assess the model’s performance in predicting the age
of the Miraj, Punjnad, and Aas genotypes. A higher R2 value
implies a better fit. The regression models explained at least
86% of the variation, using red, green, near-infrared, and
short-wave near-infrared as predictors. The model achieves
overall accuracy of over 90%, with a standard error range of
7.1–9 had.
Depicted in the provided Figure 7 is a scatter plot show-
casing the age distribution of wheat crops. Notably, this
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
JAMIL ET AL. 1651
F I G U R E 5 The histogram-based performance comparison of machine learning methods. LR, logistic regression; RF, random forest; SVM,
support vector machine.
F I G U R E 6 The confusion matrix-based performance validations of applied techniques: (a) random forest (RF), (b) support vector machine
(SVM), and (c) logistic regression (LR).
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
1652 JAMIL ET AL.
F I G U R E 7 The scatter plot of age prediction of wheat genotypes.
TA B L E 1 0 The state of the art approaches comparisons.
Reference Proposed technique
Performance
acuracy
Rehmani et al. (2015) Artificial neural network
(ANN)
0.96
Qadri et al. (2016) Artificial neural network
(ANN)
0.96
Jamil et al. (2023) Artificial neural network
(ANN)
0.97
Jamil et al. (2023) Random forest (RF) 0.91
This study Random forest (RF) 0.99
visualization underscores the proficiency of the multiple
linear model in predicting the ages of the wheat crop geno-
types. The state-of-the-art comparison results are described
in Table 10.
3.4 Discussion
The inclusion of diverse wheat genotypes Miraj, Punjnad, and
Aas in our dataset ensures the generalizability of our find-
ings across varieties, making our methodology applicable to
a broader range of agricultural settings. The high accuracy
achieved by the RF model of 98.77% in wheat crop genotype
classification underscores the effectiveness of the machine
learning approach. This accuracy is particularly noteworthy as
it provides a reliable means for distinguishing between geno-
types. The successful application of diverse machine learning
models, including support vector machine and logistic regres-
sion, in comparative analyses demonstrates the robustness of
our methodology.
The implementation of k-fold cross-validation mecha-
nisms further strengthens the credibility of the results. The
validation process ensures the generalizability of the mod-
els by assessing their performance across various subsets
of the dataset. The consistency of high accuracy values
across folds substantiates the robustness and reliability of our
classification approach.
Our focus on age prediction, a critical aspect of preci-
sion agriculture, adds a dimension of practicality to our
research. The MLR model developed for age prediction
explained 91% of the variability. Such accurate age pre-
dictions can significantly contribute to timely and targeted
agricultural interventions, optimizing resource management
and improving overall crop yield.
4 CONCLUSIONS
This study demonstrated the effectiveness of multispectral
radiometry and machine learning techniques for wheat crop
genotype classification and age prediction. The data in this
research were collected using a multispectral radiometer
encompassing five bands: blue, green, red, near infrared
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
JAMIL ET AL. 1653
(NIR), and SWNIR. Among the machine learning mod-
els (RF, SVM, and LR), RF excelled in wheat genotype
classification, achieving an accuracy rate of 98.77%. The
robustness of the classification model is validated through
k-fold cross-validation. Furthermore, the machine learning
model designed to predict additional phenotypic traits,
including crop age, exhibited exceptional performance. MLR
successfully predicted plant age based on spectral features,
achieving over 90% accuracy. Overall, this study establishes
the potential of MSR5 spectral bands for estimating the age
of wheat crop genotypes. This study can serve as a foundation
for the improvement of a real-time monitoring system for
wheat crops in high-throughput plant phenotyping facilities.
In the future, we will collect more dataset samples and
enhance the wheat genotypes. We will also develop an
advanced neural network approach for effective wheat geno-
type classification. Additionally, we will utilize other sensors
similar to MSR.
AU T H O R C O N T R I B U T I O N S
Mutiullah Jamil: Conceptualization. Zoha Ahsan:
Conceptualization. Muhammad Nauman Saeed: Con-
ceptualization. Ali Raza: Conceptualization. Hazem
Migdady: Conceptualization. Mohammad Sh. Daoud:
Conceptualization. Maryam Altalhi: Conceptualization.
Absalom E. Ezugwu: Conceptualization; writing—review
and editing. Laith Abualigah: Conceptualization.
AC K N OW L E D G M E N T S
The authors would like to acknowledge Deanship of Graduate
Studies and Scientific Research, Taif University for funding
this work.
C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflicts of interest.
O RC I D
LaithAbualigah https://orcid.org/0000-0002-2203-4549
R E F E R E N C E S
Akhter, M. J., Sonderskov, M., Loddo, D., Ulber, L., Hull, R., & Kudsk,
P. (2023). Opportunities and challenges for harvest weed seed control
in european cropping systems. European Journal of Agronomy, 142,
126639.
Bakhsh, A., Hussain, A., & Khan, A. S. (2003). Genetic studies of plant
height, yield and its components in bread wheat. Sarhad Journal of
Agriculture, 19(4), 529–534.
Das, S., Christopher, J., Apan, A., Choudhury, M. R., Chapman, S.,
Menzies, N. W., & Dang, Y. P. (2021). Evaluation of water status
of wheat genotypes to aid prediction of yield on sodic soils using
UAV-thermal imaging and machine learning. Agricultural and Forest
Meteorology, 307, 108477.
Fang, P., Zhang, X., Wei, P., Wang, Y., Zhang, H., Liu, F., & Zhao, J.
(2020). The classification performance and mechanism of machine
learning algorithms in winter wheat mapping using Sentinel-2 10 m
resolution imagery. Applied Sciences, 10(15), 5075.
Han, S., Zhao, Y., Cheng, J., Zhao, F., Yang, H., Feng, H., Li, Z., Ma, X.,
Zhao, C., & Yang, G. (2022). Monitoring key wheat growth variables
by integrating phenology and UAV multispectral imagery data into
random forest model. Remote Sensing, 14(15), 3723.
Hao, J., & Ho, T. K. (2019). Machine learning made easy: A review
of scikit-learn package in python programming language. Journal of
Educational and Behavioral Statistics, 44(3), 348–361.
Hubert-Moy, L., Thibault, J., Fabre, E., Rozo, C., Arvor, D., Corpetti,
T., & Rapinel, S. (2019). Time-series spectral dataset for croplands in
France (2006–2017). Data in Brief, 27, 104810.
Jamil, M., Rehman, H., Saqlain Zaheer, M., Tariq, A., Iqbal, R., Hasnain,
M. U., Majeed, A., Munir, A., Sabagh, A. E., Habib ur Rahman, M.,
Raza, A., Ali, M. A., & Elshikh, M. S. (2023). The use of Multispec-
tral Radio-Meter (MSR5) data for wheat crop genotypes identification
using machine learning models. Scientific Reports, 13(1), 19867.
Jamil, M., ul Rehman, H., SaleemUllah, Ashraf, I., & Ubaid, S. (2023).
Smart techniques for LULC micro class classification using land-
sat8 imagery. Computers, Materials & Continua, 74(3), 5545–5557.
https://doi.org/10.32604/cmc.2023.033449
Khan, S. U., Din, J. U., Qayyum, A., Jaan, N. E., & Jenks, M. A. (2015).
Heat tolerance indicators in Pakistani wheat (Triticum aestivum L.)
genotypes. Acta Botanica Croatica, 74(1), 109–121.
Nabwire, S., Wakholi, C., Faqeerzada, M. A., Arief, M. A. A., Kim, M.
S., Baek, I., & Cho, B.-K. (2022). Estimation of cold stress, plant
age, and number of leaves in watermelon plants using image analysis.
Frontiers in Plant Science, 13, 847225.
Naser, M. A., Khosla, R., Longchamps, L., & Dahal, S. (2020). Using
NDVI to differentiate wheat genotypes productivity under dryland
and irrigated conditions. Remote Sensing, 12(5), 824.
Noh, J., Kim, J. M., Sheikh, S., Lee, S. G., Lim, J. H., Seong, M. H.,
& Jung, G. T. (2013). Effect of heat treatment around the fruit set
region on growth and yield of watermelon [Citrullus lanatus (Thunb.)
Matsum. and Nakai]. Physiology and Molecular Biology of Plants, 19,
509–514.
Panhwar, N. A., Mierzwa-Hersztek, M., Baloch, G. M., Soomro, Z. A.,
Sial, M. A., Demiraj, E., Panhwar, S. A., Afzal, A., & Lahori, A. H.
(2021). Water stress affects the some morpho-physiological traits of
twenty wheat (Triticum aestivum L.) genotypes under field condition.
Sustainability, 13(24), 13736.
Qadri, S., Furqan Qadri, S., Husnain, M., Saad Missen, M. M., Khan, D.
M., Muzammil-Ul-Rehman, Razzaq, A., & Ullah, S. (2019). Machine
vision approach for classification of citrus leaves using fused features.
International Journal of Food Properties, 22(1), 2072–2089.
Qadri, S., Khan, D. M., Ahmad, F., Qadri, S. F., Babar, M. E., Shahid,
M., Ul-Rehman, M., Razzaq, A., Shah Muhammad, S., Fahad, M.,
Ahmad, S., Pervez, M. T., Naveed, N., Aslam, N., Jamil, M., Rehmani,
E. A., Ahmad, N., & Akhtar Khan, N. (2016). A comparative study
of land cover classification by using multispectral and texture data.
BioMed Research International, 2016, 8797438. https://doi.org/10.
1155/2016/8797438
Raoufi, R., Soufizadeh, S., Amiri Larijani, B., AghaAlikhani, M., &
Kambouzia, J. (2018). Simulation of growth and yield of various irri-
gated rice (Oryza sativa L.) genotypes by AquaCrop under different
seedling ages. Natural Resource Modeling, 31(2), e12162.
Raza, A., Munir, K., Almutairi, M. S., & Sehar, R. (2023). Novel
class probability features for optimizing network attack detection with
machine learning. IEEE Access, 11, 98685–98694. https://doi.org/10.
1109/ACCESS.2023.3313596
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License
1654 JAMIL ET AL.
Raza, A., Rustam, F., Mallampati, B., Gali, P., & Ashraf, I. (2023).
Preventing crimes through gunshots recognition using novel fea-
ture engineering and meta-learning approach. IEEE Access, 11,
103115–103131. https://doi.org/10.1109/ACCESS.2023.3316695
Raza, A., Siddiqui, H. U. R., Munir, K., Almutairi, M., Rustam, F., &
Ashraf, I. (2022). Ensemble learning-based feature engineering to
analyze maternal health during pregnancy and health risk prediction.
PLOS One, 17(11), e0276525.
Rehmani, E., Naweed, M., Shahid, M., Qadri, S., & Gilani, Z. (2015).
A comparative study of crop classification by using radiometric
and photographic data. Sindh University Research Journal (Science
Series), 47(2), 335–340.
Sandhu, K., Patil, S. S., Pumphrey, M., & Carter, A. (2021). Multitrait
machine-and deep-learning models for genomic selection using spec-
tral information in a wheat breeding program. The Plant Genome,
14(3), e20119.
Shah, M. A. A., Mohsin, M., Chesneau, C., Zulfiqar, A., Jamal, F.,
Nadeem, K., & Sherwani, R. A. K. (2020). Analysis of factors
affecting yield of agricultural crops in bahawalpur district: Analysis
of factors of major agricultural crops. Proceedings of the Pakistan
Academy of Sciences: A. Physical and Computational Sciences, 57(4),
99–112.
Sharma, B. P., Zhang, N., Lee, D., Heaton, E., Delucia, E. H., Sacks, E. J.,
Kantola, I. B., Boersma, N. N., Long, S. P., Voigt, T. B., & Khanna,
M. (2022). Responsiveness of miscanthus and switchgrass yields to
stand age and nitrogen fertilization: A meta-regression analysis. GCB
Bioenergy, 14(5), 539–557.
Shekar, B., & Dagnew, G. (2019). Grid search-based hyperparame-
ter tuning and classification of microarray cancer data. In 2019
second international conference on advanced computational and
communication paradigms (ICACCP) (pp. 1–8). IEEE.
Wyatt, J. (2016). Grain and plant morphology of cereals and how char-
acters can be used to identify varieties. Encyclopedia of Food Grains
(Second Edition), 1, 51–72.
Yang, J., Spicer, R. A., Spicer, T. E., Arens, N. C., Jacques, F. M., Su,
T., Kennedy, E. M., Herman, A. B., Steart, D. C., Srivastava, G.,
Mehrotra, R. C., Valdes, P. J., Mehrotra, N. C., Zhou, Z.-K., & Lai,
J.-S. (2015). Leaf form–climate relationships on the global stage: An
ensemble of characters. Global Ecology and Biogeography, 24(10),
1113–1125.
Zahra, N., Hafeez, M. B., Wahid, A., Al Masruri, M. H., Ullah, A.,
Siddique, K. H., & Farooq, M. (2023). Impact of climate change on
wheat grain composition and quality. Journal of the Science of Food
and Agriculture, 103(6), 2745–2751.
Zhang, X., Liu, K., Wang, S., Long, X., & Li, X. (2021). A rapid model
(COV_PSDI) for winter wheat mapping in fallow rotation area using
MODIS NDVI time-series satellite observations: The case of the
Heilonggang region. Remote Sensing, 13(23), 4870.
How to cite this article: Jamil, M., Ahsan, Z., Saeed,
M. N., Raza, A., Migdady, H., Daoud, M. S., Altalhi,
M., Ezugwu, A. E., & Abualigah, L. (2024). Wheat
crop genotype and age prediction using machine
learning with multispectral radiometer sensor data.
Agronomy Journal, 116, 1643–1654.
https://doi.org/10.1002/agj2.21595
14350645,
2024,
4,
Downloaded
from
https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595
by
Khwaja
Fareed
University
of
Engineering
&
Information,
Wiley
Online
Library
on
[08/12/2024].
See
the
Terms
and
Conditions
(https://onlinelibrary.wiley.com/terms-and-conditions)
on
Wiley
Online
Library
for
rules
of
use;
OA
articles
are
governed
by
the
applicable
Creative
Commons
License

More Related Content

PDF
Crop Identification Using Unsuperviesd ISODATA and K-Means from Multispectral...
PDF
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
PDF
Application of Decision Tree (M5Tree) Algorithm for Multicrop Yield Predictio...
DOC
Mung been identification for smart phone paper submitted for publication
PDF
Analysis and prediction of seed quality using machine learning
PDF
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
PDF
Random Forest Application for Crop Yield Prediction
PPTX
Crop recommendation system powerpoint.pptx
Crop Identification Using Unsuperviesd ISODATA and K-Means from Multispectral...
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
Application of Decision Tree (M5Tree) Algorithm for Multicrop Yield Predictio...
Mung been identification for smart phone paper submitted for publication
Analysis and prediction of seed quality using machine learning
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
Random Forest Application for Crop Yield Prediction
Crop recommendation system powerpoint.pptx

Similar to Wheat crop genotype and age prediction using machine learning.pdf (20)

PDF
DIRECT AND INDIRECT EFFECTS OF QUANTITATIVE CHARACTERS IN QUINOA (Chenopodium...
PPTX
Crop Prediction on Indian Agriculture.pptx
PDF
Genetic parameter estimates and diversity studies of upland rice (Oryza sativ...
PDF
Genetic parameter estimates and diversity studies of upland rice (Oryza sativ...
PDF
2019_Calibration and Simulation of the CERES-Sorghum.pdf
PDF
Factor and Principal Component Analyses of Component of Yield and Morphologic...
PDF
Evaluating the rice genotypes at various growth stages under agro-climatic co...
PPTX
DNA Finger Printing of Maize and Wheat in Ethiopia
PDF
ICRISAT Research Program West and Central Africa 2016 Highlights-Smallholders...
PDF
A Novel approach for Weed Identification and Classification in Vegetable Plan...
PDF
Overcoming imbalanced rice seed germination classification: enhancing accurac...
PDF
Evaluation of promising lines in rice ( O r y z a s a t i v a L.) to agronomi...
PDF
M044066366
PDF
Heritability and Genetic Advance for Grain Yield and its Component Characters...
PDF
Assessing the advancement of artificial intelligence and drones’ integration ...
PDF
No 15. correlation and genetic distance on sixteen rice varieties grown under...
PDF
No 10. growth and yield trial of 16 rice varieties under system of rice inten...
PDF
Crop yield prediction.pdf
DIRECT AND INDIRECT EFFECTS OF QUANTITATIVE CHARACTERS IN QUINOA (Chenopodium...
Crop Prediction on Indian Agriculture.pptx
Genetic parameter estimates and diversity studies of upland rice (Oryza sativ...
Genetic parameter estimates and diversity studies of upland rice (Oryza sativ...
2019_Calibration and Simulation of the CERES-Sorghum.pdf
Factor and Principal Component Analyses of Component of Yield and Morphologic...
Evaluating the rice genotypes at various growth stages under agro-climatic co...
DNA Finger Printing of Maize and Wheat in Ethiopia
ICRISAT Research Program West and Central Africa 2016 Highlights-Smallholders...
A Novel approach for Weed Identification and Classification in Vegetable Plan...
Overcoming imbalanced rice seed germination classification: enhancing accurac...
Evaluation of promising lines in rice ( O r y z a s a t i v a L.) to agronomi...
M044066366
Heritability and Genetic Advance for Grain Yield and its Component Characters...
Assessing the advancement of artificial intelligence and drones’ integration ...
No 15. correlation and genetic distance on sixteen rice varieties grown under...
No 10. growth and yield trial of 16 rice varieties under system of rice inten...
Crop yield prediction.pdf
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
1. Introduction to Computer Programming.pptx
PPT
Teaching material agriculture food technology
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
1. Introduction to Computer Programming.pptx
Teaching material agriculture food technology
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
SOPHOS-XG Firewall Administrator PPT.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Ad

Wheat crop genotype and age prediction using machine learning.pdf

  • 1. Received: 13 November 2023 Accepted: 18 April 2024 Published online: 8 June 2024 DOI: 10.1002/agj2.21595 O R I G I N A L A R T I C L E A g r o n o m i c A p p l i c a t i o n o f G e n e t i c R e s o u r c e s Wheat crop genotype and age prediction using machine learning with multispectral radiometer sensor data Mutiullah Jamil1 Zoha Ahsan1 Muhammad Nauman Saeed1 Ali Raza2 Hazem Migdady3 Mohammad Sh. Daoud4 Maryam Altalhi5 Absalom E. Ezugwu6 Laith Abualigah7,8,9,10,11,12 1Institute of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan 2Department of Software Engineering, University Of Lahore, Lahore 54000, Pakistan 3CSMIS Department, Oman College of Management and Technology, Barka, Oman 4College of Engineering, Al Ain University, Abu Dhabi, United Arab Emirates 5Department of Management Information Systems, College of Business Administration, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia 6Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa 7Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, Jordan 8MEU Research Unit, Middle East University, Amman, Jordan 9Computer Science Department, Al al-Bayt University, Mafraq 25113, Jordan 10Applied Science Research Center, Applied Science Private University, Amman, Jordan 11School of Engineering and Technology, Sunway University Malaysia, Petaling Jaya 27500, Malaysia 12Jadara Research Center, Jadara University, Irbid 21110, Jordan Correspondence Absalom E. Ezugwu, Unit for Data Science and Computing, North-West University, 11 Hoffman Street, Potchefstroom 2520, South Africa. Email: Absalom.ezugwu@nwu.ac.za Assigned to Associate Editor David E. Clay. Abstract Wheat (Triticum aestivum) yield predictions can be improved by using multispectral remote sensing to identify different genotypes and crop growth stages. We propose an innovative machine learning technique aimed at classifying diverse wheat crop geno- types and providing accurate estimations of plant age. Multispectral reflectance data was obtained from different sites where various wheat genotypes were cultivated. This approach involved analyzing incoming radiation and canopy light reflectance across five distinct spectral bands using a multispectral radiometer. The newly col- lected remote sensing data was utilized as input for the machine learning algorithm. Impressively, the random forest model achieved an accuracy rate of 98.77% in wheat crop genotype classification. Furthermore, the proposed approach’s effectiveness was confirmed through a 10-fold cross-validation mechanism. Moreover, a multiple lin- ear regression model for predicting the age of wheat genotypes explained 91% of the observed variation. These findings signify significant progress in wheat crop genotype and age prediction, ultimately leading to enhanced wheat yield. Abbreviations: LR, linear regression; MLR, multiple linear regression; NDVI, normalized difference vegetation index; NIR, near infrared; RF, random forest; SVM, support vector machine; SWNIR, short-wave near infrared. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2024 The Authors. Agronomy Journal published by Wiley Periodicals LLC on behalf of American Society of Agronomy. Agronomy Journal. 2024;116:1643–1654. wileyonlinelibrary.com/journal/agj2 1643
  • 2. 1644 JAMIL ET AL. 1 INTRODUCTION Wheat (Triticum aestivum) grains are rich in essential nutri- ents, establishing themselves as a valuable nutritional source that enhances diets worldwide (Zahra et al., 2023). Wheat consumption in Pakistan surpasses that of rice, and the per capita consumption of wheat is 124 kg annually (Bakhsh et al., 2003). To effectively address Pakistan’s escalating food requirements, advanced agricultural techniques are needed (Panhwar et al., 2021). The research findings by Khan et al. (2015) shed light on the vulnerability of the flowering stage to heat stress. Furthermore, Nabwire et al. (2022) empha- sizes the pivotal role of a plant’s age in managing water stress and temperature and sourcing essential nutrients from various avenues. Plant morphology can be used to compare different species, differentiate between various types of plants, or study how plants respond to stimuli (Wyatt, 2016). Some of the most important morphological traits include leaf shape, size, color, texture, angle, and volume. Within the shoot system, leaves adapt to their environment by altering their visual properties, making them recognizable (Yang et al., 2015). Developing alternative phenotypic classification approaches other than physical measurements is important for accelerating breeding, and the prediction of food resources is critical for improving food security. To comprehend a nation’s food resources, it is crucial to conduct a comprehensive assessment of potential crop har- vests (Akhter et al., 2023). In this ever-changing landscape, precise and meticulous crop evaluations play a vital role in generating valuable information that informs the strate- gic management of plant cultivation, allocation of resources, and food security. The intricate interplay between data-driven analysis and farming methods encapsulates the essence of this endeavor, illuminating the path toward sustainable and resilient agricultural systems. Consequently, this research aims to establish an innovative machine learning-based framework for categorizing wheat genotypes, accompanied by the development of a precise age prediction model for each specific genotype. Optimal yield can be achieved through the cultivation of wheat genotypes in harmony with their respective conducive environmental con- ditions. The efficacy and precision of our proposed machine learning-based model for wheat genotype classification and age predictions are evaluated through various parameters. Numerous scholars have contributed to the advancement of wheat genotype classification through a diverse array of approaches. For example, Naser et al. (2020) proposed a model that utilizes the Normalized Difference Vegetation Index (NDVI) to distinguish between wheat genotypes’ pro- ductivity in dry and wet environments. Their study, conducted in Northeastern Colorado, encompassed various climatic conditions. Employing NDVI data acquired from a prox- Core Ideas ∙ This research addresses a significant challenge in wheat crop genotype and age prediction. ∙ We propose an innovative machine learning methodology to classify different wheat crop geno- types. ∙ We collected different wheat seed genotype sam- ples using the multispectral radiometer. imal sensor to gauge the greenness of wheat fields, they also gathered data on grain yield for each wheat geno- type. The findings demonstrated a robust correlation between NDVI and grain yield, with higher NDVI readings associ- ated with wheat genotypes exhibiting greater grain yields. Notably, precise measurements of grain yield and effective discrimination of superior wheat genotypes were achieved at non-saturated NDVI values, particularly around the threshold of 0.9. Additionally, they determined that the k-means clus- tering algorithm could reliably categorize wheat genotypes into three classes of grain yield productivity based on their respective NDVI readings. A remote sensing study was conducted by Han et al. (2022) to investigate using a random forest (RF) model in moni- toring wheat phenology. They discovered that the RF model demonstrated high accuracy in predicting plant nitrogen accu- mulation, nitrogen nutrition index, aboveground biomass, and nitrogen concentration. The researchers collected multispec- tral images and crop data at five growth stages. The study of Raoufi et al. (2018) involved the emulation of growth and harvest patterns of diverse wet rice genotypes at varying seedling ages using the AquaCrop model. The research employed version 4.0 of the AquaCrop model to simulate rice growth. The experimentation spanned 2 years and was carried out at the Haraz Extension and Technology Development Center in Amol, Mazandaran Province, Iran. The study focused on three rice genotypes—Tarom, Ghaem, and Fajr—each exhibiting distinct growth period durations. Raoufi et al. (2018) showed that the model could be used to predict rice yields. Zhang et al. (2021) used MODIS NDVI time-series satellite data to distinguish winter wheat from other crops. The Hei- longjiang region was chosen for winter wheat mapping over four consecutive years (2014–2017). The model employed the peak–slope difference index and the NDVI time-series varia- tion coefficient for wheat crop mapping, specifically utilizing NDVI data from the MOD13Q1 dataset (Hubert-Moy et al., 2019). Landsat-8 multispectral images were acquired from the U.S. Geological Survey (USGS), and sample sites were selected using data from the USGS website, Google Earth 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 3. JAMIL ET AL. 1645 F I G U R E 1 Proposed innovative methodology workflow for the prediction of wheat crop genotype and age. photos, and statistical information. The coefficient of vari- ation (COV) of PSDI demonstrated high user and accuracy rates, achieving 94.10% and 93.74%, respectively. Das et al. (2021) proposed a methodology to assess water conditions in wheat genotypes using thermal imaging from unmanned aerial vehicles. This approach was valuable in predicting yields in sodic soils. This technique effectively classified agricultural water stress factors and provided biomass and grain production forecasts based on crop water stress indices. Applying classification and regression trees yielded highly accurate predictions for grain yield, root mean square error, and biomass. In the context of sodic soil conditions, wheat genotypes, including Gregory, Bremer, Mace, Lancer, and Mitch, demonstrated greater productivity than Flanker, Gladius, Emu Rock, Scout, and Janz. This research highlights genotype-specific productivity, offering valuable insights for wheat cultivation. Sandhu et al. (2021) introduced multi-trait machine learn- ing and deep learning models to enhance wheat breeding programs. They observed that the proposed models outper- formed genomic best linear unbiased predictor (GBLUP). The authors conducted their study on a dataset comprising wheat genotypes phenotyped for grain yield and grain protein con- tent. Furthermore, the genotypes were assessed for spectral reflectance, which was used to train the machine learning and deep learning models. The authors compared the performance of four uni-trait (UT) and four multi-trait (MT) models. Their findings indicated that the MT and deep learning models sur- passed the UT models and the GBLUP methods. The RF and multilayer perceptron models demonstrated the highest performance among the models. The authors concluded that the proposed models represent a promising tool for genomic selection in wheat breeding programs, suggesting their poten- tial in selecting wheat genotypes with superior grain yield and grain protein content. Fang et al. (2020) used Sentinel-2 imagery with winter wheat. The research conducted in Henan Province, Cen- tral China, involved acquiring Sentinel-2 images of winter wheat at a specific phenological stage through Google Earth Engine. Machine learning techniques, including RF, sup- port vector machine (SVM), and classification and regression tree, were employed to identify and map winter wheat across a wide area. Five-fold cross-validation and grid search approaches were utilized to optimize machine learning hyperparameters. The SVM demonstrated superior perfor- mance in classifying winter wheat, as indicated by com- paring the three algorithms. It achieved an overall accuracy (OA) of 0.94, user’s accuracy (UA) of 0.95, producer’s accuracy (PA) of 0.95, and Kappa coefficient (Kappa) of 0.92. The results emphasized the SVM’s sensitivity to spe- cific parameters (C and gamma), which led to the highest classification accuracy when these hyperparameters were optimized. Due to the lack of research on genotype classification using multispectral data in the literature, our study aims to address this gap. The primary objective of our research is to design a data acquisition system using multispectral MSRF5 sen- sors. Additionally, we have developed an automated machine learning-based technique to detect wheat growth stages. 2 METHODS AND MATERIALS The research study was conducted in the years 2020 and 2021 under the supervision of the IUB Agriculture Research Center. During data collection, nine plots were harvested, focusing on three types of genotypes and three different con- ditions of water stress. These conditions included normal watering, a 1-week delay in watering, and a 2-week delay in watering. No fertilization or spraying treatments were applied. 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 4. 1646 JAMIL ET AL. In this study, the selection of wheat crop genotypes for classification is based on diverse criteria, including genetic variability, agronomic performance, and adaptability to spe- cific environmental conditions. This comprehensive method- ology allowed for a systematic and rigorous investigation into the classification of wheat crop genotypes, providing valuable insights into their genetic diversity and potential agricultural applications. Our proposed innovative research methodology (Figure 1) involves architectural analysis. The multispectral radiometer (MSR5)-based sensor data is collected and utilized for building genotype classification and age prediction machine learning models. The collected multispectral radiometer sensor data is preprocessed and converted into five spectral bands. The formatted dataset is then split into training and testing portions. The 70% training portion of the dataset is utilized for training the applied machine learning models. The remaining 30% of the data is used for the evaluation of the machine learning model. The machine learning model is then used for cultivar classification. Following this, SPSS software is employed to predict the cultivar age. Using SPSS software, a multiple linear regression (MLR) model is applied to the classified data for predicting the age of each genotype. 2.1 Multispectral radiometer sensors data The study focused on three test genotypes: Miraj, Punjnad, and Aas, each cultivated in pairs, with one plot under water stress. Plots with the dimensions of 3.66 by 3.66 m were established in 2020 and 2021. Plots were planted at a rate determined by each plot size, which measured 2.32 m2 (length and width) on 2020 and 2021. After 2 weeks, each genotype underwent 30 MSR5 scans by CROPSCAN, Inc. The process yielded 90 samples at 15-day intervals over 3 months, totaling 540 samples representing six developmental stages, as shown in Figure 2. 2.1.1 Data collection area The data collection area chosen is within the Agricul- tural Research Center located at the Islamia University of Bahawalpur, situated in the dynamic city of Bahawalpur, Pun- jab, Pakistan. The data collection locations are illustrated in satellite as in Figure 3. This diverse study area encompasses various agro-climates typical of Punjab, where the annual rainfall can be as low as 2 mm (0.1 in.). Among these climates, October records the scantiest rainfall, while July is the wettest month, receiving 61 mm (2.4 in.) of rainfall. Bahawalpur, known for its soaring temperatures, often grapples with water scarcity issues that pose significant challenges. F I G U R E 2 The photographic representation of wheat crop of six stages: (a) stage 1, (b) stage 2, (c) stage 3, (d) stage 4, (e) stage 5, and (f) stage 6. 2.1.2 Data collection experiment design Observations were made between 2 and 12 weeks. This choice was based on the fact that temperatures below 13◦C inhibit flowering, while temperatures exceeding 14◦C after flow- ering and fruit set have negligible effects on plant growth (Noh et al., 2013). The wheat plants were categorized into plants cultivated under optimal growth conditions) and plants subjected to high-temperature stress. The selection of plants for the stress group were randomly selected. Each group of 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 5. JAMIL ET AL. 1647 F I G U R E 3 Location of the study site using Google Earth View with the map of Pakistan and highlighted in red color ROI at the upper top right corner of the image. TA B L E 1 The soil characteristics analysis during data collection. Soil characteristics 0–15 cm 15–30 cm Organic matter (%) 0.79 0.55 pH value 8.4 8.6 Electrical conductivity (dS/m) 250 230 Phosphorus (ppm) 7.1 5.1 Potassium (ppm) 112 114 Saturation characterizing soil texture (%) 36 35 plants was cultivated in dedicated plots, maintaining a con- sistent relative humidity of 70% throughout the entire growth period. Both the standard and stressed groups adhered to dis- tinct watering schedules, with irrigation administered every 15 days. The irrigation conditions included normal watering, a 1-week delay in watering, and a 2-week delay in water- ing. Wheat required a total of five irrigations. One irrigation equals 3 ha in., so 15 ha in. was required. No fertilization or spraying treatments were applied during data collection. The ratio of plant population was between 1.2 and 2.0 mil- lion seeds per ha. Soil physicochemical analysis was carried out before sowing the crop. Soil samples were taken from 0.0 to 0.15 m and 0.15 to 0.30 m using a soil augur. Soil characteristics (Shah et al., 2020) analysis data are given in Table 1. 2.2 Multispectral radiometer bands analysis The multispectral radiometers are utilized to assess incom- ing radiation and canopy light reflectance across five distinct spectral bands (Qadri et al., 2019; Rehmani et al., 2015). TA B L E 2 The wavelength and spatial resolution for the collected crop scan MSR5 data. Spectral band Wavelength (nm) Spatial resolution (area covered by the sensor) Band 1 Blue 450–520 1.524 m in radius Band 2 Green 520–630 1.524 m in radius Band 3 Red 630–690 1.524 m in radius Band 4 SNIR 760–900 1.524 m in radius Band 5 FNIR 1550–1750 1.524 m in radius F I G U R E 4 The feature space analysis of extracted multispectral bands data; most expressive feature (MEF 1, 2, and 3). The generated output dataset contained blue (450–520 nm), green (520–600 nm), red (630–690 nm), near-infrared (760– 900 nm), and far-infrared wavelengths (1550–1750 nm). Within each specific spectral band, the half-peak width varies, ranging from approximately 5 to 15 nm. This inno- vative approach, referred to as MSR5, encapsulates an entire scene by utilizing five distinct numerical values, effectively representing five energy bands, as described in Table 2. 2.3 Feature space analysis A feature space analysis was conducted to extract the impor- tant multispectral bands. The analysis began with the calcu- lation of principal components for feature space analysis. We selected the top five principal components from the band data and illustrated them in Figure 4. This analysis reveals that over 90% of variance is captured in the multispectral bands data. The dataset’s feature space exhibits greater linear separability for wheat genotype classification. 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 6. 1648 JAMIL ET AL. 2.4 Applied machine learning methods 2.4.1 Random forest Random forest is a commonly used technique for the clas- sification of multispectral data and yields enhanced results compared to other machine learning models (Raza et al., 2023). In the RF model, a value of 100 was utilized for the n_estimators parameter, which specifies the number of trees in the RF model. The RF prediction for the wheat crop genotype can be represented as: 𝑅𝐹(𝑋) = 1 𝑁 𝑁 ∑ 𝑖=1 𝑓𝑖(𝑋) (1) where 𝑁 is the number of decision trees in the forest, 𝑋 is the feature matrix with 𝑛 samples and 𝑚 features, 𝑌 is the target variable representing the wheat crop genotype, 𝑇𝑖 represents the 𝑖th decision tree in the forest, and 𝑓𝑖(𝑋) is the prediction of the 𝑖th decision tree. 2.4.2 Support vector machine A widely used supervised machine learning technique for classification and regression tasks is known as a SVM (Raza et al., 2022). SVMs have a good ability to differentiate between multiple classes or make precise predictions for con- tinuous values. The SVM model can be represented by the following equation: 𝑓(𝑥) = sign ( 𝑛 ∑ 𝑖=1 𝛼𝑖𝑦𝑖𝐾(𝑥𝑖, 𝑥) + 𝑏 ) (2) 2.4.3 Logistic regression Logistic regression (LR) (Raza et al., 2023) is a statisti- cal method used to model the relationship between a binary dependent variable and one or more independent variables. The main objective of logistic regression is to estimate the likelihood of a specific outcome based on distinct variables. In contrast to linear regression, which employs a linear equa- tion for modeling variable relationships, logistic regression transforms independent variables into a probability range from 0 to 1 using the logistic function, also known as the sigmoid function. The logistic regression equation is given by: 𝑃 (𝑌 = 1) = 1 1 + 𝑒−(𝑏0+𝑏1𝑥1+𝑏2𝑥2+⋯+𝑏𝑛𝑥𝑛) (3) 2.4.4 Multiple linear regression A statistical modeling method known as MLR (Sharma et al., 2022) is employed to investigate the relationship between sev- eral independent variables and a dependent variable. This approach extends the principle of simple linear regression to scenarios with multiple independent variables. The aim of MLR is to determine the most accurate linear equa- tion that estimates the value of the dependent variable based on the values of the independent variables. The mathematical equation for MLR can be written as: 𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑛𝑋𝑛 + 𝜀 (4) Tables 3–5 provide a comprehensive array of regression coefficients. In the “Unstandardized coefficients” column, “B” indicates weights. Notably, for the Miraj, Punjnad, and Aas tables, the “B” weights are 53.347, 107.728, and 107.126, respectively, with the “Constant” row representing the inter- cept. The “B” weight serves as a predictor in conjunction with the slope. A negative slope value implies a negative correla- tion. These coefficients intricately shape the linear regression equation, providing insight into the relationship. Significance across these three tables is remarkably low, at 0.000, under- scoring the influential role of independent variables on the dependent variable. 2.5 Hyperparameter tuning The best-fit hyperparameters of the applied machine learn- ing methods are determined, as illustrated in Table 6. In this research study, we employed a grid search approach to optimize the machine learning hyperparameters (Shekar & Dagnew, 2019). The best-fit hyperparameters help us achieve high-performance accuracy scores. 2.6 Analysis We used the Python programming language to conduct all research experiments (Hao & Ho, 2019). The Scikit-learn library in Python, version 1.0.2, was utilized to evaluate performance metrics for wheat crop genotype classification. The performance metrics included accuracy, recall, preci- sion, and F1 scores. We have employed several methods to evaluate performance scores, including comparisons of 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 7. JAMIL ET AL. 1649 TA B L E 3 The coefficient analysis of the Miraj genotype. Model Unstandardized coefficients Standardized coefficients B Stdard error Beta Significance Constant 53.347 6.401 0.000 Blue spectral band −3.119 0.415 −0.614 0.000 Green spectral band −6.541 1.231 −0.688 0.000 Red spectral band 15.822 0.581 2.576 0.000 Near-infrared spectral band 1.500 0.100 0.694 0.000 Far-infrared spectral band −5.334 0.301 −1.019 0.000 TA B L E 4 The coefficient analysis of the Punjnad genotype. Model Unstandardized coefficients Standardized coefficients Significance B Stdard error Beta Constant 105.728 11.277 0.000 Blue spectral band 0.692 0.759 0.170 0.364 Green spectral band −15.830 2.414 −1.947 0.000 Red spectral band 14.848 0.910 2.618 0.000 Near-infrared spectral band 0.670 0.177 0.238 0.000 Far-infrared spectral band −3.788 0.582 −0.745 0.000 TA B L E 5 The coefficient analysis of the Aas genotype. Model Unstandardized coefficients Standardized coefficients Significance B Stdard error Beta Constant 107.126 8.837 0.000 Blue spectral band −0.550 0.636 −0.117 0.389 Green spectral band −15.028 1.703 −1.749 0.000 Red spectral band 16.724 0.636 2.820 0.000 Near-infrared spectral band 0.856 0.168 0.384 0.000 Far-infrared spectral band −4.858 0.494 −0.899 0.000 TA B L E 6 The hyperparameters settings for applied machine learning models. Method Hyperparameter description RF n_estimators = 100, criterion = “entropy,” random_state=1 SVM kernel = “linear,” C = 10, random_state = 3 LR random_state = 2, max_iter=700 Abbreviations: LR, logistic regression; RF, random forest; SVM, support vector machine. machine learning model results. This also includes confu- sion matrix comparisons, k-fold cross-validation, and feature space comparisons. The accuracy metric is calculated using the following equation: Accuracy = Number of correct predictions Total number of predictions (5) The recall metric is calculated using the following equation: Recall = True positives True positives + False negatives (6) The precision metric is calculated using the following equation: Precision = True positives True positives + False positives (7) 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 8. 1650 JAMIL ET AL. TA B L E 7 Performance analysis of applied machine learning methods for unseen testing data. Method Accuracy Target class Precision Recall F1 score RF 0.98 Miraj 0.96 1.00 0.98 Punjnad 1.00 0.97 0.98 Aas 1.00 1.00 1.00 Average 0.99 0.99 0.99 SVM 0.90 Miraj 0.98 0.98 0.98 Punjnad 0.89 0.89 0.89 Aas 0.86 0.86 0.86 Average 0.91 0.91 0.91 LR 0.84 Miraj 0.87 0.92 0.89 Punjnad 0.87 0.84 0.85 Aas 0.80 0.78 0.79 Average 0.85 0.85 0.85 Abbreviations: LR, logistic regression; RF, random forest; SVM, support vector machine. The F1 metric is calculated using the following equation: 𝐹1 = 2 × Precision × Recall Precision + Recall (8) 3 RESULTS 3.1 Performance analysis of machine learning methods This analysis provides valuable insights into the performance of various machine learning models (Table 7). The analysis compared: accuracy, precision, recall, and F1 score. More so, the analysis shows that only the LR model achieved moderate performance scores of 0.84 and RF model outperformed the other models (Figure 5). Furthermore, the comprehensive histogram analysis, depicted in Figure 5 showed that the RF and SVM methods had good precision, recall, and F1 scores. On the other hand, the LR model yielded satisfactory results, indicating its potential for further optimization. The columns and rows in the confusion matrix (Figure 6) are denoted by 0, 1, and 2, eloquently representing the Miraj, Punjnad, and Aas genotypes, respectively. The diag- onal elements within this matrix gracefully show the adeptly classified data, showcasing the proficiency of the RF, SVM, and logistic regression machine learning models. The remain- ing entries of the matrix affectingly disclose instances where the three genotypes were unfortunately mispredicted. The RF model has successfully classified data with 98.77% accuracy, the SVM accuracy stands at 90.74%, and the accuracy of the logistic regression stands at 84.57% during validation. TA B L E 8 k-Fold-based performance validation of applied machine learning methods. Method Folds Accuracy Standard deviation (+/−) RF 10 0.99 0.0119 SVM 10 0.90 0.0292 LR 10 0.84 0.0532 Abbreviations: LR, logistic regression; RF, random forest; SVM, support vector machine. TA B L E 9 Performance analysis of the applied multiple regression model. Genotype R 𝑹𝟐 Adjusted 𝑹𝟐 Standard error Miraj 𝑌1 0.972 0.946 0.943 6.12 Punjnad 𝑌2 0.931 0.867 0.861 9.57 Aas 𝑌3 0.963 0.926 0.923 7.12 3.2 k-Fold cross-validation The performance of the applied machine learning models are rigorously validated through 10-fold cross-validation. This approach enables a comprehensive assessment of how well the applied models handle unseen data. The outcomes of the cross-validation analysis, summarizing the performance of the models across different folds, are presented in Table 8. These findings suggest that ML can be used to predict genotype and plant age. 3.3 Performance analysis of crop genotype age prediction The interplay between the coefficients of green and short- wave near infrared (SWNIR) and the age of the wheat crop genotype showcases an inverse correlation. In practical terms, as the plant’s age increases, there is a gradual reduction in the intensity of the green spectral band, along with the SWNIR values. Conversely, the red and NIR values demonstrate a direct proportionality with the age of the wheat genotype, exhibiting an upward trend. The multiple regression model summary in Table 9 depicts the correlation coefficient (R) and the R2 statistic, indicating the “proportionate decrease in error.” These values collec- tively assess the model’s performance in predicting the age of the Miraj, Punjnad, and Aas genotypes. A higher R2 value implies a better fit. The regression models explained at least 86% of the variation, using red, green, near-infrared, and short-wave near-infrared as predictors. The model achieves overall accuracy of over 90%, with a standard error range of 7.1–9 had. Depicted in the provided Figure 7 is a scatter plot show- casing the age distribution of wheat crops. Notably, this 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 9. JAMIL ET AL. 1651 F I G U R E 5 The histogram-based performance comparison of machine learning methods. LR, logistic regression; RF, random forest; SVM, support vector machine. F I G U R E 6 The confusion matrix-based performance validations of applied techniques: (a) random forest (RF), (b) support vector machine (SVM), and (c) logistic regression (LR). 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 10. 1652 JAMIL ET AL. F I G U R E 7 The scatter plot of age prediction of wheat genotypes. TA B L E 1 0 The state of the art approaches comparisons. Reference Proposed technique Performance acuracy Rehmani et al. (2015) Artificial neural network (ANN) 0.96 Qadri et al. (2016) Artificial neural network (ANN) 0.96 Jamil et al. (2023) Artificial neural network (ANN) 0.97 Jamil et al. (2023) Random forest (RF) 0.91 This study Random forest (RF) 0.99 visualization underscores the proficiency of the multiple linear model in predicting the ages of the wheat crop geno- types. The state-of-the-art comparison results are described in Table 10. 3.4 Discussion The inclusion of diverse wheat genotypes Miraj, Punjnad, and Aas in our dataset ensures the generalizability of our find- ings across varieties, making our methodology applicable to a broader range of agricultural settings. The high accuracy achieved by the RF model of 98.77% in wheat crop genotype classification underscores the effectiveness of the machine learning approach. This accuracy is particularly noteworthy as it provides a reliable means for distinguishing between geno- types. The successful application of diverse machine learning models, including support vector machine and logistic regres- sion, in comparative analyses demonstrates the robustness of our methodology. The implementation of k-fold cross-validation mecha- nisms further strengthens the credibility of the results. The validation process ensures the generalizability of the mod- els by assessing their performance across various subsets of the dataset. The consistency of high accuracy values across folds substantiates the robustness and reliability of our classification approach. Our focus on age prediction, a critical aspect of preci- sion agriculture, adds a dimension of practicality to our research. The MLR model developed for age prediction explained 91% of the variability. Such accurate age pre- dictions can significantly contribute to timely and targeted agricultural interventions, optimizing resource management and improving overall crop yield. 4 CONCLUSIONS This study demonstrated the effectiveness of multispectral radiometry and machine learning techniques for wheat crop genotype classification and age prediction. The data in this research were collected using a multispectral radiometer encompassing five bands: blue, green, red, near infrared 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 11. JAMIL ET AL. 1653 (NIR), and SWNIR. Among the machine learning mod- els (RF, SVM, and LR), RF excelled in wheat genotype classification, achieving an accuracy rate of 98.77%. The robustness of the classification model is validated through k-fold cross-validation. Furthermore, the machine learning model designed to predict additional phenotypic traits, including crop age, exhibited exceptional performance. MLR successfully predicted plant age based on spectral features, achieving over 90% accuracy. Overall, this study establishes the potential of MSR5 spectral bands for estimating the age of wheat crop genotypes. This study can serve as a foundation for the improvement of a real-time monitoring system for wheat crops in high-throughput plant phenotyping facilities. In the future, we will collect more dataset samples and enhance the wheat genotypes. We will also develop an advanced neural network approach for effective wheat geno- type classification. Additionally, we will utilize other sensors similar to MSR. AU T H O R C O N T R I B U T I O N S Mutiullah Jamil: Conceptualization. Zoha Ahsan: Conceptualization. Muhammad Nauman Saeed: Con- ceptualization. Ali Raza: Conceptualization. Hazem Migdady: Conceptualization. Mohammad Sh. Daoud: Conceptualization. Maryam Altalhi: Conceptualization. Absalom E. Ezugwu: Conceptualization; writing—review and editing. Laith Abualigah: Conceptualization. AC K N OW L E D G M E N T S The authors would like to acknowledge Deanship of Graduate Studies and Scientific Research, Taif University for funding this work. C O N F L I C T O F I N T E R E S T S TAT E M E N T The authors declare no conflicts of interest. O RC I D LaithAbualigah https://orcid.org/0000-0002-2203-4549 R E F E R E N C E S Akhter, M. J., Sonderskov, M., Loddo, D., Ulber, L., Hull, R., & Kudsk, P. (2023). Opportunities and challenges for harvest weed seed control in european cropping systems. European Journal of Agronomy, 142, 126639. Bakhsh, A., Hussain, A., & Khan, A. S. (2003). Genetic studies of plant height, yield and its components in bread wheat. Sarhad Journal of Agriculture, 19(4), 529–534. Das, S., Christopher, J., Apan, A., Choudhury, M. R., Chapman, S., Menzies, N. W., & Dang, Y. P. (2021). Evaluation of water status of wheat genotypes to aid prediction of yield on sodic soils using UAV-thermal imaging and machine learning. Agricultural and Forest Meteorology, 307, 108477. Fang, P., Zhang, X., Wei, P., Wang, Y., Zhang, H., Liu, F., & Zhao, J. (2020). The classification performance and mechanism of machine learning algorithms in winter wheat mapping using Sentinel-2 10 m resolution imagery. Applied Sciences, 10(15), 5075. Han, S., Zhao, Y., Cheng, J., Zhao, F., Yang, H., Feng, H., Li, Z., Ma, X., Zhao, C., & Yang, G. (2022). Monitoring key wheat growth variables by integrating phenology and UAV multispectral imagery data into random forest model. Remote Sensing, 14(15), 3723. Hao, J., & Ho, T. K. (2019). Machine learning made easy: A review of scikit-learn package in python programming language. Journal of Educational and Behavioral Statistics, 44(3), 348–361. Hubert-Moy, L., Thibault, J., Fabre, E., Rozo, C., Arvor, D., Corpetti, T., & Rapinel, S. (2019). Time-series spectral dataset for croplands in France (2006–2017). Data in Brief, 27, 104810. Jamil, M., Rehman, H., Saqlain Zaheer, M., Tariq, A., Iqbal, R., Hasnain, M. U., Majeed, A., Munir, A., Sabagh, A. E., Habib ur Rahman, M., Raza, A., Ali, M. A., & Elshikh, M. S. (2023). The use of Multispec- tral Radio-Meter (MSR5) data for wheat crop genotypes identification using machine learning models. Scientific Reports, 13(1), 19867. Jamil, M., ul Rehman, H., SaleemUllah, Ashraf, I., & Ubaid, S. (2023). Smart techniques for LULC micro class classification using land- sat8 imagery. Computers, Materials & Continua, 74(3), 5545–5557. https://doi.org/10.32604/cmc.2023.033449 Khan, S. U., Din, J. U., Qayyum, A., Jaan, N. E., & Jenks, M. A. (2015). Heat tolerance indicators in Pakistani wheat (Triticum aestivum L.) genotypes. Acta Botanica Croatica, 74(1), 109–121. Nabwire, S., Wakholi, C., Faqeerzada, M. A., Arief, M. A. A., Kim, M. S., Baek, I., & Cho, B.-K. (2022). Estimation of cold stress, plant age, and number of leaves in watermelon plants using image analysis. Frontiers in Plant Science, 13, 847225. Naser, M. A., Khosla, R., Longchamps, L., & Dahal, S. (2020). Using NDVI to differentiate wheat genotypes productivity under dryland and irrigated conditions. Remote Sensing, 12(5), 824. Noh, J., Kim, J. M., Sheikh, S., Lee, S. G., Lim, J. H., Seong, M. H., & Jung, G. T. (2013). Effect of heat treatment around the fruit set region on growth and yield of watermelon [Citrullus lanatus (Thunb.) Matsum. and Nakai]. Physiology and Molecular Biology of Plants, 19, 509–514. Panhwar, N. A., Mierzwa-Hersztek, M., Baloch, G. M., Soomro, Z. A., Sial, M. A., Demiraj, E., Panhwar, S. A., Afzal, A., & Lahori, A. H. (2021). Water stress affects the some morpho-physiological traits of twenty wheat (Triticum aestivum L.) genotypes under field condition. Sustainability, 13(24), 13736. Qadri, S., Furqan Qadri, S., Husnain, M., Saad Missen, M. M., Khan, D. M., Muzammil-Ul-Rehman, Razzaq, A., & Ullah, S. (2019). Machine vision approach for classification of citrus leaves using fused features. International Journal of Food Properties, 22(1), 2072–2089. Qadri, S., Khan, D. M., Ahmad, F., Qadri, S. F., Babar, M. E., Shahid, M., Ul-Rehman, M., Razzaq, A., Shah Muhammad, S., Fahad, M., Ahmad, S., Pervez, M. T., Naveed, N., Aslam, N., Jamil, M., Rehmani, E. A., Ahmad, N., & Akhtar Khan, N. (2016). A comparative study of land cover classification by using multispectral and texture data. BioMed Research International, 2016, 8797438. https://doi.org/10. 1155/2016/8797438 Raoufi, R., Soufizadeh, S., Amiri Larijani, B., AghaAlikhani, M., & Kambouzia, J. (2018). Simulation of growth and yield of various irri- gated rice (Oryza sativa L.) genotypes by AquaCrop under different seedling ages. Natural Resource Modeling, 31(2), e12162. Raza, A., Munir, K., Almutairi, M. S., & Sehar, R. (2023). Novel class probability features for optimizing network attack detection with machine learning. IEEE Access, 11, 98685–98694. https://doi.org/10. 1109/ACCESS.2023.3313596 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
  • 12. 1654 JAMIL ET AL. Raza, A., Rustam, F., Mallampati, B., Gali, P., & Ashraf, I. (2023). Preventing crimes through gunshots recognition using novel fea- ture engineering and meta-learning approach. IEEE Access, 11, 103115–103131. https://doi.org/10.1109/ACCESS.2023.3316695 Raza, A., Siddiqui, H. U. R., Munir, K., Almutairi, M., Rustam, F., & Ashraf, I. (2022). Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLOS One, 17(11), e0276525. Rehmani, E., Naweed, M., Shahid, M., Qadri, S., & Gilani, Z. (2015). A comparative study of crop classification by using radiometric and photographic data. Sindh University Research Journal (Science Series), 47(2), 335–340. Sandhu, K., Patil, S. S., Pumphrey, M., & Carter, A. (2021). Multitrait machine-and deep-learning models for genomic selection using spec- tral information in a wheat breeding program. The Plant Genome, 14(3), e20119. Shah, M. A. A., Mohsin, M., Chesneau, C., Zulfiqar, A., Jamal, F., Nadeem, K., & Sherwani, R. A. K. (2020). Analysis of factors affecting yield of agricultural crops in bahawalpur district: Analysis of factors of major agricultural crops. Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences, 57(4), 99–112. Sharma, B. P., Zhang, N., Lee, D., Heaton, E., Delucia, E. H., Sacks, E. J., Kantola, I. B., Boersma, N. N., Long, S. P., Voigt, T. B., & Khanna, M. (2022). Responsiveness of miscanthus and switchgrass yields to stand age and nitrogen fertilization: A meta-regression analysis. GCB Bioenergy, 14(5), 539–557. Shekar, B., & Dagnew, G. (2019). Grid search-based hyperparame- ter tuning and classification of microarray cancer data. In 2019 second international conference on advanced computational and communication paradigms (ICACCP) (pp. 1–8). IEEE. Wyatt, J. (2016). Grain and plant morphology of cereals and how char- acters can be used to identify varieties. Encyclopedia of Food Grains (Second Edition), 1, 51–72. Yang, J., Spicer, R. A., Spicer, T. E., Arens, N. C., Jacques, F. M., Su, T., Kennedy, E. M., Herman, A. B., Steart, D. C., Srivastava, G., Mehrotra, R. C., Valdes, P. J., Mehrotra, N. C., Zhou, Z.-K., & Lai, J.-S. (2015). Leaf form–climate relationships on the global stage: An ensemble of characters. Global Ecology and Biogeography, 24(10), 1113–1125. Zahra, N., Hafeez, M. B., Wahid, A., Al Masruri, M. H., Ullah, A., Siddique, K. H., & Farooq, M. (2023). Impact of climate change on wheat grain composition and quality. Journal of the Science of Food and Agriculture, 103(6), 2745–2751. Zhang, X., Liu, K., Wang, S., Long, X., & Li, X. (2021). A rapid model (COV_PSDI) for winter wheat mapping in fallow rotation area using MODIS NDVI time-series satellite observations: The case of the Heilonggang region. Remote Sensing, 13(23), 4870. How to cite this article: Jamil, M., Ahsan, Z., Saeed, M. N., Raza, A., Migdady, H., Daoud, M. S., Altalhi, M., Ezugwu, A. E., & Abualigah, L. (2024). Wheat crop genotype and age prediction using machine learning with multispectral radiometer sensor data. Agronomy Journal, 116, 1643–1654. https://doi.org/10.1002/agj2.21595 14350645, 2024, 4, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.21595 by Khwaja Fareed University of Engineering & Information, Wiley Online Library on [08/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License