SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 820
A Hybrid Data Clustering Approach using K-Means and Simplex
Method-based Bacterial Colony Optimization
S. Suresh Babu1 and K. Jayasudha2
1,2 Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Tamilnadu, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Clustering is a common data mining and data
analysis tool. K-means is a popular clustering approach in
which the data is partitioned into K clusters. The k-means
method, on the other hand, is highly dependent on the initial
state and eventually converges to a local optimum solution.
A new hybrid algorithm is proposed in this paper using a k-
means algorithm combined with simplex method-based
bacterial colony optimization (SMBCO+KM) for finding
more efficient groups. The main aim of the hybrid
approach is to enhance the clustering quality by utilizing
the benefits of both algorithms. The suggested approach
outperforms other algorithms according to simulation
findings.
Key Words: Bacterial colony optimization, simplex
method, k-means, convergence rate, data clustering
1.INTRODUCTION
Data clustering is the task of gathering information into
clusters (classes) so that the data in each group has a high
similarity while being considerably different from data in
other clusters [1]. The three-decade-old k-means
technique is among the most widely used partitional
clustering algorithms in a range of areas. Over continuous
data, the k-means algorithm is defined only when the
initial partitions were near to the final solution. To put it
another way, the outcomes of k-means are greatly
dependent on the initial state and the time it takes to
obtain a locally optimal solution. Many studies in
clustering have been conducted to try to overcome this
problem.
For example, Y.-T. Kao et al. (2008) have presented a new
hybrid method that is combined with simplex search, K-
means, and particle swarm optimization [2]. X. Geng et al.
(2019) have developed a hybrid method based on k-
means and Agglomerative nesting (AGNES) for topic
detection [3]. M. A. El-Shorbagy et al. (2021) have
developed an algorithm that combines the discovery
proficiency of LS and the exploitation proficiency of GOA
and incorporates the advantages of both LS and GOA [4].
P. Padmavathi et al. (2018) [5] presented a fuzzy
clustering method based on social spider optimization
(FSSO) and a hybrid method for fuzzy clustering (2021)
[6]. K. Vijayakumari et al (2021) presented a hybrid
method based on FBCO and fuzzy c-means algorithm for
fuzzy clustering [7].
The BCO is the most widely used well-known algorithm
that is applied to several fields of real-time applications.
The most significant benefit of the BCO is the ability to
share information with others via a communication
process. Individual communication and group exchanges
are different kinds of communication mechanisms that
have started in the BCO algorithm. The communication
process is utilized to improve the efficacy of the solutions
that have been provided. When tackling a data clustering
problem, however, standard BCO has a slow convergence
rate and a long calculation time [8]. Because the BCO
clustering technique uses some internal iteration to
achieve a significant clustering result, it takes longer to
compute.
However, traditional SI methods necessitate a high level of
development aptitude, which might lead to premature
convergence and increased calculation time[9]. Individual
algorithms have various advantages and disadvantages on
their own [10]. Hence, combining any two methods
provides an alternative method of overcoming the
limitations of individual algorithms [11]. The various SI
techniques are integrated with k-means to improve
clustering quality and overcome the shortcomings of a
single algorithm. Recently, Revathi et. al. (2021)
developed a hybrid method based on BCO and k-means to
enhance the performance [12]. However, the conventional
BCO has many drawbacks including a slow convergence
rate, failure to achieve local optimum values.
In this research, propose a hybrid technique for handling
clustering problems that combine an SMBCO and k-means.
SMBCO has used simplex method for enhancing the
performance of BCO. The main aim of the proposed
SMBCO+KM is to enhance the searching ability of both
local and global solutions, enhance the convergence rate
and avoid the local optima problem.
The suggested hybrid algorithm's goals are to improve the
accuracy of the clustering problem while eliminating the
flaws of both algorithms. The BCO is utilized to search the
complete space for the global optimum in the suggested
hybrid algorithm. When the BCO method obtains a
solution close to the optimum solution, the process is
switched to the k-means algorithm to generate more
precise similar groupings. The paper's contribution is as
follows:
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 821
 The SMBCO+KM algorithm is proposed to solve the
data clustering problem
 The k-means is integrated with BCO and BCO is
enhanced by the simplex method to produce more
similar data partitions
 The proposed hybrid SMBCO+KM utilizes the benefits
of both algorithms to overwrite their shortcomings of
its.
 The strength of the SMBCO+KM is evaluated on six
famous UCI datasets.
 To analyse the strength of SMBCO+KM, objective
function and computation time is used
 The performance of the SMBCO+KM method
compared with some benchmark algorithms.
2. DATA CLUSTERING
The data samples (patterns) 1 2
( { , ,..., })
n
X x x x
 are
found out the group of the N data patterns into the K
groups 1, 2
( ,...., )
K
C C C C
 . The data clustering problem
must meet the following requirements.
;
0, , 1, 2,..., ; ;
0, 1, 2,..., ;
1 i
i j K i j
i j
i K
i
K
C X
i
C C
C
  
 









(1)
Hence, the clustering problem could be determined as
follows:,
 
, ,
,
1
,
, 1, 2,....,
1, 2,..., ,
i
j j j j j
i
j
i
i C
j
x x z x x x
p
x
C X
p i p K
z i K
C x
  



 
 








(2)
Where, - represents the distance value which is the
calculated difference between two given data samples. j
z
- represent the center of the cluster. Hence, the major goal
of clustering approaches is to reduce the sum of squared
errors (SSE) [13] which can be defined as follows,
2
1 j i
K
j i
k x C
SSE x z
 
 
  (3)
3. K-MEANS ALGORITHM
The unsupervised k-means methodology is a distinguished
method for handling the clustering problem. It's a
partitioned clustering procedure that's basic,
straightforward, and low-cost to compute. [14]. This
algorithm starts with cluster centre values that are chosen
at random. Based on similarity, each data point is
assigned to the group's closest point. Distance values are
used to calculate the similarity of data points. The
Euclidean distance is an extensively used measure of
similarity. The following is the definition of the distance
function:
2
1
( , ) ( )
d
i
p j pi ji
D x z x z

 
 (4)
Here, p
x is represents the
th
p data sample, j
z is
representing the
th
j cluster center value, and d is the
number of data features. The cluster center is updated by
using the mean value of the related data samples
belonging to appreciate class.
1
p
j
x C
p j
j
z x
n  

 (5)
Here, j
n is denotes the data objects associated with j . j
C
is denotes a subset group from the cluster C . When one
of the following conditions is met, the k-means clustering
method expires: when the maximum iteration is reached
or when no cluster membership changes. The k-means
clustering method is depicted in algorithm 1.
4. SIMPLEX METHOD
Spendley et al. discovered the simplex approach (1962). It
is defined by a set of points that is one greater than the
dimensions of the search space. The simplex approach
provides a number of advantages, including a rapid search
speed, a small calculation area, and a high ability to search
locally [15, 16]. The SM method's detailed procedure is
outlined below,
Step 1: Calculate all of the solutions in the population
(bacteria). Elect the best global g
X and the second best
b
X , assuming s
X is the spider that has to be changed and
)
( g
f X , ( )
b
f X and ( )
s
f X are associated fitness value.
Step 2: Using the formula below, get the middle point
c
X
of g
X and b
X :
2
X X
g b
c
X

 (6)
Algorithm 1: K-means algorithm
Step 1: Choose k cluster centroid values at random.
Step 2: Calculate distance (Equation (3))
Step 3: Values for the cluster centroid is updated
(Equation (5))
Step 4: Execute the termination procedure. If yes, proceed
to step 2; then, the process is terminated.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 822
Step 2: Using the formula below, find the reflection point
r
X . Typically, the reflection coefficient  is set to 1.
( )
r c c s
X X X X

   (7)
Step 4: Compare the fitness values of ( )
r
f X and ( )
g
f X . If
( ) ( )
r g
f X f X
 , the following equation was used to
execute the extension operation:
( )
e c r c
X X X X

   (8)
Where  - denotes the extension coefficient, which is
usually set at 2. After that, compare the fitness of the
extension point Xe with the global best g
X . If ( )
e g
X X

then s
X is replaced by e
X . Else r
X will be used instead
of s
X .
Step 5: Match the fitness values of s
X and r
X . If ( )
r
f X
is greater than ( )
s
f X , the compression operation is
conducted using the following formula:
( )
t c s c
X X X X

   (9)
Where, the condense coefficient  s commonly set at 0.5.
then, compared the condense point t
X and the point s
X .
If ( ) ( )
t s
f X f X
 , then s
X should be exchanged for t
X .
Else, r
X will be used instead of s
X .
Step 6: the shrink operations are accomplished to identify
the condense point w
X ( ) ( ) ( )
g r s
f X f X f X
  . This is
defined as follows,
( )
w c s c
X X X X

   (10)
Here,  is the shrink coefficient. If ( ) ( ),
f X f X
w s
 s
X
must be swapped by w
X ;else r
X will be used instead of
s
X .
5. BACTERIAL COLONY OPTIMIZATION
BCO is the newest optimization algorithm developed by
Niu et al. (2012) [17]. In comparison to other bacteria
algorithms like BFO [18] and BC [19], the BCO algorithm
looks for nutrients by communicating information
between individuals, a process known as communication.
Chemotaxis, communication, elimination, reproduction,
and migration are the five primary phases of the BCO.
Algorithm 2 shows the BCO algorithm with simplex
method.
The chemotaxis process is carried out in two different
ways: running and tumbling. The goal of the running
process is to improve the efficiency of convergence. The
purpose of the tumbling procedure is to avoid problems
with local optima. The most important task in BCO
communication is the communication phase. Two types of
processes in the communication process are used:
dynamic neighbor-oriented (randomly oriented study)
and group-oriented. The communication procedure is
adopted to improve searching capabilities while also
lowering computational costs and preventing premature
convergence. The tumbling process can be determined as
follows,
( ) ( 1) ( )
*[ .( ( 1)) (1 )
*( ( 1)) ]
Position T Position T C i
i i
f G Position T f
i i i
best
P Position T turb
i i
besti
  
   
  
(11)
The running process performed as follows,
( ) ( 1) ( )
*[ .( ( 1)) (1 )
*( ( 1))]
Position T Position T C i
i i
f G Position T f
i i i
best
P Position T
i
besti
  
   
 
(12)
max
( ) ( )
max
min min
max
Iter Iterj
C i C C C
Iter

  
 
 
 
 
(13)
Where, i
turb - turbulent direction variance, ( )
C i -
chemotaxis step size, (0,1)
fi ,Gbest
- global best value
and P
best
- personal best or local best, max
Iter -
maximum iteration and Iterj - current iteration.
6. PROPOSED SMBCO+KM
The proposed SMBCO+KM is combined with three
algorithms k-means, BCO, and simplex method. The k-
means is a well-known fast clustering algorithm. But, it
Algorithm 2: BCO for clustering
Step 1: Each bacterial colony
Step 2: Chemotaxis and communication process
Step 3: Calculate each colony's cluster centre
Step 4: Calculate the distance between the data samples
and the cluster center
Step 5: Reproduction and elimination process
Step 6: Migration process
Step 7: Bacterial colony updating using simplex
method
Step 8: If the terminating state is not met, then go to
step 2.1.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 823
produced low accuracy and fell into local optima. On the
other hand, BCO is a well-known swarm intelligence global
optimization algorithm and also it produced high accuracy.
But, its convergence rate is low and fails to find a local
optimum. The simplex method is a well-known local
searching method that is used to enhance the performance
of various optimization algorithms. Hence, the proposed
method uses the merits of the above three algorithms for
obtaining a more efficient solution and to dismiss the
shortcomings. In the proposed SMBCO+KM, First, the
searching ability of BCO is enhanced by the simplex
method. Then, the results of the SMBCO are used as the
initial condition of the k-means algorithm. The step by
step algorithm for hybrid SMBCO+KM is mentioned in
algorithm 3.
7. EXPERIMENTAL RESULTS AND ANALYSIS
The experimental results are conducted by using MATLAB.
The strength of the proposed SMBCO+KM is evaluated on
six different prominent UCI datasets. The proposed
SMBCO+KM technique's performance is evaluated using
the objective function. The strength of the proposed
SMBCO+KM matched with some benchmark algorithms
such as k-means [20], PSO [21], BFO [22], BCO[23], SMBCO
7.1 Datasets collections
The developed and compared methods are applied to six
different datasets to obtain experimental results. The
datasets were retrieved from the UCI machine learning
database, the details of which are shown in Table 1 and
discussed as follows,
 From 214 data samples, the Glass is divided into six
groups, each with nine attributes.
 The 303 data samples in the Heart are divided into two
unique classes based on six attributes
 Fisher's iris is a collection of 150 samples divided into
three groups with four attributes each.
 There are 871 data samples in Vowel, which are divided
into six types based on three features.
 There are 178 data samples in the wine, which are
divided into three classes with thirteen attributes each.
 Wisconsin breast cancer (WBC) has 683 samples that
are divided into two classes based on nine features.
7.2 Parameter settings
The best parameter settings can produce more efficient
outcomes for the given solutions. The following
parameter settings are considered in this present research
paper. High computation time is taken when choosing the
high value of the chemeotaxis step. Hence, this present
research work selects as 100
C
N  . The selecting the
swim step is 4
s
N  , and the reproduction value is
4
re
N  . The lowest step length value
min
C is 0.01. The
Algorithm 3: proposed hybrid SMBCO+KM
Begin Hybrid method
Step 1: Initialize population required parameters
Step 2: Begin BCO
Step 2.1: Each bacterial colony
Step 2.2: Chemotaxis and communication process
Step 2.3: Calculate each colony's cluster centre
Step 2.4: Calculate the distance between the data samples and
the cluster center
Step 2.5: Reproduction and elimination process
Step 2.6: Migration process
Step 2.7: Bacterial colony updating using simplex method
Step 2.8: If the terminating state is not met, then go to step 2.
Step 2: End BCO
Step 3: Begin K-means
Step 3.1: Assign the best solution of SM-BCO as the initial
cluster center
Step 3.2: Calculate distance (Equation (3))
Step 3.3: Values for the cluster centroid is updated
(Equation (5))
Step 3.4: Execute the termination procedure. If yes,
proceed to step 2; then, the process is
terminated.
Step 3: End K-means
Step 4: Perform the termination condition if yes then
go to step 2, otherwise terminate the
process and go to step 5.
Step 5: store the final cluster center as the best solution
Step 6: End
Table 1: Description of datasets
Datasets Instances Features Clusters
Glass 214 9 6
Heart 303 76 2
Iris 150 4 3
Vowel 871 3 6
WBC 683 9 2
Wine 178 13 3
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 824
higher step length max
C is 0.2. The probability value in
the elimination and dispersal step is 0.25.
7.3 Performance indicators
The performances of the developed algorithms are
investigated with the help of a performance analyzer such
as an objective function. The objective function's goal is to
close the distance between two data samples, such as data
object values and the center of the cluster. The value of the
objective function is divided into three categories:
maximum, worst, and medium. As a result, the lowest value
is seen as the best.
7.4 Discussions
Table 2 shows the performance comparisons of the
developed data clustering algorithms. To analyze the
performance of developed clustering algorithms, in this
research work, considering six different benchmark
datasets such as glass, heart, Iris, WBC, wine, and vowel.
The qualities of clustering algorithms are analyzed using
the objective function, standard deviation, and
computational time.
Table 2 shows the performance of developed
algorithms based on the objective function. The low value
of the objective function is considered as best performance.
For example, the low objective value is produced by the
proposed SMBCO+KM algorithm for iris datasets such as
85.80, and low computation time is taken to convergence
Table 2: Comparative analysis results of the objective values
Datasets Techniques
Objective values
Time (s)
Best Mean Worst SD
Glass
K-Means 244.85 253.53 260.55 4.32 0.0396
PSO 227.31 234.35 240.63 3.96 15.084
BFO 219.31 225.31 231.48 3.65 12.169
BCO 208.34 214.64 219.78 3.14 10.823
SMBCO 197.04 204.13 207.17 2.81 9.858
SMBCO+KM 187.98 188.71 190.10 1.17 8.926
Heart
K-Means 4530.98 5089.41 5406.96 230.50 0.0802
PSO 4485.06 4757.32 5049.54 172.89 19.437
BFO 4321.44 4607.83 4772.54 118.68 18.028
BCO 4213.12 4393.84 4497.06 72.24 16.792
SMBCO 4130.36 4245.05 4347.92 61.27 14.819
SMBCO+KM 4097.41 4158.93 4197.50 26.59 12.951
Iris
K-Means 145.58 197.83 227.86 21.55 0.0628
PSO 91.37 100.11 109.77 5.61 5.6598
BFO 88.45 96.61 104.73 4.63 5.1481
BCO 80.83 88.76 95.66 4.09 4.5180
SMBCO 76.78 84.30 90.71 3.75 4.1381
SMBCO+KM 73.77 80.93 85.80 3.29 3.8712
WBC
K-Means 2516.24 2926.13 3279.84 212.01 0.0926
PSO 2331.89 2679.86 2984.67 191.81 16.184
BFO 2232.43 2620.73 2884.57 184.39 15.118
BCO 2139.69 2342.74 2499.02 93.69 13.902
SMBCO 2055.70 2119.54 2193.51 42.66 11.362
SMBCO+KM 1970.26 2036.99 2095.03 36.15 10.752
Wine
K-Means 18791.69 19824.04 20695.52 529.96 0.08726
PSO 17989.09 18520.85 18999.75 313.10 14.3623
BFO 16848.34 17610.17 17890.36 254.06 11.7212
BCO 16536.52 16957.24 17181.56 176.14 7.2810
SMBCO 16383.72 16561.36 16891.29 154.77 6.6482
SMBCO+KM 16268.65 16411.67 16573.67 92.22 5.9261
Vowel
K-Means 134227.58 148569.91 161084.74 7610.08 0.0820
PSO 129623.26 140510.24 147086.88 4582.54 17.871
BFO 126995.69 133261.81 139473.93 3450.50 16.108
BCO 121231.89 125020.95 129747.96 2718.64 13.581
SMBCO 115778.54 119875.35 121849.42 1555.52 12.984
SMBCO+KM 114714.65 116077.57 117810.54 924.57 11.821
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 825
such as 3.8712 when compared with other algorithms.
According to Table 2, the proposed SMBCO+KM achieves
higher performances compared with other compared
algorithms. Figure 1 shows the performance of the
computational algorithm for SMBCO and proposed
SMBCO+KM. Figure 2 shows the convergence rate of the
SMBCO and SMBCO+KM algorithms.
8. CONCLUSIONS
The well-known clustering methods k-means and BCO
have their benefits and drawbacks. For the data clustering
problem, the present article developed a new hybrid
method based on k-means and simplex BCO. The
performance analysis is carried out on six datasets using
five well-known techniques. When using the proposed
hybrid clustering technique to solve clustering problems,
the proposed method obtains the best solution by
determining the best clustering center vector for each
bacterial individual. When compared to existing methods,
the experimental findings showed that the proposed
SMBCO+KM method offered the best solution.
REFERENCES
[1] S. S. Babu and K. Jayasudha, "A survey of nature-
inspired algorithm for partitional data clustering,"
in Journal of Physics: Conference Series, 2020, vol.
1706, no. 1: IOP Publishing, p. 012163.
[2] Y.-T. Kao, E. Zahara, and I.-W. Kao, "A hybridized
approach to data clustering," Expert Systems with
Applications, vol. 34, no. 3, pp. 1754-1762, 2008.
[3] X. Geng, Y. Zhang, Y. Jiao, and Y. Mei, "A novel
hybrid clustering algorithm for topic detection on
chinese microblogging," IEEE Transactions on
Computational Social Systems, vol. 6, no. 2, pp.
289-300, 2019.
[4] M. A. El-Shorbagy and A. Ayoub, "Integrating
grasshopper optimization algorithm with local
search for solving data clustering problems,"
International Journal of Computational
Intelligence Systems, vol. 14, no. 1, pp. 783-793,
2021.
Figure 1 : Performance comparisons based on computational time
Figure 2 : convergence rate of SMBCO and SMBCO+KM
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 826
[5] P. Padmavathi, V. Eswaramurthy, and J. Revathi,
"Fuzzy social spider optimization algorithm for
fuzzy clustering analysis," in 2018 International
Conference on Current Trends towards
Converging Technologies (ICCTCT), 2018: IEEE,
pp. 1-6.
[6] P. Padmavathi, V. Eswaramurthy, and J. Revathi,
"Hybridization of Fuzzy C-Means and Fuzzy Social
Spider Optimization for Clustering," in Advances
in Electrical and Computer Technologies:
Springer, 2021, pp. 179-187.
[7] K. Vijayakumari and V. Baby Deepa, "Fuzzy C-
Means Hybrid with Fuzzy Bacterial Colony
Optimization," in Advances in Electrical and
Computer Technologies: Springer, 2021, pp. 75-
87.
[8] K. Tamilarisi, M. Gogulkumar, and K. Velusamy,
"Data clustering using bacterial colony
optimization with particle swarm optimization,"
in 2021 Fourth International Conference on
Electrical, Computer and Communication
Technologies (ICECCT), 2021: IEEE, pp. 1-5.
[9] K. Tamilarasi, M. Gogulkumar, and K. Velusamy,
"Enhancing the performance of social spider
optimization with neighbourhood attraction
algorithm," in Journal of Physics: Conference
Series, 2021, vol. 1767, no. 1: IOP Publishing, p.
012017.
[10] T. Niknam and B. Amiri, "An efficient hybrid
approach based on PSO, ACO and k-means for
cluster analysis," Applied soft computing, vol. 10,
no. 1, pp. 183-197, 2010.
[11] J. Ji, H. Xiao, and C. Yang, "HFADE-FMD: a hybrid
approach of fireworks algorithm and differential
evolution strategies for functional module
detection in protein-protein interaction
networks," Applied Intelligence, pp. 1-15, 2020.
[12] J. Revathi, V. Eswaramurthy, and P. Padmavathi,
"Hybrid data clustering approaches using
bacterial colony optimization and k-means," in
IOP Conference Series: Materials Science and
Engineering, 2021, vol. 1070, no. 1: IOP
Publishing, p. 012064.
[13] A. Abraham, S. Das, and S. Roy, "Swarm
intelligence algorithms for data clustering," in Soft
computing for knowledge discovery and data
mining: Springer, 2008, pp. 279-313.
[14] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D.
Piatko, R. Silverman, and A. Y. Wu, "An efficient k-
means clustering algorithm: Analysis and
implementation," IEEE transactions on pattern
analysis and machine intelligence, vol. 24, no. 7,
pp. 881-892, 2002.
[15] J. A. Nelder and R. Mead, "A simplex method for
function minimization," The computer journal,
vol. 7, no. 4, pp. 308-313, 1965.
[16] Y. Zhou, Y. Zhou, Q. Luo, and M. Abdel-Basset, "A
simplex method-based social spider optimization
algorithm for clustering analysis," Engineering
Applications of Artificial Intelligence, vol. 64, pp.
67-82, 2017.
[17] B. Niu and H. Wang, "Bacterial colony
optimization," Discrete Dynamics in Nature and
Society, vol. 2012, 2012.
[18] K. M. Passino, "Biomimicry of bacterial foraging
for distributed optimization and control," IEEE
control systems magazine, vol. 22, no. 3, pp. 52-
67, 2002.
[19] S. D. Muller, J. Marchetto, S. Airaghi, and P.
Kournoutsakos, "Optimization based on bacterial
chemotaxis," IEEE Transactions on Evolutionary
Computation, vol. 6, no. 1, pp. 16-29, 2002, doi:
10.1109/4235.985689.
[20] A. Likas, N. Vlassis, and J. J. Verbeek, "The global k-
means clustering algorithm," Pattern recognition,
vol. 36, no. 2, pp. 451-461, 2003.
[21] I. De Falco, A. Della Cioppa, and E. Tarantino,
"Facing classification problems with particle
swarm optimization," Applied Soft Computing,
vol. 7, no. 3, pp. 652-658, 2007.
[22] M. Wan, L. Li, J. Xiao, C. Wang, and Y. Yang, "Data
clustering using bacterial foraging optimization,"
Journal of Intelligent Information Systems, vol. 38,
no. 2, pp. 321-341, 2012.
[23] J. Revathi, V. Eswaramurthy, and P. Padmavathi,
"Bacterial colony optimization for data
clustering," in 2019 IEEE International
Conference on Electrical, Computer and
Communication Technologies (ICECCT), 2019:
IEEE, pp. 1-4.

More Related Content

PDF
Dynamic approach to k means clustering algorithm-2
PDF
Analysis and implementation of modified k medoids
PDF
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
PDF
IRJET- Customer Segmentation from Massive Customer Transaction Data
PDF
An optimal design of current conveyors using a hybrid-based metaheuristic alg...
PDF
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
PDF
Advanced SOM & K Mean Method for Load Curve Clustering
Dynamic approach to k means clustering algorithm-2
Analysis and implementation of modified k medoids
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
IRJET- Customer Segmentation from Massive Customer Transaction Data
An optimal design of current conveyors using a hybrid-based metaheuristic alg...
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
Advanced SOM & K Mean Method for Load Curve Clustering

Similar to A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bacterial Colony Optimization (20)

PDF
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
PDF
A PSO-Based Subtractive Data Clustering Algorithm
PDF
A046010107
PDF
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
PDF
Algorithms 14-00122
PDF
IRJET- Crowd Density Estimation using Novel Feature Descriptor
PDF
Estimating project development effort using clustered regression approach
PDF
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
PDF
Machine Learning, K-means Algorithm Implementation with R
PDF
Review of Existing Methods in K-means Clustering Algorithm
PDF
Master's Thesis Presentation
PDF
Sequences classification based on group technology
PDF
Proposed algorithm for image classification using regression-based pre-proces...
PDF
Application of-computational-intelligence-techniques-for-economic-load-dispatch
PDF
Sequences classification based on group technology for flexible manufacturing...
PDF
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
PDF
84cc04ff77007e457df6aa2b814d2346bf1b
PDF
Paper id 21201488
PDF
The International Journal of Engineering and Science (The IJES)
PDF
Optimising Data Using K-Means Clustering Algorithm
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A PSO-Based Subtractive Data Clustering Algorithm
A046010107
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
Algorithms 14-00122
IRJET- Crowd Density Estimation using Novel Feature Descriptor
Estimating project development effort using clustered regression approach
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
Machine Learning, K-means Algorithm Implementation with R
Review of Existing Methods in K-means Clustering Algorithm
Master's Thesis Presentation
Sequences classification based on group technology
Proposed algorithm for image classification using regression-based pre-proces...
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Sequences classification based on group technology for flexible manufacturing...
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
84cc04ff77007e457df6aa2b814d2346bf1b
Paper id 21201488
The International Journal of Engineering and Science (The IJES)
Optimising Data Using K-Means Clustering Algorithm
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Construction Project Organization Group 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
Project quality management in manufacturing
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Well-logging-methods_new................
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Welding lecture in detail for understanding
PPT
Mechanical Engineering MATERIALS Selection
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
composite construction of structures.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Construction Project Organization Group 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Project quality management in manufacturing
CYBER-CRIMES AND SECURITY A guide to understanding
Automation-in-Manufacturing-Chapter-Introduction.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
OOP with Java - Java Introduction (Basics)
bas. eng. economics group 4 presentation 1.pptx
Well-logging-methods_new................
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Welding lecture in detail for understanding
Mechanical Engineering MATERIALS Selection
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Geodesy 1.pptx...............................................
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
composite construction of structures.pdf
CH1 Production IntroductoryConcepts.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bacterial Colony Optimization

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 820 A Hybrid Data Clustering Approach using K-Means and Simplex Method-based Bacterial Colony Optimization S. Suresh Babu1 and K. Jayasudha2 1,2 Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Tamilnadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Clustering is a common data mining and data analysis tool. K-means is a popular clustering approach in which the data is partitioned into K clusters. The k-means method, on the other hand, is highly dependent on the initial state and eventually converges to a local optimum solution. A new hybrid algorithm is proposed in this paper using a k- means algorithm combined with simplex method-based bacterial colony optimization (SMBCO+KM) for finding more efficient groups. The main aim of the hybrid approach is to enhance the clustering quality by utilizing the benefits of both algorithms. The suggested approach outperforms other algorithms according to simulation findings. Key Words: Bacterial colony optimization, simplex method, k-means, convergence rate, data clustering 1.INTRODUCTION Data clustering is the task of gathering information into clusters (classes) so that the data in each group has a high similarity while being considerably different from data in other clusters [1]. The three-decade-old k-means technique is among the most widely used partitional clustering algorithms in a range of areas. Over continuous data, the k-means algorithm is defined only when the initial partitions were near to the final solution. To put it another way, the outcomes of k-means are greatly dependent on the initial state and the time it takes to obtain a locally optimal solution. Many studies in clustering have been conducted to try to overcome this problem. For example, Y.-T. Kao et al. (2008) have presented a new hybrid method that is combined with simplex search, K- means, and particle swarm optimization [2]. X. Geng et al. (2019) have developed a hybrid method based on k- means and Agglomerative nesting (AGNES) for topic detection [3]. M. A. El-Shorbagy et al. (2021) have developed an algorithm that combines the discovery proficiency of LS and the exploitation proficiency of GOA and incorporates the advantages of both LS and GOA [4]. P. Padmavathi et al. (2018) [5] presented a fuzzy clustering method based on social spider optimization (FSSO) and a hybrid method for fuzzy clustering (2021) [6]. K. Vijayakumari et al (2021) presented a hybrid method based on FBCO and fuzzy c-means algorithm for fuzzy clustering [7]. The BCO is the most widely used well-known algorithm that is applied to several fields of real-time applications. The most significant benefit of the BCO is the ability to share information with others via a communication process. Individual communication and group exchanges are different kinds of communication mechanisms that have started in the BCO algorithm. The communication process is utilized to improve the efficacy of the solutions that have been provided. When tackling a data clustering problem, however, standard BCO has a slow convergence rate and a long calculation time [8]. Because the BCO clustering technique uses some internal iteration to achieve a significant clustering result, it takes longer to compute. However, traditional SI methods necessitate a high level of development aptitude, which might lead to premature convergence and increased calculation time[9]. Individual algorithms have various advantages and disadvantages on their own [10]. Hence, combining any two methods provides an alternative method of overcoming the limitations of individual algorithms [11]. The various SI techniques are integrated with k-means to improve clustering quality and overcome the shortcomings of a single algorithm. Recently, Revathi et. al. (2021) developed a hybrid method based on BCO and k-means to enhance the performance [12]. However, the conventional BCO has many drawbacks including a slow convergence rate, failure to achieve local optimum values. In this research, propose a hybrid technique for handling clustering problems that combine an SMBCO and k-means. SMBCO has used simplex method for enhancing the performance of BCO. The main aim of the proposed SMBCO+KM is to enhance the searching ability of both local and global solutions, enhance the convergence rate and avoid the local optima problem. The suggested hybrid algorithm's goals are to improve the accuracy of the clustering problem while eliminating the flaws of both algorithms. The BCO is utilized to search the complete space for the global optimum in the suggested hybrid algorithm. When the BCO method obtains a solution close to the optimum solution, the process is switched to the k-means algorithm to generate more precise similar groupings. The paper's contribution is as follows:
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 821  The SMBCO+KM algorithm is proposed to solve the data clustering problem  The k-means is integrated with BCO and BCO is enhanced by the simplex method to produce more similar data partitions  The proposed hybrid SMBCO+KM utilizes the benefits of both algorithms to overwrite their shortcomings of its.  The strength of the SMBCO+KM is evaluated on six famous UCI datasets.  To analyse the strength of SMBCO+KM, objective function and computation time is used  The performance of the SMBCO+KM method compared with some benchmark algorithms. 2. DATA CLUSTERING The data samples (patterns) 1 2 ( { , ,..., }) n X x x x  are found out the group of the N data patterns into the K groups 1, 2 ( ,...., ) K C C C C  . The data clustering problem must meet the following requirements. ; 0, , 1, 2,..., ; ; 0, 1, 2,..., ; 1 i i j K i j i j i K i K C X i C C C               (1) Hence, the clustering problem could be determined as follows:,   , , , 1 , , 1, 2,...., 1, 2,..., , i j j j j j i j i i C j x x z x x x p x C X p i p K z i K C x                   (2) Where, - represents the distance value which is the calculated difference between two given data samples. j z - represent the center of the cluster. Hence, the major goal of clustering approaches is to reduce the sum of squared errors (SSE) [13] which can be defined as follows, 2 1 j i K j i k x C SSE x z       (3) 3. K-MEANS ALGORITHM The unsupervised k-means methodology is a distinguished method for handling the clustering problem. It's a partitioned clustering procedure that's basic, straightforward, and low-cost to compute. [14]. This algorithm starts with cluster centre values that are chosen at random. Based on similarity, each data point is assigned to the group's closest point. Distance values are used to calculate the similarity of data points. The Euclidean distance is an extensively used measure of similarity. The following is the definition of the distance function: 2 1 ( , ) ( ) d i p j pi ji D x z x z     (4) Here, p x is represents the th p data sample, j z is representing the th j cluster center value, and d is the number of data features. The cluster center is updated by using the mean value of the related data samples belonging to appreciate class. 1 p j x C p j j z x n     (5) Here, j n is denotes the data objects associated with j . j C is denotes a subset group from the cluster C . When one of the following conditions is met, the k-means clustering method expires: when the maximum iteration is reached or when no cluster membership changes. The k-means clustering method is depicted in algorithm 1. 4. SIMPLEX METHOD Spendley et al. discovered the simplex approach (1962). It is defined by a set of points that is one greater than the dimensions of the search space. The simplex approach provides a number of advantages, including a rapid search speed, a small calculation area, and a high ability to search locally [15, 16]. The SM method's detailed procedure is outlined below, Step 1: Calculate all of the solutions in the population (bacteria). Elect the best global g X and the second best b X , assuming s X is the spider that has to be changed and ) ( g f X , ( ) b f X and ( ) s f X are associated fitness value. Step 2: Using the formula below, get the middle point c X of g X and b X : 2 X X g b c X   (6) Algorithm 1: K-means algorithm Step 1: Choose k cluster centroid values at random. Step 2: Calculate distance (Equation (3)) Step 3: Values for the cluster centroid is updated (Equation (5)) Step 4: Execute the termination procedure. If yes, proceed to step 2; then, the process is terminated.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 822 Step 2: Using the formula below, find the reflection point r X . Typically, the reflection coefficient  is set to 1. ( ) r c c s X X X X     (7) Step 4: Compare the fitness values of ( ) r f X and ( ) g f X . If ( ) ( ) r g f X f X  , the following equation was used to execute the extension operation: ( ) e c r c X X X X     (8) Where  - denotes the extension coefficient, which is usually set at 2. After that, compare the fitness of the extension point Xe with the global best g X . If ( ) e g X X  then s X is replaced by e X . Else r X will be used instead of s X . Step 5: Match the fitness values of s X and r X . If ( ) r f X is greater than ( ) s f X , the compression operation is conducted using the following formula: ( ) t c s c X X X X     (9) Where, the condense coefficient  s commonly set at 0.5. then, compared the condense point t X and the point s X . If ( ) ( ) t s f X f X  , then s X should be exchanged for t X . Else, r X will be used instead of s X . Step 6: the shrink operations are accomplished to identify the condense point w X ( ) ( ) ( ) g r s f X f X f X   . This is defined as follows, ( ) w c s c X X X X     (10) Here,  is the shrink coefficient. If ( ) ( ), f X f X w s  s X must be swapped by w X ;else r X will be used instead of s X . 5. BACTERIAL COLONY OPTIMIZATION BCO is the newest optimization algorithm developed by Niu et al. (2012) [17]. In comparison to other bacteria algorithms like BFO [18] and BC [19], the BCO algorithm looks for nutrients by communicating information between individuals, a process known as communication. Chemotaxis, communication, elimination, reproduction, and migration are the five primary phases of the BCO. Algorithm 2 shows the BCO algorithm with simplex method. The chemotaxis process is carried out in two different ways: running and tumbling. The goal of the running process is to improve the efficiency of convergence. The purpose of the tumbling procedure is to avoid problems with local optima. The most important task in BCO communication is the communication phase. Two types of processes in the communication process are used: dynamic neighbor-oriented (randomly oriented study) and group-oriented. The communication procedure is adopted to improve searching capabilities while also lowering computational costs and preventing premature convergence. The tumbling process can be determined as follows, ( ) ( 1) ( ) *[ .( ( 1)) (1 ) *( ( 1)) ] Position T Position T C i i i f G Position T f i i i best P Position T turb i i besti           (11) The running process performed as follows, ( ) ( 1) ( ) *[ .( ( 1)) (1 ) *( ( 1))] Position T Position T C i i i f G Position T f i i i best P Position T i besti          (12) max ( ) ( ) max min min max Iter Iterj C i C C C Iter             (13) Where, i turb - turbulent direction variance, ( ) C i - chemotaxis step size, (0,1) fi ,Gbest - global best value and P best - personal best or local best, max Iter - maximum iteration and Iterj - current iteration. 6. PROPOSED SMBCO+KM The proposed SMBCO+KM is combined with three algorithms k-means, BCO, and simplex method. The k- means is a well-known fast clustering algorithm. But, it Algorithm 2: BCO for clustering Step 1: Each bacterial colony Step 2: Chemotaxis and communication process Step 3: Calculate each colony's cluster centre Step 4: Calculate the distance between the data samples and the cluster center Step 5: Reproduction and elimination process Step 6: Migration process Step 7: Bacterial colony updating using simplex method Step 8: If the terminating state is not met, then go to step 2.1.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 823 produced low accuracy and fell into local optima. On the other hand, BCO is a well-known swarm intelligence global optimization algorithm and also it produced high accuracy. But, its convergence rate is low and fails to find a local optimum. The simplex method is a well-known local searching method that is used to enhance the performance of various optimization algorithms. Hence, the proposed method uses the merits of the above three algorithms for obtaining a more efficient solution and to dismiss the shortcomings. In the proposed SMBCO+KM, First, the searching ability of BCO is enhanced by the simplex method. Then, the results of the SMBCO are used as the initial condition of the k-means algorithm. The step by step algorithm for hybrid SMBCO+KM is mentioned in algorithm 3. 7. EXPERIMENTAL RESULTS AND ANALYSIS The experimental results are conducted by using MATLAB. The strength of the proposed SMBCO+KM is evaluated on six different prominent UCI datasets. The proposed SMBCO+KM technique's performance is evaluated using the objective function. The strength of the proposed SMBCO+KM matched with some benchmark algorithms such as k-means [20], PSO [21], BFO [22], BCO[23], SMBCO 7.1 Datasets collections The developed and compared methods are applied to six different datasets to obtain experimental results. The datasets were retrieved from the UCI machine learning database, the details of which are shown in Table 1 and discussed as follows,  From 214 data samples, the Glass is divided into six groups, each with nine attributes.  The 303 data samples in the Heart are divided into two unique classes based on six attributes  Fisher's iris is a collection of 150 samples divided into three groups with four attributes each.  There are 871 data samples in Vowel, which are divided into six types based on three features.  There are 178 data samples in the wine, which are divided into three classes with thirteen attributes each.  Wisconsin breast cancer (WBC) has 683 samples that are divided into two classes based on nine features. 7.2 Parameter settings The best parameter settings can produce more efficient outcomes for the given solutions. The following parameter settings are considered in this present research paper. High computation time is taken when choosing the high value of the chemeotaxis step. Hence, this present research work selects as 100 C N  . The selecting the swim step is 4 s N  , and the reproduction value is 4 re N  . The lowest step length value min C is 0.01. The Algorithm 3: proposed hybrid SMBCO+KM Begin Hybrid method Step 1: Initialize population required parameters Step 2: Begin BCO Step 2.1: Each bacterial colony Step 2.2: Chemotaxis and communication process Step 2.3: Calculate each colony's cluster centre Step 2.4: Calculate the distance between the data samples and the cluster center Step 2.5: Reproduction and elimination process Step 2.6: Migration process Step 2.7: Bacterial colony updating using simplex method Step 2.8: If the terminating state is not met, then go to step 2. Step 2: End BCO Step 3: Begin K-means Step 3.1: Assign the best solution of SM-BCO as the initial cluster center Step 3.2: Calculate distance (Equation (3)) Step 3.3: Values for the cluster centroid is updated (Equation (5)) Step 3.4: Execute the termination procedure. If yes, proceed to step 2; then, the process is terminated. Step 3: End K-means Step 4: Perform the termination condition if yes then go to step 2, otherwise terminate the process and go to step 5. Step 5: store the final cluster center as the best solution Step 6: End Table 1: Description of datasets Datasets Instances Features Clusters Glass 214 9 6 Heart 303 76 2 Iris 150 4 3 Vowel 871 3 6 WBC 683 9 2 Wine 178 13 3
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 824 higher step length max C is 0.2. The probability value in the elimination and dispersal step is 0.25. 7.3 Performance indicators The performances of the developed algorithms are investigated with the help of a performance analyzer such as an objective function. The objective function's goal is to close the distance between two data samples, such as data object values and the center of the cluster. The value of the objective function is divided into three categories: maximum, worst, and medium. As a result, the lowest value is seen as the best. 7.4 Discussions Table 2 shows the performance comparisons of the developed data clustering algorithms. To analyze the performance of developed clustering algorithms, in this research work, considering six different benchmark datasets such as glass, heart, Iris, WBC, wine, and vowel. The qualities of clustering algorithms are analyzed using the objective function, standard deviation, and computational time. Table 2 shows the performance of developed algorithms based on the objective function. The low value of the objective function is considered as best performance. For example, the low objective value is produced by the proposed SMBCO+KM algorithm for iris datasets such as 85.80, and low computation time is taken to convergence Table 2: Comparative analysis results of the objective values Datasets Techniques Objective values Time (s) Best Mean Worst SD Glass K-Means 244.85 253.53 260.55 4.32 0.0396 PSO 227.31 234.35 240.63 3.96 15.084 BFO 219.31 225.31 231.48 3.65 12.169 BCO 208.34 214.64 219.78 3.14 10.823 SMBCO 197.04 204.13 207.17 2.81 9.858 SMBCO+KM 187.98 188.71 190.10 1.17 8.926 Heart K-Means 4530.98 5089.41 5406.96 230.50 0.0802 PSO 4485.06 4757.32 5049.54 172.89 19.437 BFO 4321.44 4607.83 4772.54 118.68 18.028 BCO 4213.12 4393.84 4497.06 72.24 16.792 SMBCO 4130.36 4245.05 4347.92 61.27 14.819 SMBCO+KM 4097.41 4158.93 4197.50 26.59 12.951 Iris K-Means 145.58 197.83 227.86 21.55 0.0628 PSO 91.37 100.11 109.77 5.61 5.6598 BFO 88.45 96.61 104.73 4.63 5.1481 BCO 80.83 88.76 95.66 4.09 4.5180 SMBCO 76.78 84.30 90.71 3.75 4.1381 SMBCO+KM 73.77 80.93 85.80 3.29 3.8712 WBC K-Means 2516.24 2926.13 3279.84 212.01 0.0926 PSO 2331.89 2679.86 2984.67 191.81 16.184 BFO 2232.43 2620.73 2884.57 184.39 15.118 BCO 2139.69 2342.74 2499.02 93.69 13.902 SMBCO 2055.70 2119.54 2193.51 42.66 11.362 SMBCO+KM 1970.26 2036.99 2095.03 36.15 10.752 Wine K-Means 18791.69 19824.04 20695.52 529.96 0.08726 PSO 17989.09 18520.85 18999.75 313.10 14.3623 BFO 16848.34 17610.17 17890.36 254.06 11.7212 BCO 16536.52 16957.24 17181.56 176.14 7.2810 SMBCO 16383.72 16561.36 16891.29 154.77 6.6482 SMBCO+KM 16268.65 16411.67 16573.67 92.22 5.9261 Vowel K-Means 134227.58 148569.91 161084.74 7610.08 0.0820 PSO 129623.26 140510.24 147086.88 4582.54 17.871 BFO 126995.69 133261.81 139473.93 3450.50 16.108 BCO 121231.89 125020.95 129747.96 2718.64 13.581 SMBCO 115778.54 119875.35 121849.42 1555.52 12.984 SMBCO+KM 114714.65 116077.57 117810.54 924.57 11.821
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 825 such as 3.8712 when compared with other algorithms. According to Table 2, the proposed SMBCO+KM achieves higher performances compared with other compared algorithms. Figure 1 shows the performance of the computational algorithm for SMBCO and proposed SMBCO+KM. Figure 2 shows the convergence rate of the SMBCO and SMBCO+KM algorithms. 8. CONCLUSIONS The well-known clustering methods k-means and BCO have their benefits and drawbacks. For the data clustering problem, the present article developed a new hybrid method based on k-means and simplex BCO. The performance analysis is carried out on six datasets using five well-known techniques. When using the proposed hybrid clustering technique to solve clustering problems, the proposed method obtains the best solution by determining the best clustering center vector for each bacterial individual. When compared to existing methods, the experimental findings showed that the proposed SMBCO+KM method offered the best solution. REFERENCES [1] S. S. Babu and K. Jayasudha, "A survey of nature- inspired algorithm for partitional data clustering," in Journal of Physics: Conference Series, 2020, vol. 1706, no. 1: IOP Publishing, p. 012163. [2] Y.-T. Kao, E. Zahara, and I.-W. Kao, "A hybridized approach to data clustering," Expert Systems with Applications, vol. 34, no. 3, pp. 1754-1762, 2008. [3] X. Geng, Y. Zhang, Y. Jiao, and Y. Mei, "A novel hybrid clustering algorithm for topic detection on chinese microblogging," IEEE Transactions on Computational Social Systems, vol. 6, no. 2, pp. 289-300, 2019. [4] M. A. El-Shorbagy and A. Ayoub, "Integrating grasshopper optimization algorithm with local search for solving data clustering problems," International Journal of Computational Intelligence Systems, vol. 14, no. 1, pp. 783-793, 2021. Figure 1 : Performance comparisons based on computational time Figure 2 : convergence rate of SMBCO and SMBCO+KM
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 826 [5] P. Padmavathi, V. Eswaramurthy, and J. Revathi, "Fuzzy social spider optimization algorithm for fuzzy clustering analysis," in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 2018: IEEE, pp. 1-6. [6] P. Padmavathi, V. Eswaramurthy, and J. Revathi, "Hybridization of Fuzzy C-Means and Fuzzy Social Spider Optimization for Clustering," in Advances in Electrical and Computer Technologies: Springer, 2021, pp. 179-187. [7] K. Vijayakumari and V. Baby Deepa, "Fuzzy C- Means Hybrid with Fuzzy Bacterial Colony Optimization," in Advances in Electrical and Computer Technologies: Springer, 2021, pp. 75- 87. [8] K. Tamilarisi, M. Gogulkumar, and K. Velusamy, "Data clustering using bacterial colony optimization with particle swarm optimization," in 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2021: IEEE, pp. 1-5. [9] K. Tamilarasi, M. Gogulkumar, and K. Velusamy, "Enhancing the performance of social spider optimization with neighbourhood attraction algorithm," in Journal of Physics: Conference Series, 2021, vol. 1767, no. 1: IOP Publishing, p. 012017. [10] T. Niknam and B. Amiri, "An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis," Applied soft computing, vol. 10, no. 1, pp. 183-197, 2010. [11] J. Ji, H. Xiao, and C. Yang, "HFADE-FMD: a hybrid approach of fireworks algorithm and differential evolution strategies for functional module detection in protein-protein interaction networks," Applied Intelligence, pp. 1-15, 2020. [12] J. Revathi, V. Eswaramurthy, and P. Padmavathi, "Hybrid data clustering approaches using bacterial colony optimization and k-means," in IOP Conference Series: Materials Science and Engineering, 2021, vol. 1070, no. 1: IOP Publishing, p. 012064. [13] A. Abraham, S. Das, and S. Roy, "Swarm intelligence algorithms for data clustering," in Soft computing for knowledge discovery and data mining: Springer, 2008, pp. 279-313. [14] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, "An efficient k- means clustering algorithm: Analysis and implementation," IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 881-892, 2002. [15] J. A. Nelder and R. Mead, "A simplex method for function minimization," The computer journal, vol. 7, no. 4, pp. 308-313, 1965. [16] Y. Zhou, Y. Zhou, Q. Luo, and M. Abdel-Basset, "A simplex method-based social spider optimization algorithm for clustering analysis," Engineering Applications of Artificial Intelligence, vol. 64, pp. 67-82, 2017. [17] B. Niu and H. Wang, "Bacterial colony optimization," Discrete Dynamics in Nature and Society, vol. 2012, 2012. [18] K. M. Passino, "Biomimicry of bacterial foraging for distributed optimization and control," IEEE control systems magazine, vol. 22, no. 3, pp. 52- 67, 2002. [19] S. D. Muller, J. Marchetto, S. Airaghi, and P. Kournoutsakos, "Optimization based on bacterial chemotaxis," IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, pp. 16-29, 2002, doi: 10.1109/4235.985689. [20] A. Likas, N. Vlassis, and J. J. Verbeek, "The global k- means clustering algorithm," Pattern recognition, vol. 36, no. 2, pp. 451-461, 2003. [21] I. De Falco, A. Della Cioppa, and E. Tarantino, "Facing classification problems with particle swarm optimization," Applied Soft Computing, vol. 7, no. 3, pp. 652-658, 2007. [22] M. Wan, L. Li, J. Xiao, C. Wang, and Y. Yang, "Data clustering using bacterial foraging optimization," Journal of Intelligent Information Systems, vol. 38, no. 2, pp. 321-341, 2012. [23] J. Revathi, V. Eswaramurthy, and P. Padmavathi, "Bacterial colony optimization for data clustering," in 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2019: IEEE, pp. 1-4.