SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 367
PARALLEL kNN ON GPU ARCHITECTURE USING OpenCL
V.B.Nikam1
, B.B.Meshram2
1
Associate Professor, Department of Computer Engineering and Information Technology, Jijabai Technological
Institute, Matunga, Mumbai, Maharashtra, India
2
Professor, Department of Computer Engineering and Information Technology, Veermata Jijabai Technological
Institute, Matunga, Mumbai, Maharashtra, India
Abstract
In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide
usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics,
biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN
search remains the bottleneck for kNN classification, especially in high dimensional spaces. This bottleneck has created the
necessity of the parallel kNN on commodity hardware. GPU and OpenCL architecture are the low cost high performance
solutions for parallelising the kNN classifier. In regard to this, we have designed, implemented our proposed parallel kNN model
to improve upon performance bottleneck issue of kNN algorithm.
In this paper, we have proposed parallel kNN algorithm on GPU and OpenCL framework. In our approach, we distributed the
distance computations of the data points among all GPU cores. Multiple threads invoked for each GPU core. We have
implemented and tested our parallel kNN implementation on UCI datasets. The experimental results show that the speedup of the
KNN algorithm is improved over the serial performance.
Keywords: kNN, GPU, CPU, Parallel Computing, Data Mining, Clustering Algorithm.
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
In data mining, classification is the problem of identifying
set of categories/sub-populations from new observations;
based on a training dataset containing observations/instances
whose category membership known in advance. The data
mining algorithm performs classification is called classifier.
Classifier is a supervised learning category. In supervised
learning, input set of observations has an effect on output set
of observations; the model includes the mediating variables
between the input and output variables. kNN uses a
continuous data for building classification model. Being the
capability of handling large datasets for solving the big
problems, kNN has been widely accepted in research and
industry applications. Though, kNN has been accepted
widely; due to the large datasets, performance became the
performance bottleneck for the algorithm. kNN classifier is
an instance based learning (IBL) which follows case based
reasoning (CBR) approach to solve new problems. This
learning approach is a lazy learning. kNN classifier is the
most widely used classifier in the data mining domain, and
has been accepted for wide range and volume of datasets. A
new instance is classified as the most frequent class of its
‘K’ nearest neighbours.
OpenCL is a generic many core computing programming
framework used for parallelizing applications on the GPU’s
parallel architecture. OpenCL and GPU is most preferred for
parallelizing kNN. It is observed that, kNN is
embarrassingly parallel in nature and a good candidate for
parallelism. In this paper, we have explored the design and
experimental results of our parallel kNN on GPU and
OpenCL architecture. We have proposed, implemented and
tested OpenCL based parallel kNN algorithm on GPU
platform. We observed that, our proposed parallel kNN on
OpenCL & GPU platform performs well over the sequential
performance of kNN. In addition to the performance, the
accuracy of our proposed kNN algorithm is achieved to the
level of satisfaction.
This paper is organised as below: Section2 discusses the
literature survey on kNN classifier, related work in parallel
kNN algorithm, OpenCL and GPU technology. Section3
describes our methodology on OpenCL framework for
parallel kNN algorithm. Section4 discusses on result
analysis. We have observed the results for scaleup on
performance. Section5 is the conclusion of our work and
the paper.
2. LITERATURE SURVEY
The basic process of sequential kNN algorithm is as follows:
First, the data pre-processing phase is to initialize the
labelled dimensional training dataset as well as the test
dataset to be classified. Second, select one test point in the
test dataset and calculate the distances between it and each
point in the train dataset. The next phase is to sort the results
of distances computed, and find out ‘k’ smallest results. The
fourth step is to determine the class label of the test point by
the election result of ‘k’ points. Thus, selecting another
point in the test dataset and repeat the process to classify the
test dataset[5].
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 368
The pseudo-code for the sequential kNN algorithm is as
follows,
Algorithm: kNN
Input: Dataset, D = {(x1,c1), . . . , (xN,cN)},
Input query t = (x1, . . . ,xn), k- number of
neighbour.
Output: class ‘c’ to be identified for new instance of
dataset ‘t’
Begin
For each labelled instance (xi, ci)
calculate d(xi, x)
Order d(xi,x) from lowest to highest, (i = 1, . . . ,N)
Select ‘K’ nearest instances to x:
Assign to x the most frequent class in
End
If we have 4 test cases and 12 training points, then in
traditional algorithm we visit each training point 4 times and
will be perform total 48 serial operations. Thus, for ‘m’ test
cases, and ‘n’ training points there would be O(mn) time
complexity. The time complexity is very high if test cases
and training points become very large in volume. This has
created the need to parallelize operations for kNN algorithm.
The compute intensive operational part of kNN algorithm
taken up for the parallel and simultaneous operations, can
lead to reasonable reduction in compute time. The optimal
value for ‘k’ identified in the range of 3 to 10. In general,
large ‘k’ value is more precise as it reduces the overall
noise.
kNN has been used in statistical methodologies and pattern
recognition since beginning. ‘k’ nearest neighbours
measured by a distance function. Here we have used
Euclidian distance. The Euclidian distance measured
between two points, ‘a’ and ‘b’, is shown below in equation
(1).
d(a,b) = .........................................(1)
In addition to Euclidian distance, there are Manhattan
distance, Minkowski distance are also used for measuring
distances. These distance measures are only used for
continuous data variables, however, in case of categorical
variables the Hamming distance must be used. If there is
continuous and categorical variables exist in the dataset,
standardization of the numerical variables between 0 and 1
is done to achieve same type data values for the entire
dataset. This is also called as normalization of dataset.
2.1 OpenCL Framework
Due to tremendous computing power, General Purpose
computing on GPU (GPGPU) has been widely adopted by
the developers and researchers [7]. Khronos group has
proposed OpenCL (Open Computing Language) framework
for programming many core architecture [8]. OpenCL is an
open industry standard for general-purpose parallel
programming. OpenCL device ‘kernels’ contains
computation to be carried out in parallel. In order to achieve
parallel computation, many work-items, work groups, etc
the components of OpenCL data parallel execution model,
as shown in the 0, spawns on multiple data items [9].
Fig 1: Data parallel execution using OpenCL
The data parallelisation task happens through the work items
that executes in work-groups as shown in 0 For parallel
computing and performance, OpenCL memory hierarchy
has four types of device memories viz private, global,
constant and local memories. Efficient use of each of them
need to be done for high performance achievement [7][8].
2.2 Related Work on Parallel kNN
kNN is widely used classification algorithm and very
parallel friendly because of number of independent
operations. It is required to scan all the objects for any new
entry. It is known for lazy learning as all the time dataset
needs to be scanned. When training and test dataset size is
large, the speed will be quite slow which makes it the best
candidate to be processed in parallel. S. Liang , C. Wang, Y.
Liu and L. Jian (2009) have focused on parallelizing sorting
and distance calculation process due to high compute
intensiveness. Distances between the current object and all
the data set objects are calculated in parallel. They made two
kernels for the same. Sorting Kernel will find the rank of
particular object in data set.[1]. Q. Kuang and L. Zhao
(2009) proposed parallel scheme for processing testing data
set and training data set by dividing computation of result
data set. During distance calculation, result data-set is
divided into tiles and each tiles of width T is processed by
an independent block in parallel. Each block contains TxT
thread each of which calculates the single distance[2].
Garcia V. Debreuve E. Nielsen F. and Barlaud M. (2010)
suggested CUDA based and CUBLAS based approaches for
the parallelism of kNN. The approach except sorting is
almost similar to S. Liang et al[1] approach. CUBLAS
implementation proved high speed up [3] for performance.
G. Nolan proposed improved bubble sort on CUDA for kNN
process. they stated iterative version of bitonic sort used for
parallelisation[4]. A. Arefin, C. Riveros, R. Berretta and P.
Moscato (2011) proposed fast and scalable kNN algorithm
on GPU for very large data-sets. In their strategy, they
divided reluctant distance matrix into the chunks of size,
which can fit into the GPU’s onboard memory. This
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 369
approach is independent of the size of the distance matrix
and overall data size[5]. A. Arefin, et. al. (2012) proposed
graph based agglomerative clustering method based on kNN
graph and Boruvka’s algorithm on GPU[6] .
3. PROPOSED METHOLOLOGY
In our proposed parallel kNN algorithm, we have proposed
the methodology on OpenCL framework. Our model will
performs best on high-end GPU platform, also, the model
responds to the multi core commodity architecture. In our
parallel kNN model, we compute distance between each
training case and all test cases concurrently. Hence,
computes all distances in parallel in a step. We extended this
concept to calculate distance between all training cases and
all testing cases in parallel using many cores. These
computations we have taken up on many core GPU
platform, and developed kernels in OpenCL to compute the
task in parallel.
Finding distance ‘d’ for finding nearest neighbour is the
crucial and compute intensive task in kNN classifier. We
have ported the ‘distance’ computation on GPU to perform
the parallel operation, which has resulted to considerable
improvement in kNN performance. The major two kernels
work for parallelising the computation on multiple threads
on GPU, 1.Distance computation to find the distance
between the individual attributes points and 2.Sum up the
partial computed distance to find the total sum to get the
distance between the points for finding the nearest
neighbouring points among the available.
Fig 2: Proposed Parallel kNN flowchart
The compute intensive step in kNN is distance
computations, which is done on GPU where as others are
performed on CPU.
The pseudo code for our parallel kNN algorithm is as
follows,
Algorithm: parkNN
Input: Dataset, D = {(a1,c1), . . . , (xn,cn)},
Input query, t = (a1, . . . ,xn),
k- number of neighbour.
Output: Class ‘c’ to be identified for new instance of
dataset‘t’
Begin
On CPU:
1. Load data set to global memory
2. Decide ‘k’ for the cluster definition/generation
3. Transfer dataset to GPU memory
On GPU:
4. For all labelled instance (ai, ci), compute d(ai, a)
On CPU:
5. Sort d(ai,a) from lowest to highest, ( i = 1, . . . ,n)
6. Select ‘k’ nearest instances to x:
7. Assign to x the most frequent class in
End
Distributing difference computations on many core GPU
performs computations faster due to parallelising the
difference computations. The following OpenCL kernels
demonstrate parallel computation on GPU cores
Kernel for Parallel difference computation
__kernel void diff(__global const float * query, __global
const float *data,
__global const int *nVal, __global float * diff)
{ int id = get_global_id(0);
int indexCent = id % ( nVal[0] * nVal[1] );
int p = id % nVal[0];
int q = (( id / (nVal[0] * nVal[1]) ) % nVal[2] ) * nVal[0];
diff[ id ] = (query[indexCent] - data[ p + q ]) *
(query[indexCent] - data[ p + q ]);
}
nVal contains following parameters,
0 -> #Attributes, 1 -> #Query , 2 -> #Data Points,
# Workers = (#Attributes * #Query * #Data Points)
Kernel for Parallel Sum Computations
__kernel void Sum(__global const float * data, __global
float *sum, __global const int *nVal)
{ int id = get_global_id(0);
int index = id + (( id / nVal[2] ) * nVal[1] );
if( (id % nVal[2]) == (nVal[0] - nVal[2]))
sum[id] = data[index];
else
sum[id] = data[index] + data[(index + nVal[2])];
}
nVal contains following parameters,
0 -> Data Point Length (DPL), 1 -> DPL / 2, 2 -> Offset ,
Offset = DPL/2 + DPL%2,
# workers = offset * # Data Points
The parallel computation of the difference computations in
two points, A & B, is illustrated in 0, as shown below. The
OpenCL kernels written for parallel performance on GPU
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 370
device, and are achieving computation methodology as
shown in 0.
Fig 3: Distance calculation between two points in Parallel
Parallel sum kernel as described in kernel ‘sum’. Multi-step
execution (partial data) for parallel sum calculation
generated partial results on each level which further used as
inputs to next level of parallel computations.
Illustrative Example: We illustrate working of our
proposed parallel kNN algorithm with the UCI Balloon
dataset, as described in Table1. The attribute values of
dataset are transformed into bit-mapped represented
datasets. The sample dataset split into training dataset and
test dataset with the ratio of 70:30 respectively. For UCI
balloon dataset 12 records are taken as training dataset, and
4 instances are taken as test datasets. The training dataset is
used to build model, however test dataset used to test the
built model for records to classify the query instance. The
test data set is shown in Table1, and the training dataset is
shown in Table2, as below. The balloon dataset has 4
attributes, 16 records, and is classified in two classes.
Table: 1 Balloons Dataset. (TEST)
Sr. No. Color Size Act Age Class
1 1 2 1 1 1
2 1 2 1 2 2
3 2 1 1 1 1
4 2 1 1 2 2
Table: 2 Balloons Dataset (TRAINING): Bitmap
representation
Sr. No. Color Size Act Age Class
1 1 1 1 1 1
2 1 1 1 2 2
3 1 1 2 1 2
4 1 1 2 2 2
5 1 2 2 1 2
6 1 2 2 2 2
7 2 1 2 1 2
8 2 1 2 2 2
9 2 2 1 1 1
10 2 2 1 2 2
11 2 2 2 1 2
12 2 2 2 2 2
Our proposed parallel kNN algorithm takes into account the
same inputs and produces the same output as sequential
algorithm, but the kNN operations performed are parallel for
performance achievement.
The following steps illustrate the parallel operations of kNN
algorithm.
CPU:
1. Load data set to global memory
2. Decide ‘k’ for the cluster definition/generation
3. Transfer dataset to GPU memory
GPU:
4. For all labelled instance (ai,ci), compute d(ai,a)
CPU:
5. Apply input query on build model for class
prediction,
a. Sort d(ai,a) from lowest to highest,(i=1,...,n
)
b. Select ‘k’ nearest instances to x:
c. Assign to ‘x’ the most frequent class in
The algorithmic steps are illustrated a bit detail as below
Step1: Load Data
Load data from CPU memory to GPU global memory.
The data is transformed from actual dataset, to bit
represented dataset at CPU. The ‘bit’ represented
dataset is then transferred GPU memory.
Step2 : Define ‘k’, Let say k=3,
Step 3: Transform dataset to GPU memory.
// for distance computations, we used squared
Euclidean distance method.
Step 4: Make Data vector ‘D’ and Query vector ‘Q’
Vector D (Training Data) contents:
Vector Q (Query Vector) contents:
Computing an Euclidian distance in parallel using openCL
kernel __diff( )
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 371
Here, the parallel model is ready which can be used to
predict the input query class. The model is tested in step5.
This we have explained in the following four test cases.
Step 1: Find Class for each Test case
Test Case 1
Distance list is <1, 2, 2, 3, 1, 2, 3, 4, 1, 2, 2, 3>, and their
corresponding
Class vector is <1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2 >
Pairing the <Distance vector, Class vector> we get,
<(1,1), (2,2), (2,2), (3,2), (1,2), (2,2), (3,2), (4,2),
(1,1),(2,2),(2,2),(3,2) >
After sorting paring vector in ascending order distance wise,
we get
<(1,1), (1,2), (1,1), (2,2), (2,2), (2,2), (2,2), (2,2), (3,2),
(3,2), (3,2), (4,2)>;
Selecting first ‘k’ samples we get training samples with
corresponding classes as <1, 2, 1>, for k=3.
Thus using majority,we can classify this test case in class
1”.
Test Case 2
For distance list <2, 1, 3, 2, 2, 1, 4, 3, 2, 1, 3, 2>,
After sorting and selecting first ‘k’ samples, we get training
samples as <2, 5, 10> with corresponding classes as
<2,2,2>.
Thus using majority, we can classify this test case in class
“2”.
Test Case 3
Distance list is <1, 2, 2, 3, 3, 4, 1, 2, 1, 2, 2, 3>
After sorting and selecting first k samples we get training
samples as <1, 7, 9> with corresponding classes as <1, 2, 1>
Thus using majority we can classify this test case in
class“1”.
Test Case 4
Distance list is <2, 1, 3, 2, 4, 3, 2, 1, 2, 1, 3, 2>
After sorting and selecting first k samples we get training
samples as <2, 8, 10> with corresponding classes as <2,2,2>
Thus using majority we can classify this test case in
class“2”.
We have tested our proposed parallel kNN algorithm on
different UCI datasets, as listed in Table-3.
Table 3: UCI Classification Dataset
Sr.
No.
Data Set
Name
Attributes Training
Records
Testing
Records
1. Balloons 4 12 4
2.
Space
Shuttle
6 12 4
3. Seeds 7 180 30
4.
Met
Data
6 432 60
5. ILPD 8 499 80
6.
CMC
Data
7 1400 74
The experimental results shown that, the model created to
predict for target query variable, is working fine and
produces almost 85 to 95% accuracy for prediction. As
represented in Table4, the accuracy of the prediction
depends on the size of the training dataset and the value of
the ‘k’. It is also observed that, large the volume of the
training dataset, the more mature the model is. Also, large
size of ‘k’ refines clustering output for the input query, but it
does not work always. The best values of the ‘k’ for the
given dataset has to identify by multiple observation and
outputs. We have tested our parallel kNN model on
following UCI datasets, categorised for classification only.
We have proved our model for performance and
classification accuracy, as shown in Table4 and Table5
respectively.
Table 4: Parallel kNN Algorithm (Classification Accuracy)
Data Set
Name
‘k’ for
kNN
Parallel
Accuracy
Balloons 3 100 %
Space Shuttle 5 75 %
Seeds 7 86 %
Met Data 9 95 %
ILPD 7 93 %
CMC Data 11 96 %
Table:5 Parallel kNN Performance on UCI Dataset
Sr
.N
o
DataSe
t Name
‘k’
Value
Traini
ng
Record
s
Testing
Rector
s
CPU
Time GPU
Time
1
Balloon
s
3 12 4
0.102sec
9 ms
2
Space
Shuttle
5 12 4
0.157sec
13 ms
3 Seeds 7 180 30 0.477sec 21 ms
4
Met
Data
9 432 60
0.621sec
56 ms
5 ILPD 7 499 80 1.253sec 76 ms
6
CMC
Data
11 1400 74
26.72sec 1384m
s
4. CONCLUSION
kNN, the instance-based classifier operate on the premises
that classification of unknown instances can be done by
relating the unknown to the known using the distance
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 372
function. Closer the distance, proximity for the similarity of
the class is more. In order to minimize the kNN compute
time, we have designed OpenCL based parallel algorithm
for many core highly parallel architectures. The results for
our OpenCL based parkNN proves that the performance
scales up sub-linearly towards the improved performance of
parallel kNN. The classification accuracy is not identified
with the proportional value of ‘k’, the better ‘k’ is to be
identified by multiple observations only. However,
concerned to performance our parallel kNN, parkNN, we
observed satisfactory scale up.
REFERENCES
[1] Shenshen Liang; Cheng Wang; Ying Liu; Liheng
Jian, "CUKNN: A parallel implementation of K-
nearest neighbor on CUDA-enabled GPU,"
Information, Computing and Telecommunication,
2009. YC-ICT '09. IEEE Youth Conference on ,
vol., no., pp.415,418, 20-21 Sept. 2009
[2] Quansheng Kuang, and Lei Zhao, “A Practical
GPU based KNN algorithm”, Proceedings of the
Second Symposium International Computer
Science and Computational Technology(ISCSCT
’09)
[3] Garcia, V.; Debreuve, E.; Nielsen, F.; Barlaud, M.,
"K-nearest neighbor search: Fast GPU-based
implementations and application to high-
dimensional feature matching," Image Processing
(ICIP), 2010 17th IEEE International Conference
on , vol., no., pp.3757,3760, 26-29 Sept. 2010
[4] G. Nolan, Improving the k-Nearest Neighbour
Algorithm with CUDA
[5] A. S. Arefin, C. Riveros, R. Berretta, and P.
Moscato, “Gpu-fs-knn: A fast and scalable knn
computation technique using gpu,” Faculty
Postgrad Research Poster, The University of
Newcastle, Australia, Sept 2011.
[6] Ahmed Shamsul Arefin, Carlos Riveros, Regina
Berretta, Pablo Moscato, “kNN-MST-
Agglomerative: A Fast and Scalable Graph-based
Data Clustering Approach” on GPU The 7th
International Conference on Computer Science &
Education (ICCSE 2012), July 14-17, 2012.
Melbourne, Australia
[7] Franco-Arcega, A.; Suarez-Cansino, J.; Flores-
Flores, L.G., "A parallel algorithm to induce
decision trees for large datasets," Information,
Communication and Automation Technologies
(ICAT), 2013 XXIV International Symposium on ,
pp.1,6, Oct. 30th - Nov. 1st , 2013.
[8] J.Zhu, G.Chen, B. Wu, "GPGPU Memory
Estimation and Optimization Targeting OpenCL
Architecture", IEEE International Conference on
Cluster Computing (CLUSTER), pp. 449-458,
Sept. 2012.
[9] Khronos OpenCL working group, “The OpenCL
Specification Version2.0”, Edited by A. Munshi,
Nov. 2013.
https://www.khronos.org/registry/cl/specs/opencl-
2.0.pdf.
BIOGRAPHIES
Valmik B Nikam is Bachelor of
Engineering (Computer Science and
Engineering) from Government College of
Engineering Aurangabad, Master of
Engineering (Computer Engineering) from
VJTI, Matunga, Mumbai, Maharashtra
state, and pursuing PhD in Computer
Department of VJTI. He was faculty at Dr. Babasaheb
Ambedkar Technological University, Lonere. He has 12
years of academic experience and 5 years of administrative
experience as a Head of Department. He has one year of
industry experience. He has attended many short-term
training programs and has been invited for expert lectures in
the workshops. Presently he is Associate Professor at
department of Computer Engineering & Information
Technology of VJTI, Matunga, Mumbai. His research
interests include Scalability of Data Mining Algorithms,
Data Warehousing, Big Data, Parallel Computing, GPU
Computing, Cloud Computing. He is member of CSI, ACM,
IEEE research organizations, and a life member of ISTE. He
has been felicitated with IBM-DRONA award in 2011.
B.B. Meshram is a Professor and Head of
Department of Computer Engineering and
Information Technology, Veermata Jijabai
Technological Institute, Matunga,
Mumbai. He is Ph.D. in Computer
Engineering. He has been in the
academics & research since 20 years. His current research
includes database technologies, data mining, securities,
forensic analysis, video processing, distributed computing.
He has authored over 203 research publications, out of
which over 38 publications at National, 91 publications at
international conferences, and more than 71 in international
journals, also he has filed eight patents. He has given
numerous invited talks at various conferences, workshops,
training programs and also served as chair/co-chair for many
conferences/workshops in the area of computer science and
engineering. The industry demanded M.Tech program on
Network Infrastructure Management System, and the
International conference “Interface” are his brain childs to
interface the industry, academia & researchers. Beyond the
researcher, he also runs the Jeman Educational Society to
uplift the needy and deprived students of the society, as a
responsibility towards the society and hence the Nation.

More Related Content

PDF
Kernel based similarity estimation and real time tracking of moving
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
PDF
Robust foreground modelling to segment and detect multiple moving objects in ...
PDF
Data clustering using kernel based
PDF
Analysis and implementation of modified k medoids
PDF
International Journal of Engineering Research and Development
PDF
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
PPTX
Incremental collaborative filtering via evolutionary co clustering
Kernel based similarity estimation and real time tracking of moving
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Robust foreground modelling to segment and detect multiple moving objects in ...
Data clustering using kernel based
Analysis and implementation of modified k medoids
International Journal of Engineering Research and Development
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
Incremental collaborative filtering via evolutionary co clustering

What's hot (18)

PPT
A scalable collaborative filtering framework based on co clustering
PDF
Clustering using kernel entropy principal component analysis and variable ker...
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
PDF
Volume 2-issue-6-2143-2147
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
PDF
Super Resolution with OCR Optimization
PDF
Parallel KNN for Big Data using Adaptive Indexing
PDF
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
PDF
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
PDF
50120140501016
PDF
Design a New Image Encryption using Fuzzy Integral Permutation with Coupled C...
PDF
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
PPTX
Co-clustering of multi-view datasets: a parallelizable approach
PDF
AN ENHANCED CHAOTIC IMAGE ENCRYPTION
PDF
A fuzzy clustering algorithm for high dimensional streaming data
PDF
T24144148
PDF
Performance Optimization of Clustering On GPU
A scalable collaborative filtering framework based on co clustering
Clustering using kernel entropy principal component analysis and variable ker...
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
Volume 2-issue-6-2143-2147
Welcome to International Journal of Engineering Research and Development (IJERD)
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Super Resolution with OCR Optimization
Parallel KNN for Big Data using Adaptive Indexing
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
50120140501016
Design a New Image Encryption using Fuzzy Integral Permutation with Coupled C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
Co-clustering of multi-view datasets: a parallelizable approach
AN ENHANCED CHAOTIC IMAGE ENCRYPTION
A fuzzy clustering algorithm for high dimensional streaming data
T24144148
Performance Optimization of Clustering On GPU
Ad

Viewers also liked (20)

PDF
Mobile robot path planning using ant colony optimization
PDF
Limnological studies related to physico chemical
PDF
Design and implementation of microstrip antenna for
PDF
Appraisal of multilevel car parking facility at kg road cbd area, bengaluru city
PDF
The growth of hsr networks around the world
PDF
Smart phone as a cloud server machine computing
PDF
Investigation on effective thermal conductivity of foams using transient plan...
PDF
Design of 3 d rc frame on sloping ground
PDF
A study on influences of lighting on resource usage in
PDF
Study on transmission energy losses and finding the hazards using what if ana...
PDF
Uw as ns design challenges in transport layer
PDF
Analysis of circular disc antenna using defective
PDF
Comparative assessment of noise levels in various laboratories and constructi...
PDF
A novel predicate for active region merging in automatic image segmentation
PDF
Assessment of groundwater quality for irrigation use
PDF
Concrete composition analysis cast as a plate solar
PDF
Pounding problems in urban areas
PDF
Directly coupled microstrip array antennas for wideband application
PDF
Performance improvement of bottleneck link in red vegas over heterogeneous ne...
PDF
Improved authentication using arduino based voice
Mobile robot path planning using ant colony optimization
Limnological studies related to physico chemical
Design and implementation of microstrip antenna for
Appraisal of multilevel car parking facility at kg road cbd area, bengaluru city
The growth of hsr networks around the world
Smart phone as a cloud server machine computing
Investigation on effective thermal conductivity of foams using transient plan...
Design of 3 d rc frame on sloping ground
A study on influences of lighting on resource usage in
Study on transmission energy losses and finding the hazards using what if ana...
Uw as ns design challenges in transport layer
Analysis of circular disc antenna using defective
Comparative assessment of noise levels in various laboratories and constructi...
A novel predicate for active region merging in automatic image segmentation
Assessment of groundwater quality for irrigation use
Concrete composition analysis cast as a plate solar
Pounding problems in urban areas
Directly coupled microstrip array antennas for wideband application
Performance improvement of bottleneck link in red vegas over heterogeneous ne...
Improved authentication using arduino based voice
Ad

Similar to Parallel k nn on gpu architecture using opencl (20)

PDF
Big data classification based on improved parallel k-nearest neighbor
PPTX
KNN Classificationwithexplanation and examples.pptx
PDF
Natural Language Processing of applications.pdf
PPTX
KNN Classifier
PPTX
K-Nearest Neighbor Classifier
PDF
tghteh ddh4eth rtnrtrgthgh12500123196.pdf
PDF
Implementation of K-Nearest Neighbor Algorithm
PPTX
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
PPTX
Dynamically updated parallel k-NN
PPTX
artificial intelligence.pptx
PPTX
Instance based learning
PPTX
K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
PPTX
K- Nearest Neighbor Approach
PDF
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
PDF
Unit 1 KNN.pptx one of the important chapters
PPTX
K-Nearest Neighbor(KNN)
PPTX
KNN Algorithm using C++
PPTX
K Nearest Neighbor Algorithm
PDF
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
PPTX
KNN.pptx
Big data classification based on improved parallel k-nearest neighbor
KNN Classificationwithexplanation and examples.pptx
Natural Language Processing of applications.pdf
KNN Classifier
K-Nearest Neighbor Classifier
tghteh ddh4eth rtnrtrgthgh12500123196.pdf
Implementation of K-Nearest Neighbor Algorithm
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
Dynamically updated parallel k-NN
artificial intelligence.pptx
Instance based learning
K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
K- Nearest Neighbor Approach
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Unit 1 KNN.pptx one of the important chapters
K-Nearest Neighbor(KNN)
KNN Algorithm using C++
K Nearest Neighbor Algorithm
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
KNN.pptx

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PPTX
additive manufacturing of ss316l using mig welding
PDF
Digital Logic Computer Design lecture notes
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
composite construction of structures.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Construction Project Organization Group 2.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PPT on Performance Review to get promotions
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
additive manufacturing of ss316l using mig welding
Digital Logic Computer Design lecture notes
Internet of Things (IOT) - A guide to understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
composite construction of structures.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Construction Project Organization Group 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT on Performance Review to get promotions
Operating System & Kernel Study Guide-1 - converted.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Foundation to blockchain - A guide to Blockchain Tech
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
UNIT 4 Total Quality Management .pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx

Parallel k nn on gpu architecture using opencl

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 367 PARALLEL kNN ON GPU ARCHITECTURE USING OpenCL V.B.Nikam1 , B.B.Meshram2 1 Associate Professor, Department of Computer Engineering and Information Technology, Jijabai Technological Institute, Matunga, Mumbai, Maharashtra, India 2 Professor, Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Matunga, Mumbai, Maharashtra, India Abstract In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN search remains the bottleneck for kNN classification, especially in high dimensional spaces. This bottleneck has created the necessity of the parallel kNN on commodity hardware. GPU and OpenCL architecture are the low cost high performance solutions for parallelising the kNN classifier. In regard to this, we have designed, implemented our proposed parallel kNN model to improve upon performance bottleneck issue of kNN algorithm. In this paper, we have proposed parallel kNN algorithm on GPU and OpenCL framework. In our approach, we distributed the distance computations of the data points among all GPU cores. Multiple threads invoked for each GPU core. We have implemented and tested our parallel kNN implementation on UCI datasets. The experimental results show that the speedup of the KNN algorithm is improved over the serial performance. Keywords: kNN, GPU, CPU, Parallel Computing, Data Mining, Clustering Algorithm. --------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION In data mining, classification is the problem of identifying set of categories/sub-populations from new observations; based on a training dataset containing observations/instances whose category membership known in advance. The data mining algorithm performs classification is called classifier. Classifier is a supervised learning category. In supervised learning, input set of observations has an effect on output set of observations; the model includes the mediating variables between the input and output variables. kNN uses a continuous data for building classification model. Being the capability of handling large datasets for solving the big problems, kNN has been widely accepted in research and industry applications. Though, kNN has been accepted widely; due to the large datasets, performance became the performance bottleneck for the algorithm. kNN classifier is an instance based learning (IBL) which follows case based reasoning (CBR) approach to solve new problems. This learning approach is a lazy learning. kNN classifier is the most widely used classifier in the data mining domain, and has been accepted for wide range and volume of datasets. A new instance is classified as the most frequent class of its ‘K’ nearest neighbours. OpenCL is a generic many core computing programming framework used for parallelizing applications on the GPU’s parallel architecture. OpenCL and GPU is most preferred for parallelizing kNN. It is observed that, kNN is embarrassingly parallel in nature and a good candidate for parallelism. In this paper, we have explored the design and experimental results of our parallel kNN on GPU and OpenCL architecture. We have proposed, implemented and tested OpenCL based parallel kNN algorithm on GPU platform. We observed that, our proposed parallel kNN on OpenCL & GPU platform performs well over the sequential performance of kNN. In addition to the performance, the accuracy of our proposed kNN algorithm is achieved to the level of satisfaction. This paper is organised as below: Section2 discusses the literature survey on kNN classifier, related work in parallel kNN algorithm, OpenCL and GPU technology. Section3 describes our methodology on OpenCL framework for parallel kNN algorithm. Section4 discusses on result analysis. We have observed the results for scaleup on performance. Section5 is the conclusion of our work and the paper. 2. LITERATURE SURVEY The basic process of sequential kNN algorithm is as follows: First, the data pre-processing phase is to initialize the labelled dimensional training dataset as well as the test dataset to be classified. Second, select one test point in the test dataset and calculate the distances between it and each point in the train dataset. The next phase is to sort the results of distances computed, and find out ‘k’ smallest results. The fourth step is to determine the class label of the test point by the election result of ‘k’ points. Thus, selecting another point in the test dataset and repeat the process to classify the test dataset[5].
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 368 The pseudo-code for the sequential kNN algorithm is as follows, Algorithm: kNN Input: Dataset, D = {(x1,c1), . . . , (xN,cN)}, Input query t = (x1, . . . ,xn), k- number of neighbour. Output: class ‘c’ to be identified for new instance of dataset ‘t’ Begin For each labelled instance (xi, ci) calculate d(xi, x) Order d(xi,x) from lowest to highest, (i = 1, . . . ,N) Select ‘K’ nearest instances to x: Assign to x the most frequent class in End If we have 4 test cases and 12 training points, then in traditional algorithm we visit each training point 4 times and will be perform total 48 serial operations. Thus, for ‘m’ test cases, and ‘n’ training points there would be O(mn) time complexity. The time complexity is very high if test cases and training points become very large in volume. This has created the need to parallelize operations for kNN algorithm. The compute intensive operational part of kNN algorithm taken up for the parallel and simultaneous operations, can lead to reasonable reduction in compute time. The optimal value for ‘k’ identified in the range of 3 to 10. In general, large ‘k’ value is more precise as it reduces the overall noise. kNN has been used in statistical methodologies and pattern recognition since beginning. ‘k’ nearest neighbours measured by a distance function. Here we have used Euclidian distance. The Euclidian distance measured between two points, ‘a’ and ‘b’, is shown below in equation (1). d(a,b) = .........................................(1) In addition to Euclidian distance, there are Manhattan distance, Minkowski distance are also used for measuring distances. These distance measures are only used for continuous data variables, however, in case of categorical variables the Hamming distance must be used. If there is continuous and categorical variables exist in the dataset, standardization of the numerical variables between 0 and 1 is done to achieve same type data values for the entire dataset. This is also called as normalization of dataset. 2.1 OpenCL Framework Due to tremendous computing power, General Purpose computing on GPU (GPGPU) has been widely adopted by the developers and researchers [7]. Khronos group has proposed OpenCL (Open Computing Language) framework for programming many core architecture [8]. OpenCL is an open industry standard for general-purpose parallel programming. OpenCL device ‘kernels’ contains computation to be carried out in parallel. In order to achieve parallel computation, many work-items, work groups, etc the components of OpenCL data parallel execution model, as shown in the 0, spawns on multiple data items [9]. Fig 1: Data parallel execution using OpenCL The data parallelisation task happens through the work items that executes in work-groups as shown in 0 For parallel computing and performance, OpenCL memory hierarchy has four types of device memories viz private, global, constant and local memories. Efficient use of each of them need to be done for high performance achievement [7][8]. 2.2 Related Work on Parallel kNN kNN is widely used classification algorithm and very parallel friendly because of number of independent operations. It is required to scan all the objects for any new entry. It is known for lazy learning as all the time dataset needs to be scanned. When training and test dataset size is large, the speed will be quite slow which makes it the best candidate to be processed in parallel. S. Liang , C. Wang, Y. Liu and L. Jian (2009) have focused on parallelizing sorting and distance calculation process due to high compute intensiveness. Distances between the current object and all the data set objects are calculated in parallel. They made two kernels for the same. Sorting Kernel will find the rank of particular object in data set.[1]. Q. Kuang and L. Zhao (2009) proposed parallel scheme for processing testing data set and training data set by dividing computation of result data set. During distance calculation, result data-set is divided into tiles and each tiles of width T is processed by an independent block in parallel. Each block contains TxT thread each of which calculates the single distance[2]. Garcia V. Debreuve E. Nielsen F. and Barlaud M. (2010) suggested CUDA based and CUBLAS based approaches for the parallelism of kNN. The approach except sorting is almost similar to S. Liang et al[1] approach. CUBLAS implementation proved high speed up [3] for performance. G. Nolan proposed improved bubble sort on CUDA for kNN process. they stated iterative version of bitonic sort used for parallelisation[4]. A. Arefin, C. Riveros, R. Berretta and P. Moscato (2011) proposed fast and scalable kNN algorithm on GPU for very large data-sets. In their strategy, they divided reluctant distance matrix into the chunks of size, which can fit into the GPU’s onboard memory. This
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 369 approach is independent of the size of the distance matrix and overall data size[5]. A. Arefin, et. al. (2012) proposed graph based agglomerative clustering method based on kNN graph and Boruvka’s algorithm on GPU[6] . 3. PROPOSED METHOLOLOGY In our proposed parallel kNN algorithm, we have proposed the methodology on OpenCL framework. Our model will performs best on high-end GPU platform, also, the model responds to the multi core commodity architecture. In our parallel kNN model, we compute distance between each training case and all test cases concurrently. Hence, computes all distances in parallel in a step. We extended this concept to calculate distance between all training cases and all testing cases in parallel using many cores. These computations we have taken up on many core GPU platform, and developed kernels in OpenCL to compute the task in parallel. Finding distance ‘d’ for finding nearest neighbour is the crucial and compute intensive task in kNN classifier. We have ported the ‘distance’ computation on GPU to perform the parallel operation, which has resulted to considerable improvement in kNN performance. The major two kernels work for parallelising the computation on multiple threads on GPU, 1.Distance computation to find the distance between the individual attributes points and 2.Sum up the partial computed distance to find the total sum to get the distance between the points for finding the nearest neighbouring points among the available. Fig 2: Proposed Parallel kNN flowchart The compute intensive step in kNN is distance computations, which is done on GPU where as others are performed on CPU. The pseudo code for our parallel kNN algorithm is as follows, Algorithm: parkNN Input: Dataset, D = {(a1,c1), . . . , (xn,cn)}, Input query, t = (a1, . . . ,xn), k- number of neighbour. Output: Class ‘c’ to be identified for new instance of dataset‘t’ Begin On CPU: 1. Load data set to global memory 2. Decide ‘k’ for the cluster definition/generation 3. Transfer dataset to GPU memory On GPU: 4. For all labelled instance (ai, ci), compute d(ai, a) On CPU: 5. Sort d(ai,a) from lowest to highest, ( i = 1, . . . ,n) 6. Select ‘k’ nearest instances to x: 7. Assign to x the most frequent class in End Distributing difference computations on many core GPU performs computations faster due to parallelising the difference computations. The following OpenCL kernels demonstrate parallel computation on GPU cores Kernel for Parallel difference computation __kernel void diff(__global const float * query, __global const float *data, __global const int *nVal, __global float * diff) { int id = get_global_id(0); int indexCent = id % ( nVal[0] * nVal[1] ); int p = id % nVal[0]; int q = (( id / (nVal[0] * nVal[1]) ) % nVal[2] ) * nVal[0]; diff[ id ] = (query[indexCent] - data[ p + q ]) * (query[indexCent] - data[ p + q ]); } nVal contains following parameters, 0 -> #Attributes, 1 -> #Query , 2 -> #Data Points, # Workers = (#Attributes * #Query * #Data Points) Kernel for Parallel Sum Computations __kernel void Sum(__global const float * data, __global float *sum, __global const int *nVal) { int id = get_global_id(0); int index = id + (( id / nVal[2] ) * nVal[1] ); if( (id % nVal[2]) == (nVal[0] - nVal[2])) sum[id] = data[index]; else sum[id] = data[index] + data[(index + nVal[2])]; } nVal contains following parameters, 0 -> Data Point Length (DPL), 1 -> DPL / 2, 2 -> Offset , Offset = DPL/2 + DPL%2, # workers = offset * # Data Points The parallel computation of the difference computations in two points, A & B, is illustrated in 0, as shown below. The OpenCL kernels written for parallel performance on GPU
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 370 device, and are achieving computation methodology as shown in 0. Fig 3: Distance calculation between two points in Parallel Parallel sum kernel as described in kernel ‘sum’. Multi-step execution (partial data) for parallel sum calculation generated partial results on each level which further used as inputs to next level of parallel computations. Illustrative Example: We illustrate working of our proposed parallel kNN algorithm with the UCI Balloon dataset, as described in Table1. The attribute values of dataset are transformed into bit-mapped represented datasets. The sample dataset split into training dataset and test dataset with the ratio of 70:30 respectively. For UCI balloon dataset 12 records are taken as training dataset, and 4 instances are taken as test datasets. The training dataset is used to build model, however test dataset used to test the built model for records to classify the query instance. The test data set is shown in Table1, and the training dataset is shown in Table2, as below. The balloon dataset has 4 attributes, 16 records, and is classified in two classes. Table: 1 Balloons Dataset. (TEST) Sr. No. Color Size Act Age Class 1 1 2 1 1 1 2 1 2 1 2 2 3 2 1 1 1 1 4 2 1 1 2 2 Table: 2 Balloons Dataset (TRAINING): Bitmap representation Sr. No. Color Size Act Age Class 1 1 1 1 1 1 2 1 1 1 2 2 3 1 1 2 1 2 4 1 1 2 2 2 5 1 2 2 1 2 6 1 2 2 2 2 7 2 1 2 1 2 8 2 1 2 2 2 9 2 2 1 1 1 10 2 2 1 2 2 11 2 2 2 1 2 12 2 2 2 2 2 Our proposed parallel kNN algorithm takes into account the same inputs and produces the same output as sequential algorithm, but the kNN operations performed are parallel for performance achievement. The following steps illustrate the parallel operations of kNN algorithm. CPU: 1. Load data set to global memory 2. Decide ‘k’ for the cluster definition/generation 3. Transfer dataset to GPU memory GPU: 4. For all labelled instance (ai,ci), compute d(ai,a) CPU: 5. Apply input query on build model for class prediction, a. Sort d(ai,a) from lowest to highest,(i=1,...,n ) b. Select ‘k’ nearest instances to x: c. Assign to ‘x’ the most frequent class in The algorithmic steps are illustrated a bit detail as below Step1: Load Data Load data from CPU memory to GPU global memory. The data is transformed from actual dataset, to bit represented dataset at CPU. The ‘bit’ represented dataset is then transferred GPU memory. Step2 : Define ‘k’, Let say k=3, Step 3: Transform dataset to GPU memory. // for distance computations, we used squared Euclidean distance method. Step 4: Make Data vector ‘D’ and Query vector ‘Q’ Vector D (Training Data) contents: Vector Q (Query Vector) contents: Computing an Euclidian distance in parallel using openCL kernel __diff( )
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 371 Here, the parallel model is ready which can be used to predict the input query class. The model is tested in step5. This we have explained in the following four test cases. Step 1: Find Class for each Test case Test Case 1 Distance list is <1, 2, 2, 3, 1, 2, 3, 4, 1, 2, 2, 3>, and their corresponding Class vector is <1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2 > Pairing the <Distance vector, Class vector> we get, <(1,1), (2,2), (2,2), (3,2), (1,2), (2,2), (3,2), (4,2), (1,1),(2,2),(2,2),(3,2) > After sorting paring vector in ascending order distance wise, we get <(1,1), (1,2), (1,1), (2,2), (2,2), (2,2), (2,2), (2,2), (3,2), (3,2), (3,2), (4,2)>; Selecting first ‘k’ samples we get training samples with corresponding classes as <1, 2, 1>, for k=3. Thus using majority,we can classify this test case in class 1”. Test Case 2 For distance list <2, 1, 3, 2, 2, 1, 4, 3, 2, 1, 3, 2>, After sorting and selecting first ‘k’ samples, we get training samples as <2, 5, 10> with corresponding classes as <2,2,2>. Thus using majority, we can classify this test case in class “2”. Test Case 3 Distance list is <1, 2, 2, 3, 3, 4, 1, 2, 1, 2, 2, 3> After sorting and selecting first k samples we get training samples as <1, 7, 9> with corresponding classes as <1, 2, 1> Thus using majority we can classify this test case in class“1”. Test Case 4 Distance list is <2, 1, 3, 2, 4, 3, 2, 1, 2, 1, 3, 2> After sorting and selecting first k samples we get training samples as <2, 8, 10> with corresponding classes as <2,2,2> Thus using majority we can classify this test case in class“2”. We have tested our proposed parallel kNN algorithm on different UCI datasets, as listed in Table-3. Table 3: UCI Classification Dataset Sr. No. Data Set Name Attributes Training Records Testing Records 1. Balloons 4 12 4 2. Space Shuttle 6 12 4 3. Seeds 7 180 30 4. Met Data 6 432 60 5. ILPD 8 499 80 6. CMC Data 7 1400 74 The experimental results shown that, the model created to predict for target query variable, is working fine and produces almost 85 to 95% accuracy for prediction. As represented in Table4, the accuracy of the prediction depends on the size of the training dataset and the value of the ‘k’. It is also observed that, large the volume of the training dataset, the more mature the model is. Also, large size of ‘k’ refines clustering output for the input query, but it does not work always. The best values of the ‘k’ for the given dataset has to identify by multiple observation and outputs. We have tested our parallel kNN model on following UCI datasets, categorised for classification only. We have proved our model for performance and classification accuracy, as shown in Table4 and Table5 respectively. Table 4: Parallel kNN Algorithm (Classification Accuracy) Data Set Name ‘k’ for kNN Parallel Accuracy Balloons 3 100 % Space Shuttle 5 75 % Seeds 7 86 % Met Data 9 95 % ILPD 7 93 % CMC Data 11 96 % Table:5 Parallel kNN Performance on UCI Dataset Sr .N o DataSe t Name ‘k’ Value Traini ng Record s Testing Rector s CPU Time GPU Time 1 Balloon s 3 12 4 0.102sec 9 ms 2 Space Shuttle 5 12 4 0.157sec 13 ms 3 Seeds 7 180 30 0.477sec 21 ms 4 Met Data 9 432 60 0.621sec 56 ms 5 ILPD 7 499 80 1.253sec 76 ms 6 CMC Data 11 1400 74 26.72sec 1384m s 4. CONCLUSION kNN, the instance-based classifier operate on the premises that classification of unknown instances can be done by relating the unknown to the known using the distance
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 10 | Oct-2014, Available @ http://www.ijret.org 372 function. Closer the distance, proximity for the similarity of the class is more. In order to minimize the kNN compute time, we have designed OpenCL based parallel algorithm for many core highly parallel architectures. The results for our OpenCL based parkNN proves that the performance scales up sub-linearly towards the improved performance of parallel kNN. The classification accuracy is not identified with the proportional value of ‘k’, the better ‘k’ is to be identified by multiple observations only. However, concerned to performance our parallel kNN, parkNN, we observed satisfactory scale up. REFERENCES [1] Shenshen Liang; Cheng Wang; Ying Liu; Liheng Jian, "CUKNN: A parallel implementation of K- nearest neighbor on CUDA-enabled GPU," Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on , vol., no., pp.415,418, 20-21 Sept. 2009 [2] Quansheng Kuang, and Lei Zhao, “A Practical GPU based KNN algorithm”, Proceedings of the Second Symposium International Computer Science and Computational Technology(ISCSCT ’09) [3] Garcia, V.; Debreuve, E.; Nielsen, F.; Barlaud, M., "K-nearest neighbor search: Fast GPU-based implementations and application to high- dimensional feature matching," Image Processing (ICIP), 2010 17th IEEE International Conference on , vol., no., pp.3757,3760, 26-29 Sept. 2010 [4] G. Nolan, Improving the k-Nearest Neighbour Algorithm with CUDA [5] A. S. Arefin, C. Riveros, R. Berretta, and P. Moscato, “Gpu-fs-knn: A fast and scalable knn computation technique using gpu,” Faculty Postgrad Research Poster, The University of Newcastle, Australia, Sept 2011. [6] Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, Pablo Moscato, “kNN-MST- Agglomerative: A Fast and Scalable Graph-based Data Clustering Approach” on GPU The 7th International Conference on Computer Science & Education (ICCSE 2012), July 14-17, 2012. Melbourne, Australia [7] Franco-Arcega, A.; Suarez-Cansino, J.; Flores- Flores, L.G., "A parallel algorithm to induce decision trees for large datasets," Information, Communication and Automation Technologies (ICAT), 2013 XXIV International Symposium on , pp.1,6, Oct. 30th - Nov. 1st , 2013. [8] J.Zhu, G.Chen, B. Wu, "GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture", IEEE International Conference on Cluster Computing (CLUSTER), pp. 449-458, Sept. 2012. [9] Khronos OpenCL working group, “The OpenCL Specification Version2.0”, Edited by A. Munshi, Nov. 2013. https://www.khronos.org/registry/cl/specs/opencl- 2.0.pdf. BIOGRAPHIES Valmik B Nikam is Bachelor of Engineering (Computer Science and Engineering) from Government College of Engineering Aurangabad, Master of Engineering (Computer Engineering) from VJTI, Matunga, Mumbai, Maharashtra state, and pursuing PhD in Computer Department of VJTI. He was faculty at Dr. Babasaheb Ambedkar Technological University, Lonere. He has 12 years of academic experience and 5 years of administrative experience as a Head of Department. He has one year of industry experience. He has attended many short-term training programs and has been invited for expert lectures in the workshops. Presently he is Associate Professor at department of Computer Engineering & Information Technology of VJTI, Matunga, Mumbai. His research interests include Scalability of Data Mining Algorithms, Data Warehousing, Big Data, Parallel Computing, GPU Computing, Cloud Computing. He is member of CSI, ACM, IEEE research organizations, and a life member of ISTE. He has been felicitated with IBM-DRONA award in 2011. B.B. Meshram is a Professor and Head of Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Matunga, Mumbai. He is Ph.D. in Computer Engineering. He has been in the academics & research since 20 years. His current research includes database technologies, data mining, securities, forensic analysis, video processing, distributed computing. He has authored over 203 research publications, out of which over 38 publications at National, 91 publications at international conferences, and more than 71 in international journals, also he has filed eight patents. He has given numerous invited talks at various conferences, workshops, training programs and also served as chair/co-chair for many conferences/workshops in the area of computer science and engineering. The industry demanded M.Tech program on Network Infrastructure Management System, and the International conference “Interface” are his brain childs to interface the industry, academia & researchers. Beyond the researcher, he also runs the Jeman Educational Society to uplift the needy and deprived students of the society, as a responsibility towards the society and hence the Nation.