SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1336
Empirical Analysis of Radix Sort using Curve Fitting Technique in
Personal Computer
Arijit Chakraborty1, Sanchari Banerjee2, Avik Mitra3, Dipankar Das4
1,3,4Assistant Professor, Department of BCA(H), The Heritage Academy
Chowbaga, Anandapur, East Kolkata Township, Kolkata-700107, India
2Student, Department of Information Technology, Heritage Institute of Technology
Chowbaga, Anandapur, East Kolkata Township, Kolkata-700107, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The proposed research article aims at analyzing
empirically a non comparative integer sorting algorithmsuch
as radix sort using known curve fitting technique(s) in a
domestic computing machine (laptop)throughvariousknown
curve fitting models using time performance as a metric. We
have used eleven best known models to observethebehavioral
pattern of radix sort on the fly and concluded that power
model is the candidate model for best fit.
Keywords: Curve fitting, Empirical Analysis, Power Model,
Radix Sort, Performance Analysis
1. INTRODUCTION
Sorting is an art of arranging items and almost all computing
machines can sort data items, many areavailablereadilyand
many are yet to be explored, it is a well known fact that no
key comparison based sorting algorithms can sort N no. of
keys lesser than O(NlogN) operations with some require
O(N2) operations in worst case. We did picked radix sort
amongst many due to a beautiful feature being N of keys can
be sorted in O(N) operations. History of Radix sort dates
back as far as 1887 credit goes to the work of Herman
Hollerith [16] in tabulating machines. This paper aims at
finding the most suitable curve that can be fitted on time
generated data in computing machines used in common
households on day to day basis, curve fitting techniquegives
us a platform to analyze and visualize experimental data
which may give further insight on the behavioral pattern of
radix sort. There is and always will be a scope of data
refinement as we have not considered many effects of
hardware architecture that plays a pivotal roleingenerating
such data, we just want it to keep it simple. In this paper we
found that out of many models power model is an ideal
model to fit the time data in personal computer(s).
2. Related Work
Intermediate step of Radix sort [1] uses the valueofa digitat
a given position to determine the position of the number in
intermediate array; this array, in its final iteration, becomes
the sorted array. The scanning of the digits can either start
from left to right or right to left. Left to right scanning during
radix sort is termed as Most Significant Digit (MSD) radix
sort, whereas, right to left scanning during radix sort is
termed as Least Significant Digit (LSD) radix sort. LSD radix
sorts [2] use queue to store the numbers where the position
in the queue is based on the present digit being scanned.
MSD radix sorts uses bins or buckets to store the numbers
where the bucket in which the number is to be stored is
determined by the digit presently being scanned; for
example, 412 and 032 in list should be stored in bucket
numbers 4 and 0 respectively; in the next iteration, sub-
buckets are allocated for each bucket and the allocation of
the number in the sub-bucket is determined by the digit in
the next scanned position; the procedure recursively
continues to get a set of sub-bucketsinordereachcontaining
a number, and hence the numbers gets sorted when
numbers are extracted in order of the sub-buckets if the
buckets. Since the scanning is a sequential process, it slows
down the actual running time, although the run-time order
remains O(kn), where k is the number of digits in a number
and n is the size of data to be sorted. Moreover, the
allocation of sub-buckets in each recursive stages of MSD
radix sorts can result exponential growth of the allocated
space. To reduce the actual running-time in radix sorts in
scanning of digits and allocation of space [3] proposed
parallel radix sort where buckets are allocated at each
processor of a multi-core processing system, the numbers
are then moved between the buckets in subsequent
iterations. Though the parallel version of the radix sort
reduces the run-time, it suffers from time lag in exchange of
the numbers between the buckets. Moreover, if the buckets
contents vary, there will be further waste of CPU cycles to
deal with asymmetric inter-processor transfer of numbers,
called load imbalance. To reduce the load imbalance, load
balanced parallel radix sort [4] proposed to splitthebuckets
into multiple processors so that the count of the numbers in
each processor remains equal to each other thus improving
the run-time significantly. Partitioned parallel radix sort [2]
is proposed where the communication overhead is reduced
by parallelizing the MSD radixsort.Non-recursiveMSDradix
sort [5] reduces the space overhead in radix sort by using
two sets of identical buckets for each digit, and sorting the
numbers in each bucket using Quicksort [6], and then
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1337
transferring the contents of each bucket to another in
alternative fashion. Since space overhead is reduced, the
communication overhead in transferring the numbers
between the buckets is also reduced, thus reducing the run-
time. The dependency of run-time of LSD radix sortinvector
multi-processor environment is analyzed in [7] where
empirical formulation results also revealed dependency on
the number of processors actually used in the sorting
process. Quicksort to sort numbers in MSD radixsortineach
buckets is again employed in [8], called Matesort, thus
eliminating the need of allocation of sub-buckets. To reduce
the space overhead in LSD radix sort in GPUs, [9] proposed
to maintain a global count of the numbers for a set of
processors, so that the count can be used to determine to
position of a number in the final sorted array, resulting 20%
speed up of the algorithm.
All discussed implementations has used specialized
computing environment like CRAY [10] to report the run-
time. However, with the growth of computing needs, it is
expected the sorting is not the only computation to be
performed and any application employing sorting has to be
executed with the other algorithms. Therefore, it is intuitive
to use common computation environment like the personal
computing system for analyzing the run-timeof radixsort.In
the next section, we analyze the actual run-time of the radix
sort algorithm in personal computing system.
3. OBJECTIVES OF THE STUDY
To identify the best curve that can be fitted to the
experimental data points (Run time versus Data size)
obtained by running Radix sort in the worst case in personal
computer and to propose a mathematical model of the best
fitted curve.
4. RESEARCH METHODOLOGY
The steps of the research methodology are given below:
Step 1: The Radix sort algorithm is implemented as a C
programme with data size varying from 10000 to 27000
with interval of 500. For each data size, the programme is
run 100 times and their average run-time is taken.
Step 2: We have used curve fitting technique to find the best
curve that can be fitted to the data pointsi.e.Runtimeversus
Data size. In the present study we have opted R square,
Adjusted R square and Root Mean Square Error (RMSE) as
the ‘Goodness of fit’ statistics [11][12][15].Themodel which
has highest R square value, highest Adjusted R square value
and lowest RMSE has been selected as the candidate model
for the best curve for the dataset [11][12][15].
Step 3: The normality tests of the residuals of the candidate
model are carried out in this step. We have considered both
graphical methods (Histogram analysis of the residuals &Q-
Q plot analysis of the residuals) [11][12] and quantitative
method (Shapiro – Wilk test statistics of the residuals)
[13][14] for this purpose. We should observe a symmetric
bell shaped curve around the histogram, a linear pattern of
the points on the Q-Q plot and the significance of Shapiro –
Wilk statistics higher than .05 to meet the assumption of
normality of error distribution.
Software used: We have used GCC compiler of Dev-C++ 4.0
under Windows XP to generate the experimental data SPSS
have been used for data analysis.
Hardware used: Intel Core 2 Duo CPU T6570 with frequency
of 2.1 GHz with 3 GB RAM (having frequency of 1.19 GHz).
5. DATA ANALYSIS & FINDINGS
The Sample dataset is given in the following table:
Table -1: Data Table
Sl No. Data Size Run Time (milliseconds)
1 10000 1292.49
2 10500 1319.71
3 11000 1463.44
4 11500 1654.06
5 12000 1772.01
6 12500 1921.54
7 13000 2103.61
8 13500 2326.9
9 14000 2529.23
10 14500 2691.85
11 15000 2859.07
12 15500 3116.67
13 16000 3339.06
14 16500 3326.1
15 17000 3601.71
16 17500 3805.93
17 18000 4131.39
18 18500 4367.04
19 19000 3846.41
20 19500 3988.74
21 20000 4198.6
22 20500 4581.72
23 21000 4517.81
24 21500 5006.43
25 22000 5272.06
26 22500 5667.37
27 23000 6187.86
28 23500 6220
29 24000 6636.72
30 24500 6721.21
31 25000 6877.31
32 25500 7288.3
33 26000 7447.19
34 26500 7436.41
35 27000 7836.4
Identification of the best curve that can be fitted to the data
points:
The ‘Goodness of fit’ statistics of the Run time versus Data
size is given below:
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1338
Table -2: Goodness of Fit Statistics Table
Model Name R Square Adjusted R Square RMSE
Linear .9797 .9791 288.6211
Logarithmic .9412 .9394 491.338
Inverse .8733 .8695 721.1321
Quadratic .9886 .9878 220.0855
Cubic .9886 .9878 220.0855
Compound .9661 .9651 .0999
Power .9897 .9894 .055
S .981 .9804 .0747
Growth .9661 .9651 .0999
Exponential .9661 .9651 .0999
Logistic .8051 .7992 .729
Findings: From the above table we found that out of eleven
(11) tried models, ten (10) models are having very high R
square and very high Adjusted R square. Out of these ten
(10) models, five (5) models are having low RMSE. We
observe that the Power model is having highest R square
value (.9897), highest Adjusted R square value (.9894) and
lowest RMSE value (.055). Therefore, we have selected the
Power model as the candidate model for the best curve for
this dataset.
The eleven (11) tried models are depicted in the following
figure:
Chart -1: Chart of Eleven Models
The normality test of the residuals of the candidate model is
given below:
(a) Histogram of the residuals –
Chart -2: Histogram of the residuals
Observations: From the above figure we have observed a
symmetric bell shaped curve around the histogram which is
approximately evenly distributed around zero.
(b) Q-Q plot of the residuals –
Chart -3: Q-Q plot of the residuals
Observations: From the above figure we have observed that
the points on the Q-Q plot are approximately linear.
(c) Shapiro – Wilk (SW) test statistics of the residuals –
Table -3: SW Test Statistics Table
Model
Shapiro-Wilk
Statistic df Sig.
Power .971 35 .463
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1339
Observations: From the above table we have observed that
the significance of SW statistics is .463 (higher than .05).
Findings of the normality test of the residuals of the
candidate model: From the above observations i.e. (a)
Histogram of the residuals, (b) Q-Q plot of the residuals and
(c) Shapiro – Wilk (SW) test statistics of the residuals, we
have found that the residuals are approximately normally
distributed.
The proposed mathematical model is given below:
Y = 7.274680973837771e-005 * X**1.813662220055146
Here,
Y = Run Time
X = Data Size
The plot of the above model is given below:
Chart -4: Final Concluded Model (Power Model)
6. CONCLUSION:
We have analyzed run-time behavior of Radix sort in its
average case. Where we found that power model (~
C*x**1.81) fits the data. Our analysis has considered
everyday computing scenario where executing the sorting
algorithm will not be the only task performed by a
computing device, that is, we have not taken into the factors
of cache misses, OS context switch etc. In future, we would
like to explore the effects of OS context switches on the
performance of Radix sort and shall try to propose an
empirical model that will give account of such events.
REFERENCES
[1] radix sort, National Institute of Standards and
Technology, [online],
https://xlinux.nist.gov/dads//HTML/radixsort.html
[2] S.J. Lee, M. Jeon, D. Kim, “PartitionedParallel RadixSort”,
vol 190, pp 160-171, LNCS High Performance
Computing, April, 2001.
[3] A. Maus, “A Full Parallel Radix Sorting Algorithm for
Multicore Processors”, NIK 2011.
[4] A. Sohn, Y. Kodama, “Load Balanced Parallel RadixSort”,
pp 3015-312, Proceedings of the 12th International
Conference of Supercomputing, 1998.
[5] A.A.Aydin, G. Alaghband, “Sequential and Hybrid
Approach for non-recursive Most SignificantDigitRadix
Sort”, International Conference on Applied Computing
2013.
[6] C.A.R.Hoare, “Quicksort”, vol. 5, issue 1, pp 10-16, The
Computer Journal, 1962.
[7] M. Zagha, G.E. Blelloch, “Radix Sort for Vector
Multiprocessors”, pp 712-721, Proceedings of the
ACM/IEEE conference of Supercomputing, 1991.
[8] N.A.Darwish, “Formulation andAnalysisofin-placeMSD
Radix sort Algorithms”, vol. 31, no. 6, pp 467-481,
December 2005.
[9] L. Ha, J. Kruger, C.T. Silva, “Fast Four-way Parallel Radix
Sorting on GPUs”, vol. 28, no. 8, pp 2368-2378,
December 2009.
[10] CRAY-1, COMPUTER SYSTEM, HARDWARE
REFERENCE MANNUAL, 2240004, [online], http://ed-
thelen.org/comp-hist/CRAY-1-HardRefMan/CRAY-1-
HRM.html
[11] D. Das, A. Chakraborty, A. Mitra, “Sample Based
Curve Fitting Computation on the Performance of
Quicksort in Personal Computer”, vol. 5, issue 2,pp885-
891, February 2014.
[12] A. Chakraborty, A. Mitra, D. Das, “Empirical Analysis
of Merge sort in Personal Computer by Curve Fitting
Technique”, vol IV, issue IV, pp 1-6, April 2015.
[13] D. Das, P. Chakraborti, “Performance Measurement
and Management Model of Data Generation andWriting
Time in Personal Computer”, vol. 5, issue 6, pp 1218-
1226, June 2014.
[14] Testing for Normality using SPSS Statistics, Laerd
statistics, (n.d.). Retrieved May 17, 2014, from
ttps://statistics.laerd.com/spss-tutorials/testing-for-
normality-using-spss-statistics.php
[15] Evaluating Goodness of Fit – MATLAB & Simulink –
MathWorks India, MathWorks, [online],
http://www.mathworks.in/help/curvefit/evaluating-
goodness-of-fit.html
[16] The Art of Computer Programming by D. Knuth

More Related Content

PDF
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
PPTX
Qiu bosc2010
PDF
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
PDF
Paper id 25201498
PPTX
Hadoop interview questions
PPT
5.1 mining data streams
PDF
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
PDF
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
Qiu bosc2010
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
Paper id 25201498
Hadoop interview questions
5.1 mining data streams
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER

What's hot (20)

PDF
A time energy performance analysis of map reduce on heterogeneous systems wit...
PPTX
Data Streaming in Big Data Analysis
PDF
Data clustering using map reduce
PDF
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
PDF
Scalable Distributed Real-Time Clustering for Big Data Streams
PDF
Scalable and Adaptive Graph Querying with MapReduce
PDF
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
PDF
C0312023
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
PDF
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
PDF
Dynamic approach to k means clustering algorithm-2
PDF
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
PDF
Enhancing Performance and Fault Tolerance of Hadoop Cluster
PDF
Implementation of p pic algorithm in map reduce to handle big data
PDF
Introduction to Data streaming - 05/12/2014
PDF
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
PDF
Hot-Spot analysis Using Apache Spark framework
PDF
CMPE275-Project1Report
A time energy performance analysis of map reduce on heterogeneous systems wit...
Data Streaming in Big Data Analysis
Data clustering using map reduce
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable and Adaptive Graph Querying with MapReduce
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
C0312023
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Dynamic approach to k means clustering algorithm-2
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Implementation of p pic algorithm in map reduce to handle big data
Introduction to Data streaming - 05/12/2014
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
Hot-Spot analysis Using Apache Spark framework
CMPE275-Project1Report
Ad

Similar to Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Computer (20)

PDF
Modified Pure Radix Sort for Large Heterogeneous Data Set
PDF
Cr25555560
PPTX
Radix sorting
PPTX
Analysis of algorithms
PDF
Design and analysis of ra sort
PPTX
Data structure using c module 3
PPTX
Radix sort presentation
PDF
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
PPT
RadixSort.ppt
PDF
Data Structure Radix Sort
PDF
29 19 sep17 17may 6637 10140-1-ed(edit)
PDF
29 19 sep17 17may 6637 10140-1-ed(edit)
PPT
lecture 10
PPT
Counting Sort Lowerbound
PPTX
SORTTING IN LINEAR TIME - Radix Sort
PDF
Sorting and Searching Techniques
PPT
3.5 merge sort
PPTX
Advance Algorithm_unit_2_czcbcnhgjy.pptx
PDF
Radix Sorting With No Extra Space
Modified Pure Radix Sort for Large Heterogeneous Data Set
Cr25555560
Radix sorting
Analysis of algorithms
Design and analysis of ra sort
Data structure using c module 3
Radix sort presentation
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
RadixSort.ppt
Data Structure Radix Sort
29 19 sep17 17may 6637 10140-1-ed(edit)
29 19 sep17 17may 6637 10140-1-ed(edit)
lecture 10
Counting Sort Lowerbound
SORTTING IN LINEAR TIME - Radix Sort
Sorting and Searching Techniques
3.5 merge sort
Advance Algorithm_unit_2_czcbcnhgjy.pptx
Radix Sorting With No Extra Space
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
composite construction of structures.pdf
PDF
Well-logging-methods_new................
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
DOCX
573137875-Attendance-Management-System-original
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
OOP with Java - Java Introduction (Basics)
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
composite construction of structures.pdf
Well-logging-methods_new................
UNIT-1 - COAL BASED THERMAL POWER PLANTS
573137875-Attendance-Management-System-original
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT 4 Total Quality Management .pptx
bas. eng. economics group 4 presentation 1.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Operating System & Kernel Study Guide-1 - converted.pdf
Mechanical Engineering MATERIALS Selection
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
CH1 Production IntroductoryConcepts.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Sustainable Sites - Green Building Construction
OOP with Java - Java Introduction (Basics)

Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Computer

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1336 Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Computer Arijit Chakraborty1, Sanchari Banerjee2, Avik Mitra3, Dipankar Das4 1,3,4Assistant Professor, Department of BCA(H), The Heritage Academy Chowbaga, Anandapur, East Kolkata Township, Kolkata-700107, India 2Student, Department of Information Technology, Heritage Institute of Technology Chowbaga, Anandapur, East Kolkata Township, Kolkata-700107, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The proposed research article aims at analyzing empirically a non comparative integer sorting algorithmsuch as radix sort using known curve fitting technique(s) in a domestic computing machine (laptop)throughvariousknown curve fitting models using time performance as a metric. We have used eleven best known models to observethebehavioral pattern of radix sort on the fly and concluded that power model is the candidate model for best fit. Keywords: Curve fitting, Empirical Analysis, Power Model, Radix Sort, Performance Analysis 1. INTRODUCTION Sorting is an art of arranging items and almost all computing machines can sort data items, many areavailablereadilyand many are yet to be explored, it is a well known fact that no key comparison based sorting algorithms can sort N no. of keys lesser than O(NlogN) operations with some require O(N2) operations in worst case. We did picked radix sort amongst many due to a beautiful feature being N of keys can be sorted in O(N) operations. History of Radix sort dates back as far as 1887 credit goes to the work of Herman Hollerith [16] in tabulating machines. This paper aims at finding the most suitable curve that can be fitted on time generated data in computing machines used in common households on day to day basis, curve fitting techniquegives us a platform to analyze and visualize experimental data which may give further insight on the behavioral pattern of radix sort. There is and always will be a scope of data refinement as we have not considered many effects of hardware architecture that plays a pivotal roleingenerating such data, we just want it to keep it simple. In this paper we found that out of many models power model is an ideal model to fit the time data in personal computer(s). 2. Related Work Intermediate step of Radix sort [1] uses the valueofa digitat a given position to determine the position of the number in intermediate array; this array, in its final iteration, becomes the sorted array. The scanning of the digits can either start from left to right or right to left. Left to right scanning during radix sort is termed as Most Significant Digit (MSD) radix sort, whereas, right to left scanning during radix sort is termed as Least Significant Digit (LSD) radix sort. LSD radix sorts [2] use queue to store the numbers where the position in the queue is based on the present digit being scanned. MSD radix sorts uses bins or buckets to store the numbers where the bucket in which the number is to be stored is determined by the digit presently being scanned; for example, 412 and 032 in list should be stored in bucket numbers 4 and 0 respectively; in the next iteration, sub- buckets are allocated for each bucket and the allocation of the number in the sub-bucket is determined by the digit in the next scanned position; the procedure recursively continues to get a set of sub-bucketsinordereachcontaining a number, and hence the numbers gets sorted when numbers are extracted in order of the sub-buckets if the buckets. Since the scanning is a sequential process, it slows down the actual running time, although the run-time order remains O(kn), where k is the number of digits in a number and n is the size of data to be sorted. Moreover, the allocation of sub-buckets in each recursive stages of MSD radix sorts can result exponential growth of the allocated space. To reduce the actual running-time in radix sorts in scanning of digits and allocation of space [3] proposed parallel radix sort where buckets are allocated at each processor of a multi-core processing system, the numbers are then moved between the buckets in subsequent iterations. Though the parallel version of the radix sort reduces the run-time, it suffers from time lag in exchange of the numbers between the buckets. Moreover, if the buckets contents vary, there will be further waste of CPU cycles to deal with asymmetric inter-processor transfer of numbers, called load imbalance. To reduce the load imbalance, load balanced parallel radix sort [4] proposed to splitthebuckets into multiple processors so that the count of the numbers in each processor remains equal to each other thus improving the run-time significantly. Partitioned parallel radix sort [2] is proposed where the communication overhead is reduced by parallelizing the MSD radixsort.Non-recursiveMSDradix sort [5] reduces the space overhead in radix sort by using two sets of identical buckets for each digit, and sorting the numbers in each bucket using Quicksort [6], and then
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1337 transferring the contents of each bucket to another in alternative fashion. Since space overhead is reduced, the communication overhead in transferring the numbers between the buckets is also reduced, thus reducing the run- time. The dependency of run-time of LSD radix sortinvector multi-processor environment is analyzed in [7] where empirical formulation results also revealed dependency on the number of processors actually used in the sorting process. Quicksort to sort numbers in MSD radixsortineach buckets is again employed in [8], called Matesort, thus eliminating the need of allocation of sub-buckets. To reduce the space overhead in LSD radix sort in GPUs, [9] proposed to maintain a global count of the numbers for a set of processors, so that the count can be used to determine to position of a number in the final sorted array, resulting 20% speed up of the algorithm. All discussed implementations has used specialized computing environment like CRAY [10] to report the run- time. However, with the growth of computing needs, it is expected the sorting is not the only computation to be performed and any application employing sorting has to be executed with the other algorithms. Therefore, it is intuitive to use common computation environment like the personal computing system for analyzing the run-timeof radixsort.In the next section, we analyze the actual run-time of the radix sort algorithm in personal computing system. 3. OBJECTIVES OF THE STUDY To identify the best curve that can be fitted to the experimental data points (Run time versus Data size) obtained by running Radix sort in the worst case in personal computer and to propose a mathematical model of the best fitted curve. 4. RESEARCH METHODOLOGY The steps of the research methodology are given below: Step 1: The Radix sort algorithm is implemented as a C programme with data size varying from 10000 to 27000 with interval of 500. For each data size, the programme is run 100 times and their average run-time is taken. Step 2: We have used curve fitting technique to find the best curve that can be fitted to the data pointsi.e.Runtimeversus Data size. In the present study we have opted R square, Adjusted R square and Root Mean Square Error (RMSE) as the ‘Goodness of fit’ statistics [11][12][15].Themodel which has highest R square value, highest Adjusted R square value and lowest RMSE has been selected as the candidate model for the best curve for the dataset [11][12][15]. Step 3: The normality tests of the residuals of the candidate model are carried out in this step. We have considered both graphical methods (Histogram analysis of the residuals &Q- Q plot analysis of the residuals) [11][12] and quantitative method (Shapiro – Wilk test statistics of the residuals) [13][14] for this purpose. We should observe a symmetric bell shaped curve around the histogram, a linear pattern of the points on the Q-Q plot and the significance of Shapiro – Wilk statistics higher than .05 to meet the assumption of normality of error distribution. Software used: We have used GCC compiler of Dev-C++ 4.0 under Windows XP to generate the experimental data SPSS have been used for data analysis. Hardware used: Intel Core 2 Duo CPU T6570 with frequency of 2.1 GHz with 3 GB RAM (having frequency of 1.19 GHz). 5. DATA ANALYSIS & FINDINGS The Sample dataset is given in the following table: Table -1: Data Table Sl No. Data Size Run Time (milliseconds) 1 10000 1292.49 2 10500 1319.71 3 11000 1463.44 4 11500 1654.06 5 12000 1772.01 6 12500 1921.54 7 13000 2103.61 8 13500 2326.9 9 14000 2529.23 10 14500 2691.85 11 15000 2859.07 12 15500 3116.67 13 16000 3339.06 14 16500 3326.1 15 17000 3601.71 16 17500 3805.93 17 18000 4131.39 18 18500 4367.04 19 19000 3846.41 20 19500 3988.74 21 20000 4198.6 22 20500 4581.72 23 21000 4517.81 24 21500 5006.43 25 22000 5272.06 26 22500 5667.37 27 23000 6187.86 28 23500 6220 29 24000 6636.72 30 24500 6721.21 31 25000 6877.31 32 25500 7288.3 33 26000 7447.19 34 26500 7436.41 35 27000 7836.4 Identification of the best curve that can be fitted to the data points: The ‘Goodness of fit’ statistics of the Run time versus Data size is given below:
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1338 Table -2: Goodness of Fit Statistics Table Model Name R Square Adjusted R Square RMSE Linear .9797 .9791 288.6211 Logarithmic .9412 .9394 491.338 Inverse .8733 .8695 721.1321 Quadratic .9886 .9878 220.0855 Cubic .9886 .9878 220.0855 Compound .9661 .9651 .0999 Power .9897 .9894 .055 S .981 .9804 .0747 Growth .9661 .9651 .0999 Exponential .9661 .9651 .0999 Logistic .8051 .7992 .729 Findings: From the above table we found that out of eleven (11) tried models, ten (10) models are having very high R square and very high Adjusted R square. Out of these ten (10) models, five (5) models are having low RMSE. We observe that the Power model is having highest R square value (.9897), highest Adjusted R square value (.9894) and lowest RMSE value (.055). Therefore, we have selected the Power model as the candidate model for the best curve for this dataset. The eleven (11) tried models are depicted in the following figure: Chart -1: Chart of Eleven Models The normality test of the residuals of the candidate model is given below: (a) Histogram of the residuals – Chart -2: Histogram of the residuals Observations: From the above figure we have observed a symmetric bell shaped curve around the histogram which is approximately evenly distributed around zero. (b) Q-Q plot of the residuals – Chart -3: Q-Q plot of the residuals Observations: From the above figure we have observed that the points on the Q-Q plot are approximately linear. (c) Shapiro – Wilk (SW) test statistics of the residuals – Table -3: SW Test Statistics Table Model Shapiro-Wilk Statistic df Sig. Power .971 35 .463
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 1339 Observations: From the above table we have observed that the significance of SW statistics is .463 (higher than .05). Findings of the normality test of the residuals of the candidate model: From the above observations i.e. (a) Histogram of the residuals, (b) Q-Q plot of the residuals and (c) Shapiro – Wilk (SW) test statistics of the residuals, we have found that the residuals are approximately normally distributed. The proposed mathematical model is given below: Y = 7.274680973837771e-005 * X**1.813662220055146 Here, Y = Run Time X = Data Size The plot of the above model is given below: Chart -4: Final Concluded Model (Power Model) 6. CONCLUSION: We have analyzed run-time behavior of Radix sort in its average case. Where we found that power model (~ C*x**1.81) fits the data. Our analysis has considered everyday computing scenario where executing the sorting algorithm will not be the only task performed by a computing device, that is, we have not taken into the factors of cache misses, OS context switch etc. In future, we would like to explore the effects of OS context switches on the performance of Radix sort and shall try to propose an empirical model that will give account of such events. REFERENCES [1] radix sort, National Institute of Standards and Technology, [online], https://xlinux.nist.gov/dads//HTML/radixsort.html [2] S.J. Lee, M. Jeon, D. Kim, “PartitionedParallel RadixSort”, vol 190, pp 160-171, LNCS High Performance Computing, April, 2001. [3] A. Maus, “A Full Parallel Radix Sorting Algorithm for Multicore Processors”, NIK 2011. [4] A. Sohn, Y. Kodama, “Load Balanced Parallel RadixSort”, pp 3015-312, Proceedings of the 12th International Conference of Supercomputing, 1998. [5] A.A.Aydin, G. Alaghband, “Sequential and Hybrid Approach for non-recursive Most SignificantDigitRadix Sort”, International Conference on Applied Computing 2013. [6] C.A.R.Hoare, “Quicksort”, vol. 5, issue 1, pp 10-16, The Computer Journal, 1962. [7] M. Zagha, G.E. Blelloch, “Radix Sort for Vector Multiprocessors”, pp 712-721, Proceedings of the ACM/IEEE conference of Supercomputing, 1991. [8] N.A.Darwish, “Formulation andAnalysisofin-placeMSD Radix sort Algorithms”, vol. 31, no. 6, pp 467-481, December 2005. [9] L. Ha, J. Kruger, C.T. Silva, “Fast Four-way Parallel Radix Sorting on GPUs”, vol. 28, no. 8, pp 2368-2378, December 2009. [10] CRAY-1, COMPUTER SYSTEM, HARDWARE REFERENCE MANNUAL, 2240004, [online], http://ed- thelen.org/comp-hist/CRAY-1-HardRefMan/CRAY-1- HRM.html [11] D. Das, A. Chakraborty, A. Mitra, “Sample Based Curve Fitting Computation on the Performance of Quicksort in Personal Computer”, vol. 5, issue 2,pp885- 891, February 2014. [12] A. Chakraborty, A. Mitra, D. Das, “Empirical Analysis of Merge sort in Personal Computer by Curve Fitting Technique”, vol IV, issue IV, pp 1-6, April 2015. [13] D. Das, P. Chakraborti, “Performance Measurement and Management Model of Data Generation andWriting Time in Personal Computer”, vol. 5, issue 6, pp 1218- 1226, June 2014. [14] Testing for Normality using SPSS Statistics, Laerd statistics, (n.d.). Retrieved May 17, 2014, from ttps://statistics.laerd.com/spss-tutorials/testing-for- normality-using-spss-statistics.php [15] Evaluating Goodness of Fit – MATLAB & Simulink – MathWorks India, MathWorks, [online], http://www.mathworks.in/help/curvefit/evaluating- goodness-of-fit.html [16] The Art of Computer Programming by D. Knuth