SlideShare a Scribd company logo
Sorting
• As much as 25% of computing time is spent on sorting.
Sorting aids searching and matching entries in a list.
• Sorting Definitions:
– Given a list of records (R1, R2, ..., Rn)
– Each record Ri has a key Ki.
– An ordering relationship (<) between two key
values, either x = y, x < y, or x > y. Ordering
relationships are transitive: x < y, y < z, then x < z.
– Find a permutation (π) of the keys such that Kπ(i)
≤ Kπ(i+1), for 1 ≤ i < n.
– The desired ordering is: (Rπ(1), Rπ(2), ..., Rπ(n))
Sorting
• Stability: Since a list could have several records with
the same key, the permutation is not unique. A
permutation π is stable if:
1. sorted: Kπ(i) ≤ Kπ(i+1), for 1 ≤ i < n.
2. stable: if i < j and Ki = Kj in the input list, then Ri
precedes Rj in the sorted list.
• An internal sort is one in which the list is small enough
to sort entirely in main memory.
• An external sort is one in which the list is too big to fit
in main memory.
• Complexity of the general sorting problem: Θ(n log n).
Under some special conditions, it is possible to perform
sorting in linear time.
Applications of Sorting
• One reason why sorting is so important is that once a set
of items is sorted, many other problems become easy.
• Searching: Binary search lets you test whether an item is
in a dictionary in O(log n) time. Speeding up searching
is perhaps the most important application of sorting.
• Closest pair: Given n numbers, find the pair which are
closest to each other. Once the numbers are sorted, the
closest pair will be next to each other in sorted order, so
an O(n) linear scan completes the job.
Applications of Sorting
• Element uniqueness: Given a set of n items, are they all
unique or are there any duplicates? Sort them and do a
linear scan to check all adjacent pairs. This is a special
case of closest pair above.
• Frequency distribution – Given a set of n items, which
element occurs the largest number of times? Sort them
and do a linear scan to measure the length of all adjacent
runs.
• Median and Selection: What is the kth largest item in
the set? Once the keys are placed in sorted order in an
array, the kth largest can be found in constant time by
simply looking in the kth position of the array.
Applications: Convex Hulls
• Given n points in two dimensions,
find the smallest area polygon
which contains them all.
• The convex hull is like a rubber
band stretched over the points.
• Convex hulls are the most important building block for
more sophisticated geometric algorithms.
• Once you have the points sorted by x-coordinate, they
can be inserted from left to right into the hull, since the
rightmost point is always on the boundary. Without
sorting the points, we would have to check whether the
point is inside or outside the current hull. Adding a new
rightmost point might cause others to be deleted.
Applications: Huffman Codes
• If you are trying to minimize the amount of space a text
file is taking up, it is silly to assign each letter the same
length (i.e., one byte) code.
• Example: e is more common than q, a is more common
than z.
• If we were storing English text, we would want a and e to
have shorter codes than q and z.
• To design the best possible code, the first and most
important step is to sort the characters in order of
frequency of use.
Sorting Methods Based on D&C
• Big Question: How to divide input file?
• Divide based on number of elements (and not their
values):
– Divide into files of size 1 and n-1
• Insertion sort
–Sort A[1], ..., A[n-1]
–Insert A[n] into proper place.
– Divide into files of size n/2 and n/2
• Mergesort
–Sort A[1], ..., A[n/2]
–Sort A[n/2+1], ..., A[n]
–Merge together.
– For these methods, divide is trivial, merge is
Sorting Methods Based on D&C
• Divide file based on some values:
– Divide based on the minimum (or maximum)
• Selection sort, Bubble sort, Heapsort
–Find the minimum of the file
–Move it to position 1
–Sort A[2], ..., A[n].
– Divide based on some value (Radix sort, Quicksort)
• Quicksort
–Partition the file into 3 subfiles consisting of:
elements < A[1], = A[1], and > A[1]
–Sort the first and last subfiles
–Form total file by concatenating the 3 subfiles.
– For these methods, divide is non-trivial, merge is
trivial.
Selection Sort
3 6 2 7 4 8 11 5
1 6 22 7 4 8 3 5
1 2 6 7 4 8 33 5
1 2 3 7 44 8 6 5
1 2 3 4 7 8 6 55
1 2 3 4 5 8 66 7
1 2 3 4 5 6 8 77
1 2 3 4 5 6 7 8
n exchanges
n2
/2 comparisons
1. for i := 1 to n-1 do
2. begin
3. min := i;
4. for j := i + 1 to n do
5. if a[j] < a[min] then min := j;
6. swap(a[min], a[i]);
7. end;
• Selection sort is linear for files with large
record and small keys
Insertion Sort
3 6 22 7 4 8 1 5
2 3 6 7 44 8 1 5
2 3 4 6 7 8 11 5
1 2 3 4 6 7 8 55
1 2 3 4 5 6 7 8
n2
/4 exchanges
n2
/4 comparisons
1. for i := 2 to n do
2. begin
3. v := a[i]; j := i;
4. while a[j-1] > v do
5. begin a[j] := a[j-1]; j := j-1 end;
6. a[j] := v;
7. end;
• linear for "almost sorted" files
• Binary insertion sort: Reduces
comparisons but not moves.
• List insertion sort: Use linked list,
no moves, but must use sequential
search.
Bubble Sort
3 6 2 7 4 8 11 5
3 2 6 4 7 11 5 8
2 3 4 6 11 5 7 8
2 3 4 11 5 6 7 8
2 3 11 4 5 6 7 8
2 11 3 4 5 6 7 8
11 2 3 4 5 6 7 8
1. for i := n down to 1 do
2. for j := 2 to i do
3. if a[j-1] > a[j]
then swap(a[j], a[j-1]);
• n2
/4 exchanges
• n2
/2 comparisons
• Bubble can be improved by
adding a flag to check if the list
has already been sorted.
Shell Sort
h := 1;
repeat h := 3*h+1 until
h>n;
repeat
h := h div 3;
for i := h+1 to n do
begin
v := a[i]; j:= i;
while j>h & a[j-h]>v do
begin
a[j] := a[j-h]; j := j -
h;
end;
a[j] := v;
• Shellsort is a simple extension of
insertion sort, which gains
speeds by allowing exchange of
elements that are far apart.
• Idea: rearrange list into h-sorted
(for any sequence of values of h
that ends in 1.)
• Shellsort never does more than
n1.5
comparisons (for the h = 1, 4,
13, 40, ...).
• The analysis of this algorithm is
hard. Two conjectures of the
complexity are n(log n)2
and n1.25
Example
I P D G L Q A J C M B E O F N H K (h =
13)
I H D G L Q A J C M B E O F N P K (h = 4)
C F A E I H B G K M D J L Q N P O (h = 1)
A B C D E F G H I J K L M N O P Q
Distribution counting
• Sort a file of n records whose keys are distinct integers
between 1 and n. Can be done by
for i := 1 to n do t[a[i]] := i.
• Sort a file of n records whose keys are integers between
0 and m-1.
1. for j := 0 to m-1 do count[j] := 0;
2. for i := 1 to n do count[a[i]] := count[a[i]] + 1;
3. for j := 1 to m -1 do count[j] := count[j-1] + count[j];
4. for i := n downto 1 do
begin t[count[a[i]]] := a [i];
count[a[i]] := count[a[i]] -1
end;
5. for i := 1 to n do a[i] := t[i];
sorting
sorting
Example (1)
Example (2)
Example (3)
Example (4)
Radix Sort
• (Straight) Radix-Sort: sorting d digit numbers for a
fixed constant d.
• While proceeding from LSB towards MSB, sort digit-
wise with a linear time stable sort.
• Radix-Sort is a stable sort.
• The running time of Radix-Sort is d times the running
time of the algorithm for digit-wise sorting.
Can use counting sort to do this.
Example
Bucket-Sort
• Bucket-Sort: sorting numbers in the interval U = [0; 1).
• For sorting n numbers,
1. partition U into n non-overlapping intervals, called
buckets,
2. put the input numbers into their buckets,
3. sort each bucket using a simple algorithm, e.g.,
Insertion-Sort,
4. concatenate the sorted lists
• What is the worst case running time of Bucket-Sort?
Analysis
• O(n) expected running time
• Let T(n) be the expected running time. Assume the
numbers appear under the uniform distribution.
• For each i, 1 ≤ i ≤ n, let ai = # of elements in the i-th
bucket. Since Insertion-Sort has a quadratic running time,
Analysis Continued
• Bucket-Sort: expected linear-time, worst-case quadratic
time.
Quicksort
• Quicksort is a simple divide-and-conquer sorting
algorithm that practically outperforms Heapsort.
• In order to sort A[p..r] do the following:
– Divide: rearrange the elements and generate two
subarrays A[p..q] and A[q+1..r] so that every element
in A[p..q] is at most every element in A[q+1..r];
– Conquer: recursively sort the two subarrays;
– Combine: nothing special is necessary.
• In order to partition, choose u = A[p] as a pivot, and
move everything < u to the left and everything > u to the
right.
Quicksort
• Although mergesort is O(n log n), it is quite inconvenient
for implementation with arrays, since we need space to
merge.
• In practice, the fastest sorting algorithm is Quicksort,
which uses partitioning as its main idea.
Partition Example (Pivot=17)
Partition Example (Pivot=5)
3 6 2 7 4 8 1 5
3 1 2 7 4 8 6 5
3 1 2 4 7 8 6 5
3 1 2 7 4 8 6 5 
3 1 2 4 5 8 6 7
• • • •
3 1 2 4 5 6 7 8
• • • •
1 2 3 4 5 6 7 8
• The efficiency of quicksort can be measured by the
number of comparisons.
sorting
Analysis
• Worst-case: If A[1..n] is already sorted, then Partition
splits A[1..n] into A[1] and A[2..n] without changing the
order. If that happens, the running time C(n) satisfies:
C(n) = C(1) + C(n –1) + Θ(n) = Θ(n2
)
• Best case: Partition keeps splitting the subarrays into
halves. If that happens, the running time C(n) satisfies:
C(n) ≈ 2 C(n/2) + Θ(n) = Θ(n log n)
Analysis
• Average case (for random permutation of n elements):
• C(n) ≈ 1.38 n log n which is about 38% higher than the
best case.
Comments
• Sort smaller subfiles first reduces stack size
asymptotically at most O(log n). Do not stack right
subfiles of size < 2 in recursive algorithm -- saves factor
of 4.
• Use different pivot selection, e.g. choose pivot to be
median of first last and middle.
• Randomized-Quicksort: turn bad instances to good
instances by picking up the pivot randomly
Priority Queue
• Priority queue: an appropriate data structure that allows
inserting a new element and finding/deleting the
smallest (largest) element quickly.
• Typical operations on priority queues:
1. Create a priority queue from n given items;
2. Insert a new item;
3. Delete the largest item;
4. Replace the largest item with a new item v (unless v
is larger);
5. Change the priority of an item;
6. Delete an arbitrary specified item;
7. Join two priority queues into a larger one.
Implementation
• As a linked list or an array:
– insert: O(1)
– deleteMax: O(n)
• As a sorted array:
– insert: O(n)
– deleteMax: O(1)
• As binary search trees (e.g. AVL trees)
– insert: O(log n)
– deleteMax: O(log n)
• Can we do better? Is binary search tree an overkill?
• Solution: an interesting class of binary trees called heaps
Heap
• Heap: A (max) heap is a complete binary tree with the
property that the value at each node is at least as large as
the values at its children (if they exist).
• A complete binary tree can be stored in an array:
– root -- position 1
– level 1 -- positions 2, 3
– level 2 -- positions 4, 5, 6, 7
– • • •
• For a node i, the parent is i/2, the left child is 2i, and the
right child is 2i +1.
Example
• The following heap corresponds to the array
A[1..10]: 16, 14, 10, 8, 7, 9, 3, 2, 4, 1
Heapify
• Heapify at node i: looks at A[i] and A[2i] and A[2i + 1],
the values at the children of i. If the heap-property does
not hold w.r.t. i, exchange A[i] with the larger of A[2i]
and A[2i+1], and recurse on the child with respect to
which exchange took place.
• The number of exchanges is at most the height of the
node, i.e., O(log n).
Pseudocode
1. Heapify(A,i)
2. left = 2i
3. right = 2i +1
4. if (left ≤ n) and(A[left] > A[i])
5. then max = left
6. else max = i
7. if (right ≤ n) and (A(right] > A[max])
8. then max = right
9. if (max ≠ i)
10. then swap(A[i], A[max])
11. Heapify(A, max)
Analysis
• Heapify on a subtree containing n nodes takes
T(n) ≤ T(2n/3) + O(1)
• The 2/3 comes from merging heaps whose levels differ
by one. The last row could be exactly half filled.
• Besides, the asymptotic answer won't change so long the
fraction is less than one.
• By the Master Theorem, let a = 1, b = 3/2, f(n) = O(1).
Note that Θ(nlog3/2
1
) = Θ(1), since log3/21 =0.
Thus, T(n) = Θ(log n)
Example of Operations
Heap Construction
• Bottom-up Construction: Create a heap from n given
items can be done in O(n) time by:
for i := n div 2 downto 1 do heapify(i);
• Why correct? Why linear time?
• cf. Top down construction of a heap takes O(n log n)
time.
ninni i
k
i
i
k
i
<⋅⋅=⋅⋅ −−
=
−−
=
∑∑ 1
1
1
1
22
Example
Example
Partial Order
• The ancestor relation in a heap defines a partial order on
its elements:
– Reflexive: x is an ancestor of itself.
– Anti-symmetric: if x is an ancestor of y and y is an
ancestor of x, then x = y.
– Transitive: if x is an ancestor of y and y is an ancestor
of z, x is an ancestor of z.
• Partial orders can be used to model hierarchies with
incomplete information or equal-valued elements.
• The partial order defined by the heap structure is weaker
than that of the total order, which explains
– Why it is easier to build.
– Why it is less useful than sorting (but still very
important).
Heapsort
1. procedure heapsort;
2. var k, t:integer;
3. begin
4. m := n;
5. for i := m div 2 downto 1 do heapify(i);
6. repeat swap(a[1],a[m]);
7. m:=m-1;
8. heapify(1)
9. until m ≤ 1;
10. end;
Comments
• Heap sort uses ≤ 2n log n (worst and average)
comparisons to sort n elements.
• Heap sort requires only a fixed amount of additional
storage.
• Slightly slower than merge sort that uses O(n) additional
space.
• Slightly faster than merge sort that uses O(l) additional
space.
• In greedy algorithms, we always pick the next thing
which locally maximizes our score. By placing all the
things in a priority queue and pulling them off in order,
we can improve performance over linear search or
sorting, particularly if the weights change.
Example
Example
Example
Example
Example
Example
Example
Summary
M(n): # of data movements
C(n): # of key comparisons
Characteristic Diagrams
before execution during execution after execution
Index
key value
Insertion Sorting a Random Permutation
Selection Sorting a Random Permutation
Shell Sorting a Random Permutation
Merge Sorting a Random Permutation
Stages of Straight Radix Sort
Quicksort (recursive implementation, M=12)
Heapsorting a Random Permutation: Construction
Heapsorting (Sorting Phase)
Bubble Sorting a Random Permutation

More Related Content

What's hot (20)

PDF
Quick sort algorithn
PPT
Data Structure Sorting
PPTX
Searching Algorithms
PPT
3.8 quick sort
PPTX
Sorting and hashing concepts
PDF
LECT 10, 11-DSALGO(Hashing).pdf
PPT
358 33 powerpoint-slides_14-sorting_chapter-14
PPTX
Data structure using c module 3
PDF
Lecture 4 asymptotic notations
PPT
ECO_TEXT_CLUSTERING
PDF
220exercises2
PPT
Data Structure (MC501)
PDF
Sorting algorithm
PPT
Data Structures - Searching & sorting
PPT
Data structure lecture 3
PPTX
Unit 2 algorithm
PPTX
Divide and conquer 1
PPT
Tri Merge Sorting Algorithm
PPT
(Data Structure) Chapter11 searching & sorting
Quick sort algorithn
Data Structure Sorting
Searching Algorithms
3.8 quick sort
Sorting and hashing concepts
LECT 10, 11-DSALGO(Hashing).pdf
358 33 powerpoint-slides_14-sorting_chapter-14
Data structure using c module 3
Lecture 4 asymptotic notations
ECO_TEXT_CLUSTERING
220exercises2
Data Structure (MC501)
Sorting algorithm
Data Structures - Searching & sorting
Data structure lecture 3
Unit 2 algorithm
Divide and conquer 1
Tri Merge Sorting Algorithm
(Data Structure) Chapter11 searching & sorting
Ad

Similar to sorting (20)

PPTX
sorting-160810203705.pptx
PDF
Linear sorting
PPTX
quick sort by deepak.pptx
PDF
Analysis and design of algorithms part2
PPT
Data Structures 6
PPT
03_sorting123456789454545454545444543.ppt
PPT
03_sorting and it's types with example .ppt
PPT
quicksortnmsd cmz ,z m,zmm,mbfjjjjhjhfjsg
PPT
Counting Sort Lowerbound
PPTX
datamining-lect8b-amachinelearninhapproach.pptx
PPT
lecture 9
PPT
Quicksort
PPTX
Data Mining Lecture_8(b).pptx
PDF
Dynamic Programming From CS 6515(Fibonacci, LIS, LCS))
PPT
Cs1311lecture23wdl
PDF
chapter1.pdf ......................................
PPT
Analysis Of Algorithms - Hashing
PPT
MergesortQuickSort.ppt
PPT
presentation_mergesortquicksort_1458716068_193111.ppt
sorting-160810203705.pptx
Linear sorting
quick sort by deepak.pptx
Analysis and design of algorithms part2
Data Structures 6
03_sorting123456789454545454545444543.ppt
03_sorting and it's types with example .ppt
quicksortnmsd cmz ,z m,zmm,mbfjjjjhjhfjsg
Counting Sort Lowerbound
datamining-lect8b-amachinelearninhapproach.pptx
lecture 9
Quicksort
Data Mining Lecture_8(b).pptx
Dynamic Programming From CS 6515(Fibonacci, LIS, LCS))
Cs1311lecture23wdl
chapter1.pdf ......................................
Analysis Of Algorithms - Hashing
MergesortQuickSort.ppt
presentation_mergesortquicksort_1458716068_193111.ppt
Ad

Recently uploaded (20)

PDF
Digital Logic Computer Design lecture notes
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPT
Project quality management in manufacturing
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Geodesy 1.pptx...............................................
PDF
composite construction of structures.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
CH1 Production IntroductoryConcepts.pptx
Digital Logic Computer Design lecture notes
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Model Code of Practice - Construction Work - 21102022 .pdf
Arduino robotics embedded978-1-4302-3184-4.pdf
UNIT 4 Total Quality Management .pptx
Project quality management in manufacturing
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Internet of Things (IOT) - A guide to understanding
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Geodesy 1.pptx...............................................
composite construction of structures.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
additive manufacturing of ss316l using mig welding
CH1 Production IntroductoryConcepts.pptx

sorting

  • 1. Sorting • As much as 25% of computing time is spent on sorting. Sorting aids searching and matching entries in a list. • Sorting Definitions: – Given a list of records (R1, R2, ..., Rn) – Each record Ri has a key Ki. – An ordering relationship (<) between two key values, either x = y, x < y, or x > y. Ordering relationships are transitive: x < y, y < z, then x < z. – Find a permutation (π) of the keys such that Kπ(i) ≤ Kπ(i+1), for 1 ≤ i < n. – The desired ordering is: (Rπ(1), Rπ(2), ..., Rπ(n))
  • 2. Sorting • Stability: Since a list could have several records with the same key, the permutation is not unique. A permutation π is stable if: 1. sorted: Kπ(i) ≤ Kπ(i+1), for 1 ≤ i < n. 2. stable: if i < j and Ki = Kj in the input list, then Ri precedes Rj in the sorted list. • An internal sort is one in which the list is small enough to sort entirely in main memory. • An external sort is one in which the list is too big to fit in main memory. • Complexity of the general sorting problem: Θ(n log n). Under some special conditions, it is possible to perform sorting in linear time.
  • 3. Applications of Sorting • One reason why sorting is so important is that once a set of items is sorted, many other problems become easy. • Searching: Binary search lets you test whether an item is in a dictionary in O(log n) time. Speeding up searching is perhaps the most important application of sorting. • Closest pair: Given n numbers, find the pair which are closest to each other. Once the numbers are sorted, the closest pair will be next to each other in sorted order, so an O(n) linear scan completes the job.
  • 4. Applications of Sorting • Element uniqueness: Given a set of n items, are they all unique or are there any duplicates? Sort them and do a linear scan to check all adjacent pairs. This is a special case of closest pair above. • Frequency distribution – Given a set of n items, which element occurs the largest number of times? Sort them and do a linear scan to measure the length of all adjacent runs. • Median and Selection: What is the kth largest item in the set? Once the keys are placed in sorted order in an array, the kth largest can be found in constant time by simply looking in the kth position of the array.
  • 5. Applications: Convex Hulls • Given n points in two dimensions, find the smallest area polygon which contains them all. • The convex hull is like a rubber band stretched over the points. • Convex hulls are the most important building block for more sophisticated geometric algorithms. • Once you have the points sorted by x-coordinate, they can be inserted from left to right into the hull, since the rightmost point is always on the boundary. Without sorting the points, we would have to check whether the point is inside or outside the current hull. Adding a new rightmost point might cause others to be deleted.
  • 6. Applications: Huffman Codes • If you are trying to minimize the amount of space a text file is taking up, it is silly to assign each letter the same length (i.e., one byte) code. • Example: e is more common than q, a is more common than z. • If we were storing English text, we would want a and e to have shorter codes than q and z. • To design the best possible code, the first and most important step is to sort the characters in order of frequency of use.
  • 7. Sorting Methods Based on D&C • Big Question: How to divide input file? • Divide based on number of elements (and not their values): – Divide into files of size 1 and n-1 • Insertion sort –Sort A[1], ..., A[n-1] –Insert A[n] into proper place. – Divide into files of size n/2 and n/2 • Mergesort –Sort A[1], ..., A[n/2] –Sort A[n/2+1], ..., A[n] –Merge together. – For these methods, divide is trivial, merge is
  • 8. Sorting Methods Based on D&C • Divide file based on some values: – Divide based on the minimum (or maximum) • Selection sort, Bubble sort, Heapsort –Find the minimum of the file –Move it to position 1 –Sort A[2], ..., A[n]. – Divide based on some value (Radix sort, Quicksort) • Quicksort –Partition the file into 3 subfiles consisting of: elements < A[1], = A[1], and > A[1] –Sort the first and last subfiles –Form total file by concatenating the 3 subfiles. – For these methods, divide is non-trivial, merge is trivial.
  • 9. Selection Sort 3 6 2 7 4 8 11 5 1 6 22 7 4 8 3 5 1 2 6 7 4 8 33 5 1 2 3 7 44 8 6 5 1 2 3 4 7 8 6 55 1 2 3 4 5 8 66 7 1 2 3 4 5 6 8 77 1 2 3 4 5 6 7 8 n exchanges n2 /2 comparisons 1. for i := 1 to n-1 do 2. begin 3. min := i; 4. for j := i + 1 to n do 5. if a[j] < a[min] then min := j; 6. swap(a[min], a[i]); 7. end; • Selection sort is linear for files with large record and small keys
  • 10. Insertion Sort 3 6 22 7 4 8 1 5 2 3 6 7 44 8 1 5 2 3 4 6 7 8 11 5 1 2 3 4 6 7 8 55 1 2 3 4 5 6 7 8 n2 /4 exchanges n2 /4 comparisons 1. for i := 2 to n do 2. begin 3. v := a[i]; j := i; 4. while a[j-1] > v do 5. begin a[j] := a[j-1]; j := j-1 end; 6. a[j] := v; 7. end; • linear for "almost sorted" files • Binary insertion sort: Reduces comparisons but not moves. • List insertion sort: Use linked list, no moves, but must use sequential search.
  • 11. Bubble Sort 3 6 2 7 4 8 11 5 3 2 6 4 7 11 5 8 2 3 4 6 11 5 7 8 2 3 4 11 5 6 7 8 2 3 11 4 5 6 7 8 2 11 3 4 5 6 7 8 11 2 3 4 5 6 7 8 1. for i := n down to 1 do 2. for j := 2 to i do 3. if a[j-1] > a[j] then swap(a[j], a[j-1]); • n2 /4 exchanges • n2 /2 comparisons • Bubble can be improved by adding a flag to check if the list has already been sorted.
  • 12. Shell Sort h := 1; repeat h := 3*h+1 until h>n; repeat h := h div 3; for i := h+1 to n do begin v := a[i]; j:= i; while j>h & a[j-h]>v do begin a[j] := a[j-h]; j := j - h; end; a[j] := v; • Shellsort is a simple extension of insertion sort, which gains speeds by allowing exchange of elements that are far apart. • Idea: rearrange list into h-sorted (for any sequence of values of h that ends in 1.) • Shellsort never does more than n1.5 comparisons (for the h = 1, 4, 13, 40, ...). • The analysis of this algorithm is hard. Two conjectures of the complexity are n(log n)2 and n1.25
  • 13. Example I P D G L Q A J C M B E O F N H K (h = 13) I H D G L Q A J C M B E O F N P K (h = 4) C F A E I H B G K M D J L Q N P O (h = 1) A B C D E F G H I J K L M N O P Q
  • 14. Distribution counting • Sort a file of n records whose keys are distinct integers between 1 and n. Can be done by for i := 1 to n do t[a[i]] := i. • Sort a file of n records whose keys are integers between 0 and m-1. 1. for j := 0 to m-1 do count[j] := 0; 2. for i := 1 to n do count[a[i]] := count[a[i]] + 1; 3. for j := 1 to m -1 do count[j] := count[j-1] + count[j]; 4. for i := n downto 1 do begin t[count[a[i]]] := a [i]; count[a[i]] := count[a[i]] -1 end; 5. for i := 1 to n do a[i] := t[i];
  • 21. Radix Sort • (Straight) Radix-Sort: sorting d digit numbers for a fixed constant d. • While proceeding from LSB towards MSB, sort digit- wise with a linear time stable sort. • Radix-Sort is a stable sort. • The running time of Radix-Sort is d times the running time of the algorithm for digit-wise sorting. Can use counting sort to do this.
  • 23. Bucket-Sort • Bucket-Sort: sorting numbers in the interval U = [0; 1). • For sorting n numbers, 1. partition U into n non-overlapping intervals, called buckets, 2. put the input numbers into their buckets, 3. sort each bucket using a simple algorithm, e.g., Insertion-Sort, 4. concatenate the sorted lists • What is the worst case running time of Bucket-Sort?
  • 24. Analysis • O(n) expected running time • Let T(n) be the expected running time. Assume the numbers appear under the uniform distribution. • For each i, 1 ≤ i ≤ n, let ai = # of elements in the i-th bucket. Since Insertion-Sort has a quadratic running time,
  • 25. Analysis Continued • Bucket-Sort: expected linear-time, worst-case quadratic time.
  • 26. Quicksort • Quicksort is a simple divide-and-conquer sorting algorithm that practically outperforms Heapsort. • In order to sort A[p..r] do the following: – Divide: rearrange the elements and generate two subarrays A[p..q] and A[q+1..r] so that every element in A[p..q] is at most every element in A[q+1..r]; – Conquer: recursively sort the two subarrays; – Combine: nothing special is necessary. • In order to partition, choose u = A[p] as a pivot, and move everything < u to the left and everything > u to the right.
  • 27. Quicksort • Although mergesort is O(n log n), it is quite inconvenient for implementation with arrays, since we need space to merge. • In practice, the fastest sorting algorithm is Quicksort, which uses partitioning as its main idea.
  • 29. Partition Example (Pivot=5) 3 6 2 7 4 8 1 5 3 1 2 7 4 8 6 5 3 1 2 4 7 8 6 5 3 1 2 7 4 8 6 5  3 1 2 4 5 8 6 7 • • • • 3 1 2 4 5 6 7 8 • • • • 1 2 3 4 5 6 7 8 • The efficiency of quicksort can be measured by the number of comparisons.
  • 31. Analysis • Worst-case: If A[1..n] is already sorted, then Partition splits A[1..n] into A[1] and A[2..n] without changing the order. If that happens, the running time C(n) satisfies: C(n) = C(1) + C(n –1) + Θ(n) = Θ(n2 ) • Best case: Partition keeps splitting the subarrays into halves. If that happens, the running time C(n) satisfies: C(n) ≈ 2 C(n/2) + Θ(n) = Θ(n log n)
  • 32. Analysis • Average case (for random permutation of n elements): • C(n) ≈ 1.38 n log n which is about 38% higher than the best case.
  • 33. Comments • Sort smaller subfiles first reduces stack size asymptotically at most O(log n). Do not stack right subfiles of size < 2 in recursive algorithm -- saves factor of 4. • Use different pivot selection, e.g. choose pivot to be median of first last and middle. • Randomized-Quicksort: turn bad instances to good instances by picking up the pivot randomly
  • 34. Priority Queue • Priority queue: an appropriate data structure that allows inserting a new element and finding/deleting the smallest (largest) element quickly. • Typical operations on priority queues: 1. Create a priority queue from n given items; 2. Insert a new item; 3. Delete the largest item; 4. Replace the largest item with a new item v (unless v is larger); 5. Change the priority of an item; 6. Delete an arbitrary specified item; 7. Join two priority queues into a larger one.
  • 35. Implementation • As a linked list or an array: – insert: O(1) – deleteMax: O(n) • As a sorted array: – insert: O(n) – deleteMax: O(1) • As binary search trees (e.g. AVL trees) – insert: O(log n) – deleteMax: O(log n) • Can we do better? Is binary search tree an overkill? • Solution: an interesting class of binary trees called heaps
  • 36. Heap • Heap: A (max) heap is a complete binary tree with the property that the value at each node is at least as large as the values at its children (if they exist). • A complete binary tree can be stored in an array: – root -- position 1 – level 1 -- positions 2, 3 – level 2 -- positions 4, 5, 6, 7 – • • • • For a node i, the parent is i/2, the left child is 2i, and the right child is 2i +1.
  • 37. Example • The following heap corresponds to the array A[1..10]: 16, 14, 10, 8, 7, 9, 3, 2, 4, 1
  • 38. Heapify • Heapify at node i: looks at A[i] and A[2i] and A[2i + 1], the values at the children of i. If the heap-property does not hold w.r.t. i, exchange A[i] with the larger of A[2i] and A[2i+1], and recurse on the child with respect to which exchange took place. • The number of exchanges is at most the height of the node, i.e., O(log n).
  • 39. Pseudocode 1. Heapify(A,i) 2. left = 2i 3. right = 2i +1 4. if (left ≤ n) and(A[left] > A[i]) 5. then max = left 6. else max = i 7. if (right ≤ n) and (A(right] > A[max]) 8. then max = right 9. if (max ≠ i) 10. then swap(A[i], A[max]) 11. Heapify(A, max)
  • 40. Analysis • Heapify on a subtree containing n nodes takes T(n) ≤ T(2n/3) + O(1) • The 2/3 comes from merging heaps whose levels differ by one. The last row could be exactly half filled. • Besides, the asymptotic answer won't change so long the fraction is less than one. • By the Master Theorem, let a = 1, b = 3/2, f(n) = O(1). Note that Θ(nlog3/2 1 ) = Θ(1), since log3/21 =0. Thus, T(n) = Θ(log n)
  • 42. Heap Construction • Bottom-up Construction: Create a heap from n given items can be done in O(n) time by: for i := n div 2 downto 1 do heapify(i); • Why correct? Why linear time? • cf. Top down construction of a heap takes O(n log n) time. ninni i k i i k i <⋅⋅=⋅⋅ −− = −− = ∑∑ 1 1 1 1 22
  • 45. Partial Order • The ancestor relation in a heap defines a partial order on its elements: – Reflexive: x is an ancestor of itself. – Anti-symmetric: if x is an ancestor of y and y is an ancestor of x, then x = y. – Transitive: if x is an ancestor of y and y is an ancestor of z, x is an ancestor of z. • Partial orders can be used to model hierarchies with incomplete information or equal-valued elements. • The partial order defined by the heap structure is weaker than that of the total order, which explains – Why it is easier to build. – Why it is less useful than sorting (but still very important).
  • 46. Heapsort 1. procedure heapsort; 2. var k, t:integer; 3. begin 4. m := n; 5. for i := m div 2 downto 1 do heapify(i); 6. repeat swap(a[1],a[m]); 7. m:=m-1; 8. heapify(1) 9. until m ≤ 1; 10. end;
  • 47. Comments • Heap sort uses ≤ 2n log n (worst and average) comparisons to sort n elements. • Heap sort requires only a fixed amount of additional storage. • Slightly slower than merge sort that uses O(n) additional space. • Slightly faster than merge sort that uses O(l) additional space. • In greedy algorithms, we always pick the next thing which locally maximizes our score. By placing all the things in a priority queue and pulling them off in order, we can improve performance over linear search or sorting, particularly if the weights change.
  • 55. Summary M(n): # of data movements C(n): # of key comparisons
  • 56. Characteristic Diagrams before execution during execution after execution Index key value
  • 57. Insertion Sorting a Random Permutation
  • 58. Selection Sorting a Random Permutation
  • 59. Shell Sorting a Random Permutation
  • 60. Merge Sorting a Random Permutation
  • 61. Stages of Straight Radix Sort
  • 63. Heapsorting a Random Permutation: Construction
  • 65. Bubble Sorting a Random Permutation