SlideShare a Scribd company logo
Module-3: Greedy Method
Lecture Notes on
Design and Analysis of Algorithms
18CS42
Contents
1. Introduction to Greedy method
1.1. General method,
1.2. Coin Change Problem
1.3. Knapsack Problem
1.4. Job sequencing with deadlines
2. Minimum cost spanning trees:
2.1. Prim’s Algorithm,
2.2. Kruskal’s Algorithm
3. Single source shortest paths
3.1. Dijkstra's Algorithm
4. Optimal Tree problem:
4.1. Huffman Trees and Codes
5. Transform and Conquer Approach:
5.1. Heaps
5.2. Heap Sort
18CS42 DAA M3-NOTES
SVIT CSE 2
1. Introduction to Greedy method
1.1 General method
The greedy method is the straight forward design technique applicable to variety of
applications.
The greedy approach suggests constructing a solution through a sequence of steps, each
expanding a partially constructed solution obtained so far, until a complete solution to the
problem is reached. On each step the choice made must be:
 feasible, i.e., it has to satisfy the problem’s constraints
 locally optimal, i.e., it has to be the best local choice among all feasible choices
available on that step
 irrevocable, i.e., once made, it cannot be changed on subsequent steps of the algorithm
As a rule, greedy algorithms are both intuitively appealing and simple. Given an optimization
problem, it is usually easy to figure out how to proceed in a greedy manner, possibly after
considering a few small instances of the problem. What is usually more difficult is to prove
that a greedy algorithm yields an optimal solution (when it does).
1.2. Coin Change Problem
Problem Statement: Given coins of several denominations find out a way to give a customer
an amount with fewest number of coins.
Example: if denominations are 1,5,10, 25 and 100 and the change required is 30, the solutions
are,
Amount : 30
Solutions : 3 x 10 ( 3 coins ), 6 x 5 ( 6 coins )
1 x 25 + 5 x 1 ( 6 coins ) 1 x 25 + 1 x 5 ( 2 coins )
The last solution is the optimal one as it gives us change only with 2 coins.
18CS42 DAA M3-NOTES
SVIT CSE 3
Solution for coin change problem using greedy algorithm is very intuitive and called as
cashier’s algorithm. Basic principle is: At every iteration for search of a coin, take the
largest coin which can fit into remain amount to be changed at that particular time. At
the end you will have optimal solution.
1.3. Knapsack Problem (Fractional knapsack problem)
Consider the following instance of the knapsack problem:
n=3, m=20, (p1, p2, p3) =(25, 24, 15), (w1, w2, w3) =(18, 15, 10)
There are several greedy methods to obtain the feasible solutions. Three are discussed here
a) At each step fill the knapsack with the object with largest profit - If the object under
consideration does not fit, then the fraction of it is included to fill the knapsack. This method
does not result optimal solution. As per this method the solution to the above problem is as
follows;
Select Item-1 with profit p1=25, here w1=18, x1=1. Remaining capacity = 20-18 = 2
Select Item-2 with profit p1=24, here w2=15, x1=2/15. Remaining capacity = 0
Total profit earned = 28.2.
Therefore optimal solution is (x1, x2, x3) = (1, 2/15, 0) with profit = 28.2
b) At each step fill the object with smallest weight
Select Item-3 with profit p1=15, here w1=10, x3=1. Remaining capacity = 20-10 = 10
Select Item-2 with profit p1=24, here w2=15, x1=10/15. Remaining capacity = 0
Total profit earned = 31.
Optimal solution using this method is (x1, x2, x3) = (0, 2/3, 1) with profit = 31
Note: Optimal solution is not guaranteed using method a and b
18CS42 DAA M3-NOTES
SVIT CSE 4
c) At each step include the object with maximum profit/weight ratio
Select Item-2 with profit p1=24, here w2=15, x1=1. Remaining capacity = 20-15=5
Select Item-3 with profit p1=15, here w1=10, x1=5/10. Remaining capacity = 0
Total profit earned = 31.5
Therefore, optimal solution is (x1, x2, x3) = (0, 1, 1/2) with profit = 31.5
This greedy approach always results optimal solution.
Algorithm: The algorithm given below assumes that the objects are sorted in non-increasing
order of profit/weight ratio
Analysis:
Disregarding the time to initially sort the object, each of the above strategies use O(n) time,
0/1 Knapsack problem
Note: The greedy approach to solve 0/1 knapsack problem does not necessarily yield an optimal
solution
18CS42 DAA M3-NOTES
SVIT CSE 5
1.4. Job sequencing with deadlines
The greedy strategy to solve job sequencing problem is, “At each time select the job that that
satisfies the constraints and gives maximum profit. i.e consider the jobs in the non-increasing
order of the pi’s”
By following this procedure, we get the 3rd
solution in the example 4.3. It can be proved that,
this greedy strategy always results optimal solution
High level description of job sequencing algorithm
18CS42 DAA M3-NOTES
SVIT CSE 6
Algorithm/Program 4.6: Greedy algorithm for sequencing unit time jobs with deadlines and
profits
Analysis:
Fast Job Scheduling Algorithm
18CS42 DAA M3-NOTES
SVIT CSE 7
Algorithm: Fast Job Scheduling is shown in next page
Analysis
Algorithm: Fast Job Scheduling
18CS42 DAA M3-NOTES
SVIT CSE 8
Problem: Find solution generated by job sequencing problem with deadlines for 7 jobs given
profits 3, 5, 20, 18, 1, 6, 30 and deadlines 1, 3, 4, 3, 2, 1, 2 respectively.
Solution: Given
Sort the jobs as per the decreasing order of profit
J7 J3 J4 J6 J2 J1 J5
Profit 30 20 18 6 5 3 1
Deadline 2 4 3 1 3 1 2
Maximum deadline is 4. Therefore create 4 slots. Now allocate jobs to highest slot, starting
from the job of highest profit
Select Job 7 – Allocate to slot-2
Select Job 3 – Allocate to slot-4
Select Job 4 – Allocate to slot-3
Select Job 6 – Allocate to slot-1 Total profit earned is = 30+20+18+6=74
Problem: What is the solution generated by job sequencing when n = 5, (P1, P2, P3, P4, P5)
= (20, 15, 10, 5, 1), (d1, d2, d3, d4, d5) = (2, 2, 1, 3, 3)
Solution
The Jobs are already sorted according to decreasing order of profit.
Maximum deadline is 3. Therefore create 4 slots. Allocate jobs to highest slot, starting from
the job of highest profit
Select Job 1 – Allocate to slot-2
Select Job 2 – Allocate to slot-1 as 2 is already filled
Select Job 3 –Slot-2 &1 are already filled. Cannot be allocated.
Select Job 4 – Allocate to slot-3
Total profit earned is = 20+15+5=40
J1 J2 J3 J4 J5 J6 J7
Profit 3 5 20 18 1 6 30
Deadline 1 3 4 3 2 1 2
Slot 1 2 3 4
Job J6 J7 J4 J3
Slot 1 2 3
Job J2 J1 J4
18CS42 DAA M3-NOTES
SVIT CSE 9
2. Minimum cost spanning trees
Definition: A spanning tree of a connected graph is its connected acyclic subgraph (i.e., a tree)
that contains all the vertices of the graph. A minimum spanning tree of a weighted connected
graph is its spanning tree of the smallest weight, where the weight of a tree is defined as the
sum of the weights on all its edges. The minimum spanning tree problem is the problem of
finding a minimum spanning tree for a given weighted connected graph.
2.1. Prim’s Algorithm
Prim's algorithm constructs a minimum spanning tree through a sequence of expanding sub-
trees. The initial subtree in such a sequence consists of a single vertex selected arbitrarily from
the set V of the graph's vertices. On each iteration it expands the current tree in the greedy
manner by simply attaching to it the nearest vertex not in that tree. The algorithm stops after
all the graph's vertices have been included in the tree being constructed. Since the algorithm
expands a tree by exactly one vertex on each of its iterations, the total number of such iterations
is n - 1, where n is the number of vertices in the graph. The tree generated by the algorithm is
obtained as the set of edges.
Correctness: Prim’s algorithm always yields a minimum spanning tree.
18CS42 DAA M3-NOTES
SVIT CSE 10
Example: An example of prim’s algorithm is shown below.
The parenthesized labels of a vertex in the middle column
indicate the nearest tree vertex and edge weight; selected
vertices and edges are shown in bold.
Tree vertices Remaining vertices Illustration
Analysis of Efficiency
The efficiency of Prim’s algorithm depends on the data structures chosen for the graph itself
and for the priority queue of the set V − VT whose vertex priorities are the distances to the
nearest tree vertices.
1. If a graph is represented by its weight matrix and the priority queue is implemented as
an unordered array, the algorithm’s running time will be in Θ(|V|2). Indeed, on each
18CS42 DAA M3-NOTES
SVIT CSE 11
of the |V| − 1iterations, the array implementing the priority queue is traversed to find
and delete the minimum and then to update, if necessary, the priorities of the remaining
vertices.
We can implement the priority queue as a min-heap. (A min-heap is a complete binary tree in
which every element is less than or equal to its children.) Deletion of the smallest element from
and insertion of a new element into a min-heap of size n are O(log n) operations.
2. If a graph is represented by its adjacency lists and the priority queue is implemented
as a min-heap, the running time of the algorithm is in O(|E| log |V |).
This is because the algorithm performs |V| − 1 deletions of the smallest element and makes |E|
verifications and, possibly, changes of an element’s priority in a min-heap of size not exceeding
|V|. Each of these operations, as noted earlier, is a O(log |V|) operation. Hence, the running
time of this implementation of Prim’s algorithm is in
(|V| − 1+ |E|) O (log |V |) = O(|E| log |V |) because, in a connected graph, |V| − 1≤ |E|.
2.2. Kruskal’s Algorithm
Background: Kruskal's algorithm is another greedy algorithm for the minimum spanning tree
problem that also always yields an optimal solution. It is named Kruskal's algorithm, after
Joseph Kruskal. Kruskal's algorithm looks at a minimum spanning tree for a weighted
connected graph G = (V, E) as an acyclic sub graph with |V | - 1 edges for which the sum of
the edge weights is the smallest. Consequently, the algorithm constructs a minimum spanning
tree as an expanding sequence of sub graphs, which are always acyclic but are not necessarily
connected on the intermediate stages of the algorithm.
Working: The algorithm begins by sorting the graph's edges in non-decreasing order of their
weights. Then, starting with the empty subgraph, it scans this sorted list adding the next edge
on the list to the current sub graph if such an inclusion does not create a cycle and simply
skipping the edge otherwise.
18CS42 DAA M3-NOTES
SVIT CSE 12
The fact that ET ,the set of edges composing a minimum spanning tree of graph G actually a
tree in Prim's algorithm but generally just an acyclic sub graph in Kruskal's algorithm.
Kruskal’s algorithm is not simpler because it has to check whether the addition of the next
edge to the edges already selected would create a cycle.
We can consider the algorithm's operations as a progression through a series of forests
containing all the vertices of a given graph and some of its edges. The initial forest consists of
|V| trivial trees, each comprising a single vertex of the graph. The final forest consists of a
single tree, which is a minimum spanning tree of the graph. On each iteration, the algorithm
takes the next edge (u, v) from the sorted list of the graph's edges, finds the trees containing the
vertices u and v, and, if these trees are not the same, unites them in a larger tree by adding the
edge (u, v).
Analysis of Efficiency
The crucial check whether two vertices belong to the same tree can be found out using union-
find algorithms.
Efficiency of Kruskal’s algorithm is based on the time needed for sorting the edge weights of
a given graph. Hence, with an efficient sorting algorithm, the time efficiency of Kruskal's
algorithm will be in O (|E| log |E|).
Illustration
An example of Kruskal’s algorithm is shown below. The
selected edges are shown in bold.
18CS42 DAA M3-NOTES
SVIT CSE 13
3. Single source shortest paths
Single-source shortest-paths problem is defined as follows. For a given vertex called the
source in a weighted connected graph, the problem is to find shortest paths to all its other
vertices. The single-source shortest-paths problem asks for a family of paths, each leading from
the source to a different vertex in the graph, though some paths may, of course, have edges in
common.
3.1. Dijkstra's Algorithm
Dijkstra's Algorithm is the best-known algorithm for the single-source shortest-paths problem.
This algorithm is applicable to undirected and directed graphs with nonnegative weights only.
Working - Dijkstra's algorithm finds the shortest paths to a graph's vertices in order of their
distance from a given source.
 First, it finds the shortest path from the source to a vertex nearest to it, then to a second
nearest, and so on.
 In general, before its ith
iteration commences, the algorithm
has alreadyidentified the shortest paths to i-1 other vertices
nearest to the source. These vertices, the source, and the
edges of the shortest paths leading to them from the source
form a subtree Ti of the given graph shown in the figure.
 Since all the edge weights are nonnegative, the next vertex
nearest to the source can be found among the vertices adjacent to the vertices of Ti. The
18CS42 DAA M3-NOTES
SVIT CSE 14
set of vertices adjacent to the vertices in Ti can be referred to as "fringe vertices"; they
are the candidates from which Dijkstra's algorithm selects the next vertex nearest to the
source.
 To identify the ith
nearest vertex, the algorithm computes, for every fringe vertex u, the
sum of the distance to the nearest tree vertex v (given by the weight of the edge (v, u))
and the length d., of the shortest path from the source to v (previously determined by
the algorithm) and then selects the vertex with the smallest such sum. The fact that it
suffices to compare the lengths of such special paths is the central insight of Dijkstra's
algorithm.
 To facilitate the algorithm's operations, we label each vertex with two labels.
o The numeric label d indicates the length of the shortest path from the source to this
vertex found by the algorithm so far; when a vertex is added to the tree, d indicates
the length of the shortest path from the source to that vertex.
o The other label indicates the name of the next-to-last vertex on such a path, i.e.,
the parent of the vertex in the tree being constructed. (It can be left unspecified for
the sources and vertices that are adjacent to none of the current tree vertices.)
With such labeling, finding the next nearest vertex u* becomes a simple task of finding
a fringe vertex with the smallest d value. Ties can be broken arbitrarily.
 After we have identified a vertex u* to be added to the tree, we need to perform two
operations:
o Move u* from the fringe to the set of tree vertices.
o For each remaining fringe vertex u that is connected to u* by an edge of weight
w(u*, u) such that du*+ w(u*, u) <du, update the labels of u by u* and du*+ w(u*,
u), respectively.
Illustration: An example of Dijkstra's algorithm is
shown below. The next closest vertex is shown in
bold. (see the figure in next page)
The shortest paths (identified by following nonnumeric labels backward from a destination
vertex in the left column to the source) and their lengths (given by numeric labels of the tree
vertices) are as follows:
The pseudocode of Dijkstra’s algorithm is given below. Note that in the following pseudocode,
VT contains a given source vertex and the fringe contains the vertices adjacent to it after
iteration 0 is completed.
18CS42 DAA M3-NOTES
SVIT CSE 15
Analysis:
The time efficiency of Dijkstra’s algorithm depends on the data structures used for
implementing the priority queue and for representing an input graph itself.
18CS42 DAA M3-NOTES
SVIT CSE 16
Efficiency is Θ(|V|2) for graphs represented by their weight matrix and the priority queue
implemented as an unordered array.
For graphs represented by their adjacency lists and the priority queue implemented as a min-
heap, it is in O (|E| log |V| )
Applications
 Transportation planning and packet routing in communication networks, including the
Internet
 Finding shortest paths in social networks, speech recognition, document formatting,
robotics, compilers, and airline crew scheduling.
4. Optimal Tree problem
Background:
Suppose we have to encode a text that comprises characters from some n-character alphabet
by assigning to each of the text's characters some sequence of bits called the codeword. There
are two types of encoding: Fixed-length encoding, Variable-length encoding
Fixed-length encoding: This method assigns to each character a bit string of the same length
m (m >= log2n). This is exactly what the standard ASCII code does.
One way of getting a coding scheme that yields a shorter bit string on the average is based on
the old idea of assigning shorter code-words to more frequent characters and longer code-words
to less frequent characters.
Variable-length encoding: This method assigns code-words of different lengths to different
characters, introduces a problem that fixed-length encoding does not have. Namely, how can
we tell how many bits of an encoded text represent the first character? (or, more generally, the
ith
) To avoid this complication, we can limit ourselves to prefix-free (or simply prefix) codes.
In a prefix ode, no code word is a prefix of a codeword of another character. Hence, with such
an encoding, we can simply scan a bit string until we get the first group of bits that is a
codeword for some character, replace these bits by this character, and repeat this operation
until the bit string's end is reached.
If we want to create a binary prefix code for some alphabet, it is natural to associate the
alphabet's characters with leaves of a binary tree in which all the left edges are labelled by 0
and all the right edges are labelled by 1 (or vice versa). The codeword of a character can then
be obtained by recording the labels on the simple path from the root to the character's leaf.
Since there is no simple path to a leaf that continues to another leaf, no codeword can be a
prefix of another codeword; hence, any such tree yields a prefix code.
Among the many trees that can be constructed in this manner for a given alphabet with known
frequencies of the character occurrences, construction of such a tree that would assign shorter
bit strings to high-frequency characters and longer ones to low-frequency characters can be
done by the following greedy algorithm, invented by David Huffman.
18CS42 DAA M3-NOTES
SVIT CSE 17
4.1 Huffman Trees and Codes
Huffman's Algorithm
Step 1: Initialize n one-node trees and label them with the characters of the alphabet. Record
the frequency of each character in its tree's root to indicate the tree's weight. (More generally,
the weight of a tree will be equal to the sum of the frequencies in the tree's leaves.)
Step 2: Repeat the following operation until a single tree is obtained. Find two trees with the
smallest weight. Make them the left and right subtree of a new tree and record the sum of their
weights in the root of the new tree as its weight.
A tree constructed by the above algorithm is called a Huffmantree. It defines-in the manner
described-a Huffman code.
Example: Consider the five-symbol alphabet {A, B, C, D, _} with the following occurrence
frequencies in a text made up of
these symbols:
The Huffman tree construction
for the above problem is shown below:
The resulting codewords are as follows:
18CS42 DAA M3-NOTES
SVIT CSE 18
Hence, DAD is encoded as 011101, and 10011011011101 is decoded as BAD_AD.
With the occurrence frequencies given and the code word lengths obtained, the average
number of bits per symbol in this code is
2 *0.35 + 3 *0.1+ 2 *0.2 + 2 *0.2 + 3 *0.15 = 2.25.
Had we used a fixed-length encoding for the same alphabet, we would have to use at least 3
bits per each symbol. Thus, for this example, Huffman’s code achieves the compression ratio
(a standard measure of a compression algorithm’s effectiveness) of (3−2.25)/3*100%= 25%.
In other words, Huffman’s encoding of the above text will use 25% less memory than its fixed-
length encoding.
5. Transform and Conquer Approach
We call this general technique transform-and-conquer because these methods work as two-
stage procedures. First, in the transformation stage, the problem’s instance is modified to be,
for one reason or another, more amenable to solution. Then, in the second or conquering stage,
it is solved.
There are three major variations of this idea that differ by what we transform a given instance
to (Figure 6.1):
 Transformation to a simpler or more convenient instance of the same problem—we call
it instance simplification.
 Transformation to a different representation of the same instance—we call it
representation change.
 Transformation to an instance of a different problem for which an algorithm is already
available—we call it problem reduction.
5.1. Heaps
Heap is a partially ordered data structure that is especially suitable for implementing priority
queues. Priority queue is a multiset of items with an orderable characteristic called an item’s
priority, with the following operations:
18CS42 DAA M3-NOTES
SVIT CSE 19
 finding an item with the highest (i.e., largest) priority
 deleting an item with the highest priority
 adding a new item to the multiset
Notion of the Heap
Definition: A heap can be defined as a binary tree with keys assigned to its nodes, one key per
node, provided the following two conditions are met:
1. The shape property—the binary tree is essentially complete (or simply complete),
i.e., all its levels are full except possiblythe last level, where only some rightmost leaves
may be missing.
2. The parental dominance or heap property—the key in each node is greater than or
equal to the keys in its children.
Illustration: The illustration of the definition of heap is shown bellow: only the left most tree
is heap. The second one is not a heap, because the tree’s shape property is violated. The left
child of last subtree cannot be empty. And the third one is not a heap, because the parental
dominance fails for the node with key 5.
Properties of Heap
1. There exists exactly one essentially complete binary tree with n nodes. Its height is
equal to ⌊𝑙𝑜𝑔2𝑛⌋
2. The root of a heap always contains its largest element.
3. A node of a heap considered with all its descendants is also a heap.
4. A heap can be implemented as an array by recording its elements in the top down, left-
to-right fashion. It is convenient to store the heap’s elements in positions 1 through n
of such an array, leaving H[0] either unused or putting there a sentinel whose value is
greater than every element in the heap. In such a representation,
a. the parental node keys will be in the first ⌊n/2⌋. positions of the array, while the
leaf keys will occupy the last ⌊n/2⌋ positions;
b. the children of a key in the array’s parental position i (1≤ i ≤⌊𝑛/2⌋) will be in
positions 2i and 2i + 1, and, correspondingly, the parent of a key in position i (2
≤ i≤ n) will be in position ⌊𝑛/2⌋.
18CS42 DAA M3-NOTES
SVIT CSE 20
Heap and its array representation
Thus, we could also define a heap as an array H[1..n] in which every element in position i in
the first half of the array is greater than or equal to the elements in positions 2i and 2i + 1, i.e.,
H[i]≥max {H [2i], H [2i + 1]} for i= 1. . .⌊𝑛/2⌋
Constructions of Heap - There are two principal alternatives for constructing Heap.
1) Bottom-up heap construction 2) Top-down heap construction
Bottom-up heap construction:
The bottom-up heap construction algorithm is illustrated bellow. It initializes the essentially
complete binary tree with n nodes by placing keys in the order given and then “heapifies” the
tree as follows.
 Starting with the last parental node, the algorithm checks whether the parental
dominance holds for the key in this node. If it does not, the algorithm exchanges the
node’s key K with the larger key of its children and checks whether the parental
dominance holds for K in its new position. This process continues until the parental
dominance for K is satisfied. (Eventually, it has to because it holds automatically for
any key in a leaf.)
 After completing the “heapification” of the subtree rooted at the current parental node,
the algorithm proceeds to do the same for the node’s immediate predecessor.
 The algorithm stops after this is done for the root of the tree.
18CS42 DAA M3-NOTES
SVIT CSE 21
Illustration
Bottom-up construction of a heap for the list 2, 9, 7, 6, 5, 8. The double headed arrows show
key comparisons verifying the parental dominance.
Analysis of efficiency - bottom up heap construction algorithm:
Assume, for simplicity, that n = 2k
− 1 so that a heap’s tree is full, i.e., the largest possible
number of nodes occurs on each level. Let h be the height of the tree.
According to the first property of heaps in the list at the beginning of the section, h=⌊𝑙𝑜𝑔2𝑛⌋
or just ⌊𝑙𝑜𝑔2(𝑛 + 1)⌋= k − 1 for the specific values of n we are considering.
Each key on level I of the tree will travel to the leaf level h in the worst case of the heap
construction algorithm. Since moving to the next level down requires two comparisons—one
18CS42 DAA M3-NOTES
SVIT CSE 22
to find the larger child and the other to determine whether the exchange is required—the total
number of key comparisons involving a key on level I will be 2(h − i).
Therefore, the total number of key comparisons in the worst case will be
where the validity of the last equality can be proved either by using the closed-form formula
for the sum or by mathematical induction on h.
Thus, with this bottom-up algorithm, a heap of size n can be constructed with fewer than 2n
comparisons.
Top-down heap construction algorithm:
It constructs a heap by successive insertions of a new key into a previously constructed heap.
1. First, attach a new node with key K in it after the last leaf of the existing heap.
2. Then shift K up to its appropriate place in the new heap as follows.
a. Compare K with its parent’s key: if the latter is greater than or equal to K, stop (the
structure is a heap); otherwise, swap these two keys and compare K with its new
parent.
b. This swapping continues until K is not greater than its last parent or it reaches root.
Obviously, this insertion operation cannot require more key comparisons than the heap’s
height. Since the height of a heap with n nodes is about log2n, the time efficiency of insertion
is in O(log n).
Illustration of inserting a new key: Inserting a new key (10) into the
heap is constructed bellow. The new key is shifted up via a swap with its
parents until it is not larger than its parents (or is in the root).
Delete an item from a heap: Deleting the root’s key from a heap can be done with the
following algorithm:
Maximum Key Deletion from a heap
1. Exchange the root’s key with the last key K of the heap.
2. Decrease the heap’s size by 1.
3. “Heapify” the smaller tree by sifting K down the tree exactly in the same way we did it
in the bottom-up heap construction algorithm. That is, verify the parental dominance
18CS42 DAA M3-NOTES
SVIT CSE 23
Illustration
for K: if it holds, we are done; if not, swap K with the larger of its children and repeat
this operation until the parental dominance condition holds for K in its new position.
The efficiency of deletion is determined by the number of key comparisons needed to
“heapify” the tree after the swap has been made and the size of the tree is decreased by 1.Since
this cannot require more key comparisons than twice the heap’s height, the time efficiency of
deletion is in O (log n) as well.
5.2. Heap Sort
Heapsort - an interesting sorting algorithm is discovered byJ. W. J. Williams. This is a two-
stage algorithm that works as follows.
Stage 1 (heap construction): Construct a heap for a given array.
Stage 2 (maximum deletions): Apply the root-deletion operation n−1 times to the
remaining heap.
As a result, the array elements are eliminated in decreasing order. But since under the array
implementation of heaps an element being deleted is placed last, the resulting array will be
exactly the original array sorted in increasing order.
Heap sort is traced on a specific input is shown below:
18CS42 DAA M3-NOTES
SVIT CSE 24
Analysis of efficiency: Since we already know that the heap construction stage of the algorithm
is in O(n), we have to investigate just the time efficiency of the second stage. For the number
of key comparisons, C(n), needed for eliminating the root keys from the heaps of diminishing
sizes from n to 2, we get the following inequality:
This means that C(n) ∈ O(n log n) for the second stage of heapsort. For both stages, we get
O(n) + O(n log n) = O(n log n). A more detailed analysis shows that the time efficiency of
heapsort is, in fact, in Θ(n log n) in both the worst and average cases. Thus, heapsort’s time
efficiency falls in the same class as that of mergesort.
Heapsort is in-place, i.e., it does not require any extra storage. Timing experiments on random
files show that heapsort runs more slowly than quicksort but can be competitive with mergesort.
*****

More Related Content

PPTX
daa-unit-3-greedy method
PDF
final-ppts-daa-unit-iii-greedy-method.pdf
PPTX
UNIT-II.pptx
PDF
Unit-3 greedy method, Prim's algorithm, Kruskal's algorithm.pdf
PPT
Matrix Multiplication(An example of concurrent programming)
PPTX
Module 3_DAA (2).pptx
PDF
Computer algorithm(Dynamic Programming).pdf
PDF
Daa chapter 3
daa-unit-3-greedy method
final-ppts-daa-unit-iii-greedy-method.pdf
UNIT-II.pptx
Unit-3 greedy method, Prim's algorithm, Kruskal's algorithm.pdf
Matrix Multiplication(An example of concurrent programming)
Module 3_DAA (2).pptx
Computer algorithm(Dynamic Programming).pdf
Daa chapter 3

Similar to Introduction to Greedy method, 0/1 Knapsack problem (20)

PPTX
Divide and Conquer / Greedy Techniques
PDF
Unit3_1.pdf
PPTX
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
PPT
Unit 3-Greedy Method
PPTX
Greedy Algorithms
PPTX
Data Structures and Algorithms Kruskals algorithm
PPTX
DYNAMIC PROGRAMMING AND GREEDY TECHNIQUE
PDF
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
PPTX
greedy algorithm.pptx good for understanding
PPT
dynamic programming Rod cutting class
PPT
Unit-2 Branch & Bound Design of Algorithms.ppt
PDF
Mastering Greedy Algorithms: Optimizing Solutions for Efficiency"
PPT
3 Greedy-lec.pptggggghhhhhhhyyyyyyyyyyyyyy
PDF
Daa chapter 2
PPTX
Greedy Strategy.pptxbfasjbjfn asnfn anjn
PPTX
Daa unit 2
PPTX
Algorithm Using Divide And Conquer
PPTX
Daa unit 2
PPTX
Dynamic programming1
PDF
Introduction to Algorithm Design and Analysis.pdf
Divide and Conquer / Greedy Techniques
Unit3_1.pdf
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
Unit 3-Greedy Method
Greedy Algorithms
Data Structures and Algorithms Kruskals algorithm
DYNAMIC PROGRAMMING AND GREEDY TECHNIQUE
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
greedy algorithm.pptx good for understanding
dynamic programming Rod cutting class
Unit-2 Branch & Bound Design of Algorithms.ppt
Mastering Greedy Algorithms: Optimizing Solutions for Efficiency"
3 Greedy-lec.pptggggghhhhhhhyyyyyyyyyyyyyy
Daa chapter 2
Greedy Strategy.pptxbfasjbjfn asnfn anjn
Daa unit 2
Algorithm Using Divide And Conquer
Daa unit 2
Dynamic programming1
Introduction to Algorithm Design and Analysis.pdf
Ad

More from DrSMeenakshiSundaram1 (7)

PDF
Performance Analysis,Time complexity, Asymptotic Notations
PDF
P, NP, NP-Hard & NP-complete problems, Optimization
PPTX
Decision Trees,P, NP, NP Hard and NP Complete Problems
PPTX
Prims Algorithm, Kruskals algorithm, Dijkstra’s Algorithm
PPTX
Balanced Search Trees,Properties of Heap
PPTX
TRAVELING SALESMAN PROBLEM, Divide and Conquer
PPTX
Analysis Framework, Asymptotic Notations
Performance Analysis,Time complexity, Asymptotic Notations
P, NP, NP-Hard & NP-complete problems, Optimization
Decision Trees,P, NP, NP Hard and NP Complete Problems
Prims Algorithm, Kruskals algorithm, Dijkstra’s Algorithm
Balanced Search Trees,Properties of Heap
TRAVELING SALESMAN PROBLEM, Divide and Conquer
Analysis Framework, Asymptotic Notations
Ad

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT 4 Total Quality Management .pptx
PDF
PPT on Performance Review to get promotions
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Welding lecture in detail for understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
additive manufacturing of ss316l using mig welding
UNIT 4 Total Quality Management .pptx
PPT on Performance Review to get promotions
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Model Code of Practice - Construction Work - 21102022 .pdf
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering
Lesson 3_Tessellation.pptx finite Mathematics
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Foundation to blockchain - A guide to Blockchain Tech
Operating System & Kernel Study Guide-1 - converted.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lecture Notes Electrical Wiring System Components
Welding lecture in detail for understanding

Introduction to Greedy method, 0/1 Knapsack problem

  • 1. Module-3: Greedy Method Lecture Notes on Design and Analysis of Algorithms 18CS42 Contents 1. Introduction to Greedy method 1.1. General method, 1.2. Coin Change Problem 1.3. Knapsack Problem 1.4. Job sequencing with deadlines 2. Minimum cost spanning trees: 2.1. Prim’s Algorithm, 2.2. Kruskal’s Algorithm 3. Single source shortest paths 3.1. Dijkstra's Algorithm 4. Optimal Tree problem: 4.1. Huffman Trees and Codes 5. Transform and Conquer Approach: 5.1. Heaps 5.2. Heap Sort
  • 2. 18CS42 DAA M3-NOTES SVIT CSE 2 1. Introduction to Greedy method 1.1 General method The greedy method is the straight forward design technique applicable to variety of applications. The greedy approach suggests constructing a solution through a sequence of steps, each expanding a partially constructed solution obtained so far, until a complete solution to the problem is reached. On each step the choice made must be:  feasible, i.e., it has to satisfy the problem’s constraints  locally optimal, i.e., it has to be the best local choice among all feasible choices available on that step  irrevocable, i.e., once made, it cannot be changed on subsequent steps of the algorithm As a rule, greedy algorithms are both intuitively appealing and simple. Given an optimization problem, it is usually easy to figure out how to proceed in a greedy manner, possibly after considering a few small instances of the problem. What is usually more difficult is to prove that a greedy algorithm yields an optimal solution (when it does). 1.2. Coin Change Problem Problem Statement: Given coins of several denominations find out a way to give a customer an amount with fewest number of coins. Example: if denominations are 1,5,10, 25 and 100 and the change required is 30, the solutions are, Amount : 30 Solutions : 3 x 10 ( 3 coins ), 6 x 5 ( 6 coins ) 1 x 25 + 5 x 1 ( 6 coins ) 1 x 25 + 1 x 5 ( 2 coins ) The last solution is the optimal one as it gives us change only with 2 coins.
  • 3. 18CS42 DAA M3-NOTES SVIT CSE 3 Solution for coin change problem using greedy algorithm is very intuitive and called as cashier’s algorithm. Basic principle is: At every iteration for search of a coin, take the largest coin which can fit into remain amount to be changed at that particular time. At the end you will have optimal solution. 1.3. Knapsack Problem (Fractional knapsack problem) Consider the following instance of the knapsack problem: n=3, m=20, (p1, p2, p3) =(25, 24, 15), (w1, w2, w3) =(18, 15, 10) There are several greedy methods to obtain the feasible solutions. Three are discussed here a) At each step fill the knapsack with the object with largest profit - If the object under consideration does not fit, then the fraction of it is included to fill the knapsack. This method does not result optimal solution. As per this method the solution to the above problem is as follows; Select Item-1 with profit p1=25, here w1=18, x1=1. Remaining capacity = 20-18 = 2 Select Item-2 with profit p1=24, here w2=15, x1=2/15. Remaining capacity = 0 Total profit earned = 28.2. Therefore optimal solution is (x1, x2, x3) = (1, 2/15, 0) with profit = 28.2 b) At each step fill the object with smallest weight Select Item-3 with profit p1=15, here w1=10, x3=1. Remaining capacity = 20-10 = 10 Select Item-2 with profit p1=24, here w2=15, x1=10/15. Remaining capacity = 0 Total profit earned = 31. Optimal solution using this method is (x1, x2, x3) = (0, 2/3, 1) with profit = 31 Note: Optimal solution is not guaranteed using method a and b
  • 4. 18CS42 DAA M3-NOTES SVIT CSE 4 c) At each step include the object with maximum profit/weight ratio Select Item-2 with profit p1=24, here w2=15, x1=1. Remaining capacity = 20-15=5 Select Item-3 with profit p1=15, here w1=10, x1=5/10. Remaining capacity = 0 Total profit earned = 31.5 Therefore, optimal solution is (x1, x2, x3) = (0, 1, 1/2) with profit = 31.5 This greedy approach always results optimal solution. Algorithm: The algorithm given below assumes that the objects are sorted in non-increasing order of profit/weight ratio Analysis: Disregarding the time to initially sort the object, each of the above strategies use O(n) time, 0/1 Knapsack problem Note: The greedy approach to solve 0/1 knapsack problem does not necessarily yield an optimal solution
  • 5. 18CS42 DAA M3-NOTES SVIT CSE 5 1.4. Job sequencing with deadlines The greedy strategy to solve job sequencing problem is, “At each time select the job that that satisfies the constraints and gives maximum profit. i.e consider the jobs in the non-increasing order of the pi’s” By following this procedure, we get the 3rd solution in the example 4.3. It can be proved that, this greedy strategy always results optimal solution High level description of job sequencing algorithm
  • 6. 18CS42 DAA M3-NOTES SVIT CSE 6 Algorithm/Program 4.6: Greedy algorithm for sequencing unit time jobs with deadlines and profits Analysis: Fast Job Scheduling Algorithm
  • 7. 18CS42 DAA M3-NOTES SVIT CSE 7 Algorithm: Fast Job Scheduling is shown in next page Analysis Algorithm: Fast Job Scheduling
  • 8. 18CS42 DAA M3-NOTES SVIT CSE 8 Problem: Find solution generated by job sequencing problem with deadlines for 7 jobs given profits 3, 5, 20, 18, 1, 6, 30 and deadlines 1, 3, 4, 3, 2, 1, 2 respectively. Solution: Given Sort the jobs as per the decreasing order of profit J7 J3 J4 J6 J2 J1 J5 Profit 30 20 18 6 5 3 1 Deadline 2 4 3 1 3 1 2 Maximum deadline is 4. Therefore create 4 slots. Now allocate jobs to highest slot, starting from the job of highest profit Select Job 7 – Allocate to slot-2 Select Job 3 – Allocate to slot-4 Select Job 4 – Allocate to slot-3 Select Job 6 – Allocate to slot-1 Total profit earned is = 30+20+18+6=74 Problem: What is the solution generated by job sequencing when n = 5, (P1, P2, P3, P4, P5) = (20, 15, 10, 5, 1), (d1, d2, d3, d4, d5) = (2, 2, 1, 3, 3) Solution The Jobs are already sorted according to decreasing order of profit. Maximum deadline is 3. Therefore create 4 slots. Allocate jobs to highest slot, starting from the job of highest profit Select Job 1 – Allocate to slot-2 Select Job 2 – Allocate to slot-1 as 2 is already filled Select Job 3 –Slot-2 &1 are already filled. Cannot be allocated. Select Job 4 – Allocate to slot-3 Total profit earned is = 20+15+5=40 J1 J2 J3 J4 J5 J6 J7 Profit 3 5 20 18 1 6 30 Deadline 1 3 4 3 2 1 2 Slot 1 2 3 4 Job J6 J7 J4 J3 Slot 1 2 3 Job J2 J1 J4
  • 9. 18CS42 DAA M3-NOTES SVIT CSE 9 2. Minimum cost spanning trees Definition: A spanning tree of a connected graph is its connected acyclic subgraph (i.e., a tree) that contains all the vertices of the graph. A minimum spanning tree of a weighted connected graph is its spanning tree of the smallest weight, where the weight of a tree is defined as the sum of the weights on all its edges. The minimum spanning tree problem is the problem of finding a minimum spanning tree for a given weighted connected graph. 2.1. Prim’s Algorithm Prim's algorithm constructs a minimum spanning tree through a sequence of expanding sub- trees. The initial subtree in such a sequence consists of a single vertex selected arbitrarily from the set V of the graph's vertices. On each iteration it expands the current tree in the greedy manner by simply attaching to it the nearest vertex not in that tree. The algorithm stops after all the graph's vertices have been included in the tree being constructed. Since the algorithm expands a tree by exactly one vertex on each of its iterations, the total number of such iterations is n - 1, where n is the number of vertices in the graph. The tree generated by the algorithm is obtained as the set of edges. Correctness: Prim’s algorithm always yields a minimum spanning tree.
  • 10. 18CS42 DAA M3-NOTES SVIT CSE 10 Example: An example of prim’s algorithm is shown below. The parenthesized labels of a vertex in the middle column indicate the nearest tree vertex and edge weight; selected vertices and edges are shown in bold. Tree vertices Remaining vertices Illustration Analysis of Efficiency The efficiency of Prim’s algorithm depends on the data structures chosen for the graph itself and for the priority queue of the set V − VT whose vertex priorities are the distances to the nearest tree vertices. 1. If a graph is represented by its weight matrix and the priority queue is implemented as an unordered array, the algorithm’s running time will be in Θ(|V|2). Indeed, on each
  • 11. 18CS42 DAA M3-NOTES SVIT CSE 11 of the |V| − 1iterations, the array implementing the priority queue is traversed to find and delete the minimum and then to update, if necessary, the priorities of the remaining vertices. We can implement the priority queue as a min-heap. (A min-heap is a complete binary tree in which every element is less than or equal to its children.) Deletion of the smallest element from and insertion of a new element into a min-heap of size n are O(log n) operations. 2. If a graph is represented by its adjacency lists and the priority queue is implemented as a min-heap, the running time of the algorithm is in O(|E| log |V |). This is because the algorithm performs |V| − 1 deletions of the smallest element and makes |E| verifications and, possibly, changes of an element’s priority in a min-heap of size not exceeding |V|. Each of these operations, as noted earlier, is a O(log |V|) operation. Hence, the running time of this implementation of Prim’s algorithm is in (|V| − 1+ |E|) O (log |V |) = O(|E| log |V |) because, in a connected graph, |V| − 1≤ |E|. 2.2. Kruskal’s Algorithm Background: Kruskal's algorithm is another greedy algorithm for the minimum spanning tree problem that also always yields an optimal solution. It is named Kruskal's algorithm, after Joseph Kruskal. Kruskal's algorithm looks at a minimum spanning tree for a weighted connected graph G = (V, E) as an acyclic sub graph with |V | - 1 edges for which the sum of the edge weights is the smallest. Consequently, the algorithm constructs a minimum spanning tree as an expanding sequence of sub graphs, which are always acyclic but are not necessarily connected on the intermediate stages of the algorithm. Working: The algorithm begins by sorting the graph's edges in non-decreasing order of their weights. Then, starting with the empty subgraph, it scans this sorted list adding the next edge on the list to the current sub graph if such an inclusion does not create a cycle and simply skipping the edge otherwise.
  • 12. 18CS42 DAA M3-NOTES SVIT CSE 12 The fact that ET ,the set of edges composing a minimum spanning tree of graph G actually a tree in Prim's algorithm but generally just an acyclic sub graph in Kruskal's algorithm. Kruskal’s algorithm is not simpler because it has to check whether the addition of the next edge to the edges already selected would create a cycle. We can consider the algorithm's operations as a progression through a series of forests containing all the vertices of a given graph and some of its edges. The initial forest consists of |V| trivial trees, each comprising a single vertex of the graph. The final forest consists of a single tree, which is a minimum spanning tree of the graph. On each iteration, the algorithm takes the next edge (u, v) from the sorted list of the graph's edges, finds the trees containing the vertices u and v, and, if these trees are not the same, unites them in a larger tree by adding the edge (u, v). Analysis of Efficiency The crucial check whether two vertices belong to the same tree can be found out using union- find algorithms. Efficiency of Kruskal’s algorithm is based on the time needed for sorting the edge weights of a given graph. Hence, with an efficient sorting algorithm, the time efficiency of Kruskal's algorithm will be in O (|E| log |E|). Illustration An example of Kruskal’s algorithm is shown below. The selected edges are shown in bold.
  • 13. 18CS42 DAA M3-NOTES SVIT CSE 13 3. Single source shortest paths Single-source shortest-paths problem is defined as follows. For a given vertex called the source in a weighted connected graph, the problem is to find shortest paths to all its other vertices. The single-source shortest-paths problem asks for a family of paths, each leading from the source to a different vertex in the graph, though some paths may, of course, have edges in common. 3.1. Dijkstra's Algorithm Dijkstra's Algorithm is the best-known algorithm for the single-source shortest-paths problem. This algorithm is applicable to undirected and directed graphs with nonnegative weights only. Working - Dijkstra's algorithm finds the shortest paths to a graph's vertices in order of their distance from a given source.  First, it finds the shortest path from the source to a vertex nearest to it, then to a second nearest, and so on.  In general, before its ith iteration commences, the algorithm has alreadyidentified the shortest paths to i-1 other vertices nearest to the source. These vertices, the source, and the edges of the shortest paths leading to them from the source form a subtree Ti of the given graph shown in the figure.  Since all the edge weights are nonnegative, the next vertex nearest to the source can be found among the vertices adjacent to the vertices of Ti. The
  • 14. 18CS42 DAA M3-NOTES SVIT CSE 14 set of vertices adjacent to the vertices in Ti can be referred to as "fringe vertices"; they are the candidates from which Dijkstra's algorithm selects the next vertex nearest to the source.  To identify the ith nearest vertex, the algorithm computes, for every fringe vertex u, the sum of the distance to the nearest tree vertex v (given by the weight of the edge (v, u)) and the length d., of the shortest path from the source to v (previously determined by the algorithm) and then selects the vertex with the smallest such sum. The fact that it suffices to compare the lengths of such special paths is the central insight of Dijkstra's algorithm.  To facilitate the algorithm's operations, we label each vertex with two labels. o The numeric label d indicates the length of the shortest path from the source to this vertex found by the algorithm so far; when a vertex is added to the tree, d indicates the length of the shortest path from the source to that vertex. o The other label indicates the name of the next-to-last vertex on such a path, i.e., the parent of the vertex in the tree being constructed. (It can be left unspecified for the sources and vertices that are adjacent to none of the current tree vertices.) With such labeling, finding the next nearest vertex u* becomes a simple task of finding a fringe vertex with the smallest d value. Ties can be broken arbitrarily.  After we have identified a vertex u* to be added to the tree, we need to perform two operations: o Move u* from the fringe to the set of tree vertices. o For each remaining fringe vertex u that is connected to u* by an edge of weight w(u*, u) such that du*+ w(u*, u) <du, update the labels of u by u* and du*+ w(u*, u), respectively. Illustration: An example of Dijkstra's algorithm is shown below. The next closest vertex is shown in bold. (see the figure in next page) The shortest paths (identified by following nonnumeric labels backward from a destination vertex in the left column to the source) and their lengths (given by numeric labels of the tree vertices) are as follows: The pseudocode of Dijkstra’s algorithm is given below. Note that in the following pseudocode, VT contains a given source vertex and the fringe contains the vertices adjacent to it after iteration 0 is completed.
  • 15. 18CS42 DAA M3-NOTES SVIT CSE 15 Analysis: The time efficiency of Dijkstra’s algorithm depends on the data structures used for implementing the priority queue and for representing an input graph itself.
  • 16. 18CS42 DAA M3-NOTES SVIT CSE 16 Efficiency is Θ(|V|2) for graphs represented by their weight matrix and the priority queue implemented as an unordered array. For graphs represented by their adjacency lists and the priority queue implemented as a min- heap, it is in O (|E| log |V| ) Applications  Transportation planning and packet routing in communication networks, including the Internet  Finding shortest paths in social networks, speech recognition, document formatting, robotics, compilers, and airline crew scheduling. 4. Optimal Tree problem Background: Suppose we have to encode a text that comprises characters from some n-character alphabet by assigning to each of the text's characters some sequence of bits called the codeword. There are two types of encoding: Fixed-length encoding, Variable-length encoding Fixed-length encoding: This method assigns to each character a bit string of the same length m (m >= log2n). This is exactly what the standard ASCII code does. One way of getting a coding scheme that yields a shorter bit string on the average is based on the old idea of assigning shorter code-words to more frequent characters and longer code-words to less frequent characters. Variable-length encoding: This method assigns code-words of different lengths to different characters, introduces a problem that fixed-length encoding does not have. Namely, how can we tell how many bits of an encoded text represent the first character? (or, more generally, the ith ) To avoid this complication, we can limit ourselves to prefix-free (or simply prefix) codes. In a prefix ode, no code word is a prefix of a codeword of another character. Hence, with such an encoding, we can simply scan a bit string until we get the first group of bits that is a codeword for some character, replace these bits by this character, and repeat this operation until the bit string's end is reached. If we want to create a binary prefix code for some alphabet, it is natural to associate the alphabet's characters with leaves of a binary tree in which all the left edges are labelled by 0 and all the right edges are labelled by 1 (or vice versa). The codeword of a character can then be obtained by recording the labels on the simple path from the root to the character's leaf. Since there is no simple path to a leaf that continues to another leaf, no codeword can be a prefix of another codeword; hence, any such tree yields a prefix code. Among the many trees that can be constructed in this manner for a given alphabet with known frequencies of the character occurrences, construction of such a tree that would assign shorter bit strings to high-frequency characters and longer ones to low-frequency characters can be done by the following greedy algorithm, invented by David Huffman.
  • 17. 18CS42 DAA M3-NOTES SVIT CSE 17 4.1 Huffman Trees and Codes Huffman's Algorithm Step 1: Initialize n one-node trees and label them with the characters of the alphabet. Record the frequency of each character in its tree's root to indicate the tree's weight. (More generally, the weight of a tree will be equal to the sum of the frequencies in the tree's leaves.) Step 2: Repeat the following operation until a single tree is obtained. Find two trees with the smallest weight. Make them the left and right subtree of a new tree and record the sum of their weights in the root of the new tree as its weight. A tree constructed by the above algorithm is called a Huffmantree. It defines-in the manner described-a Huffman code. Example: Consider the five-symbol alphabet {A, B, C, D, _} with the following occurrence frequencies in a text made up of these symbols: The Huffman tree construction for the above problem is shown below: The resulting codewords are as follows:
  • 18. 18CS42 DAA M3-NOTES SVIT CSE 18 Hence, DAD is encoded as 011101, and 10011011011101 is decoded as BAD_AD. With the occurrence frequencies given and the code word lengths obtained, the average number of bits per symbol in this code is 2 *0.35 + 3 *0.1+ 2 *0.2 + 2 *0.2 + 3 *0.15 = 2.25. Had we used a fixed-length encoding for the same alphabet, we would have to use at least 3 bits per each symbol. Thus, for this example, Huffman’s code achieves the compression ratio (a standard measure of a compression algorithm’s effectiveness) of (3−2.25)/3*100%= 25%. In other words, Huffman’s encoding of the above text will use 25% less memory than its fixed- length encoding. 5. Transform and Conquer Approach We call this general technique transform-and-conquer because these methods work as two- stage procedures. First, in the transformation stage, the problem’s instance is modified to be, for one reason or another, more amenable to solution. Then, in the second or conquering stage, it is solved. There are three major variations of this idea that differ by what we transform a given instance to (Figure 6.1):  Transformation to a simpler or more convenient instance of the same problem—we call it instance simplification.  Transformation to a different representation of the same instance—we call it representation change.  Transformation to an instance of a different problem for which an algorithm is already available—we call it problem reduction. 5.1. Heaps Heap is a partially ordered data structure that is especially suitable for implementing priority queues. Priority queue is a multiset of items with an orderable characteristic called an item’s priority, with the following operations:
  • 19. 18CS42 DAA M3-NOTES SVIT CSE 19  finding an item with the highest (i.e., largest) priority  deleting an item with the highest priority  adding a new item to the multiset Notion of the Heap Definition: A heap can be defined as a binary tree with keys assigned to its nodes, one key per node, provided the following two conditions are met: 1. The shape property—the binary tree is essentially complete (or simply complete), i.e., all its levels are full except possiblythe last level, where only some rightmost leaves may be missing. 2. The parental dominance or heap property—the key in each node is greater than or equal to the keys in its children. Illustration: The illustration of the definition of heap is shown bellow: only the left most tree is heap. The second one is not a heap, because the tree’s shape property is violated. The left child of last subtree cannot be empty. And the third one is not a heap, because the parental dominance fails for the node with key 5. Properties of Heap 1. There exists exactly one essentially complete binary tree with n nodes. Its height is equal to ⌊𝑙𝑜𝑔2𝑛⌋ 2. The root of a heap always contains its largest element. 3. A node of a heap considered with all its descendants is also a heap. 4. A heap can be implemented as an array by recording its elements in the top down, left- to-right fashion. It is convenient to store the heap’s elements in positions 1 through n of such an array, leaving H[0] either unused or putting there a sentinel whose value is greater than every element in the heap. In such a representation, a. the parental node keys will be in the first ⌊n/2⌋. positions of the array, while the leaf keys will occupy the last ⌊n/2⌋ positions; b. the children of a key in the array’s parental position i (1≤ i ≤⌊𝑛/2⌋) will be in positions 2i and 2i + 1, and, correspondingly, the parent of a key in position i (2 ≤ i≤ n) will be in position ⌊𝑛/2⌋.
  • 20. 18CS42 DAA M3-NOTES SVIT CSE 20 Heap and its array representation Thus, we could also define a heap as an array H[1..n] in which every element in position i in the first half of the array is greater than or equal to the elements in positions 2i and 2i + 1, i.e., H[i]≥max {H [2i], H [2i + 1]} for i= 1. . .⌊𝑛/2⌋ Constructions of Heap - There are two principal alternatives for constructing Heap. 1) Bottom-up heap construction 2) Top-down heap construction Bottom-up heap construction: The bottom-up heap construction algorithm is illustrated bellow. It initializes the essentially complete binary tree with n nodes by placing keys in the order given and then “heapifies” the tree as follows.  Starting with the last parental node, the algorithm checks whether the parental dominance holds for the key in this node. If it does not, the algorithm exchanges the node’s key K with the larger key of its children and checks whether the parental dominance holds for K in its new position. This process continues until the parental dominance for K is satisfied. (Eventually, it has to because it holds automatically for any key in a leaf.)  After completing the “heapification” of the subtree rooted at the current parental node, the algorithm proceeds to do the same for the node’s immediate predecessor.  The algorithm stops after this is done for the root of the tree.
  • 21. 18CS42 DAA M3-NOTES SVIT CSE 21 Illustration Bottom-up construction of a heap for the list 2, 9, 7, 6, 5, 8. The double headed arrows show key comparisons verifying the parental dominance. Analysis of efficiency - bottom up heap construction algorithm: Assume, for simplicity, that n = 2k − 1 so that a heap’s tree is full, i.e., the largest possible number of nodes occurs on each level. Let h be the height of the tree. According to the first property of heaps in the list at the beginning of the section, h=⌊𝑙𝑜𝑔2𝑛⌋ or just ⌊𝑙𝑜𝑔2(𝑛 + 1)⌋= k − 1 for the specific values of n we are considering. Each key on level I of the tree will travel to the leaf level h in the worst case of the heap construction algorithm. Since moving to the next level down requires two comparisons—one
  • 22. 18CS42 DAA M3-NOTES SVIT CSE 22 to find the larger child and the other to determine whether the exchange is required—the total number of key comparisons involving a key on level I will be 2(h − i). Therefore, the total number of key comparisons in the worst case will be where the validity of the last equality can be proved either by using the closed-form formula for the sum or by mathematical induction on h. Thus, with this bottom-up algorithm, a heap of size n can be constructed with fewer than 2n comparisons. Top-down heap construction algorithm: It constructs a heap by successive insertions of a new key into a previously constructed heap. 1. First, attach a new node with key K in it after the last leaf of the existing heap. 2. Then shift K up to its appropriate place in the new heap as follows. a. Compare K with its parent’s key: if the latter is greater than or equal to K, stop (the structure is a heap); otherwise, swap these two keys and compare K with its new parent. b. This swapping continues until K is not greater than its last parent or it reaches root. Obviously, this insertion operation cannot require more key comparisons than the heap’s height. Since the height of a heap with n nodes is about log2n, the time efficiency of insertion is in O(log n). Illustration of inserting a new key: Inserting a new key (10) into the heap is constructed bellow. The new key is shifted up via a swap with its parents until it is not larger than its parents (or is in the root). Delete an item from a heap: Deleting the root’s key from a heap can be done with the following algorithm: Maximum Key Deletion from a heap 1. Exchange the root’s key with the last key K of the heap. 2. Decrease the heap’s size by 1. 3. “Heapify” the smaller tree by sifting K down the tree exactly in the same way we did it in the bottom-up heap construction algorithm. That is, verify the parental dominance
  • 23. 18CS42 DAA M3-NOTES SVIT CSE 23 Illustration for K: if it holds, we are done; if not, swap K with the larger of its children and repeat this operation until the parental dominance condition holds for K in its new position. The efficiency of deletion is determined by the number of key comparisons needed to “heapify” the tree after the swap has been made and the size of the tree is decreased by 1.Since this cannot require more key comparisons than twice the heap’s height, the time efficiency of deletion is in O (log n) as well. 5.2. Heap Sort Heapsort - an interesting sorting algorithm is discovered byJ. W. J. Williams. This is a two- stage algorithm that works as follows. Stage 1 (heap construction): Construct a heap for a given array. Stage 2 (maximum deletions): Apply the root-deletion operation n−1 times to the remaining heap. As a result, the array elements are eliminated in decreasing order. But since under the array implementation of heaps an element being deleted is placed last, the resulting array will be exactly the original array sorted in increasing order. Heap sort is traced on a specific input is shown below:
  • 24. 18CS42 DAA M3-NOTES SVIT CSE 24 Analysis of efficiency: Since we already know that the heap construction stage of the algorithm is in O(n), we have to investigate just the time efficiency of the second stage. For the number of key comparisons, C(n), needed for eliminating the root keys from the heaps of diminishing sizes from n to 2, we get the following inequality: This means that C(n) ∈ O(n log n) for the second stage of heapsort. For both stages, we get O(n) + O(n log n) = O(n log n). A more detailed analysis shows that the time efficiency of heapsort is, in fact, in Θ(n log n) in both the worst and average cases. Thus, heapsort’s time efficiency falls in the same class as that of mergesort. Heapsort is in-place, i.e., it does not require any extra storage. Timing experiments on random files show that heapsort runs more slowly than quicksort but can be competitive with mergesort. *****