SlideShare a Scribd company logo
Algorithms
Parallel Algorithms
1
Page 2
An overview
• A parallel merging algorithm
• Accelerated Cascading and Parallel List
Ranking
Page 3
Parallel merging through
partitioning
The partitioning strategy consists of:
• Breaking up the given problem into many
independent subproblems of equal size
• Solving the subproblems in parallel
This is similar to the divide-and-conquer
strategy in sequential computing.
Page 4
Partitioning and Merging
Given a set S with a relation , S is linearly
ordered, if for every pair a,b S.
• either a b or b a.
The merging problem is the following:
Page 5
Partitioning and Merging
Input: Two sorted arrays A = (a1, a2,..., am) and
B = (b1, b2,..., bn) whose elements are drawn
from a linearly ordered set.
Output: A merged sorted sequence
C = (c1, c2,..., cm+n).
Page 6
Merging
For example, if A = (2,8,11,13,17,20) and B =
(3,6,10,15,16,73), the merged sequence
C = (2,3,6,8,10,11,13,15,16,17,20,73).
Page 7
Merging
A sequential algorithm
• Simultaneously move two pointers along the
two arrays
• Write the items in sorted order in another
array
Page 8
Partitioning and Merging
• The complexity of the sequential algorithm is
O(m + n).
• We will use the partitioning strategy for
solving this problem in parallel.
Page 9
Partitioning and Merging
Definitions:
rank(ai : A) is the number of elements in A less
than or equal to ai A.
rank(bi : A) is the number of elements in A less
than or equal to bi B.
Page 10
Merging
For example, consider the arrays:
A = (2,8,11,13,17,20)
B = (3,6,10,15,16,73)
rank(11 : A) = 3 and rank(11 : B) = 3.
Page 11
Merging
• The position of an element ai A in the sorted
array C is:
rank(ai : A) + rank(ai : B).
For example, the position of 11 in the sorted
array C is:
rank(11 : A) + rank(11 : B) = 3 + 3 = 6.
Page 12
Parallel Merging
• The idea is to decompose the overall merging
problem into many smaller merging
problems.
• When the problem size is sufficiently small,
we will use the sequential algorithm.
Page 13
Merging
• The main task is to generate smaller merging
problems such that:
• Each sequence in such a smaller problem has
O(log m) or O(log n) elements.
• Then we can use the sequential algorithm since
the time complexity will be O(log m + log n).
Page 14
Parallel Merging
Step 1. Divide the array B into blocks such that each
block has log m elements. Hence there are m/log m
blocks.
For each block, the last elements are
i log m, 1 i m/log m
Page 15
Parallel Merging
Step 2. We allocate one processor for each last
element in B.
•For a last element i log m, this processor does
a binary search in the array A to determine two
elements ak, ak+1 such that ak i log m ak+1.
•All the m/log m binary searches are done in
parallel and take O(log m) time each.
Page 16
Parallel Merging
• After the binary searches are over, the array
A is divided into m/log m blocks.
• There is a one-to-one correspondence
between the blocks in A and B. We call a pair
of such blocks as matching blocks.
Page 17
Parallel Merging
• Each block in A is determined in the following
way.
• Consider the two elements i log m and(i + 1)
log m. These are the elements in the (i + 1)-th
block of B.
• The two elements that determine rank(i log m
: A) and rank((i + 1) log m : A) define the
matching block in A
Page 18
Parallel Merging
• These two matching blocks determine a smaller
merging problem.
• Every element inside a matching block has to be
ranked inside the other matching block.
• Hence, the problem of merging a pair of matching
blocks is an independent subproblem which does
not affect any other block.
Page 19
Parallel Merging
• If the size of each block in A is O(log m), we can
directly run the sequential algorithm on every pair of
matching blocks from A and B.
• Some blocks in A may be larger than O(log m) and
hence we have to do some more work to break
them into smaller blocks.
Page 20
Parallel Merging
If a block in Ai is larger than O(log m) and the
matching block of Ai is Bj, we do the following
•We divide Ai into blocks of size O(log m).
•Then we apply the same algorithm to rank the
boundary elements of each block in Ai in Bj.
•Now each block in A is of size O(log m)
•This takes O(log log m) time.
Page 21
Parallel Merging
Step 3.
• We now take every pair of matching blocks from A
and B and run the sequential merging algorithm.
• One processor is allocated for every matching pair
and this processor merges the pair in O(log m)
time.
We have to analyse the time and processor
complexities of each of the steps to get the overall
complexities.
Page 22
Parallel Merging
Complexity of Step 1
• The task in Step 1 is to partition B into
blocks of size log m.
• We allocate m/log m processors.
• Since B is an array, processor Pi, 1 i m/log
m can find the element i log m in O(1) time.
Page 23
Parallel Merging
Complexity of Step 2
• In Step 2, m/log m processors do binary
search in array A in O(log n) time each.
• Hence the time complexity is O(log n) and
the work done is
(m log n)/ log m (m log(m + n)) / log m (m + n)
for n,m 4. Hence the total work is O(m + n).
Page 24
Parallel Merging
Complexity of Step 3
• In Step 3, we use m/log m processors
• Each processor merges a pair Ai, Bi in O(log m)
time.Hence the total work done is m.
Theorem
Let A and B be two sorted sequences each of
length n. A and B can be merged in O(log n) time
using O(n) operations in the CREW PRAM.
25
Accelerated Cascading and Parallel List
Ranking
• We will first discuss a technique called
accelerated cascading for designing very fast
parallel algorithms.
• We will then study a very important technique
for ranking the elements of a list in parallel.
26
Fast computation of maximum
Input: An array A holding p elements from a linearly ordered
universe S. We assume that all the elements in A are
distinct.
Output: The maximum element from the array A.
We use a boolean array M such that M(k)=1 if and only if
A(k) is the maximum element in A.
Initialization: We allocate p processors to set each entry in
M to 1.
27
Fast computation of maximum
Step 1: Assign p processors for each element in A, p2
processors overall.
•Consider the p processors allocated to A(j). We name
these processors as P1, P2,..., Pi,..., Pp.
•Pi compares A(j) with A(i) :
If A(i) > A(j) then M(j) := 0
else do nothing.
28
Fast computation of maximum
Step 2: At the end of Step 1, M(k) , 1 k p will
be 1 if and only if A(k) is the maximum element.
•We allocate p processors, one for each entry in
M.
•If the entry is 0, the processor does nothing.
•If the entry is 1, it outputs the index k of the
maximum element.
29
Fast computation of maximum
Complexity: The processor requirement is p2
and the time complexity is O(1).
• We need concurrent write facility and hence
the Common CRCW PRAM model.
30
Optimal computation of
maximum
• This is the same algorithm which we used for
adding n numbers.
31
Optimal computation of
maximum
• This algorithm takes O(n) processors and
O(log n) time.
• We can reduce the processor complexity to
O(n / log n). Hence the algorithm does optimal
O(n) work.
32
An O(log log n) time algorithm
• Instead of a binary tree, we use a more complex
tree. Assume that .
• The root of the tree has children.
• Each node at the i-th level has children for
.
• Each node at level k has two children.
2
2
k
n
1
2
2
k
n
1
2
2
k i
0 1i k
33
An O(log log n) time algorithm
Some Properties
• The depth of the tree is k. Since
• The number of nodes at the i-th level is
2
2 , loglog
k
n k n
2 2
,for 0 .2
k k i
i k
34
An O(log log n) time algorithm
The Algorithm
• The algorithm proceeds level by level,
starting from the leaves.
• At every level, we compute the maximum of
all the children of an internal node by the O(1)
time algorithm.
• The time complexity is O(log log n) since the
depth of the tree is O(log log n).
35
An O(log log n) time algorithm
Total Work:
• Recall that the O(1) time algorithm needs
O(p2) work for p elements.
• Each node at the i-th level has children.
• So the total work for each node at the i-th
level is .
1
2
2
k i
1
22
( )2
k i
O
36
An O(log log n) time algorithm
Total Work:
• There are nodes at the i-th level.
Hence the total work for the i-th level is:
• For O(log log n) levels, the total work is
O(n log log n) . This is suboptimal.
2 2
2
k k i
1
2 22 2 2
( ) (2 ) ( )2 2
kk i k k i
O O nO
37
Accelerated cascading
• The first algorithm which is based on a binary
tree, is optimal but slow.
• The second algorithm is suboptimal, but very
fast.
• We combine these two algorithms through
the accelerated cascading strategy.
38
Accelerated cascading
• We start with the optimal algorithm until the
size of the problem is reduced to a certain
value.
• Then we use the suboptimal but very fast
algorithm.
39
Accelerated cascading
Phase 1.
• We apply the binary tree algorithm, starting
from the leaves and upto log log log n
levels.
• The number of candidates reduces to
• The total work done so far is O(n) and the
total time is O(log log log n) .
logloglog
2 loglog
.n
n n
n
40
Accelerated cascading
Phase 2.
• In this phase, we use the fast algorithm on
the remaining candidates.
• The total work is .
• The total time is .
• Theorem: Maximum of n elements can be
computed in O(log log n) time and O(n)
work on the Common CRCW PRAM.
( )
loglog
n
n O
n
( loglog ) ( )O n n O n
(loglog ) (loglog )O n O n
41
Two parallel list ranking algorithms
• An O(log n) time and O(n log n) work list
ranking algorithm.
• An O(log n loglog n) time and O(n) work list
ranking algorithm.
42
List ranking
Input: A linked list L of n elements.
L is given in an array S such that the entry S(i)
contains the index of the node which is the
successor of the node i in L.
Output: The distance of each node i from the
end of the list.
43
List ranking
List ranking can be solved in O(n) time
sequentially for a list of length n.
•Hence, a work-optimal parallel algorithm
should do only O(n) work.
44
A simple list ranking algorithm
Output: For each 1 i n, the distance R(i) of
node i from the end of the list.
begin
for 1 i n do in parallel
if S(i) 0 then R(i) := 1
else R(i) := 0
endfor
while S(i) 0 and S(S(i)) 0 do
Set R(i) := R(i) + R(S(i))
Set S(i) := S(S(i))
end
45
A simple list ranking algorithm
• At the start of an iteration of the while loop,
R(i) counts the nodes in a sublist starting at i
(a subset of nodes which are adjacent in the
list).
46
A simple list ranking algorithm
• After the iteration, R(i) counts the nodes in a
sublist of double the size.
• When the while loop terminates, R(i) counts
all the nodes starting from i and until the end
of the list
47
Complexity and model
• The algorithm terminates after O(log n)
iterations of the while loop.
• The work complexity is O(n log n) since we
allocate one processor for each node.
• We need the CREW PRAM model since
several nodes may try to read the same
successor (S) values.
48
Complexity and model
Exercise :
Modify the algorithm to run on the EREW
PRAM with the same time and processor
complexities.
49
The strategy for an optimal
algorithm
• Our aim is to modify the simple algorithm so
that it does optimal O(n) work.
• The best algorithm would be the one which
does O(n) work and takes O(log n) time.
• There is an algorithm meeting these criteria,
however the algorithm and its analysis are
very involved.
50
The strategy for an optimal
algorithm
• We will study an algorithm which does O(n)
work and takes O(log n loglog n) time.
• However, in future we will use the optimal
algorithm for designing other algorithms.
51
The strategy for an optimal
algorithm
1. Shrink the initial list L by removing some of
the nodes.
The modified list should have O(n / log n)
nodes.
2. Apply the pointer jumping technique (the
suboptimal algorithm) on the list with O(n /
log n) nodes.
52
The strategy for an optimal
algorithm
3. Restore the original list and rank all the
nodes removed in Step 1.
The important step is Step1. We need to
choose a subset of nodes for removal.
53
Independent sets
Definition
A set I of nodes is independent if whenever i
I , S(i) I.
The blue nodes form an independent set in this
list
54
Independent sets
• The main task is to pick an independent set
correctly.
• We pick an independent set by first coloring
the nodes of the list by two colors.
55
2-coloring the nodes of a list
Definition: A k-coloring of a graph G is a
mapping: c : V 0,1,…,k - 1 such that
c(i) c(j) if i, j E.
• It is very easy to design an O(n) time
sequential algorithm for 2-coloring the nodes of
a linked list.
56
2-coloring the nodes of a list
• We will assume the following result:
Theorem: A linked list with n nodes can be 2-
colored in O(log n) time and O(n) work.
57
Independent sets
• When we 2-color the nodes of a list,
alternate nodes get the same color.
• Hence, we can remove the nodes of the
same color to reduce the size of the
original list from n to n/2.
58
Independent sets
• However, we need a list of size to run our
pointer jumping algorithm for list ranking.
• If we repeat the process loglog n time, we will
reduce the size of the list to
i.e., to
loglog
2 n
n
log
n
n
log
n
n
59
Preserving the information
• When we reduce the size of the list to ,
we have lost a lot of information because the
removed nodes are no longer present in the
list.
• Hence, we have to put back the removed
nodes in their original positions to correctly
compute the ranks of all the nodes in the list.
log
n
n
60
Preserving the information
• Note that we have removed the nodes in
O(log log n) iterations.
• So, we have to replace the nodes also in
O(log log n) iterations.

More Related Content

PPT
Heaps & priority queues
PPTX
Asymptotic Notations
PPTX
daa-unit-3-greedy method
PPT
multiprocessors and multicomputers
PPTX
Analysis and Design of Algorithms
PPTX
Greedy algorithms
PDF
Algorithms Lecture 2: Analysis of Algorithms I
Heaps & priority queues
Asymptotic Notations
daa-unit-3-greedy method
multiprocessors and multicomputers
Analysis and Design of Algorithms
Greedy algorithms
Algorithms Lecture 2: Analysis of Algorithms I

What's hot (20)

PDF
Parallel Algorithms
PPT
C++ Data Structure PPT.ppt
PPT
ANALYSIS-AND-DESIGN-OF-ALGORITHM.ppt
PPTX
Hill Climbing Algorithm in Artificial Intelligence
DOCX
Quick sort
PPTX
System protection in Operating System
PPTX
Divide and Conquer - Part 1
PDF
Chapter 5: Mapping and Scheduling
PPT
Divide and conquer
PPTX
Data structure - Graph
PPTX
Daa unit 1
PPT
Chapter1.1 Introduction to design and analysis of algorithm.ppt
PPTX
Quick sort
PDF
Lecture 2 role of algorithms in computing
PDF
03 Analysis of Algorithms: Probabilistic Analysis
PPTX
Greedy Algorithm - Knapsack Problem
PPT
Algorithm analysis
PDF
Design & Analysis of Algorithms Lecture Notes
PPT
BackTracking Algorithm: Technique and Examples
PDF
Parallel Algorithms
Parallel Algorithms
C++ Data Structure PPT.ppt
ANALYSIS-AND-DESIGN-OF-ALGORITHM.ppt
Hill Climbing Algorithm in Artificial Intelligence
Quick sort
System protection in Operating System
Divide and Conquer - Part 1
Chapter 5: Mapping and Scheduling
Divide and conquer
Data structure - Graph
Daa unit 1
Chapter1.1 Introduction to design and analysis of algorithm.ppt
Quick sort
Lecture 2 role of algorithms in computing
03 Analysis of Algorithms: Probabilistic Analysis
Greedy Algorithm - Knapsack Problem
Algorithm analysis
Design & Analysis of Algorithms Lecture Notes
BackTracking Algorithm: Technique and Examples
Parallel Algorithms
Ad

Viewers also liked (18)

PPTX
Multithreaded algorithms
PDF
24 Multithreaded Algorithms
PPT
Chuong 05 de quy
PPTX
PDF
Compiler unit 1
DOCX
Parallel searching
PDF
Parallel Algorithms
PDF
What Makes Great Infographics
PDF
Masters of SlideShare
PDF
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
PDF
You Suck At PowerPoint!
PDF
10 Ways to Win at SlideShare SEO & Presentation Optimization
PDF
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
PDF
2015 Upload Campaigns Calendar - SlideShare
PPTX
What to Upload to SlideShare
PDF
How to Make Awesome SlideShares: Tips & Tricks
PDF
Getting Started With SlideShare
Multithreaded algorithms
24 Multithreaded Algorithms
Chuong 05 de quy
Compiler unit 1
Parallel searching
Parallel Algorithms
What Makes Great Infographics
Masters of SlideShare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
You Suck At PowerPoint!
10 Ways to Win at SlideShare SEO & Presentation Optimization
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
2015 Upload Campaigns Calendar - SlideShare
What to Upload to SlideShare
How to Make Awesome SlideShares: Tips & Tricks
Getting Started With SlideShare
Ad

Similar to Parallel Algorithms (20)

PPTX
Design and Analysis of Algorithms Lecture Notes
PDF
Data Structure & Algorithms - Mathematical
PPT
Chap9 slides
PPT
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
PPT
part-1.ppt high performance computing...
PPTX
DAA-Unit1.pptx
PDF
Ch07 linearspacealignment
PPT
introduction to algorithm for beginneer1
PPTX
Unit ii algorithm
PPT
Sorting algorithms
PPTX
streamingalgo88585858585858585pppppp.pptx
PPTX
Module-1.pptxbdjdhcdbejdjhdbchchchchchjcjcjc
PPTX
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
PPTX
RProgrammingassignmenthPPT_05.07.24.pptx
PDF
complexity analysis.pdf
PPTX
Algorithm for the DAA agscsnak javausmagagah
PDF
A MATLAB project on LCR circuits
PDF
Parallel sorting Algorithms
PPTX
02 Introduction to Data Structures & Algorithms.pptx
PPT
ALGORITHM-ANALYSIS.ppt
Design and Analysis of Algorithms Lecture Notes
Data Structure & Algorithms - Mathematical
Chap9 slides
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
part-1.ppt high performance computing...
DAA-Unit1.pptx
Ch07 linearspacealignment
introduction to algorithm for beginneer1
Unit ii algorithm
Sorting algorithms
streamingalgo88585858585858585pppppp.pptx
Module-1.pptxbdjdhcdbejdjhdbchchchchchjcjcjc
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
RProgrammingassignmenthPPT_05.07.24.pptx
complexity analysis.pdf
Algorithm for the DAA agscsnak javausmagagah
A MATLAB project on LCR circuits
Parallel sorting Algorithms
02 Introduction to Data Structures & Algorithms.pptx
ALGORITHM-ANALYSIS.ppt

More from Dr Sandeep Kumar Poonia (20)

PDF
Soft computing
PDF
An improved memetic search in artificial bee colony algorithm
PDF
Modified position update in spider monkey optimization algorithm
PDF
Enhanced local search in artificial bee colony algorithm
PDF
Memetic search in differential evolution algorithm
PDF
Improved onlooker bee phase in artificial bee colony algorithm
PDF
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
PDF
A novel hybrid crossover based abc algorithm
PDF
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
PDF
Sunzip user tool for data reduction using huffman algorithm
PDF
New Local Search Strategy in Artificial Bee Colony Algorithm
PDF
A new approach of program slicing
PDF
Performance evaluation of different routing protocols in wsn using different ...
PDF
Enhanced abc algo for tsp
PDF
Database aggregation using metadata
PDF
Performance evaluation of diff routing protocols in wsn using difft network p...
PDF
PDF
Lecture27 linear programming
Soft computing
An improved memetic search in artificial bee colony algorithm
Modified position update in spider monkey optimization algorithm
Enhanced local search in artificial bee colony algorithm
Memetic search in differential evolution algorithm
Improved onlooker bee phase in artificial bee colony algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
A novel hybrid crossover based abc algorithm
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Sunzip user tool for data reduction using huffman algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm
A new approach of program slicing
Performance evaluation of different routing protocols in wsn using different ...
Enhanced abc algo for tsp
Database aggregation using metadata
Performance evaluation of diff routing protocols in wsn using difft network p...
Lecture27 linear programming

Recently uploaded (20)

PDF
01-Introduction-to-Information-Management.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Business Ethics Teaching Materials for college
PDF
Pre independence Education in Inndia.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Classroom Observation Tools for Teachers
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Cell Structure & Organelles in detailed.
PDF
Insiders guide to clinical Medicine.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
01-Introduction-to-Information-Management.pdf
PPH.pptx obstetrics and gynecology in nursing
Anesthesia in Laparoscopic Surgery in India
Abdominal Access Techniques with Prof. Dr. R K Mishra
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Types and Its function , kingdom of life
Business Ethics Teaching Materials for college
Pre independence Education in Inndia.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Classroom Observation Tools for Teachers
Renaissance Architecture: A Journey from Faith to Humanism
Cell Structure & Organelles in detailed.
Insiders guide to clinical Medicine.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

Parallel Algorithms

  • 2. Page 2 An overview • A parallel merging algorithm • Accelerated Cascading and Parallel List Ranking
  • 3. Page 3 Parallel merging through partitioning The partitioning strategy consists of: • Breaking up the given problem into many independent subproblems of equal size • Solving the subproblems in parallel This is similar to the divide-and-conquer strategy in sequential computing.
  • 4. Page 4 Partitioning and Merging Given a set S with a relation , S is linearly ordered, if for every pair a,b S. • either a b or b a. The merging problem is the following:
  • 5. Page 5 Partitioning and Merging Input: Two sorted arrays A = (a1, a2,..., am) and B = (b1, b2,..., bn) whose elements are drawn from a linearly ordered set. Output: A merged sorted sequence C = (c1, c2,..., cm+n).
  • 6. Page 6 Merging For example, if A = (2,8,11,13,17,20) and B = (3,6,10,15,16,73), the merged sequence C = (2,3,6,8,10,11,13,15,16,17,20,73).
  • 7. Page 7 Merging A sequential algorithm • Simultaneously move two pointers along the two arrays • Write the items in sorted order in another array
  • 8. Page 8 Partitioning and Merging • The complexity of the sequential algorithm is O(m + n). • We will use the partitioning strategy for solving this problem in parallel.
  • 9. Page 9 Partitioning and Merging Definitions: rank(ai : A) is the number of elements in A less than or equal to ai A. rank(bi : A) is the number of elements in A less than or equal to bi B.
  • 10. Page 10 Merging For example, consider the arrays: A = (2,8,11,13,17,20) B = (3,6,10,15,16,73) rank(11 : A) = 3 and rank(11 : B) = 3.
  • 11. Page 11 Merging • The position of an element ai A in the sorted array C is: rank(ai : A) + rank(ai : B). For example, the position of 11 in the sorted array C is: rank(11 : A) + rank(11 : B) = 3 + 3 = 6.
  • 12. Page 12 Parallel Merging • The idea is to decompose the overall merging problem into many smaller merging problems. • When the problem size is sufficiently small, we will use the sequential algorithm.
  • 13. Page 13 Merging • The main task is to generate smaller merging problems such that: • Each sequence in such a smaller problem has O(log m) or O(log n) elements. • Then we can use the sequential algorithm since the time complexity will be O(log m + log n).
  • 14. Page 14 Parallel Merging Step 1. Divide the array B into blocks such that each block has log m elements. Hence there are m/log m blocks. For each block, the last elements are i log m, 1 i m/log m
  • 15. Page 15 Parallel Merging Step 2. We allocate one processor for each last element in B. •For a last element i log m, this processor does a binary search in the array A to determine two elements ak, ak+1 such that ak i log m ak+1. •All the m/log m binary searches are done in parallel and take O(log m) time each.
  • 16. Page 16 Parallel Merging • After the binary searches are over, the array A is divided into m/log m blocks. • There is a one-to-one correspondence between the blocks in A and B. We call a pair of such blocks as matching blocks.
  • 17. Page 17 Parallel Merging • Each block in A is determined in the following way. • Consider the two elements i log m and(i + 1) log m. These are the elements in the (i + 1)-th block of B. • The two elements that determine rank(i log m : A) and rank((i + 1) log m : A) define the matching block in A
  • 18. Page 18 Parallel Merging • These two matching blocks determine a smaller merging problem. • Every element inside a matching block has to be ranked inside the other matching block. • Hence, the problem of merging a pair of matching blocks is an independent subproblem which does not affect any other block.
  • 19. Page 19 Parallel Merging • If the size of each block in A is O(log m), we can directly run the sequential algorithm on every pair of matching blocks from A and B. • Some blocks in A may be larger than O(log m) and hence we have to do some more work to break them into smaller blocks.
  • 20. Page 20 Parallel Merging If a block in Ai is larger than O(log m) and the matching block of Ai is Bj, we do the following •We divide Ai into blocks of size O(log m). •Then we apply the same algorithm to rank the boundary elements of each block in Ai in Bj. •Now each block in A is of size O(log m) •This takes O(log log m) time.
  • 21. Page 21 Parallel Merging Step 3. • We now take every pair of matching blocks from A and B and run the sequential merging algorithm. • One processor is allocated for every matching pair and this processor merges the pair in O(log m) time. We have to analyse the time and processor complexities of each of the steps to get the overall complexities.
  • 22. Page 22 Parallel Merging Complexity of Step 1 • The task in Step 1 is to partition B into blocks of size log m. • We allocate m/log m processors. • Since B is an array, processor Pi, 1 i m/log m can find the element i log m in O(1) time.
  • 23. Page 23 Parallel Merging Complexity of Step 2 • In Step 2, m/log m processors do binary search in array A in O(log n) time each. • Hence the time complexity is O(log n) and the work done is (m log n)/ log m (m log(m + n)) / log m (m + n) for n,m 4. Hence the total work is O(m + n).
  • 24. Page 24 Parallel Merging Complexity of Step 3 • In Step 3, we use m/log m processors • Each processor merges a pair Ai, Bi in O(log m) time.Hence the total work done is m. Theorem Let A and B be two sorted sequences each of length n. A and B can be merged in O(log n) time using O(n) operations in the CREW PRAM.
  • 25. 25 Accelerated Cascading and Parallel List Ranking • We will first discuss a technique called accelerated cascading for designing very fast parallel algorithms. • We will then study a very important technique for ranking the elements of a list in parallel.
  • 26. 26 Fast computation of maximum Input: An array A holding p elements from a linearly ordered universe S. We assume that all the elements in A are distinct. Output: The maximum element from the array A. We use a boolean array M such that M(k)=1 if and only if A(k) is the maximum element in A. Initialization: We allocate p processors to set each entry in M to 1.
  • 27. 27 Fast computation of maximum Step 1: Assign p processors for each element in A, p2 processors overall. •Consider the p processors allocated to A(j). We name these processors as P1, P2,..., Pi,..., Pp. •Pi compares A(j) with A(i) : If A(i) > A(j) then M(j) := 0 else do nothing.
  • 28. 28 Fast computation of maximum Step 2: At the end of Step 1, M(k) , 1 k p will be 1 if and only if A(k) is the maximum element. •We allocate p processors, one for each entry in M. •If the entry is 0, the processor does nothing. •If the entry is 1, it outputs the index k of the maximum element.
  • 29. 29 Fast computation of maximum Complexity: The processor requirement is p2 and the time complexity is O(1). • We need concurrent write facility and hence the Common CRCW PRAM model.
  • 30. 30 Optimal computation of maximum • This is the same algorithm which we used for adding n numbers.
  • 31. 31 Optimal computation of maximum • This algorithm takes O(n) processors and O(log n) time. • We can reduce the processor complexity to O(n / log n). Hence the algorithm does optimal O(n) work.
  • 32. 32 An O(log log n) time algorithm • Instead of a binary tree, we use a more complex tree. Assume that . • The root of the tree has children. • Each node at the i-th level has children for . • Each node at level k has two children. 2 2 k n 1 2 2 k n 1 2 2 k i 0 1i k
  • 33. 33 An O(log log n) time algorithm Some Properties • The depth of the tree is k. Since • The number of nodes at the i-th level is 2 2 , loglog k n k n 2 2 ,for 0 .2 k k i i k
  • 34. 34 An O(log log n) time algorithm The Algorithm • The algorithm proceeds level by level, starting from the leaves. • At every level, we compute the maximum of all the children of an internal node by the O(1) time algorithm. • The time complexity is O(log log n) since the depth of the tree is O(log log n).
  • 35. 35 An O(log log n) time algorithm Total Work: • Recall that the O(1) time algorithm needs O(p2) work for p elements. • Each node at the i-th level has children. • So the total work for each node at the i-th level is . 1 2 2 k i 1 22 ( )2 k i O
  • 36. 36 An O(log log n) time algorithm Total Work: • There are nodes at the i-th level. Hence the total work for the i-th level is: • For O(log log n) levels, the total work is O(n log log n) . This is suboptimal. 2 2 2 k k i 1 2 22 2 2 ( ) (2 ) ( )2 2 kk i k k i O O nO
  • 37. 37 Accelerated cascading • The first algorithm which is based on a binary tree, is optimal but slow. • The second algorithm is suboptimal, but very fast. • We combine these two algorithms through the accelerated cascading strategy.
  • 38. 38 Accelerated cascading • We start with the optimal algorithm until the size of the problem is reduced to a certain value. • Then we use the suboptimal but very fast algorithm.
  • 39. 39 Accelerated cascading Phase 1. • We apply the binary tree algorithm, starting from the leaves and upto log log log n levels. • The number of candidates reduces to • The total work done so far is O(n) and the total time is O(log log log n) . logloglog 2 loglog .n n n n
  • 40. 40 Accelerated cascading Phase 2. • In this phase, we use the fast algorithm on the remaining candidates. • The total work is . • The total time is . • Theorem: Maximum of n elements can be computed in O(log log n) time and O(n) work on the Common CRCW PRAM. ( ) loglog n n O n ( loglog ) ( )O n n O n (loglog ) (loglog )O n O n
  • 41. 41 Two parallel list ranking algorithms • An O(log n) time and O(n log n) work list ranking algorithm. • An O(log n loglog n) time and O(n) work list ranking algorithm.
  • 42. 42 List ranking Input: A linked list L of n elements. L is given in an array S such that the entry S(i) contains the index of the node which is the successor of the node i in L. Output: The distance of each node i from the end of the list.
  • 43. 43 List ranking List ranking can be solved in O(n) time sequentially for a list of length n. •Hence, a work-optimal parallel algorithm should do only O(n) work.
  • 44. 44 A simple list ranking algorithm Output: For each 1 i n, the distance R(i) of node i from the end of the list. begin for 1 i n do in parallel if S(i) 0 then R(i) := 1 else R(i) := 0 endfor while S(i) 0 and S(S(i)) 0 do Set R(i) := R(i) + R(S(i)) Set S(i) := S(S(i)) end
  • 45. 45 A simple list ranking algorithm • At the start of an iteration of the while loop, R(i) counts the nodes in a sublist starting at i (a subset of nodes which are adjacent in the list).
  • 46. 46 A simple list ranking algorithm • After the iteration, R(i) counts the nodes in a sublist of double the size. • When the while loop terminates, R(i) counts all the nodes starting from i and until the end of the list
  • 47. 47 Complexity and model • The algorithm terminates after O(log n) iterations of the while loop. • The work complexity is O(n log n) since we allocate one processor for each node. • We need the CREW PRAM model since several nodes may try to read the same successor (S) values.
  • 48. 48 Complexity and model Exercise : Modify the algorithm to run on the EREW PRAM with the same time and processor complexities.
  • 49. 49 The strategy for an optimal algorithm • Our aim is to modify the simple algorithm so that it does optimal O(n) work. • The best algorithm would be the one which does O(n) work and takes O(log n) time. • There is an algorithm meeting these criteria, however the algorithm and its analysis are very involved.
  • 50. 50 The strategy for an optimal algorithm • We will study an algorithm which does O(n) work and takes O(log n loglog n) time. • However, in future we will use the optimal algorithm for designing other algorithms.
  • 51. 51 The strategy for an optimal algorithm 1. Shrink the initial list L by removing some of the nodes. The modified list should have O(n / log n) nodes. 2. Apply the pointer jumping technique (the suboptimal algorithm) on the list with O(n / log n) nodes.
  • 52. 52 The strategy for an optimal algorithm 3. Restore the original list and rank all the nodes removed in Step 1. The important step is Step1. We need to choose a subset of nodes for removal.
  • 53. 53 Independent sets Definition A set I of nodes is independent if whenever i I , S(i) I. The blue nodes form an independent set in this list
  • 54. 54 Independent sets • The main task is to pick an independent set correctly. • We pick an independent set by first coloring the nodes of the list by two colors.
  • 55. 55 2-coloring the nodes of a list Definition: A k-coloring of a graph G is a mapping: c : V 0,1,…,k - 1 such that c(i) c(j) if i, j E. • It is very easy to design an O(n) time sequential algorithm for 2-coloring the nodes of a linked list.
  • 56. 56 2-coloring the nodes of a list • We will assume the following result: Theorem: A linked list with n nodes can be 2- colored in O(log n) time and O(n) work.
  • 57. 57 Independent sets • When we 2-color the nodes of a list, alternate nodes get the same color. • Hence, we can remove the nodes of the same color to reduce the size of the original list from n to n/2.
  • 58. 58 Independent sets • However, we need a list of size to run our pointer jumping algorithm for list ranking. • If we repeat the process loglog n time, we will reduce the size of the list to i.e., to loglog 2 n n log n n log n n
  • 59. 59 Preserving the information • When we reduce the size of the list to , we have lost a lot of information because the removed nodes are no longer present in the list. • Hence, we have to put back the removed nodes in their original positions to correctly compute the ranks of all the nodes in the list. log n n
  • 60. 60 Preserving the information • Note that we have removed the nodes in O(log log n) iterations. • So, we have to replace the nodes also in O(log log n) iterations.