SlideShare a Scribd company logo
Page 01
Algorithms
1. Introduction
An algorithm is defined as a sequence of computational steps required to
accomplish a specific task. The algorithm works for a given input and will
terminate in a well-defined state. The basic conditions of an algorithm are: input,
output, definiteness, effectiveness and finiteness. The purpose of the developing an
algorithm is to solve a general, well specified problem.
A concern while designing an algorithm also pertains to the kind of computer
on which the algorithm would be executed. The two forms of architectures of
computers are: sequential computer and parallel computer. Therefore, depending
upon the architecture of the computers, we have sequential as well as parallel
algorithms.
The algorithms which are executed on the sequential computers simply
perform according to sequence of steps for solving a given problem. Such
algorithms are known as sequential algorithms.
However, a problem can be solved after dividing it into sub-problems and those
in turn are executed in parallel. Later on, the results of the solutions of these sub
problems can be combined together and the final solution can be achieved. In such
situations, the number of processors required would be more than one and they
would be communicating with each other for producing the final output. This
environment operates on the parallel computer and the special kind of algorithms
called parallel algorithms are designed for these computers. The parallel algorithms
depend on the kind of parallel computer they are designed for. Hence, for a given
problem, there would be a need to design the different kinds of parallel algorithms
depending upon the kind of parallel architecture.
A parallel computer is a set of processors that are able to work cooperatively to
solve a computational problem. This definition is broad enough to include parallel
supercomputers that have hundreds or thousands of processors, networks of
workstations, multi-processor workstations, and embedded systems. The parallel
computers can be represented with the help of various kinds of models such as
random access machine (RAM), parallel random access machine (PRAM),
Interconnection Networks etc. While designing a parallel algorithm, the
Page 02
computational power of various models can be analyzed and compared, parallelism
can be involved for a given problem on a specific model after understanding the
characteristics of a model. The analysis of parallel algorithm on different models
assist in determining the best model for a problem after receiving the results in
terms of the time and space complexity.
2. Analysis of Parallel Algorithms
A generic algorithm is mainly analyzed on the basis of the following
parameters: the time complexity (execution time) and the space complexity
(amount of space required). Usually we give much more importance to time
complexity in comparison with space complexity. The subsequent section
highlights the criteria of analyzing the complexity of parallel algorithms. The
fundamental parameters required for the analysis of parallel algorithms are as
follow:
• Time Complexity
• The Total Number of Processors Required
• The Cost Involved.
2.1 Time Complexity
As it happens, most people who implement algorithms want to know how much
of a particular resource (such as time or storage) is required for a given algorithm.
The parallel architectures have been designed for improving the computation
power of the various algorithms. Thus, the major concern of evaluating an
algorithm is the determination of the amount of time required to execute. Usually,
the time complexity is calculated on the basis of the total number of steps executed
to accomplish the desired output.
The resource consumption in parallel algorithms is both processor cycles on
each processor and also the communication overhead between the processors.
Thus, first in the computation step, the local processor performs an arithmetic
and logic operation. Thereafter, the various processors communicate with each
other for exchanging messages and/or data. Hence, the time complexity can be
calculated on the basis of computational cost and communication cost involved.
The time complexity of an algorithm varies depending upon the instance of the
input for a given problem. For example, the already sorted list (10, 17, 19, 21, 22,
Page 03
and 33) will consume less amount of time than the reverse order of list (33, 22, 21,
19, 17, and 10). The time complexity of an algorithm has been categorized into
three forms:
i) Best Case Complexity;
ii) Average Case Complexity; and
iii) Worst Case Complexity.
The best case complexity is the least amount of time required by the algorithm
for a given input. The average case complexity is the average running time
required by the algorithm for a given input. Similarly, the worst case complexity
can be defined as the maximum amount of time required by the algorithm for a
given input.
The main factors involved for analyzing the time complexity depends upon the:
 Algorithm
 Parallel computer model
 Specific set of inputs.
Mostly the size of the input is a function of time complexity of the algorithm.
2.2 Number of Processors
One of the other factors that assist in analysis of parallel algorithms is the total
number of processors required to deliver a solution to a given problem. Thus, for a
given input of size say n, the number of processors required by the parallel
algorithm is a function of n, usually denoted by TP (n).
2.3 Overall Cost
Finally, the total cost of the algorithm is a product of time complexity of the
parallel algorithm and the total number of processors required for computation.
Cost = Time Complexity * Total Number of Processors
The other form of defining the cost is that it specifies the total number of steps
executed collectively by the n number of processors, i.e., summation of steps.
Another term related with the analysis of the parallel algorithms is efficiency of the
algorithm. It is defined as the ratio of the worst case running time of the best
Page 04
sequential algorithm and the cost of the parallel algorithm. The efficiency would be
mostly less than or equal to 1. In a situation, if efficiency is greater than 1 then it
means that the sequential algorithm is faster than the parallel algorithm.
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 =
𝑊𝑜𝑟𝑠𝑡 𝑐𝑎𝑠𝑒 𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑆𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚
𝐶𝑜𝑠𝑡 𝑜𝑓 𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚
3. Merge Sort Algorithm
First, divide the given sequence of n numbers into two parts, each consisting of
n/2 numbers. Thereafter, recursively split the sequence into two parts until each
number acts as an independent sequence. Consequently, the independent numbers
are first sorted and recursively merged until a sorted sequence of n numbers is not
achieved.
In order to perform the above-mentioned task, there will be two kinds of circuits
which would be used in the following manner: the first one for sorting and another
one for merging the sorted list of numbers.
Odd-Even Merging Circuit
Let us firstly illustrate the concept of merging two sorted sequences using an
odd-even merging circuit. The working of a merging circuit is as follows:
1) Let there be two sorted sequences A=(a1, a2, a3, a4……… am) and B=(b1, b2, b3,
b4……… bm) which are required to be merged.
2) With the help of a merging circuit (m/2,m/2), merge the odd indexed numbers of
the two sub sequences i.e. (a1, a3, a5…… am-1) and (b1, b3, b5……… bm-1) and thus
resulting in sorted sequence (c1, c2, c3……… cm).
3) Thereafter, with the help of a merging circuit (m/2,m/2), merge the even
indexed numbers of the two sub sequences i.e. (a2, a4, a6……… am) and (b2, b4, b6………
bm) and thus resulting in sorted sequence (d1, d2, d3……… dm).
4) The final output sequence O=(o1, o2, o3……… o2m ) is achieved in the following
manner:
Page 05
o1 = a1 and o2m = bm .In general the formula is as given below: o2i = min(ai+1,bi ) and
o2I+1 = max(ai+1,bi ) for i=1,2,3,4……….m-1.
Now, let us take an example for merging the two sorted sequences of length 4,
i.e., A=(a1, a2, a3, a4) and B=(b1, b2, b3, b4). Suppose the numbers of the sequence are
A=(4,6,9,10) and B=(2,7,8,12). The circuit of merging the two given sequences is
illustrated in Figure 7.
Sorting Circuit along with Odd-Even Merging Circuit
As we already know, the merge sort algorithm requires two circuits, i.e. one for
merging and another for sorting the sequences. Therefore, the sorting circuit has
been derived from the above-discussed merging circuit. The basic steps followed
by the circuit are highlighted below:
i) The given input sequence of length n is divided into two sub-sequences of length
n/2 each.
ii) The two sub sequences are recursively sorted.
iii) The two sorted sub sequences are merged (n/2,n/2) using a merging circuit in order
to finally get the sorted sequence of length n.
Page 06
Now, let us take an example for sorting the n numbers say 4,2,10,12,8,7,6,9. The
circuit of sorting + merging given sequence is illustrated in Figure 8.
Analysis of Merge Sort
i) The width of the sorting + merging circuit is equal to the maximum number of
devices required in a stage is O(n/2). As in the above figure the maximum number
of devices for a given stage is 4 which is 8/2 or n/2.
ii) The circuit contains two sorting circuits for sorting sequences of length n/2 and
thereafter one merging circuit for merging of the two sorted sub sequences (see
Page 07
stage 4th
in the above figure). Let the functions Ts and Tm denote the time
complexity of sorting and merging in terms of its depth.
The Ts can be calculated as follows:
Ts(n) =Ts(n/2) + Tm(n/2)
Ts(n) =Ts(n/2) + log(n/2) ,
Therefore, Ts (n) is equal to O(log2
n).
4. Image Processing
Image is the 2 dimensional distributions of the small image points called
pixels. It can be considered as a function of two real variables, for example, f(x,y)
with f as the amplitude (e.g. brightness) of the image at position (x,y). Image
Processing is the process of enhancing the image and extraction of meaningful
information from an image.
Image Processing with parallel computing is an alternative way to solve image
processing problems that require large times of processing or handling large
amounts of information in "acceptable time" (according to each criterion).
The main idea of parallel image processing is to divide the problem into simple
tasks and solve them concurrently, in such a way the total time can be divided
between the total tasks (in the best case). Parallel image processing cannot be
applied to all problems, in other words we can say that not all the problems can be
coded in a parallel form. A parallel program must have some features for a correct
and efficient operation; otherwise, it is possible that runtime or operation does not
have the expected performance. These features include the following:
 Granularity: It’s defined as the number of basic units and it is classified as:
 Coarse-grained: Few tasks of more intense computing.
 Fine-grained: A large number of small parts and less intense computing.
 Synchronization: This prevents the overlap of two or more processes.
 Latency: This is the time transition of information from request to receipt.
Page 08
 Scalability: It’s defined as the ability of an algorithm to maintain its efficiency by
increasing the number of processors and the size of the problem in the same
proportion.
5. Proposed Algorithms for Parallel Image Processing Tasks
We will go through the following sequential image processing algorithms and
develop its parallel versions.
Load Distribution between Processors:
Suppose we are taking the following image (1) as an example. The main step of
these algorithms is to determine the number of tiles to be generated. The number of
tiles corresponds to the amount of threads. If only a thread exists, the computation
is just sequential computation. Otherwise if there are two or more threads then the
image is divided into distinct areas, as shown in Figure 2, 3, 4. Each thread is
responsible for processing the pixels included in its tile and to execute different
tasks but considering maintaining synchronization between all the processor
otherwise there will be the situation of deadlock between processors.
Parallel Segmentation by Region Growing Technique and
Calculation of different Features of Segmented Regions:
Region growing is one of the very famous techniques of segmentation. Main
drawback of the region growing is that it is time consuming. This algorithm is
designed to reduce timing of this algorithm. These are the steps involved in it.
1. Take Input image on client processor.
2. Select the Seed pixels (It is based on some user criterion like pixels in a certain
gray level range, pixels evenly spaced on a grid, etc.) for different regions on
client processor.
3. Count number of seed points and activate same number of kernels.
4. Send copy of image and respective seed pixels on each kernel processor.
5. Regions are then grown from these seed pixels to adjacent depending on pixel
intensity, gray level texture, or color, etc. on each kernel processor for this
Page 09
image information is also important because region grown is based on the
membership criteria.
6. Calculate different features of different ROI (region of interest) on individual
kernel.
7. Send segmented regions and information of ROIs to the client processor and
deactivation of worker kernels.
8. Reconstruct the image on client processor and Display of all the information.
6. Fault Tolerance
To achieve the needed reliability and availability, we need fault-tolerant
computers. They have the ability to tolerate faults by detecting failures, and
isolate defect modules so that the rest of the system can operate correctly.
Page 10
Reliability techniques have also become of increasing interest to general-
purpose computer systems.
Parallel computing can also be applied to the design of fault-tolerance
computer systems, particularly via lockstep systems performing the same
operation in parallel. This provides redundancy in case one component should
fail, and also allows automatic error detection and error correction if the results
differ. These methods can be used to help prevent single event upsets caused by
transient errors. Although additional measures may be required in embedded or
specialized systems, this method can provide a cost effective approach to
achieve n-modular redundancy in commercial off-the-shelf systems.
7. Process of System Recovery from Fault Tolerance
Faults in different parts of parallel system have different importance. Let’s
think about a fault processor, line or switch. The most important is fault on
processor. In this case the processes allocated on this processor have to be
moved to other processor, recovered and initialized one more time. Usually we
can think about that processor memory content is lost after fault appearing, or
lack of access. It is necessary to remove and to redirect all communications
lines going through this process.
Every process of parallel system from the moment when the fault appears till
the end of the recovery is getting a new attribute (fig.1). When processor
element PE failed, then:
• Every process allocated on the processor element PE is called locked
process, main and copy too,
• Every process except locked process, communicating with locked
process is called fault influenced process,
• Every process except locked process, not communicating with locked
process is called free process.
Page 11
The process of system recovery is known. But there is a question how and
who controls recovery of kernel of processor. Control can be either centralized
or decentralized. In case of decentralized control it is necessary to build on the
fact that all kernels dispose the same data, according to which they determine
final processors. Every kernel determines final processors for those locked
processes which have on its processor allocated copies of processes. If the copy
of process is located on more than one processor then the corresponding
processor transmit message about system recovery to other processors where
the other copies are located. Content of the message is about final processor for
the exact copy of process and time mark of begin of recovery. The kernel of the
system after receiving all messages about system recovery compares this time
mark with its own time of recovery. Lately the kernel doesn’t realize any code
reallocation of the relevant processes. In case of equality of time marks can be
decisive by another criterion, like for example identification number of a
processor. There is a question how many copies of processes are enough for
sufficient resistance against faults. In case of active and passive processes it
depends on requested security. One passive copy of the process is sufficient if
we assume, that fault doesn’t appear on two processors occupied by the same
process at the same time or in time of recovery of the system.

More Related Content

PPTX
Parallel sorting algorithm
PDF
Parallel quicksort cz. 1
DOCX
Parallel searching
PPS
PRAM algorithms from deepika
PPTX
PPT
Parallel algorithms
PPTX
Parallel algorithm in linear algebra
PPT
Parallel algorithms
Parallel sorting algorithm
Parallel quicksort cz. 1
Parallel searching
PRAM algorithms from deepika
Parallel algorithms
Parallel algorithm in linear algebra
Parallel algorithms

What's hot (19)

PPTX
Parallel algorithms
PDF
Parallel Algorithms
PPTX
Matrix multiplication
PDF
Parallel sorting Algorithms
PDF
Elementary Parallel Algorithms
PPTX
Optimal Chain Matrix Multiplication Big Data Perspective
PDF
Parallel Algorithms
PDF
Parallel Algorithms
PDF
Parallel Algorithms
PPTX
parallel Merging
PDF
Cost optimal algorithm
PDF
Chapter 4: Parallel Programming Languages
PPT
Parallel algorithms
PPTX
Performance analysis(Time & Space Complexity)
PDF
Lecture 3 insertion sort and complexity analysis
PDF
Parallel Algorithms
PDF
Introduction to Algorithms Complexity Analysis
PPT
Algorithm And analysis Lecture 03& 04-time complexity.
PPT
Matrix transposition
Parallel algorithms
Parallel Algorithms
Matrix multiplication
Parallel sorting Algorithms
Elementary Parallel Algorithms
Optimal Chain Matrix Multiplication Big Data Perspective
Parallel Algorithms
Parallel Algorithms
Parallel Algorithms
parallel Merging
Cost optimal algorithm
Chapter 4: Parallel Programming Languages
Parallel algorithms
Performance analysis(Time & Space Complexity)
Lecture 3 insertion sort and complexity analysis
Parallel Algorithms
Introduction to Algorithms Complexity Analysis
Algorithm And analysis Lecture 03& 04-time complexity.
Matrix transposition
Ad

Viewers also liked (20)

PDF
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
PDF
Parallel Algorithms K – means Clustering
PPTX
Presentación1 dhticflore
PPTX
Landbrugssektoren – kvinders rolle i udviklingslande
DOC
6.3.1
PPTX
Administracion estrategica
PDF
Cannabis Juice Benefits
PPTX
PPTX
Farmad niet opgeloste problemen
PPTX
Knitting boar - Toronto and Boston HUGs - Nov 2012
PDF
SAC como Estrategia Competititva
PDF
IA Een innovatieve toekomst in de bouw. Sessie 1 deel 1: Jan Desmyter Nieuwe ...
DOCX
latoya documents
DOCX
Makalah 1
PDF
20121021 bspapproach tiskin
PPTX
Parallel Hashing Algorithms
PDF
Parallel algorithms for solving linear systems with block-fivediagonal matric...
PPTX
Balanced Scorecard Templates - Version 3
PDF
Implementation of dijsktra’s algorithm in parallel
PPTX
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
Parallel Algorithms K – means Clustering
Presentación1 dhticflore
Landbrugssektoren – kvinders rolle i udviklingslande
6.3.1
Administracion estrategica
Cannabis Juice Benefits
Farmad niet opgeloste problemen
Knitting boar - Toronto and Boston HUGs - Nov 2012
SAC como Estrategia Competititva
IA Een innovatieve toekomst in de bouw. Sessie 1 deel 1: Jan Desmyter Nieuwe ...
latoya documents
Makalah 1
20121021 bspapproach tiskin
Parallel Hashing Algorithms
Parallel algorithms for solving linear systems with block-fivediagonal matric...
Balanced Scorecard Templates - Version 3
Implementation of dijsktra’s algorithm in parallel
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
Ad

Similar to Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance (20)

PPTX
unit 2 hpc.pptx
PPTX
Unit 1.pptx
PPTX
design analysis of algorithmaa unit 1.pptx
PPTX
Module-1.pptxbdjdhcdbejdjhdbchchchchchjcjcjc
PPT
467719275-Lecture-01 designe and alg.ppt
PPT
Lecture 01-2.ppt
PDF
Data Structure & Algorithms - Introduction
PDF
Algorithm Analysis.pdf
PPTX
Unit 1, ADA.pptx
PPT
algorithm_lec_1eregdsgdfgdgdfgdfgdfg.ppt
PPT
Unit II_Searching and Sorting Algorithms.ppt
PPT
Algorithm
PPT
Algorithm
PPTX
Data Structure Algorithm -Algorithm Complexity
PDF
Daa notes 1
PPT
Design and analysis of algorithm in Computer Science
PPSX
Ds03 part i algorithms by jyoti lakhani
PPT
introegthnhhdfhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhppt
PDF
Advanced Datastructures and algorithms CP4151unit1b.pdf
PPTX
Analysis of algorithms
unit 2 hpc.pptx
Unit 1.pptx
design analysis of algorithmaa unit 1.pptx
Module-1.pptxbdjdhcdbejdjhdbchchchchchjcjcjc
467719275-Lecture-01 designe and alg.ppt
Lecture 01-2.ppt
Data Structure & Algorithms - Introduction
Algorithm Analysis.pdf
Unit 1, ADA.pptx
algorithm_lec_1eregdsgdfgdgdfgdfgdfg.ppt
Unit II_Searching and Sorting Algorithms.ppt
Algorithm
Algorithm
Data Structure Algorithm -Algorithm Complexity
Daa notes 1
Design and analysis of algorithm in Computer Science
Ds03 part i algorithms by jyoti lakhani
introegthnhhdfhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhppt
Advanced Datastructures and algorithms CP4151unit1b.pdf
Analysis of algorithms

More from University of Technology - Iraq (19)

PPTX
Security problems.pptx
PPTX
Information System (IS) life cycle.pptx
PPTX
Performance Evaluation for Software Defined Networking (SDN) Based on Adaptiv...
PPT
Ch12 microprocessor interrupts
PPTX
Performance and traffic management for WSNs
PPTX
IP address and Domain name
PDF
Computer science mcq alle books4u
PPTX
Switches and LEDs interface to the 8051 microcontroller
PPTX
Wavelet Transform and DSP Applications
PPTX
SCHEDULING DAGs WITHOUT CONSIDERING COMMUNICATION
PPTX
DSB-SC Demodulation using matlab
PDF
Main memory-2 (ch8,os)
PDF
8086 microprocessor lab manual
PPTX
Security problems.pptx
Information System (IS) life cycle.pptx
Performance Evaluation for Software Defined Networking (SDN) Based on Adaptiv...
Ch12 microprocessor interrupts
Performance and traffic management for WSNs
IP address and Domain name
Computer science mcq alle books4u
Switches and LEDs interface to the 8051 microcontroller
Wavelet Transform and DSP Applications
SCHEDULING DAGs WITHOUT CONSIDERING COMMUNICATION
DSB-SC Demodulation using matlab
Main memory-2 (ch8,os)
8086 microprocessor lab manual

Recently uploaded (20)

PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Business Ethics Teaching Materials for college
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
master seminar digital applications in india
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
RMMM.pdf make it easy to upload and study
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Basic Mud Logging Guide for educational purpose
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial diseases, their pathogenesis and prophylaxis
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
FourierSeries-QuestionsWithAnswers(Part-A).pdf
human mycosis Human fungal infections are called human mycosis..pptx
TR - Agricultural Crops Production NC III.pdf
Business Ethics Teaching Materials for college
Microbial disease of the cardiovascular and lymphatic systems
master seminar digital applications in india
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
RMMM.pdf make it easy to upload and study
Supply Chain Operations Speaking Notes -ICLT Program
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Basic Mud Logging Guide for educational purpose
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Week 4 Term 3 Study Techniques revisited.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx

Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance

  • 1. Page 01 Algorithms 1. Introduction An algorithm is defined as a sequence of computational steps required to accomplish a specific task. The algorithm works for a given input and will terminate in a well-defined state. The basic conditions of an algorithm are: input, output, definiteness, effectiveness and finiteness. The purpose of the developing an algorithm is to solve a general, well specified problem. A concern while designing an algorithm also pertains to the kind of computer on which the algorithm would be executed. The two forms of architectures of computers are: sequential computer and parallel computer. Therefore, depending upon the architecture of the computers, we have sequential as well as parallel algorithms. The algorithms which are executed on the sequential computers simply perform according to sequence of steps for solving a given problem. Such algorithms are known as sequential algorithms. However, a problem can be solved after dividing it into sub-problems and those in turn are executed in parallel. Later on, the results of the solutions of these sub problems can be combined together and the final solution can be achieved. In such situations, the number of processors required would be more than one and they would be communicating with each other for producing the final output. This environment operates on the parallel computer and the special kind of algorithms called parallel algorithms are designed for these computers. The parallel algorithms depend on the kind of parallel computer they are designed for. Hence, for a given problem, there would be a need to design the different kinds of parallel algorithms depending upon the kind of parallel architecture. A parallel computer is a set of processors that are able to work cooperatively to solve a computational problem. This definition is broad enough to include parallel supercomputers that have hundreds or thousands of processors, networks of workstations, multi-processor workstations, and embedded systems. The parallel computers can be represented with the help of various kinds of models such as random access machine (RAM), parallel random access machine (PRAM), Interconnection Networks etc. While designing a parallel algorithm, the
  • 2. Page 02 computational power of various models can be analyzed and compared, parallelism can be involved for a given problem on a specific model after understanding the characteristics of a model. The analysis of parallel algorithm on different models assist in determining the best model for a problem after receiving the results in terms of the time and space complexity. 2. Analysis of Parallel Algorithms A generic algorithm is mainly analyzed on the basis of the following parameters: the time complexity (execution time) and the space complexity (amount of space required). Usually we give much more importance to time complexity in comparison with space complexity. The subsequent section highlights the criteria of analyzing the complexity of parallel algorithms. The fundamental parameters required for the analysis of parallel algorithms are as follow: • Time Complexity • The Total Number of Processors Required • The Cost Involved. 2.1 Time Complexity As it happens, most people who implement algorithms want to know how much of a particular resource (such as time or storage) is required for a given algorithm. The parallel architectures have been designed for improving the computation power of the various algorithms. Thus, the major concern of evaluating an algorithm is the determination of the amount of time required to execute. Usually, the time complexity is calculated on the basis of the total number of steps executed to accomplish the desired output. The resource consumption in parallel algorithms is both processor cycles on each processor and also the communication overhead between the processors. Thus, first in the computation step, the local processor performs an arithmetic and logic operation. Thereafter, the various processors communicate with each other for exchanging messages and/or data. Hence, the time complexity can be calculated on the basis of computational cost and communication cost involved. The time complexity of an algorithm varies depending upon the instance of the input for a given problem. For example, the already sorted list (10, 17, 19, 21, 22,
  • 3. Page 03 and 33) will consume less amount of time than the reverse order of list (33, 22, 21, 19, 17, and 10). The time complexity of an algorithm has been categorized into three forms: i) Best Case Complexity; ii) Average Case Complexity; and iii) Worst Case Complexity. The best case complexity is the least amount of time required by the algorithm for a given input. The average case complexity is the average running time required by the algorithm for a given input. Similarly, the worst case complexity can be defined as the maximum amount of time required by the algorithm for a given input. The main factors involved for analyzing the time complexity depends upon the:  Algorithm  Parallel computer model  Specific set of inputs. Mostly the size of the input is a function of time complexity of the algorithm. 2.2 Number of Processors One of the other factors that assist in analysis of parallel algorithms is the total number of processors required to deliver a solution to a given problem. Thus, for a given input of size say n, the number of processors required by the parallel algorithm is a function of n, usually denoted by TP (n). 2.3 Overall Cost Finally, the total cost of the algorithm is a product of time complexity of the parallel algorithm and the total number of processors required for computation. Cost = Time Complexity * Total Number of Processors The other form of defining the cost is that it specifies the total number of steps executed collectively by the n number of processors, i.e., summation of steps. Another term related with the analysis of the parallel algorithms is efficiency of the algorithm. It is defined as the ratio of the worst case running time of the best
  • 4. Page 04 sequential algorithm and the cost of the parallel algorithm. The efficiency would be mostly less than or equal to 1. In a situation, if efficiency is greater than 1 then it means that the sequential algorithm is faster than the parallel algorithm. 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = 𝑊𝑜𝑟𝑠𝑡 𝑐𝑎𝑠𝑒 𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑆𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝐶𝑜𝑠𝑡 𝑜𝑓 𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 3. Merge Sort Algorithm First, divide the given sequence of n numbers into two parts, each consisting of n/2 numbers. Thereafter, recursively split the sequence into two parts until each number acts as an independent sequence. Consequently, the independent numbers are first sorted and recursively merged until a sorted sequence of n numbers is not achieved. In order to perform the above-mentioned task, there will be two kinds of circuits which would be used in the following manner: the first one for sorting and another one for merging the sorted list of numbers. Odd-Even Merging Circuit Let us firstly illustrate the concept of merging two sorted sequences using an odd-even merging circuit. The working of a merging circuit is as follows: 1) Let there be two sorted sequences A=(a1, a2, a3, a4……… am) and B=(b1, b2, b3, b4……… bm) which are required to be merged. 2) With the help of a merging circuit (m/2,m/2), merge the odd indexed numbers of the two sub sequences i.e. (a1, a3, a5…… am-1) and (b1, b3, b5……… bm-1) and thus resulting in sorted sequence (c1, c2, c3……… cm). 3) Thereafter, with the help of a merging circuit (m/2,m/2), merge the even indexed numbers of the two sub sequences i.e. (a2, a4, a6……… am) and (b2, b4, b6……… bm) and thus resulting in sorted sequence (d1, d2, d3……… dm). 4) The final output sequence O=(o1, o2, o3……… o2m ) is achieved in the following manner:
  • 5. Page 05 o1 = a1 and o2m = bm .In general the formula is as given below: o2i = min(ai+1,bi ) and o2I+1 = max(ai+1,bi ) for i=1,2,3,4……….m-1. Now, let us take an example for merging the two sorted sequences of length 4, i.e., A=(a1, a2, a3, a4) and B=(b1, b2, b3, b4). Suppose the numbers of the sequence are A=(4,6,9,10) and B=(2,7,8,12). The circuit of merging the two given sequences is illustrated in Figure 7. Sorting Circuit along with Odd-Even Merging Circuit As we already know, the merge sort algorithm requires two circuits, i.e. one for merging and another for sorting the sequences. Therefore, the sorting circuit has been derived from the above-discussed merging circuit. The basic steps followed by the circuit are highlighted below: i) The given input sequence of length n is divided into two sub-sequences of length n/2 each. ii) The two sub sequences are recursively sorted. iii) The two sorted sub sequences are merged (n/2,n/2) using a merging circuit in order to finally get the sorted sequence of length n.
  • 6. Page 06 Now, let us take an example for sorting the n numbers say 4,2,10,12,8,7,6,9. The circuit of sorting + merging given sequence is illustrated in Figure 8. Analysis of Merge Sort i) The width of the sorting + merging circuit is equal to the maximum number of devices required in a stage is O(n/2). As in the above figure the maximum number of devices for a given stage is 4 which is 8/2 or n/2. ii) The circuit contains two sorting circuits for sorting sequences of length n/2 and thereafter one merging circuit for merging of the two sorted sub sequences (see
  • 7. Page 07 stage 4th in the above figure). Let the functions Ts and Tm denote the time complexity of sorting and merging in terms of its depth. The Ts can be calculated as follows: Ts(n) =Ts(n/2) + Tm(n/2) Ts(n) =Ts(n/2) + log(n/2) , Therefore, Ts (n) is equal to O(log2 n). 4. Image Processing Image is the 2 dimensional distributions of the small image points called pixels. It can be considered as a function of two real variables, for example, f(x,y) with f as the amplitude (e.g. brightness) of the image at position (x,y). Image Processing is the process of enhancing the image and extraction of meaningful information from an image. Image Processing with parallel computing is an alternative way to solve image processing problems that require large times of processing or handling large amounts of information in "acceptable time" (according to each criterion). The main idea of parallel image processing is to divide the problem into simple tasks and solve them concurrently, in such a way the total time can be divided between the total tasks (in the best case). Parallel image processing cannot be applied to all problems, in other words we can say that not all the problems can be coded in a parallel form. A parallel program must have some features for a correct and efficient operation; otherwise, it is possible that runtime or operation does not have the expected performance. These features include the following:  Granularity: It’s defined as the number of basic units and it is classified as:  Coarse-grained: Few tasks of more intense computing.  Fine-grained: A large number of small parts and less intense computing.  Synchronization: This prevents the overlap of two or more processes.  Latency: This is the time transition of information from request to receipt.
  • 8. Page 08  Scalability: It’s defined as the ability of an algorithm to maintain its efficiency by increasing the number of processors and the size of the problem in the same proportion. 5. Proposed Algorithms for Parallel Image Processing Tasks We will go through the following sequential image processing algorithms and develop its parallel versions. Load Distribution between Processors: Suppose we are taking the following image (1) as an example. The main step of these algorithms is to determine the number of tiles to be generated. The number of tiles corresponds to the amount of threads. If only a thread exists, the computation is just sequential computation. Otherwise if there are two or more threads then the image is divided into distinct areas, as shown in Figure 2, 3, 4. Each thread is responsible for processing the pixels included in its tile and to execute different tasks but considering maintaining synchronization between all the processor otherwise there will be the situation of deadlock between processors. Parallel Segmentation by Region Growing Technique and Calculation of different Features of Segmented Regions: Region growing is one of the very famous techniques of segmentation. Main drawback of the region growing is that it is time consuming. This algorithm is designed to reduce timing of this algorithm. These are the steps involved in it. 1. Take Input image on client processor. 2. Select the Seed pixels (It is based on some user criterion like pixels in a certain gray level range, pixels evenly spaced on a grid, etc.) for different regions on client processor. 3. Count number of seed points and activate same number of kernels. 4. Send copy of image and respective seed pixels on each kernel processor. 5. Regions are then grown from these seed pixels to adjacent depending on pixel intensity, gray level texture, or color, etc. on each kernel processor for this
  • 9. Page 09 image information is also important because region grown is based on the membership criteria. 6. Calculate different features of different ROI (region of interest) on individual kernel. 7. Send segmented regions and information of ROIs to the client processor and deactivation of worker kernels. 8. Reconstruct the image on client processor and Display of all the information. 6. Fault Tolerance To achieve the needed reliability and availability, we need fault-tolerant computers. They have the ability to tolerate faults by detecting failures, and isolate defect modules so that the rest of the system can operate correctly.
  • 10. Page 10 Reliability techniques have also become of increasing interest to general- purpose computer systems. Parallel computing can also be applied to the design of fault-tolerance computer systems, particularly via lockstep systems performing the same operation in parallel. This provides redundancy in case one component should fail, and also allows automatic error detection and error correction if the results differ. These methods can be used to help prevent single event upsets caused by transient errors. Although additional measures may be required in embedded or specialized systems, this method can provide a cost effective approach to achieve n-modular redundancy in commercial off-the-shelf systems. 7. Process of System Recovery from Fault Tolerance Faults in different parts of parallel system have different importance. Let’s think about a fault processor, line or switch. The most important is fault on processor. In this case the processes allocated on this processor have to be moved to other processor, recovered and initialized one more time. Usually we can think about that processor memory content is lost after fault appearing, or lack of access. It is necessary to remove and to redirect all communications lines going through this process. Every process of parallel system from the moment when the fault appears till the end of the recovery is getting a new attribute (fig.1). When processor element PE failed, then: • Every process allocated on the processor element PE is called locked process, main and copy too, • Every process except locked process, communicating with locked process is called fault influenced process, • Every process except locked process, not communicating with locked process is called free process.
  • 11. Page 11 The process of system recovery is known. But there is a question how and who controls recovery of kernel of processor. Control can be either centralized or decentralized. In case of decentralized control it is necessary to build on the fact that all kernels dispose the same data, according to which they determine final processors. Every kernel determines final processors for those locked processes which have on its processor allocated copies of processes. If the copy of process is located on more than one processor then the corresponding processor transmit message about system recovery to other processors where the other copies are located. Content of the message is about final processor for the exact copy of process and time mark of begin of recovery. The kernel of the system after receiving all messages about system recovery compares this time mark with its own time of recovery. Lately the kernel doesn’t realize any code reallocation of the relevant processes. In case of equality of time marks can be decisive by another criterion, like for example identification number of a processor. There is a question how many copies of processes are enough for sufficient resistance against faults. In case of active and passive processes it depends on requested security. One passive copy of the process is sufficient if we assume, that fault doesn’t appear on two processors occupied by the same process at the same time or in time of recovery of the system.