SlideShare a Scribd company logo
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 1
Parallel Computing System For Image Intelligent Processing
 ABSTRACT:
In this paper, a parallel computing system for image intelligent processing is
described. This parallel computing system includes two main parts:
A parallel computer and a set of software tools.
The parallel computer is consinicted by a host processor, a SIMD coprocessor, a
stream memory and a controller of this memory. Furthermore, the parallel computer
can be extended by different kinds of organization with the host processors and the
SIMD coprocessors. Programmers write image intelligent processing programs in a C-
like language and a set of software tools maps these programs to code that runs on
the parallel computer. These two parts work together to provide a high performance
for image intelligent processing.
 INTRODUCTION
In this paper, we describe a parallel computing system for image intelligent
processing applications development and show how high performance image
intelligent processing applications are realized on this system. This system includes a
parallel computer and a set of software tools.
 How parallel computing system for image intelligent processing takes place?
Image intelligent processing applications are progmmmed using a C-like language.
This C-like language includes traditional serial statements and data parallel
statements.
The serial statements part as stream programs.
The data parallel statements part as kernel programs.
The stream programs run on the host processor.
The kernel programs run on the SIMD coprocessor.
Image data and kernel programs are transferred from main memory to stream memory.
Finally, image data arrive at coprocessor’s local register file and the kernel programs
arrive at coprocessor’s program memory.
The whole transformation is controlled by the stream memory controller.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 2
 SMID :
Single instruction multiple data (SIMD), as the name suggests, A type
of parallel computingarchitecture that is classified under Flynn's taxonomy. It describes computers
with multiple processing elements that perform the same operation on multiple data points
simultaneously.
Typically this consists of many simple processors, each with a local memoryin which it keeps the
data which it will work on. Each processor simultaneously performs the same instruction on its
local data progressing through the instructions in lock-step, with the instructions issued by the
controller processor. The processors can communicate with each other in order to perform shifts
and other array operations
The number of parallel data elements processed at the same time is dependent on the size of the
elements and the capacity of the data processing resources available. Usually the capacity of the
data processing resources is fixed for a given architecture
Thus, such machines exploit data level parallelism. SIMD is particularly applicable to common
tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most
modern CPU designs include SIMD instructions in order to improve the performance
of multimedia use.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 3
 APPLICATIONS
We want to implement image intelligent processing on this system. Tasks in this kind of
image processing can be divided to three levels. They are :
Low level tasks,
Middle level tasks and
High level tasks.
Low level tasks in image processing are also called preprocessing which is used to
organize data for later image processing. Preprocessing mainly includes image enhancement
and image recovery. In the opinion of computing, preprocessing can be viewed as three
kinds of processes that have single pixel depended processing (such as threshold,
quantification and coding etc.) part pixels processing (such as filtering, convolution etc.)
area pixels processing and whole pixels processing (digital transformation such as
FFT).
All of the above processing has common characteristics as followings: The first is space
inflexibility that is to say the same instruction can be executed on the whole of the image
pixels. We call this characteristic data independence. The second is locality i.e., for each
pixel its processing result only depends on its neighboring pixels. The third is the fixed data
structure i.e., the image processing result is still an image array. From all the above we
conclude that the image preprocessing has obvious data parallel.
The aim of middle level processing is to find out theoutstanding features and the
proper primitive elements from the images. The middle level processing includes image
partition and description of image partition. The middle level processing corresponds to a
transformation from image to feature. So, middle level processing is a mixture of data
parallel and task parallel.
The aim of high level processing is to recognize the content of image. This kind
of processing can be divided into two periods that are symbol description and scene
understanding. From these feature, we can conclude that the high level processing has
obvious task parallel.
When a parallel computer system can fit these typical features in image processing,
the performance of this system needs to be greatly enhanced.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 4
 PARALLEL COMPUTER
The parallel computer includes a host processor, a SIMD coprocessor, a
stream memory and a controller of this stream memory as Fig. 1 illustrated.
Fig 1: parallel computer organization
The parallel computer organization consist of three parts:
The host processor is the main controller of the whole system.
The stream memory is the memory between main memory .
The coprocessor’s local register file. Its function is to enhance memory system’s speed
to match the coprocessor’s requirement.
 Parallel computer organization:
The host processor is the main controller of the whole system. We call the host
processor and the stream memory controller a serial processor. The stream memory
controller manages the data transfer between the stream memory and the memory on serial
processor. The memory management’s function is to enhance memory system’s data
transfer speed to match the coprocessor’s computing requirement.
In Fig. 1, the host processor, the stream memory controller, the stream memory and the
SIMD coprocessor are always on a same chip and always the stream memory is constructed
by register files.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 5
 Parallel Computer Architecture:
Fig. 2: Parallel computer architecture
The parallel computer is a new generation image processing computer. As Fig. 2
illustrated this computer includes a key component which is called SIMD coprocessor.
Always we call the SIMD coprocessor as PE (Processing Element) array.
The host of the parallel computer can be a PC or a workstation.
The FE array connects with host The host processor manages the whole computer.
The host transfers kernel source programs and data to the PE array’s program
memory and stream memory.
Then the kemel programs are started on the PE array.
Finally, the computing results are fetched from stream memory to the host for
further analysis. T
he host also does initialization, diagnosis and testing of the PE array. PE array is
suitable to do matrix computing, so we can view the PE array as an accelerator which
matches matrix computing.
The PE array is constructed by many processing elements in a regular array. The
PE array includes NxN PEs. Each PE includes ALU, local register file, buffer and router
which does data communication. All of the PEs are connected by a conventional ring grirL
The number of the PEs is decided by the computing requirement and the VLSI realization
level.Because the PE anay is regular and each FE is simple, the PE array chip can be
reabzed in small size. Furthermore, with many PE array chips we can get a parallel
computer.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 6
The parallel computer provides a three levels memory hierarchy. This three levels
memory is the:
local register files of PE’s,
the stream memory and
The main memory.
This hierarchy is used to exploit the parallelism and locality of image applications.
The architecture of the parallel computer meets the computation and bandwidth
demands of image applications by directly processing the naturally occurring data
streams within image applications. The PE array is designed to be a coprocessor that
operates on image data streams. Many PEs can provide high performance for real-time
image processing.
The stream memory effectively isolates the ALUs from the main memory, making the
PE array a load/store architecture for data streams. All data stream operations transfer
data streams to or from the stream memory. For instance, the PE array transfers data
streams directly out of and into the stream memory, isolating PE array transfers from main
memory accesses and computation. This simplifies the design of the processor. Meanwhile
the stream memory and the PE array can works in pipeline. So data streams transfer speed
is promoted by this stream memory.
 Parallel computer extension:
The above parallel computer is a basic image processing computer. The host
processor is a SISD computer. The coprocessor is a SIMD on a serial computer which can be
an 8 PEs chip or a 16 PEs chip. We can construct 8 PE chips or 16 PE chip on a board. In this
kind of organization we can construct a 64 PEs or 256 PEs in one parallel computer.
Furthermore, the 256 PEs on a board (for example a PCI board) can also be constructed in
the number of 8, 16 or more to mount on a computer main board. In this kind of
extension for the parallel computer we can get a parallel computer which can match the
Image processmg application's performance requirem ents.
Meanwhile, in the application section we know that image processing application has
data parallelism and task parallelism characteristics. In order to construct a parallel computing
system to match the need of different parallelism for image processing, we can use
the basic image processing computers to construct an intelligent parallel computer for
image intelligent processing. For example, use cable to connect several basic parallel
computers together we can construct a real task parallel computer which can match three
kinds of parallelism for image intelligent processing.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 7
In a word, the extension of the parallel computer can be realized in four ways:
1. Extend the number of pes on a chip.
2. Extend the number of PE chips on a board.
3 . Extend the number of boards on a computer main board.
4. Extend the number of computer main boards.
 SOFTWARE
The basic image processing computer decomposes an application into a series of kernel
programs that operate on image data. We call this kind of image data streams. Streams
are sets of sequential data records that lend themselves to high-performance computation
through their regular and predictable structure. In simple words stream is a data
collection that could be operated in parallel.
 Stream programs are written in the SISD statement ofa C-like language.
 The kernel programs are written in the SIMD statement of the C-like language.

The use of these two parts in the C-like language for image processing allows the
programmer to use the familiar high-level C language's structure. The two-level
programming system reflects the division of programs on the basic image processing
computer:
The kernel programs runing on the SIMD coprocessor, while the stream program runing
on the serial processor corresponds to a high-level description of the image processmg.
Fig. 3: Software partitions
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 8
There are several program parts work together to execute an image application, as
illustrated in Fig. 3.
Stream programs run on the serial processor, call the kernel programs and initiate the
transfer of data streams. Stream programs instruct the SIMD coprocessor to run kernel
programs and then send out the resulting data.
 Stream Program And Stream Scheduler:
Stream programs are written in the C-like language with calls into kernel programs.
Kernel program calls are treated as functions that take input and output streams as
arguments. The stream scheduler allocates space in serial processor's memory and in
stream memory used to store streams that are passed between kernel programs. The
scheduler determines when to place streams in the stream memory and when to
move them to and from serial processor's memory to take advantage of producer-
consumer locality between different kernel programs. Efficient stream programs
minimize traffic between the stream memory and the serial processor's memory. When the
image data is larger than the stream memory, the scheduler breaks the large stream into
smaller batches. One batch is loaded into stream memory at a time. The kernel program
executes on this batch before computation begins on the next rows. In this way, raw image
data arrives in the stream memory and passes between kernel programs through
temporary storage in the stream memory. Expensive memory traffic is encountered only when
final results are written out.
To ensure high resource utilization, the stream scheduler performs software
pipelining at the stream level, where kernel programs are executed concurrently with memory
operations.
 Kernel Program:
Kernel programs are user specified programs which are executed over the set of input
streams to produce elements onto the set of output streams. Kernel programs operate
implicitly over the entire set of input streams. The kernel will execute once for each
element in the input stream. The run-time will guarantee that all input streams are the
same length. If a kernel is called with input streams of differing lengths, the kernel will
immediately fault.,producing no output. Kernel programs are declared similar to standard C-
functions. An example kernel is shown below:
void kernel foo (float a <>, out float b <>, float p)
{
b = a+p;
}
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 9
The parameter 'a' is the input stream,
'b' is the output stream,
'p' is a constant parameter.
 The body of the kernel foo will be executed for each input element of 'a' in
parallelism, producing a new stream 'b', which contains a set floats which are p larger
than the corresponding 'a' elements.
 The keyword kernel is a function descriptor used to descript that this is a kernel
function, not a normal C function.
 Kernel functions are called in the same method as normal C function however
the stream arguments should be declared streams.
Programmers can distribute several kernels on different PE array chip, so we
can get several kernels in parallel. This kind of parallelism can match task parallelism of
image intelligent processing.
 Parallel Pes Communication Function:
The permute operation is used to explicitly interchanges values between PEs.
The permute operation looks like permute (PE-permutation, expression).The permute
operation takes a constant describing the permutation and the value to be
permuted as arguments. The permutation is an integer in which the nth nibble specifies the
index of the PE from which the nth PE gets the value of the permuted variable. For
instance, "permute (Ox65432107,x)", rotates the value ofx in each of the PEs to the left by
specifying that PE 7 gets the value from PE 6 and so on.
 System Working Process:
Two different types of programs are compiled and loaded on the basic image
processing parallel system. Both stream programs and the stream scheduler are compiled
for the serial processor with the C-like language. On boot, the serial processor reads a
small boot program from boot RAlv1and executes the full stream programs. Also, the
compiler compiles kernel programs for the SIMD coprocessor. The stream scheduler
loads compiled kernel programs into the SIMD coprocessor's microcode store.
For the extension parallel computer, the programmer needs to divide the application
into task parallelism in advance. The application programmer divides image processing
application in task parallelism and data parallelism in the view of image processing's three
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 10
level processmg. For the task parallelism programs, communication can be realized in the
function of MPI tools.
 CONCLUSION
The research on parallel computing system for the image intelligent processing is
popular. There are some research parts for parallel computing system for image intelligent
processing being discussed in this article. The system has two functions.
1. the system can be used as an evaluation platform for parallel system of image
processmg.
2. the system can be used as a development and debugging platform for
Image processing applications.
Through several kinds of image applications' analysis and evaluation we can get some
suggestions to improve the parallel computer's architecture which could help us
get new parallel system for image intelligent processing with higher speed and higher
performance.
 FUTURE WORKS
In Fig. 1, we get a serial processor that is a SISD processor. The coprocessor is a
SIMD processor. When we deploy the SISD processor and the SIMDcoprocessor on a
board or a chip, we get a single computer system. When several this kind of boards
or chips organize together, then setting another host processor as a management node we
can get a total MIJvlD processor which supports data parallelism, task parallelism and
pipeline parallelism. Then the image processing will be greatly promoted. Also we need
new software technology to support these kinds of parallelism.
There are many multi core processors. The parallel computer is a kind of multi
core processor. Many researches about multi core processor's software components are
doing now, such as runtime system, programming language and compiler. These are still a
challenge in the parallel computing area.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 11

More Related Content

PPTX
Unit 5-lecture-2
PPTX
Computer processing
PPT
multiprocessors and multicomputers
PPTX
Memory organization (Computer architecture)
PPTX
PDF
Computer Memory
PPTX
Computer Architecture
DOC
Unit 5-lecture-2
Computer processing
multiprocessors and multicomputers
Memory organization (Computer architecture)
Computer Memory
Computer Architecture

What's hot (20)

PPTX
Computer System Architecture
PPTX
Memory Organization
PPT
Coa module1
PDF
Unit 4-input-output organization
PPTX
Memory organization
PPT
POLITEKNIK MALAYSIA
PDF
Basic+machine+organization
DOC
POLITEKNIK MALAYSIA
PDF
Computer organisation -morris mano
PPTX
Chap2 comp architecture
PPT
Unit 5-lecture 5
PPTX
Computer architecture and organization
PPT
PPTX
memory Interleaving and low order interleaving and high interleaving
PPTX
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
PDF
Computer Oraganisation and Architecture
PDF
Computer Organization Lecture Notes
PPT
Ntroduction to computer architecture and organization
DOCX
Bindura university of science education
PPTX
Computer System Architecture
Memory Organization
Coa module1
Unit 4-input-output organization
Memory organization
POLITEKNIK MALAYSIA
Basic+machine+organization
POLITEKNIK MALAYSIA
Computer organisation -morris mano
Chap2 comp architecture
Unit 5-lecture 5
Computer architecture and organization
memory Interleaving and low order interleaving and high interleaving
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Computer Oraganisation and Architecture
Computer Organization Lecture Notes
Ntroduction to computer architecture and organization
Bindura university of science education
Ad

Similar to parallel processing (20)

PPT
Multilevel arch & str org.& mips, 8086, memory
PDF
Computer science class 11th, kseeb notes
PDF
Cliff sugerman
PPTX
1.1 Introduction.pptx about the design thinking of the engineering students
PPT
Introduction to the operating and its types
PPTX
IntroductiontoInformationsecurity11.pptx
PPT
Module-1.ppt............notes of coa 1st module
PDF
COMPUTER COMPONENTS
PPT
Par com
PPTX
System components
PPTX
Computer.pptx
DOC
How a cpu works1
DOC
How a cpu works1
PPTX
Unit 5 Memory management in OS Unit 5 Memory management in OS
PPTX
Learn about computer hardware and software
PDF
System Software ( Os )
PDF
Unit 1 - Basic Computer Engineering - www.rgpvnotes.in (1).pdf
DOCX
Pankaj kumar
PPTX
Digital Computer
Multilevel arch & str org.& mips, 8086, memory
Computer science class 11th, kseeb notes
Cliff sugerman
1.1 Introduction.pptx about the design thinking of the engineering students
Introduction to the operating and its types
IntroductiontoInformationsecurity11.pptx
Module-1.ppt............notes of coa 1st module
COMPUTER COMPONENTS
Par com
System components
Computer.pptx
How a cpu works1
How a cpu works1
Unit 5 Memory management in OS Unit 5 Memory management in OS
Learn about computer hardware and software
System Software ( Os )
Unit 1 - Basic Computer Engineering - www.rgpvnotes.in (1).pdf
Pankaj kumar
Digital Computer
Ad

Recently uploaded (20)

PPTX
Institutional Correction lecture only . . .
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Lesson notes of climatology university.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Institutional Correction lecture only . . .
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Computing-Curriculum for Schools in Ghana
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
01-Introduction-to-Information-Management.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Complications of Minimal Access Surgery at WLH
O7-L3 Supply Chain Operations - ICLT Program
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
A systematic review of self-coping strategies used by university students to ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Lesson notes of climatology university.
Abdominal Access Techniques with Prof. Dr. R K Mishra
202450812 BayCHI UCSC-SV 20250812 v17.pptx

parallel processing

  • 1. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 1 Parallel Computing System For Image Intelligent Processing  ABSTRACT: In this paper, a parallel computing system for image intelligent processing is described. This parallel computing system includes two main parts: A parallel computer and a set of software tools. The parallel computer is consinicted by a host processor, a SIMD coprocessor, a stream memory and a controller of this memory. Furthermore, the parallel computer can be extended by different kinds of organization with the host processors and the SIMD coprocessors. Programmers write image intelligent processing programs in a C- like language and a set of software tools maps these programs to code that runs on the parallel computer. These two parts work together to provide a high performance for image intelligent processing.  INTRODUCTION In this paper, we describe a parallel computing system for image intelligent processing applications development and show how high performance image intelligent processing applications are realized on this system. This system includes a parallel computer and a set of software tools.  How parallel computing system for image intelligent processing takes place? Image intelligent processing applications are progmmmed using a C-like language. This C-like language includes traditional serial statements and data parallel statements. The serial statements part as stream programs. The data parallel statements part as kernel programs. The stream programs run on the host processor. The kernel programs run on the SIMD coprocessor. Image data and kernel programs are transferred from main memory to stream memory. Finally, image data arrive at coprocessor’s local register file and the kernel programs arrive at coprocessor’s program memory. The whole transformation is controlled by the stream memory controller.
  • 2. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 2  SMID : Single instruction multiple data (SIMD), as the name suggests, A type of parallel computingarchitecture that is classified under Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Typically this consists of many simple processors, each with a local memoryin which it keeps the data which it will work on. Each processor simultaneously performs the same instruction on its local data progressing through the instructions in lock-step, with the instructions issued by the controller processor. The processors can communicate with each other in order to perform shifts and other array operations The number of parallel data elements processed at the same time is dependent on the size of the elements and the capacity of the data processing resources available. Usually the capacity of the data processing resources is fixed for a given architecture Thus, such machines exploit data level parallelism. SIMD is particularly applicable to common tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions in order to improve the performance of multimedia use.
  • 3. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 3  APPLICATIONS We want to implement image intelligent processing on this system. Tasks in this kind of image processing can be divided to three levels. They are : Low level tasks, Middle level tasks and High level tasks. Low level tasks in image processing are also called preprocessing which is used to organize data for later image processing. Preprocessing mainly includes image enhancement and image recovery. In the opinion of computing, preprocessing can be viewed as three kinds of processes that have single pixel depended processing (such as threshold, quantification and coding etc.) part pixels processing (such as filtering, convolution etc.) area pixels processing and whole pixels processing (digital transformation such as FFT). All of the above processing has common characteristics as followings: The first is space inflexibility that is to say the same instruction can be executed on the whole of the image pixels. We call this characteristic data independence. The second is locality i.e., for each pixel its processing result only depends on its neighboring pixels. The third is the fixed data structure i.e., the image processing result is still an image array. From all the above we conclude that the image preprocessing has obvious data parallel. The aim of middle level processing is to find out theoutstanding features and the proper primitive elements from the images. The middle level processing includes image partition and description of image partition. The middle level processing corresponds to a transformation from image to feature. So, middle level processing is a mixture of data parallel and task parallel. The aim of high level processing is to recognize the content of image. This kind of processing can be divided into two periods that are symbol description and scene understanding. From these feature, we can conclude that the high level processing has obvious task parallel. When a parallel computer system can fit these typical features in image processing, the performance of this system needs to be greatly enhanced.
  • 4. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 4  PARALLEL COMPUTER The parallel computer includes a host processor, a SIMD coprocessor, a stream memory and a controller of this stream memory as Fig. 1 illustrated. Fig 1: parallel computer organization The parallel computer organization consist of three parts: The host processor is the main controller of the whole system. The stream memory is the memory between main memory . The coprocessor’s local register file. Its function is to enhance memory system’s speed to match the coprocessor’s requirement.  Parallel computer organization: The host processor is the main controller of the whole system. We call the host processor and the stream memory controller a serial processor. The stream memory controller manages the data transfer between the stream memory and the memory on serial processor. The memory management’s function is to enhance memory system’s data transfer speed to match the coprocessor’s computing requirement. In Fig. 1, the host processor, the stream memory controller, the stream memory and the SIMD coprocessor are always on a same chip and always the stream memory is constructed by register files.
  • 5. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 5  Parallel Computer Architecture: Fig. 2: Parallel computer architecture The parallel computer is a new generation image processing computer. As Fig. 2 illustrated this computer includes a key component which is called SIMD coprocessor. Always we call the SIMD coprocessor as PE (Processing Element) array. The host of the parallel computer can be a PC or a workstation. The FE array connects with host The host processor manages the whole computer. The host transfers kernel source programs and data to the PE array’s program memory and stream memory. Then the kemel programs are started on the PE array. Finally, the computing results are fetched from stream memory to the host for further analysis. T he host also does initialization, diagnosis and testing of the PE array. PE array is suitable to do matrix computing, so we can view the PE array as an accelerator which matches matrix computing. The PE array is constructed by many processing elements in a regular array. The PE array includes NxN PEs. Each PE includes ALU, local register file, buffer and router which does data communication. All of the PEs are connected by a conventional ring grirL The number of the PEs is decided by the computing requirement and the VLSI realization level.Because the PE anay is regular and each FE is simple, the PE array chip can be reabzed in small size. Furthermore, with many PE array chips we can get a parallel computer.
  • 6. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 6 The parallel computer provides a three levels memory hierarchy. This three levels memory is the: local register files of PE’s, the stream memory and The main memory. This hierarchy is used to exploit the parallelism and locality of image applications. The architecture of the parallel computer meets the computation and bandwidth demands of image applications by directly processing the naturally occurring data streams within image applications. The PE array is designed to be a coprocessor that operates on image data streams. Many PEs can provide high performance for real-time image processing. The stream memory effectively isolates the ALUs from the main memory, making the PE array a load/store architecture for data streams. All data stream operations transfer data streams to or from the stream memory. For instance, the PE array transfers data streams directly out of and into the stream memory, isolating PE array transfers from main memory accesses and computation. This simplifies the design of the processor. Meanwhile the stream memory and the PE array can works in pipeline. So data streams transfer speed is promoted by this stream memory.  Parallel computer extension: The above parallel computer is a basic image processing computer. The host processor is a SISD computer. The coprocessor is a SIMD on a serial computer which can be an 8 PEs chip or a 16 PEs chip. We can construct 8 PE chips or 16 PE chip on a board. In this kind of organization we can construct a 64 PEs or 256 PEs in one parallel computer. Furthermore, the 256 PEs on a board (for example a PCI board) can also be constructed in the number of 8, 16 or more to mount on a computer main board. In this kind of extension for the parallel computer we can get a parallel computer which can match the Image processmg application's performance requirem ents. Meanwhile, in the application section we know that image processing application has data parallelism and task parallelism characteristics. In order to construct a parallel computing system to match the need of different parallelism for image processing, we can use the basic image processing computers to construct an intelligent parallel computer for image intelligent processing. For example, use cable to connect several basic parallel computers together we can construct a real task parallel computer which can match three kinds of parallelism for image intelligent processing.
  • 7. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 7 In a word, the extension of the parallel computer can be realized in four ways: 1. Extend the number of pes on a chip. 2. Extend the number of PE chips on a board. 3 . Extend the number of boards on a computer main board. 4. Extend the number of computer main boards.  SOFTWARE The basic image processing computer decomposes an application into a series of kernel programs that operate on image data. We call this kind of image data streams. Streams are sets of sequential data records that lend themselves to high-performance computation through their regular and predictable structure. In simple words stream is a data collection that could be operated in parallel.  Stream programs are written in the SISD statement ofa C-like language.  The kernel programs are written in the SIMD statement of the C-like language.  The use of these two parts in the C-like language for image processing allows the programmer to use the familiar high-level C language's structure. The two-level programming system reflects the division of programs on the basic image processing computer: The kernel programs runing on the SIMD coprocessor, while the stream program runing on the serial processor corresponds to a high-level description of the image processmg. Fig. 3: Software partitions
  • 8. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 8 There are several program parts work together to execute an image application, as illustrated in Fig. 3. Stream programs run on the serial processor, call the kernel programs and initiate the transfer of data streams. Stream programs instruct the SIMD coprocessor to run kernel programs and then send out the resulting data.  Stream Program And Stream Scheduler: Stream programs are written in the C-like language with calls into kernel programs. Kernel program calls are treated as functions that take input and output streams as arguments. The stream scheduler allocates space in serial processor's memory and in stream memory used to store streams that are passed between kernel programs. The scheduler determines when to place streams in the stream memory and when to move them to and from serial processor's memory to take advantage of producer- consumer locality between different kernel programs. Efficient stream programs minimize traffic between the stream memory and the serial processor's memory. When the image data is larger than the stream memory, the scheduler breaks the large stream into smaller batches. One batch is loaded into stream memory at a time. The kernel program executes on this batch before computation begins on the next rows. In this way, raw image data arrives in the stream memory and passes between kernel programs through temporary storage in the stream memory. Expensive memory traffic is encountered only when final results are written out. To ensure high resource utilization, the stream scheduler performs software pipelining at the stream level, where kernel programs are executed concurrently with memory operations.  Kernel Program: Kernel programs are user specified programs which are executed over the set of input streams to produce elements onto the set of output streams. Kernel programs operate implicitly over the entire set of input streams. The kernel will execute once for each element in the input stream. The run-time will guarantee that all input streams are the same length. If a kernel is called with input streams of differing lengths, the kernel will immediately fault.,producing no output. Kernel programs are declared similar to standard C- functions. An example kernel is shown below: void kernel foo (float a <>, out float b <>, float p) { b = a+p; }
  • 9. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 9 The parameter 'a' is the input stream, 'b' is the output stream, 'p' is a constant parameter.  The body of the kernel foo will be executed for each input element of 'a' in parallelism, producing a new stream 'b', which contains a set floats which are p larger than the corresponding 'a' elements.  The keyword kernel is a function descriptor used to descript that this is a kernel function, not a normal C function.  Kernel functions are called in the same method as normal C function however the stream arguments should be declared streams. Programmers can distribute several kernels on different PE array chip, so we can get several kernels in parallel. This kind of parallelism can match task parallelism of image intelligent processing.  Parallel Pes Communication Function: The permute operation is used to explicitly interchanges values between PEs. The permute operation looks like permute (PE-permutation, expression).The permute operation takes a constant describing the permutation and the value to be permuted as arguments. The permutation is an integer in which the nth nibble specifies the index of the PE from which the nth PE gets the value of the permuted variable. For instance, "permute (Ox65432107,x)", rotates the value ofx in each of the PEs to the left by specifying that PE 7 gets the value from PE 6 and so on.  System Working Process: Two different types of programs are compiled and loaded on the basic image processing parallel system. Both stream programs and the stream scheduler are compiled for the serial processor with the C-like language. On boot, the serial processor reads a small boot program from boot RAlv1and executes the full stream programs. Also, the compiler compiles kernel programs for the SIMD coprocessor. The stream scheduler loads compiled kernel programs into the SIMD coprocessor's microcode store. For the extension parallel computer, the programmer needs to divide the application into task parallelism in advance. The application programmer divides image processing application in task parallelism and data parallelism in the view of image processing's three
  • 10. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 10 level processmg. For the task parallelism programs, communication can be realized in the function of MPI tools.  CONCLUSION The research on parallel computing system for the image intelligent processing is popular. There are some research parts for parallel computing system for image intelligent processing being discussed in this article. The system has two functions. 1. the system can be used as an evaluation platform for parallel system of image processmg. 2. the system can be used as a development and debugging platform for Image processing applications. Through several kinds of image applications' analysis and evaluation we can get some suggestions to improve the parallel computer's architecture which could help us get new parallel system for image intelligent processing with higher speed and higher performance.  FUTURE WORKS In Fig. 1, we get a serial processor that is a SISD processor. The coprocessor is a SIMD processor. When we deploy the SISD processor and the SIMDcoprocessor on a board or a chip, we get a single computer system. When several this kind of boards or chips organize together, then setting another host processor as a management node we can get a total MIJvlD processor which supports data parallelism, task parallelism and pipeline parallelism. Then the image processing will be greatly promoted. Also we need new software technology to support these kinds of parallelism. There are many multi core processors. The parallel computer is a kind of multi core processor. Many researches about multi core processor's software components are doing now, such as runtime system, programming language and compiler. These are still a challenge in the parallel computing area.
  • 11. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 11