SlideShare a Scribd company logo
Abstract
This poster presents a design for parallel processing of synthetic
aperture radar (SAR) data using multi-core Graphics Processing
Units (GP-GPUs). In this design a two-dimensional image is
reconstructed from received echo pulses and their corresponding
response values. Then the comparative performance of proposed
parallel algorithm implementation is tabulated using different set of
nvidia GPUs i.e. GT, Tesla and Fermi. Analyzing experimentation
results on image of various size 1,024 x 1,024 pixels to 4,096 x
4,096 pixels led us to the conclusion that nvidia Fermi S2050 have
maximum performance throughput .
Introduction
Generating images from Synthetic aperture radar (SAR) video’s
raw data require complex computations that is tabulated in data
sample for RADARSAT-1[1].
In SAR system, fundamental principle is based on matching of
received(Rx) signal phase with the transmitted(Tx) signal phase at
a stable frequency of transmission. Fast Fourier Transform (FFT)
calculation is used to calculate phase of received and transmitted
signal.
The modified algorithm that is presented in the paper [2] is
implemented by dividing the impulse response of Rx/Tx signal into
several blocks of equal length and then computing FFT on each
block. It help to process the dimension of range and azimuth in
parallel.
Since GP-GPU has huge computational capability which process
data in parallel, modified SAR image generation on GPU is
implemented. New proposed parallel algorithm helped to achieve
faster execution speed for range and azimuth dimension.
Algorithm And Method
Data Sample
Implementation
Typical Parallel Algorithm
Proposed Parallel Algorithm
Process distribution to generate SAR
Range Image Generation Timing (1024x1024)
Azimuth Image Generation Timing (1024x1024)
Final SAR Image Generation (1024x1024)
Result - SAR Image Generation
References
[1] Ian G Cumming and Frank H Wong, Digital Processing of Synthetic Aperture Data,
Artech House, Boston, London, Jan 2005.
[2] Bhattacharya C, “Parallel processing of satellite-borne SAR data for accurate and
efficient image formation,” EUSAR 2010, pp. 1046-49, 7-10 June 2010, Aachen,
Germany.
[3] NVIDIA Corporation, Compute Unified Device Architecture (CUDA),
http://developer.nvidia.com/object/cuda.html.
[4] AK Agarwal and et al, “Accelerated SAR image generation on GPGPU platform,”
APSAR, 2011
Multi-core GPU – Fast parallel SAR image generation
Mahesh Khadtare1, Pragati Dharmale2, Prakalp Somwanshi3 & C Bhattacharya4
1 – I2IT, Pune, IN; 2 – SNHU, NH, US; 3 – CRL, Pune, IN; 4 – DIAT, Pune, IN
Parameter Typical Value
Number of range samples 6840
Number of Azimuth samples 4096
Block size (image size) 8192 X 8192
Data size (actual size) 11.27 M samples /sec
Total Computational requirement (range) 1.83 GFLOPS
Total Computational requirement (azimuth) 3.28 GFLOPS
Total load computing requirement 5.11 GFLOPS
NOTE: All calculations in O (GFLOPS)
6840 pixels, 50 km
Range (R)
Azimuth(s)
4096 lines,
3.6 km
RADAR-SAT-1
Data from C-band Sensor
• 50 km Range
• 3.6 km Azimuth
• 7.2 m Ground Range
Resolution
• 5.26 m Azimuth
Resolution
Image (1) Image (2) Image (N)
GPU-PE(1)
Process Image (1) Process Image (2) Process Image (N)
GPU-PE(2) GPU-PE(N)
Image
1
G-PE
1
G-PE
2
G-PE
N
P-Image 1
Image N
G-PE
1
G-PE
2
G-PE
N
P-Image N
Exploit Linearity of Correlation process
1)1(),()(
1
0
 


KiniKnunu
P
i
i
1)1(),()(
1
0
 


KjnjKnyny
Q
j
j
1),()()(
1
0
22
0
1
0
  






KlKnulnylr i
P
j
lK
n
j
Q
j
Perform Block Correlation in Transform domain
     120],[)( *
 KkkYkUIDFTlr jiij
Arrange Correlation vector as output of Matrix
transform
X = A*T
Mathematical Model
Phase-I
Range
Processing
Phase-II
Azimuth
Processing
Input Data Stream Output data stream
Time
Signal
Data
Sensor
RangeFFT
Multiplicationof
Reference
Function
RangeIFFT
Corner-turn
AzimuthFFT
RangeMigration
Multiplicationof
Reference
Function
AzimuthIFFT
Recons-
tructed
Picture
Range Azimuth Range Azimuth
Range Compress Processing Azimuth Compress Processing
The rate of
Execution Time* 34 % 6 % 60 %
*1 Thread,
Conventional Corner Turn
CPU / GPU Core
Details
Time(sec) Speedu
p
AMD Athlon X2 Dual
Core
1 45.337 1x
Fx1800 64 19.50 2.32x
GeForce GT 525M 96 4.150 10.92x
Tesla GeForce GTX 275 192 1.421 31.90x
Fermi S2050 (Multi
GPU)
4x448 0.321 141.21x
0
10
20
30
40
50
AMD
Athlon X2
Dual Core
Fx1800 GeForce
GT 525M
Fermi
S2050
Range Image Processing
0
10
20
30
40
50
60
AMD
Athlon X2
Dual Core
Fx1800 GeForce
GT 525M
Fermi
S2050
Azimuth Image ProcessingCPU / GPU Core
Details
Time(sec) Speedup
AMD Athlon X2 Dual Core 1 50.065 1x
Fx1800 64 36.60 1.368x
GeForce GT 525M 96 8.501 5.884x
Tesla GeForce GTX 275 192 3.725 13.44x
Fermi S2050 (Multi
GPU)
4x448 0.981 51.01x
CPU / GPU Core
Details
Time(sec) Speedup
AMD Athlon X2 Dual Core 1 95.00 1x
Fx1800 64 36.60 2.6x
GeForce GT 525M 96 9.23 10.29x
Tesla GeForce GTX 275 192 4.4563 21.31x
Fermi S2050 (Multi
GPU)
4x448 1.280 74.21x
0
20
40
60
80
100
AMD
Athlon X2
Dual Core
Fx1800 GeForce
GT 525M
Fermi
S2050
SAR Image Processing
Transmit Signal
( M Blocks)
Formation of M*N
Blocks
Received Signal
(N Blocks)
Formation of M*N
Blocks
FFT of M*N Block
vector of Transmit
Signal
FFT of M*N Block vector
of Received Signal
IFFT with Normalization
of correlated M*N blocks of
data
Rearrange the
DATA
Processed O/P Image
contact Name
Mahesh Khadtare: maheshkha@gmail.com
Poster
P5142
Category: Video & Image Processing - Vi02

More Related Content

PDF
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
PDF
Enhanced Human Computer Interaction using hand gesture analysis on GPU
PDF
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
PPTX
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
PPTX
ARPS Architecture 1
PDF
Advanced Image Reconstruction Algorithms in MRIfor ISMRMversion finalll
PPTX
ARPS Architecture
PDF
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
ARPS Architecture 1
Advanced Image Reconstruction Algorithms in MRIfor ISMRMversion finalll
ARPS Architecture
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...

What's hot (20)

PPTX
FPGA Implementation of a GA
PDF
Oc2423022305
PDF
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
PPTX
Parallel implementation of geodesic distance transform with application in su...
PDF
Early Application experiences on Summit
PDF
FPGA FIR filter implementation (Audio signal processing)
PDF
IRJET-ASIC Implementation for SOBEL Accelerator
PDF
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
PPTX
Presnt3
PDF
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
PPTX
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
PPTX
Statistical power consumption analysis and modeling
PDF
Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...
PDF
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
PPTX
Octnews featured article
PDF
C42011318
PDF
Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN
PDF
Implementation of Low Power and Area-Efficient Carry Select Adder
PDF
Real time filter and fusion of multi sensor data for localization of mobile r...
PDF
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
FPGA Implementation of a GA
Oc2423022305
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Parallel implementation of geodesic distance transform with application in su...
Early Application experiences on Summit
FPGA FIR filter implementation (Audio signal processing)
IRJET-ASIC Implementation for SOBEL Accelerator
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
Presnt3
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Statistical power consumption analysis and modeling
Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...
Out-of-core GPU Memory Management for MapReduce-based Large-scale Graph Proce...
Octnews featured article
C42011318
Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN
Implementation of Low Power and Area-Efficient Carry Select Adder
Real time filter and fusion of multi sensor data for localization of mobile r...
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Ad

Similar to Multi-core GPU – Fast parallel SAR image generation (20)

PPT
B Eng Final Year Project Presentation
PDF
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
PDF
Improving initial generations in pso algorithm for transportation network des...
PDF
Performance analysis of transformation and bogdonov chaotic substitution base...
PDF
Implementation performance analysis of cordic
PDF
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
PDF
An35225228
PDF
FPGA DESIGN FOR H.264/AVC ENCODER
PPTX
Mining of time series data base using fuzzy neural information systems
PDF
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
PDF
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
PDF
Cuda project paper
PDF
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
PDF
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
RTF
BDL_project_report
PDF
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
PDF
A fast search algorithm for large
PDF
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
PDF
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
B Eng Final Year Project Presentation
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Improving initial generations in pso algorithm for transportation network des...
Performance analysis of transformation and bogdonov chaotic substitution base...
Implementation performance analysis of cordic
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
An35225228
FPGA DESIGN FOR H.264/AVC ENCODER
Mining of time series data base using fuzzy neural information systems
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Cuda project paper
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
BDL_project_report
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
A fast search algorithm for large
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Ad

Recently uploaded (20)

PDF
Digital Logic Computer Design lecture notes
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
composite construction of structures.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
Digital Logic Computer Design lecture notes
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
R24 SURVEYING LAB MANUAL for civil enggi
composite construction of structures.pdf
Sustainable Sites - Green Building Construction
CYBER-CRIMES AND SECURITY A guide to understanding
Automation-in-Manufacturing-Chapter-Introduction.pdf
Internet of Things (IOT) - A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Embodied AI: Ushering in the Next Era of Intelligent Systems

Multi-core GPU – Fast parallel SAR image generation

  • 1. Abstract This poster presents a design for parallel processing of synthetic aperture radar (SAR) data using multi-core Graphics Processing Units (GP-GPUs). In this design a two-dimensional image is reconstructed from received echo pulses and their corresponding response values. Then the comparative performance of proposed parallel algorithm implementation is tabulated using different set of nvidia GPUs i.e. GT, Tesla and Fermi. Analyzing experimentation results on image of various size 1,024 x 1,024 pixels to 4,096 x 4,096 pixels led us to the conclusion that nvidia Fermi S2050 have maximum performance throughput . Introduction Generating images from Synthetic aperture radar (SAR) video’s raw data require complex computations that is tabulated in data sample for RADARSAT-1[1]. In SAR system, fundamental principle is based on matching of received(Rx) signal phase with the transmitted(Tx) signal phase at a stable frequency of transmission. Fast Fourier Transform (FFT) calculation is used to calculate phase of received and transmitted signal. The modified algorithm that is presented in the paper [2] is implemented by dividing the impulse response of Rx/Tx signal into several blocks of equal length and then computing FFT on each block. It help to process the dimension of range and azimuth in parallel. Since GP-GPU has huge computational capability which process data in parallel, modified SAR image generation on GPU is implemented. New proposed parallel algorithm helped to achieve faster execution speed for range and azimuth dimension. Algorithm And Method Data Sample Implementation Typical Parallel Algorithm Proposed Parallel Algorithm Process distribution to generate SAR Range Image Generation Timing (1024x1024) Azimuth Image Generation Timing (1024x1024) Final SAR Image Generation (1024x1024) Result - SAR Image Generation References [1] Ian G Cumming and Frank H Wong, Digital Processing of Synthetic Aperture Data, Artech House, Boston, London, Jan 2005. [2] Bhattacharya C, “Parallel processing of satellite-borne SAR data for accurate and efficient image formation,” EUSAR 2010, pp. 1046-49, 7-10 June 2010, Aachen, Germany. [3] NVIDIA Corporation, Compute Unified Device Architecture (CUDA), http://developer.nvidia.com/object/cuda.html. [4] AK Agarwal and et al, “Accelerated SAR image generation on GPGPU platform,” APSAR, 2011 Multi-core GPU – Fast parallel SAR image generation Mahesh Khadtare1, Pragati Dharmale2, Prakalp Somwanshi3 & C Bhattacharya4 1 – I2IT, Pune, IN; 2 – SNHU, NH, US; 3 – CRL, Pune, IN; 4 – DIAT, Pune, IN Parameter Typical Value Number of range samples 6840 Number of Azimuth samples 4096 Block size (image size) 8192 X 8192 Data size (actual size) 11.27 M samples /sec Total Computational requirement (range) 1.83 GFLOPS Total Computational requirement (azimuth) 3.28 GFLOPS Total load computing requirement 5.11 GFLOPS NOTE: All calculations in O (GFLOPS) 6840 pixels, 50 km Range (R) Azimuth(s) 4096 lines, 3.6 km RADAR-SAT-1 Data from C-band Sensor • 50 km Range • 3.6 km Azimuth • 7.2 m Ground Range Resolution • 5.26 m Azimuth Resolution Image (1) Image (2) Image (N) GPU-PE(1) Process Image (1) Process Image (2) Process Image (N) GPU-PE(2) GPU-PE(N) Image 1 G-PE 1 G-PE 2 G-PE N P-Image 1 Image N G-PE 1 G-PE 2 G-PE N P-Image N Exploit Linearity of Correlation process 1)1(),()( 1 0     KiniKnunu P i i 1)1(),()( 1 0     KjnjKnyny Q j j 1),()()( 1 0 22 0 1 0          KlKnulnylr i P j lK n j Q j Perform Block Correlation in Transform domain      120],[)( *  KkkYkUIDFTlr jiij Arrange Correlation vector as output of Matrix transform X = A*T Mathematical Model Phase-I Range Processing Phase-II Azimuth Processing Input Data Stream Output data stream Time Signal Data Sensor RangeFFT Multiplicationof Reference Function RangeIFFT Corner-turn AzimuthFFT RangeMigration Multiplicationof Reference Function AzimuthIFFT Recons- tructed Picture Range Azimuth Range Azimuth Range Compress Processing Azimuth Compress Processing The rate of Execution Time* 34 % 6 % 60 % *1 Thread, Conventional Corner Turn CPU / GPU Core Details Time(sec) Speedu p AMD Athlon X2 Dual Core 1 45.337 1x Fx1800 64 19.50 2.32x GeForce GT 525M 96 4.150 10.92x Tesla GeForce GTX 275 192 1.421 31.90x Fermi S2050 (Multi GPU) 4x448 0.321 141.21x 0 10 20 30 40 50 AMD Athlon X2 Dual Core Fx1800 GeForce GT 525M Fermi S2050 Range Image Processing 0 10 20 30 40 50 60 AMD Athlon X2 Dual Core Fx1800 GeForce GT 525M Fermi S2050 Azimuth Image ProcessingCPU / GPU Core Details Time(sec) Speedup AMD Athlon X2 Dual Core 1 50.065 1x Fx1800 64 36.60 1.368x GeForce GT 525M 96 8.501 5.884x Tesla GeForce GTX 275 192 3.725 13.44x Fermi S2050 (Multi GPU) 4x448 0.981 51.01x CPU / GPU Core Details Time(sec) Speedup AMD Athlon X2 Dual Core 1 95.00 1x Fx1800 64 36.60 2.6x GeForce GT 525M 96 9.23 10.29x Tesla GeForce GTX 275 192 4.4563 21.31x Fermi S2050 (Multi GPU) 4x448 1.280 74.21x 0 20 40 60 80 100 AMD Athlon X2 Dual Core Fx1800 GeForce GT 525M Fermi S2050 SAR Image Processing Transmit Signal ( M Blocks) Formation of M*N Blocks Received Signal (N Blocks) Formation of M*N Blocks FFT of M*N Block vector of Transmit Signal FFT of M*N Block vector of Received Signal IFFT with Normalization of correlated M*N blocks of data Rearrange the DATA Processed O/P Image contact Name Mahesh Khadtare: maheshkha@gmail.com Poster P5142 Category: Video & Image Processing - Vi02