ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

1Aggregation of Parallel Computing and Hardware/Software Co-Design Techniques for High-Performance Remote Sensing ApplicationsPresenter: Dr. Alejandro Castillo Atoche2011/07/25IGARSS’11School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

2OutlineIntroductionPrevious WorkHW/SW Co-designMethodology

Case Study: DEDR-related RSF/RASF AlgorithmsSystolic Architectures (SAs) as Co-processorsIntegration in a Co-design schemeNew design Perspective: Super-Systolic Arrays and VLSI architecturesHardware Implementation ResultsPerformance AnalysisConclusionsSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

3Introduction: Radar Imagery, FactsThe advanced high resolution operations of remote sensing (RS) are computationally complex. The recently development remote sensing (RS) image reconstruction/ enhancement techniques are definitively unacceptable for a (near) real time implementation.In previous works, the algorithms were implemented in conventional simulations in Personal Computers (normally MATLAB), in Digital Signal Processing (DSP) platforms or in Clusters of PCs. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

4Introduction: HW/SW co-design, FactsWhy Hardware/software (HW/SW) co-design? The HW/SW co-design is a hybrid method aimed to increase the flexibility of the implementations and improvement of the overall design process.Why Systolic Arrays? Extremely fast. Easily scalable architecture.Why Parallel Techniques? Optimize and improve the performance of the loops that generally take most of the time in RS algorithms.School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

5MOTIVATIONFirst, novel RS imaging applications require now a response in (near) real time in areas such as: target detection for military purpose, tracking wildfires, and monitoring oil spills, etc. Also, in previous works, virtual remote sensing laboratories had been developed. Now, we are intended to design efficient HW architectures pursuing the real time mode.School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

6CONTRIBUTIONS: First, the application of parallel computing techniques using loop optimization transformations generates efficient super-systolic arrays (SSAs)-based co-processors units of the selected reconstructive SP subtasks.Second, the addressed HW/SW co-design methodology is aimed at an efficient HW implementation of the enhancement/reconstruction regularization methods using the proposed SSA-based co-processor architectures. School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

7HW/SW Co-design: MethodologyThe proposed co-design methodology encompasses the following general stages: (i) Algorithmic implementation of the DEDR RSF/RASF (reference simulation in MATLAB and C++ platforms); (ii) Computational tasks partitioning process;(ii) Aggregation of parallel computing techniques;(iii) Architecture design procedure of the addressed reconstructive SP computational tasks onto HW blocks (SSAs);School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

8HW/SW Co-design: MethodologySchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

9Algorithmic ref. ImplementationSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

12Partitioning PhaseSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

13Aggregation of parallel computing techniquesWe consider a number of different parallel optimization techniques used in high performance computing (HPC) in order to exploit the maximum possible parallelism in the design:

TilingSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

14Aggregation of parallel computing techniquesCASE STUDY: Matrix Vector MultiplicationThe Matrix Vector multiplication operation is described by the following sum:where,a: is the input matrix of dimensions mxnv: is the input vector of dimensions nx1u: is the results vector of dimensions mx1 i: index variable with range 0 to mSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

15Aggregation of parallel computing techniquesCASE STUDY: Matrix Vector MultiplicationThe matrix vector multiplication is usually implemented in sequential programming languages such as C++ as: for (i=0; i < m; i++) { u[i] = 0; for (j=0; j < n; j++) { u[i] = u[i] + a[i][j]*v[j]; } }To find out if we can speed up this algorithm, first we need to re write it in such a way that we can see all of its data dependencies. For this purpose, we use single assignment notation.Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[j] = V[j] : 0 <= j < nOutputs: U[i] = u[i] : 0 <= i < mSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

16Aggregation of parallel computing techniquesIndex SpaceijCASE STUDY: Matrix Vector MultiplicationFirst, we assign each operation in the Matrix Vector multiplication algorithm, a location in the space called the Index Space as the one shown on the right. We also re write the algorithm in such a way that we can assign a coordinate in this Index Space to each operation.

This operation is called Index Matching. for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) {S(i,j): u[i][0] = u[i][0] + a[i][j]*v[0][j]; } }NOTE:The algorithm has not been changed in any way, the addition of coordinate [0] has no effect with respect to the previous form of the algorithm.Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < nOutputs: U[i] = u[i][0] : 0 <= i < mSchool of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

17Aggregation of parallel computing techniquesCASE STUDY: Matrix Vector Multiplication -> Single Assignment StageInputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < nOutputs: U[i] = u[i][j+1] : 0 <= i < mNow that each operations is assigned to a single point in the Index Space, we can re write the algorithm such that variable assignments occur only once for each coordinate in the Index Space. for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) { u[i][j+1] = u[i][j] + a[i][j]*v[0][j]; } }In this version of the algorithm, one variable assignment is done for each point (PE) in the index space, please note that the input vector must be seen by all the PEs in order to perform its correct operation.School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

Aggregation of parallel computing techniquesCASE STUDY: Matrix Vector Multiplication -> Broadcast RemovalInputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < nOutputs: U[i] = u[i][j+1]: 0 <= i < mHaving a signal being broadcast, implies large routing resources and big drivers which can translate into large amounts of buffers being inserted in the final circuit. To avoid this, we remove the variable being broadcast by passing the variable through each of the PEs. for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) { u[i][j+1] = u[i][j] + a[i][j]*v[i][j];v[i+1][j] = v[i][j]; } }This form of the algorithm does not only complies with the single assignment requirement but it also has locality, this is, it only depends on data from its neighbors. This graph is also called a Dependency Graph (DG).School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

Aggregation of parallel computing techniques0U[4]a[4][1]a[4][2]a[4][3]a[4][0]a[4][4]0a[3][1]a[3][2]a[3][3]a[3][0]a[3][4]0a[2][1]a[2][2]a[2][3]a[2][0]a[2][4]0a[1][1]a[1][2]a[1][3]a[1][0]a[1][4]U[0]0a[0][1]a[0][2]a[0][3]a[0][0]a[0][4]v[0][0]v[0][1]v[0][2]v[0][3]v[0][4]iU[3]jU[2]U[1]CASE STUDY: Matrix Vector Multiplication -> SchedulingIndex SpaceNow, lets see how the algorithm works in time, look carefully at the animation at the right.

We have identified that in this processor array, it only takes 9 time cycles to run the entire matrix vector multiplication algorithm and that for each time cycle the maximum number of processors being used is 5.

If we are only using a maximum of 5 processors, why should we build an array of 25!!?School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

Aggregation of parallel computing techniques0U[4]U[4]U[3]U[2]U[1]U[0]a[4][1]a[4][2]a[4][3]a[4][0]a[4][4]Index Space0U[3]a[3][1]a[3][2]a[3][3]a[3][0]a[3][4]0U[2]a[4][0]a[4][1]a[4][2]a[4][3]a[2][1]a[2][2]a[2][3]a[2][0]a[2][4]0U[1]a[3][1]a[3][2]a[3][3]a[3][4]a[3][0]a[1][1]a[1][2]a[1][3]a[1][0]a[1][4]0U[0]a[2][3]a[2][2]a[2][4]a[2][1]a[2][0]a[0][1]a[0][2]a[0][3]a[0][0]a[0][4]v[0][0]v[0][1]v[0][2]v[0][3]v[0][4]a[1][3]a[1][4]a[1][0]a[1][1]a[1][2]i00000a[0][4]a[0][0]a[0][1]a[0][2]a[0][3]jv[0][4]v[0][2]v[0][3]v[0][0]v[0][1]CASE STUDY: Matrix Vector Multiplication -> AllocationThe circuit could operate with 5 processors.Matrix Vector Algorithm with projection [1 0]P4a[4][4]P3[1 0]P2P1P0School of Engineering, AutonomousUniversityof Yucatan, Merida, Mexico.

Aggregation of parallel computing techniquesCASE STUDY: Matrix Vector Multiplication -> Space-Time mappingIn this table we can see which processor is being used for each instant t.

Now, if we plot the information in the table into a [t,p] axis, we can see that the polytope defined by this selection table is bounded by the inequations: p>= 0, p>= t-n, p <=t and p<=m in the following relation: lower bound of p: p >= max(0,t-n)upper bound of p: p <= min(m,t)for ALL t

Aggregation of parallel computing techniquesCASE STUDY: Matrix Vector Multiplication -> Space-Time mappingwhere, p is the position of the processing element in the transformed algorithm.

ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA (20)

More from grssieee (20)

ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA