SlideShare a Scribd company logo
Address generation unit for multimedia 
                  applications
     on application specific instruction set 
                   processors
 Marc Moreno­Berengue,  Guillermo Talavera Velilla, Aitor Rodriguez­Alsina,  
                             Jordi Carrabina
                  Universitat Autònoma de Barcelona (Spain)




                            IECON 2010
                  7–10 November – Phoenix, AZ, USA
Motivation

➢   Design a custom Address Generation Unit (AGU)
       ➢   Connected to an ASIP data­path


➢   Benefits of custom AGU design
       ➢   Previous software optimizations.
       ➢   Multimedia applications



                                               2
Structure
➢   Introduction
➢   Design
➢   Work Flow
➢   Results
➢   Conclusions


                               3
➢   Introduction
➢   Design
➢   Work Flow
➢   Results
➢   Conclusions
Multimedia applications features
➢   Multimedia applications
        ➢   Complex index manipulation
        ➢   Large number  of data access
➢   Require
        ➢   High performance 
        ➢   Low energy consumption


    It is crucial reduce these data accesses and related address 
    computations in an effective way
                                                            5
SW optimizations
Data Transfer and Storage Exploration (DTSE)* methodology 
has oriented to:
 ➢   Reduce data transfers between memories and processor
 ➢   Improve the energy efficiency
 ➢   Reduce the execution time


     SW transformations create high overhead in the address 
     generation and control flow

                      *Methodology developed at IMEC research center
                                                              6
SW optimizations
                             ...

                             for (y=0; y<=M+2; ++y){
...
                              for (x=0; x<=N+2; ++x) {
for (x=1; x<=N-2; ++x)
                                 if (x>=0&&x<N &&y>=1&&y<=M-2)
 for (y=1; y<=N-2; ++y)
                                   D[x%3] = B[(y*N+x)%8704+
  for (k=-1; k<=1; ++k){
A[x][y] += B[x+k][y]                (y*N+x)/8704*16384+7680] ;
        *C[abs(k)];
                                 if (x-1>=1&&x-1<=N-2
    A[x][y] /=tot;
                                              &&y>=1&&y<=M-2) {
}
                                    for (k=-1; k<=1; ++k)
...
                                      acc += D[(x-1+k)%3]*C[abs(k)];
                             }

                                   acc /= tot;}

                             }

                             ...
                                                              7
SW optimizations
                             ...

                             for (y=0; y<=M+2; ++y){
...
                               for (x=0; x<=N+2; ++x) {
for (x=1; x<=N-2; ++x)
                                 if (x>=0&&x<N &&y>=1&&y<=M-2)
 for (y=1; y<=N-2; ++y)
                                   D[x%3] = B[(y*N+x)%8704+
  for (k=-1; k<=1; ++k){
A[x][y] += B[x+k][y]                (y*N+x)/8704*16384+7680] ;
        *C[abs(k)];
                                 if (x-1>=1&&x-1<=N-2
    A[x][y] /=tot;
                                              &&y>=1&&y<=M-2) {
}
                                    for (k=-1; k<=1; ++k)
...     Need to be optimized          acc += D[(x-1+k)%3]*C[abs(k)];
                             }

                                   acc /= tot;}

                             }

                             ...
                                                              8
Address Generation Unit
 The Address Generation Unit (AGU) is a co­processor which use 
 the address equation (AE) to generate the address sequence (AS).


                             &X[AE]=AS 


 Example:
 B[(y*N+x)%8704+(y*N+x)/8704*16384+7680]
 AE = (y*N+x) % 8704 + (y*N+x) / 8704*16384+7680
   AS = 7680,7681,7682,7683, ...
                                                           9
➢   Introduction

➢   Design
➢   Work Flow
➢   Results
➢   Conclusions
Application specific instruction set 
             processor
Application specific instruction set processor (ASIP) 
     ➢   Extend its instruction set
     ➢   Fast interface for read/write data from/to specific 
           hardware
              ➢   1 Instruction
              ➢   1 Cycle


                                                                11
AGU design

➢   AGU attached to the ASIP data­path save execution time
        ●   1 instruction
        ●   1 cycle




                                                             12
AGU skeleton
The AGU has one control unit, 
one process unit and one FIFO
                                 Custom Instruction interface


                                                         CI unit

                                          Change AE values


                                           Read AS values




                                                         CO unit




                                            AS generation




                                                            13
AGU skeleton
The AGU has one control unit, 
one process unit and one FIFO
                                         Custom Instruction interface


  ➢   CI (custom instruction) unit                               CI unit

                                                  Change AE values
      •   AE configuration & read FIFO
                                                   Read AS values




                                                                 CO unit




                                                    AS generation




                                                                    14
AGU skeleton
The AGU has one control unit, 
one process unit and one FIFO
                                              Custom Instruction interface


  ➢   CI (custom instruction) unit                                    CI unit

                                                       Change AE values
      •   AE configuration & read FIFO
                                                        Read AS values



  ➢    CO (co­processador) unit                                       CO unit

      •   Calculate the AE to generate the 
          AS  and store all values in the                AS generation

          FIFO

                                                                         15
AGU Creator




Web based application
                        16
➢   Introduction
➢   Design

➢   Work Flow
➢   Results
➢   Conclusions
Work Flow




            18
Work Flow
      Init.c                     Opt.c                      CI_code.c
int A[70],B[70],C=0;       int A[7],B[7],C=0;         int A[7],B[7],C=0,ix,x;

...                        ...                        initAGU(); initAGU2();

for (i=7; i<70; i++)       for (i=7; i<70; i++)       ...

{                          {                          for (i=7; i<70; i++)

B[i]=A[i-7]+B[i-7];        B[i%7]=A[(i-7)%7]          {

A[i]=i;           SW Opt.        +B[(i-7)%7];         x=readAGU();

C+=B[i];          (DTSE) A[i%7]=i;                    ix=readAGU2();

}                          C+=B[i%7];                 B[x]=A[ix]+B[ix];

...                        }                   AGUs   A[x]=i;
                           ...                        C+=B[x];

                                                      }

                                                      ...              19
➢   Introduction
➢   Design
➢   Work Flow

➢   Results
➢   Conclusions
Test environment 
➢   NIOS II soft­core processor (Altera)
    ●   32 bits RISC processor
    ●   Harvard memory architecture
    ●   Data/Instructions cache 
    ●   256 Custom Instructions (Fast data­path interface)


➢   Cyclone II EP2C35 Altera FPGA




                                                             21
Test Applications

➢   Cavity Detector
    Medical imaging application to detect cavities on tomography scans


➢   Quad­tree Structured Difference Pulse Code Modulation 
    (QSDPCM)
    An inter­frame compression technique for video imaging.




                                                                         22
Speedup
      Speedup ( Cavity )                   Speedup ( QSDPCM )
1.4                                  1.4

1.2                                  1.2

 1                                    1

0.8                                  0.8

0.6                                  0.6

0.4                                  0.4

0.2                                  0.2

 0                                    0
      DTSE
       Init      AGU inclusion
                  HW AGU inclusion         DTSE
                                            Init     AGU inclusion
                                                      HW AGU inclusion



      Speedup: 1.26                        Speedup: 1.19

                                                                         23
Energy improvements 
        Energy ( Cavity )                    Energy ( QSDPCM )
 1                                      1


0.8                                    0.8


0.6                                    0.6


0.4                                    0.4


0.2                                    0.2


 0                                      0
      DTSE
        Init       AGU inclusion
                    HW AGU inclusion         DTSE
                                              Init     AGU inclusion
                                                        HW AGU inclusion




Energy reduction: 27%                  Energy reduction: 21%

                                                                           24
Area penalties

                     Cavity (LEs)   QSPCM (LEs)

NIOS-F                  2644            2644

NIOS-F +AGU             3596            3592




  The AGU inclusion in the NIOS II architecture use
     2.9% of total FPGA resources (33216LEs)


                                                      25
➢   Introduction
➢   Design
➢   Work Flow
➢   Results

➢   Conclusions
Conclusions
➢   Extend an ASIP by AGUs is an efficient way to meet the 
    performance/energy requirements of multimedia applications 
    after some SW optimizations

➢   The innovation of connecting the AGU in the processor data­
    path and working in parallel with the main processor allow 
    calculate a wide range of values before the processor needs them

➢   Use an AGU skeleton and a wizard decrease the design and 
    implementation time.


                                                               27
Future Work
➢   Improve the AGU wizard in order to:

    ●   Detect automatically AEs  and show relevant informations 
        about each AE for a given C file.
    ●   Generate the appropriate AGU for a specific set of AEs
    ●   Generate AGUs for more than one ASIP


➢   Extend the set of applications have been used in this work


                                                                 28
Thank you!!


Questions?

More Related Content

PPTX
Code generation
PPTX
Luksusfælden stine
PPTX
EA Draffan Assess 2010 Presentation on Accessibility & Assistive Technology
PPT
Alistair McNaught Right 2 Read presentation from Assess2010
DOCX
2012question
PPT
Accessible Procurement
PPT
Applying Technology to assist Study Skills Development
PPT
C for Microcontrollers
Code generation
Luksusfælden stine
EA Draffan Assess 2010 Presentation on Accessibility & Assistive Technology
Alistair McNaught Right 2 Read presentation from Assess2010
2012question
Accessible Procurement
Applying Technology to assist Study Skills Development
C for Microcontrollers

Similar to Iecon slides (20)

PDF
Evaluation of X32 ABI for Virtualization and Cloud
KEY
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
PPT
verilog_case_study
KEY
Hardware assited x86 emulation on godson 3
PPT
PDF
Write once on silicon
PPTX
What&rsquo;s new in Visual C++
PPT
Code generator
PPT
Advanced computer architecture
PPT
Unit 3 basic processing unit
PDF
High-Level Synthesis with GAUT
PPTX
Basic Processing Unit
PPTX
Microcontroller architecture programming and interfacing
PDF
PPT
Central Processing Unit User View
PDF
CWCAS X-ISCKER Poster
PPTX
Code Optimization using Code Re-ordering
PDF
Javascript engine performance
PDF
optimization c code on blackfin
Evaluation of X32 ABI for Virtualization and Cloud
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
verilog_case_study
Hardware assited x86 emulation on godson 3
Write once on silicon
What&rsquo;s new in Visual C++
Code generator
Advanced computer architecture
Unit 3 basic processing unit
High-Level Synthesis with GAUT
Basic Processing Unit
Microcontroller architecture programming and interfacing
Central Processing Unit User View
CWCAS X-ISCKER Poster
Code Optimization using Code Re-ordering
Javascript engine performance
optimization c code on blackfin
Ad

Iecon slides

  • 1. Address generation unit for multimedia  applications on application specific instruction set  processors  Marc Moreno­Berengue,  Guillermo Talavera Velilla, Aitor Rodriguez­Alsina,   Jordi Carrabina Universitat Autònoma de Barcelona (Spain) IECON 2010 7–10 November – Phoenix, AZ, USA
  • 2. Motivation ➢ Design a custom Address Generation Unit (AGU) ➢ Connected to an ASIP data­path ➢ Benefits of custom AGU design ➢ Previous software optimizations. ➢ Multimedia applications 2
  • 3. Structure ➢ Introduction ➢ Design ➢ Work Flow ➢ Results ➢ Conclusions 3
  • 4. Introduction ➢ Design ➢ Work Flow ➢ Results ➢ Conclusions
  • 5. Multimedia applications features ➢ Multimedia applications ➢ Complex index manipulation ➢ Large number  of data access ➢ Require ➢ High performance  ➢ Low energy consumption It is crucial reduce these data accesses and related address  computations in an effective way 5
  • 6. SW optimizations Data Transfer and Storage Exploration (DTSE)* methodology  has oriented to: ➢ Reduce data transfers between memories and processor ➢ Improve the energy efficiency ➢ Reduce the execution time SW transformations create high overhead in the address  generation and control flow *Methodology developed at IMEC research center 6
  • 7. SW optimizations ... for (y=0; y<=M+2; ++y){ ... for (x=0; x<=N+2; ++x) { for (x=1; x<=N-2; ++x) if (x>=0&&x<N &&y>=1&&y<=M-2) for (y=1; y<=N-2; ++y) D[x%3] = B[(y*N+x)%8704+ for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] (y*N+x)/8704*16384+7680] ; *C[abs(k)]; if (x-1>=1&&x-1<=N-2 A[x][y] /=tot; &&y>=1&&y<=M-2) { } for (k=-1; k<=1; ++k) ... acc += D[(x-1+k)%3]*C[abs(k)]; } acc /= tot;} } ... 7
  • 8. SW optimizations ... for (y=0; y<=M+2; ++y){ ... for (x=0; x<=N+2; ++x) { for (x=1; x<=N-2; ++x) if (x>=0&&x<N &&y>=1&&y<=M-2) for (y=1; y<=N-2; ++y) D[x%3] = B[(y*N+x)%8704+ for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] (y*N+x)/8704*16384+7680] ; *C[abs(k)]; if (x-1>=1&&x-1<=N-2 A[x][y] /=tot; &&y>=1&&y<=M-2) { } for (k=-1; k<=1; ++k) ... Need to be optimized acc += D[(x-1+k)%3]*C[abs(k)]; } acc /= tot;} } ... 8
  • 9. Address Generation Unit The Address Generation Unit (AGU) is a co­processor which use  the address equation (AE) to generate the address sequence (AS). &X[AE]=AS  Example: B[(y*N+x)%8704+(y*N+x)/8704*16384+7680] AE = (y*N+x) % 8704 + (y*N+x) / 8704*16384+7680    AS = 7680,7681,7682,7683, ... 9
  • 10. Introduction ➢ Design ➢ Work Flow ➢ Results ➢ Conclusions
  • 11. Application specific instruction set  processor Application specific instruction set processor (ASIP)  ➢ Extend its instruction set ➢ Fast interface for read/write data from/to specific  hardware ➢ 1 Instruction ➢ 1 Cycle 11
  • 12. AGU design ➢ AGU attached to the ASIP data­path save execution time ● 1 instruction ● 1 cycle 12
  • 13. AGU skeleton The AGU has one control unit,  one process unit and one FIFO Custom Instruction interface CI unit Change AE values Read AS values CO unit AS generation 13
  • 14. AGU skeleton The AGU has one control unit,  one process unit and one FIFO Custom Instruction interface ➢ CI (custom instruction) unit CI unit Change AE values • AE configuration & read FIFO Read AS values CO unit AS generation 14
  • 15. AGU skeleton The AGU has one control unit,  one process unit and one FIFO Custom Instruction interface ➢ CI (custom instruction) unit CI unit Change AE values • AE configuration & read FIFO Read AS values ➢  CO (co­processador) unit CO unit • Calculate the AE to generate the  AS  and store all values in the  AS generation FIFO 15
  • 17. Introduction ➢ Design ➢ Work Flow ➢ Results ➢ Conclusions
  • 19. Work Flow Init.c Opt.c CI_code.c int A[70],B[70],C=0; int A[7],B[7],C=0; int A[7],B[7],C=0,ix,x; ... ... initAGU(); initAGU2(); for (i=7; i<70; i++) for (i=7; i<70; i++) ... { { for (i=7; i<70; i++) B[i]=A[i-7]+B[i-7]; B[i%7]=A[(i-7)%7] { A[i]=i; SW Opt. +B[(i-7)%7]; x=readAGU(); C+=B[i]; (DTSE) A[i%7]=i; ix=readAGU2(); } C+=B[i%7]; B[x]=A[ix]+B[ix]; ... } AGUs A[x]=i; ... C+=B[x]; } ... 19
  • 20. Introduction ➢ Design ➢ Work Flow ➢ Results ➢ Conclusions
  • 21. Test environment  ➢ NIOS II soft­core processor (Altera) ● 32 bits RISC processor ● Harvard memory architecture ● Data/Instructions cache  ● 256 Custom Instructions (Fast data­path interface) ➢ Cyclone II EP2C35 Altera FPGA 21
  • 22. Test Applications ➢ Cavity Detector Medical imaging application to detect cavities on tomography scans ➢ Quad­tree Structured Difference Pulse Code Modulation  (QSDPCM) An inter­frame compression technique for video imaging. 22
  • 23. Speedup Speedup ( Cavity ) Speedup ( QSDPCM ) 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 DTSE Init AGU inclusion HW AGU inclusion DTSE Init AGU inclusion HW AGU inclusion Speedup: 1.26 Speedup: 1.19 23
  • 24. Energy improvements  Energy ( Cavity ) Energy ( QSDPCM ) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 DTSE Init AGU inclusion HW AGU inclusion DTSE Init AGU inclusion HW AGU inclusion Energy reduction: 27% Energy reduction: 21% 24
  • 25. Area penalties Cavity (LEs) QSPCM (LEs) NIOS-F 2644 2644 NIOS-F +AGU 3596 3592 The AGU inclusion in the NIOS II architecture use 2.9% of total FPGA resources (33216LEs) 25
  • 26. Introduction ➢ Design ➢ Work Flow ➢ Results ➢ Conclusions
  • 27. Conclusions ➢ Extend an ASIP by AGUs is an efficient way to meet the  performance/energy requirements of multimedia applications  after some SW optimizations ➢ The innovation of connecting the AGU in the processor data­ path and working in parallel with the main processor allow  calculate a wide range of values before the processor needs them ➢ Use an AGU skeleton and a wizard decrease the design and  implementation time. 27
  • 28. Future Work ➢ Improve the AGU wizard in order to: ● Detect automatically AEs  and show relevant informations  about each AE for a given C file. ● Generate the appropriate AGU for a specific set of AEs ● Generate AGUs for more than one ASIP ➢ Extend the set of applications have been used in this work 28