SlideShare a Scribd company logo
A POWER EFFICIENT ARCHITECTURE
FOR 2-D DISCRETE WAVELET
TRANSFORM

                 Rahul Jain, CoWare India
            Preeti Ranjan Panda, IIT-Delhi
Agenda
    Memory Power Optimization
    Existing Z-Scan based Schemes
    Low Power Z-Scan (Proposed Architecture )
    Results
    Conclusion




10 August 2006   10th IEEE VLSI Design And Test   2
                        Symposium, 2006
Memory Power Optimization
   Importance of Optimizing Memory System Energy
        Many emerging applications like JPEG2000 are data
        intensive
        Memory system can contribute up to 90% energy
   Concurrently Optimizing Memory Architecture and
   Accesses
        Algorithm Level
             Reduce memory requirement
             Improve regularity of accesses
        Build optimized memory architecture
             Memory Partitioning
             Custom Circuits



10 August 2006           10th IEEE VLSI Design And Test     3
                                Symposium, 2006
Z-Scan based Schemes                               [Chiu-SIPS’03]

   Suspending a DWT line computation
      Store 4 intermediate values
   Z-Scan
      Column Processing starts early
      On-Chip Buffer Required = 4*M
                            M =Image Tile ht                        2* CH



   Optimal Z-Scan                           2* CW
     EBCOT Code-Block size (CW*CH) considered
     On-Chip Buffer Required = 4*M+4*2*CW
     Usually CW=CH=64 (values used in exp.)
10 August 2006    10th IEEE VLSI Design And Test                4
                         Symposium, 2006
Low-Power Z-Scan (1)
    Generalize the Z-Scan
    Compute r elements in a row
    For Z Scan, r =2
    For Optimal Z-Scan, r = 2*CW
    On-Chip Buffer Required = 4*M+4*r
                                            r     r




                                2*CH




10 August 2006   10th IEEE VLSI Design And Test       5
                        Symposium, 2006
Low-Power Z-Scan (2)
    r will be a sub-integral multiple of 2*CW
         This considers the Code Block Size
    2 separate buffers used
         Row Buffer (RB) = 4*M
         Column Buffer (CB) = 4*r
    How to decide the value of r ?
         Size of CB α r
         RB Sleep Time α r                 RB in Low Power Mode

                                                                  RB access


                                              CB: r locations



10 August 2006        10th IEEE VLSI Design And Test                      6
                             Symposium, 2006
Memory Power Analysis (1)
    Let us assume that each element is computed in
    unit time (Energy and Power can be used interchangeably)
    For a memory of size 2n, Let
       Pa(2n) : memory access power
       Ps(2n) : sleep mode / data retention mode power
       Pw(2n) : wakeup power for each state transition from
       sleep mode to active mode
    Let, Ps(2n) = s* Pa (2n) and Pw (2n) = w* Pa (2n)
       s = 0.1, w = 0.33 (Assumed for Experiments)
    Buffer Accesses
         Read at Resumption
         Write at Suspension

10 August 2006        10th IEEE VLSI Design And Test           7
                             Symposium, 2006
Memory Power Analysis (2)
    Row Buffer Power
         2 access per r elements
         RB in sleep mode for r-2 element computation
         Wakeup RB once per row
         Power per ‘r’ element computation:
         Prow_buffer (r, M) = 2* Pa(M) + (r-2) * Ps(M) + Pw(M)


                     RB in Low Power Mode
                                                 Wakeup


                  Row Computation Resumes
                        Row Computation Suspends


10 August 2006          10th IEEE VLSI Design And Test           8
                               Symposium, 2006
Memory Power Analysis (3)
    Column Buffer Power
         1 access per element
         Power consumption per element computation:
         Pcol_buffer (r) = Pa(r)
                  Col Computation Resumes



                  Col Computation Suspends

    Power per 2-D DWT Element Computation:
          Prow_buffer (r, M)/r + Pcol_buffer (r)


10 August 2006       10th IEEE VLSI Design And Test   9
                            Symposium, 2006
Variation of Power with r
                        6.00E-10




                        5.00E-10




                        4.00E-10
           Energy (J)




                                                                              M=512
                                                                              M=256
                        3.00E-10                                              M=128
                                                                              M=64
                                                          r=32                M=32


                        2.00E-10



                                                         r=16
                        1.00E-10




                        0.00E+00
                                   2     4     8    16     32    64     128




                                             Value of r
10 August 2006                         10th IEEE VLSI Design And Test                 10
                                              Symposium, 2006
Power Implications of Banking (1)
    Banked Buffer
         Increases the average idleness of the each buffer
         Lower Access Power
         Predictable state changes, no timing overheads
    Let there be ‘b’ RB banks and ‘c’ CB banks
    Average RB power per element:
    Prow = [Power of bank in use*M/b + Sleep Power*(M-M/b)] / M
          = [{Prow_buffer (r, M/b) / r} * M/b + Ps (M/b) * (M-M/b)] / M
    Each bank waked up once for M*r elements
         Additional Row Buffer Wakeups per Element = b/M*r


10 August 2006           10th IEEE VLSI Design And Test              11
                                Symposium, 2006
Power Implications of Banking (2)
    Average column-buffer power per element:
    Pcol = [{Pcol_buffer (r/c)} * r/c + Ps (r/c) * (r-r/c)] / r
    No of Column Buffer Wakeups per Element = c/r
    Additional Wakeup Power :
    Pwakeups = [Pw(M/b) * b/M*r ] + [ Pw(r/c) * c/r ]
    MUX power considered
    Total Power per Element :
    Prow + Pcol + Pwakeups + Pmux




10 August 2006       10th IEEE VLSI Design And Test          12
                            Symposium, 2006
r vs Power (Banked Case, M=512)




                                                  Min Power
                                                  with r=64,
                                                   c=4, b=8

10 August 2006   10th IEEE VLSI Design And Test        13
                        Symposium, 2006
Energy Consumption Comparison
                    Optimal    Low-Power
          Z-scan                                                      %
 M                   Z-scan       Z-scan                 r   c   b
         (10-11J)                                                    imp
                    (10-11J)     (10-11J)
 32        23.4      29.1            8.08            32      4   4   72.2
 64        25.5      29.3            8.13            64      4   4   72.3
128        29.9      29.7            8.18            64      4   8   72.5
256        38.5      30.6            8.29            64      4   8   72.9
512        55.8      32.3            8.49            64      4   8   73.7
1024       90.3      35.8            8.89            64      4   8   75.2

Up to 90% and 75% improvement over Z-Scan and Optimal
Z-Scan respectively

10 August 2006          10th IEEE VLSI Design And Test                  14
                               Symposium, 2006
Energy Modelling
  Sequential Access Memory [Moon-CICC’02]
    Configured as a circular buffer
    Address Sequencing logic and decoders replaced with
    row sequencer to get low power and high speed
    Banked implementation used for big memory
  Energy Modelling [Coumeri-TVLSI’00]
    Empirical Equations for modelling energy of on-chip
    SRAM memory
    Model parameters are Size, Bit Width, Access Mode
    Individual equations for different memory components
    To model SAM, Row Decoder, Column Decoder, Buffers
    not considered

10 August 2006     10th IEEE VLSI Design And Test      15
                          Symposium, 2006
Conclusion
    A methodology to arrive at a Low-Power
    DWT architecture proposed
    Co-Optimization of Memory Architecture
    and Access pattern done
    Up to 90% energy saving achieved
    The derived architecture depends on the
    target memory technology
         Would lead to different architectures for ASIC
         and FPGA implementations


10 August 2006       10th IEEE VLSI Design And Test       16
                            Symposium, 2006
References:
   [Chiu-SIPS’03]: Mu-Yu Chiu et al (2003).Optimal data
   transfer and buffering schemes for JPEG2000 encode.
   IEEE Workshop on SIPS, Aug. 2003, pp. 177 – 182
   [Moon-CICC’02]: Joong-Seok Moon et.al (2002). Low-
   power sequential access memory design. Custom
   Integrated Circuits Conference, 2002. pp.111 – 114
   [Coumeri-TVLSI’00]: Coumeri, S.L et al (2000).
   Memory modelling for System Synthesis. IEEE Trans.
   VLSI Systems, , June 2000, pp:327 – 334




10 August 2006     10th IEEE VLSI Design And Test     17
                          Symposium, 2006
Thank You


                 Questions!




10 August 2006    10th IEEE VLSI Design And Test   18
                         Symposium, 2006
Backup Slides
Discrete Wavelet Transform
        2D wavelet transform:
             1st:1D wavelet transform to all rows
             2nd:1D wavelet transform to all columns
        Each Row/Column can be computed independently
        Store 4 values at line computation suspension
   0     1     2   3   4   5      6    7    8         X(i)



         1         3       5           7            Y(2i+1)
                                                                Colored arrows show
                                                                multiplication by
    0          2       4          6          8        Y(2i)     constants a, b, c, d
                                                                defined in JPEG2000
         1                             7              Z(2i+1)   standard
                   3       5

    0          2                             8        Z(2i)
                       4          6
10 August 2006                 10th IEEE VLSI Design And Test                     20
                                      Symposium, 2006
Buffer Structure
    The Buffers are all the time full
    They are accessed like a circular FIFO
    General Memory Row Decoder not required
         use a counter
         use a shift register loaded with a 1 initially
    Every Write Signal
         Increments the counter
         Shifts the Register
    Store all the 4 intermediate values in one Column
         No need for the Column Decoder
    This would be similar to Sequential Access Memory
    (SAM) [Moon-CICC’02]

10 August 2006          10th IEEE VLSI Design And Test    21
                               Symposium, 2006

More Related Content

PDF
Libxc a library of exchange and correlation functionals
PPTX
Blaha krakow 2004
PDF
Linear Ic Applications Jntu Model Paper{Www.Studentyogi.Com}
PDF
vasp-gpu on Balena: Usage and Some Benchmarks
PDF
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
PDF
Phonons & Phonopy: Pro Tips (2015)
PDF
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
PDF
7th Semester Electronic and Communication Engineering (2013-December) Questio...
Libxc a library of exchange and correlation functionals
Blaha krakow 2004
Linear Ic Applications Jntu Model Paper{Www.Studentyogi.Com}
vasp-gpu on Balena: Usage and Some Benchmarks
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
Phonons & Phonopy: Pro Tips (2015)
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
7th Semester Electronic and Communication Engineering (2013-December) Questio...

What's hot (19)

PDF
Phonons & Phonopy: Pro Tips (2014)
PPT
Ch4 lecture slides Chenming Hu Device for IC
PDF
Ultrasound Modular Architecture
PPT
Ch5 lecture slides Chenming Hu Device for IC
PDF
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
PPT
Ch3 lecture slides Chenming Hu Device for IC
PPT
Ch7 lecture slides Chenming Hu Device for IC
PDF
SAR_ADC__Resumo
PDF
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
PDF
Battle field3 ssao
PDF
Vortex Dissipation Due to Airfoil-Vortex Interaction
PDF
Numerical Simulation: Flight Dynamic Stability Analysis Using Unstructured Ba...
PDF
Multiband Transceivers - [Chapter 1]
PDF
An Enhanced Inherited Crossover GA for the Reliability Constrained UC Problem
PDF
US7522774
PDF
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
PDF
1147 smith[1]
PDF
Lect2 up270 (100328)
PPT
D1150740001
Phonons & Phonopy: Pro Tips (2014)
Ch4 lecture slides Chenming Hu Device for IC
Ultrasound Modular Architecture
Ch5 lecture slides Chenming Hu Device for IC
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Ch3 lecture slides Chenming Hu Device for IC
Ch7 lecture slides Chenming Hu Device for IC
SAR_ADC__Resumo
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Battle field3 ssao
Vortex Dissipation Due to Airfoil-Vortex Interaction
Numerical Simulation: Flight Dynamic Stability Analysis Using Unstructured Ba...
Multiband Transceivers - [Chapter 1]
An Enhanced Inherited Crossover GA for the Reliability Constrained UC Problem
US7522774
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
1147 smith[1]
Lect2 up270 (100328)
D1150740001
Ad

Viewers also liked (10)

PPT
Low Power Architecture for JPEG2000
PDF
Passive Low Energy Architecture Conference Paper 2009
PDF
Low Energy Architecture: An Overview
PDF
Satellite Image Resolution Enhancement Technique Using DWT and IWT
PPT
Energy Efficient Architecture-Sustainable Habitat
PDF
Energy Efficient Design Education Through Architectural Design Studio Projects
PPTX
Satellite image contrast enhancement using discrete wavelet transform
PDF
Advanced architecture theory and criticism lecture 01
PPTX
Climate Responsive Architecture
Low Power Architecture for JPEG2000
Passive Low Energy Architecture Conference Paper 2009
Low Energy Architecture: An Overview
Satellite Image Resolution Enhancement Technique Using DWT and IWT
Energy Efficient Architecture-Sustainable Habitat
Energy Efficient Design Education Through Architectural Design Studio Projects
Satellite image contrast enhancement using discrete wavelet transform
Advanced architecture theory and criticism lecture 01
Climate Responsive Architecture
Ad

Similar to A Power Efficient Architecture for 2-D Discrete Wavelet Transform (20)

PDF
Optimal Capacitor Placement in a Radial Distribution System using Shuffled Fr...
PPT
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
PDF
Linear and digital ic applications Jntu Model Paper{Www.Studentyogi.Com}
PDF
2nd Semester M Tech: VLSI Design and Embedded System (June-2016) Question Papers
PDF
Position sensorless vector control of pmsm for electrical household applicances
PDF
A Low Phase Noise CMOS Quadrature Voltage Control Oscillator Using Clock Gate...
PDF
A Low Phase Noise CMOS Quadrature Voltage Control Oscillator Using Clock Gate...
PDF
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
PDF
Design of 5.1 GHz ultra-low power and wide tuning range hybrid oscillator
PDF
ijaerv10n9spl_473
PDF
Ijaerv10n9spl 473
PDF
ALEA:Fine-grain Energy Profiling with Basic Block sampling
PDF
6th Semeste Electronics and Communication Engineering (June-2016) Question Pa...
PDF
Low Power Clock Distribution Schemes in VLSI Design
PDF
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
PDF
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
PDF
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
PDF
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
PDF
7th Semeste Electronics and Communication Engineering (June-2016) Question Pa...
Optimal Capacitor Placement in a Radial Distribution System using Shuffled Fr...
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Linear and digital ic applications Jntu Model Paper{Www.Studentyogi.Com}
2nd Semester M Tech: VLSI Design and Embedded System (June-2016) Question Papers
Position sensorless vector control of pmsm for electrical household applicances
A Low Phase Noise CMOS Quadrature Voltage Control Oscillator Using Clock Gate...
A Low Phase Noise CMOS Quadrature Voltage Control Oscillator Using Clock Gate...
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
Design of 5.1 GHz ultra-low power and wide tuning range hybrid oscillator
ijaerv10n9spl_473
Ijaerv10n9spl 473
ALEA:Fine-grain Energy Profiling with Basic Block sampling
6th Semeste Electronics and Communication Engineering (June-2016) Question Pa...
Low Power Clock Distribution Schemes in VLSI Design
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
7th Semeste Electronics and Communication Engineering (June-2016) Question Pa...

Recently uploaded (20)

PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
1_English_Language_Set_2.pdf probationary
PDF
Empowerment Technology for Senior High School Guide
PDF
HVAC Specification 2024 according to central public works department
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Introduction to Building Materials
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Computer Architecture Input Output Memory.pptx
Indian roads congress 037 - 2012 Flexible pavement
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Unit 4 Computer Architecture Multicore Processor.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Weekly quiz Compilation Jan -July 25.pdf
Introduction to pro and eukaryotes and differences.pptx
1_English_Language_Set_2.pdf probationary
Empowerment Technology for Senior High School Guide
HVAC Specification 2024 according to central public works department
LDMMIA Reiki Yoga Finals Review Spring Summer
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
AI-driven educational solutions for real-life interventions in the Philippine...
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Hazard Identification & Risk Assessment .pdf
Introduction to Building Materials

A Power Efficient Architecture for 2-D Discrete Wavelet Transform

  • 1. A POWER EFFICIENT ARCHITECTURE FOR 2-D DISCRETE WAVELET TRANSFORM Rahul Jain, CoWare India Preeti Ranjan Panda, IIT-Delhi
  • 2. Agenda Memory Power Optimization Existing Z-Scan based Schemes Low Power Z-Scan (Proposed Architecture ) Results Conclusion 10 August 2006 10th IEEE VLSI Design And Test 2 Symposium, 2006
  • 3. Memory Power Optimization Importance of Optimizing Memory System Energy Many emerging applications like JPEG2000 are data intensive Memory system can contribute up to 90% energy Concurrently Optimizing Memory Architecture and Accesses Algorithm Level Reduce memory requirement Improve regularity of accesses Build optimized memory architecture Memory Partitioning Custom Circuits 10 August 2006 10th IEEE VLSI Design And Test 3 Symposium, 2006
  • 4. Z-Scan based Schemes [Chiu-SIPS’03] Suspending a DWT line computation Store 4 intermediate values Z-Scan Column Processing starts early On-Chip Buffer Required = 4*M M =Image Tile ht 2* CH Optimal Z-Scan 2* CW EBCOT Code-Block size (CW*CH) considered On-Chip Buffer Required = 4*M+4*2*CW Usually CW=CH=64 (values used in exp.) 10 August 2006 10th IEEE VLSI Design And Test 4 Symposium, 2006
  • 5. Low-Power Z-Scan (1) Generalize the Z-Scan Compute r elements in a row For Z Scan, r =2 For Optimal Z-Scan, r = 2*CW On-Chip Buffer Required = 4*M+4*r r r 2*CH 10 August 2006 10th IEEE VLSI Design And Test 5 Symposium, 2006
  • 6. Low-Power Z-Scan (2) r will be a sub-integral multiple of 2*CW This considers the Code Block Size 2 separate buffers used Row Buffer (RB) = 4*M Column Buffer (CB) = 4*r How to decide the value of r ? Size of CB α r RB Sleep Time α r RB in Low Power Mode RB access CB: r locations 10 August 2006 10th IEEE VLSI Design And Test 6 Symposium, 2006
  • 7. Memory Power Analysis (1) Let us assume that each element is computed in unit time (Energy and Power can be used interchangeably) For a memory of size 2n, Let Pa(2n) : memory access power Ps(2n) : sleep mode / data retention mode power Pw(2n) : wakeup power for each state transition from sleep mode to active mode Let, Ps(2n) = s* Pa (2n) and Pw (2n) = w* Pa (2n) s = 0.1, w = 0.33 (Assumed for Experiments) Buffer Accesses Read at Resumption Write at Suspension 10 August 2006 10th IEEE VLSI Design And Test 7 Symposium, 2006
  • 8. Memory Power Analysis (2) Row Buffer Power 2 access per r elements RB in sleep mode for r-2 element computation Wakeup RB once per row Power per ‘r’ element computation: Prow_buffer (r, M) = 2* Pa(M) + (r-2) * Ps(M) + Pw(M) RB in Low Power Mode Wakeup Row Computation Resumes Row Computation Suspends 10 August 2006 10th IEEE VLSI Design And Test 8 Symposium, 2006
  • 9. Memory Power Analysis (3) Column Buffer Power 1 access per element Power consumption per element computation: Pcol_buffer (r) = Pa(r) Col Computation Resumes Col Computation Suspends Power per 2-D DWT Element Computation: Prow_buffer (r, M)/r + Pcol_buffer (r) 10 August 2006 10th IEEE VLSI Design And Test 9 Symposium, 2006
  • 10. Variation of Power with r 6.00E-10 5.00E-10 4.00E-10 Energy (J) M=512 M=256 3.00E-10 M=128 M=64 r=32 M=32 2.00E-10 r=16 1.00E-10 0.00E+00 2 4 8 16 32 64 128 Value of r 10 August 2006 10th IEEE VLSI Design And Test 10 Symposium, 2006
  • 11. Power Implications of Banking (1) Banked Buffer Increases the average idleness of the each buffer Lower Access Power Predictable state changes, no timing overheads Let there be ‘b’ RB banks and ‘c’ CB banks Average RB power per element: Prow = [Power of bank in use*M/b + Sleep Power*(M-M/b)] / M = [{Prow_buffer (r, M/b) / r} * M/b + Ps (M/b) * (M-M/b)] / M Each bank waked up once for M*r elements Additional Row Buffer Wakeups per Element = b/M*r 10 August 2006 10th IEEE VLSI Design And Test 11 Symposium, 2006
  • 12. Power Implications of Banking (2) Average column-buffer power per element: Pcol = [{Pcol_buffer (r/c)} * r/c + Ps (r/c) * (r-r/c)] / r No of Column Buffer Wakeups per Element = c/r Additional Wakeup Power : Pwakeups = [Pw(M/b) * b/M*r ] + [ Pw(r/c) * c/r ] MUX power considered Total Power per Element : Prow + Pcol + Pwakeups + Pmux 10 August 2006 10th IEEE VLSI Design And Test 12 Symposium, 2006
  • 13. r vs Power (Banked Case, M=512) Min Power with r=64, c=4, b=8 10 August 2006 10th IEEE VLSI Design And Test 13 Symposium, 2006
  • 14. Energy Consumption Comparison Optimal Low-Power Z-scan % M Z-scan Z-scan r c b (10-11J) imp (10-11J) (10-11J) 32 23.4 29.1 8.08 32 4 4 72.2 64 25.5 29.3 8.13 64 4 4 72.3 128 29.9 29.7 8.18 64 4 8 72.5 256 38.5 30.6 8.29 64 4 8 72.9 512 55.8 32.3 8.49 64 4 8 73.7 1024 90.3 35.8 8.89 64 4 8 75.2 Up to 90% and 75% improvement over Z-Scan and Optimal Z-Scan respectively 10 August 2006 10th IEEE VLSI Design And Test 14 Symposium, 2006
  • 15. Energy Modelling Sequential Access Memory [Moon-CICC’02] Configured as a circular buffer Address Sequencing logic and decoders replaced with row sequencer to get low power and high speed Banked implementation used for big memory Energy Modelling [Coumeri-TVLSI’00] Empirical Equations for modelling energy of on-chip SRAM memory Model parameters are Size, Bit Width, Access Mode Individual equations for different memory components To model SAM, Row Decoder, Column Decoder, Buffers not considered 10 August 2006 10th IEEE VLSI Design And Test 15 Symposium, 2006
  • 16. Conclusion A methodology to arrive at a Low-Power DWT architecture proposed Co-Optimization of Memory Architecture and Access pattern done Up to 90% energy saving achieved The derived architecture depends on the target memory technology Would lead to different architectures for ASIC and FPGA implementations 10 August 2006 10th IEEE VLSI Design And Test 16 Symposium, 2006
  • 17. References: [Chiu-SIPS’03]: Mu-Yu Chiu et al (2003).Optimal data transfer and buffering schemes for JPEG2000 encode. IEEE Workshop on SIPS, Aug. 2003, pp. 177 – 182 [Moon-CICC’02]: Joong-Seok Moon et.al (2002). Low- power sequential access memory design. Custom Integrated Circuits Conference, 2002. pp.111 – 114 [Coumeri-TVLSI’00]: Coumeri, S.L et al (2000). Memory modelling for System Synthesis. IEEE Trans. VLSI Systems, , June 2000, pp:327 – 334 10 August 2006 10th IEEE VLSI Design And Test 17 Symposium, 2006
  • 18. Thank You Questions! 10 August 2006 10th IEEE VLSI Design And Test 18 Symposium, 2006
  • 20. Discrete Wavelet Transform 2D wavelet transform: 1st:1D wavelet transform to all rows 2nd:1D wavelet transform to all columns Each Row/Column can be computed independently Store 4 values at line computation suspension 0 1 2 3 4 5 6 7 8 X(i) 1 3 5 7 Y(2i+1) Colored arrows show multiplication by 0 2 4 6 8 Y(2i) constants a, b, c, d defined in JPEG2000 1 7 Z(2i+1) standard 3 5 0 2 8 Z(2i) 4 6 10 August 2006 10th IEEE VLSI Design And Test 20 Symposium, 2006
  • 21. Buffer Structure The Buffers are all the time full They are accessed like a circular FIFO General Memory Row Decoder not required use a counter use a shift register loaded with a 1 initially Every Write Signal Increments the counter Shifts the Register Store all the 4 intermediate values in one Column No need for the Column Decoder This would be similar to Sequential Access Memory (SAM) [Moon-CICC’02] 10 August 2006 10th IEEE VLSI Design And Test 21 Symposium, 2006