SlideShare a Scribd company logo
GeneralizedDivision-FreeArchitectureand
CompactMemoryStructureforResampling
inParticleFilters
Syed Asad Alam and Oscar Gustafsson
{syed.asad.alam, oscar.gustafsson}@liu.se
Department of Electrical Engineering, Linköping University, Sweden
Aims and Objectives
• The most basic form of resampling algorithm in a
particle filter has a high hardware cost
• Requires normalized and ordered data set for
implementation → Multinomial resampling
• Alternative algorithms used to avoid multinomial →
Stratified, systematic resampling
• Aim
– Architecture for multinomial resampling free
from the need of ordering and normalization
– Memory optimization for the weights and
random function
Particle Filters
• Model based filtering
– State transition and observation models are
non-linear and noise non-gaussian
• Purpose → Estimation of a state from a set of
observations corrupted by noise
• Applications → Target tracking, computer vision,
channel estimation . . .
• Steps → Time-update, weight computation and
resampling
estimate
ResamplingTime update
computation
Weight
Input observations
xn
yn
wn
˜xn, ˜wn
Output
Figure 1: Overall structure of particle filter.
Comparator
Memory
Weight
Normalized
Replicated−
Factors
Replicated/
Memory
Discarded/
Normalization
and
Cumulative
Sum
Unit
Generation
Number
Random
Particle
Memoryand Time
Update
Sample
(i) Time Update
Unit
Unit
Control
Unit
(iii) Resampling
(ii) Weight Memory and
Random Number Generation
Memory address
wK
W ′
K
U′
K
Figure 2: Basic architecture of particle filter.
Resampling
Resampling prevents degeneration of particles and
improves estimation by eliminating and replicating
particles having low and high particle weights respectively.
The total number of particles, M, remains the same.
0 1
Systematic Stratified Multinomial
Figure 3: Uniformly distributed samples for M = 10.
Standard algorithms for resampling available in the
literature
• Multinomial → Uniform random numbers – U[0, 1)
• Stratified → Partition U[0, 1) into M regions, one
sample from each interval with a random offset
• Systematic → Similar to stratified resampling, offset
is fixed
Proposed Idea
Background
• Complexity of multinomial resampling can be reduced
from M2
to approximately 2M by generating ordered
random numbers → High hardware cost
• Accumulation and normalization will provide an
intrinsic ordering, reducing the hardware cost
• The comparison, needed for replication and
discarding particles, can be formulated as:
WK
WM
⋚
UK
UM
(1)
where WK = K
j=0 wj and UK = K
j=0 uj, are the running
sums, WM and UM are the cumulative sum.
Division Free Architecture
Reformulation of (1) gives:
WK × UM
W ′
K
⋚ UK × WM
U′
K
(2)
• No normalization required
• Equally efficient for non-powers-of-two M
• Independent of generating ordered random numbers
• Can be used for stratified and systematic resampling
with appropriate random number generators
REG
REG
Weight
Memory
Accumulator
Memory
Random
value
Accumulator
From Control
Unit
From Control
Unit
uK
wK
WM
Bw + log2 M
Br + log2 M
UM
U′
K
W ′
K
Figure 4: Memory and data generation for resampling with
stored cumulative sum.
Memory Optimization
• Storing cumulative sum of data increases the word
length requirement for the two memories
• Can be reduced from Bw + log2 M and Br + log2 M to
Bw and Br respectively by on-the-fly accumulation
• Accumulators are placed after each memory
• Extra hardware cost of multiplexer and associated
control logic
Weight
Memory
REG
Accumulator
Memory
Random
value
REG
Accumulator
Unit
Unit
From Control
From Control
uK
wK
WMUM
Bw
Br
U′
K
W ′
K
Figure 5: Memory and data generation for resampling with
on-the-fly cumulative sum.
Results
Complexity – Standard Cells
Table 1: Complexity, in terms of Area (mm2
),
of architectures based on stored and online sum.
Particle
count
Bit
growth
Stored
sum
Online
sum
Savings
(%)
10 4 0.022 0.014 36.36
20 5 0.035 0.019 45.71
100 7 0.112 0.088 21.43
128 7 0.114 0.088 22.81
200 8 0.220 0.153 30.45
256 8 0.220 0.154 30.00
512 9 0.441 0.291 34.01
1000 10 0.833 0.550 33.97
1024 10 0.857 0.555 35.24
2000 11 1.703 1.103 35.23
2048 11 1.731 1.105 36.16
Complexity – FPGA
Number of particles, M
512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k
NumberofLUTs
0
50
100
150
200
250
300
350
400
450
500
Stored
Online
Figure 6: Look-up table used by architectures based on
stored and online sum.
Number of particles, M
512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k
Numberof36kBRAM
0
10
20
30
40
50
60
70
Stored
Online
Figure 7: Memory used by architectures based on stored
and online sum.
Summary
• Proposed a generalized division free architecture for
the resampling stage
• Independent of the non-powers-of-two number of
particles, normalization and generation of ordered
random numbers
• Achieved by use of double multipliers and
accumulators
• Memory optimization results in reduction of area and
memory usage up to 45% and 50% respectively
• Achieved by on-the-fly accumulation of particle
weights and random numbers
• Each memory holds the original particle weight and
random number
• Reduces the word length required for each memory

More Related Content

PPTX
CSTalks - Object detection and tracking - 25th May
PDF
Graph Regularised Hashing
PPTX
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
PDF
IEEEFYP2014poster Track 5 Vincent Yeong Chun Kiat
PDF
cloud_futures_2.0_Papazachos
PPTX
Introduction to Deep Learning
PDF
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
PPTX
ImageNet classification with deep convolutional neural networks(2012)
CSTalks - Object detection and tracking - 25th May
Graph Regularised Hashing
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
IEEEFYP2014poster Track 5 Vincent Yeong Chun Kiat
cloud_futures_2.0_Papazachos
Introduction to Deep Learning
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
ImageNet classification with deep convolutional neural networks(2012)

What's hot (20)

PPTX
Objects as points
PPT
Mmclass5
PDF
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
PPTX
Visualizing and understanding convolutional networks(2014)
PPTX
Semantic Segmentation on Satellite Imagery
PPTX
PPTX
Accelerated Logistic Regression on GPU(s)
PPTX
Summary of survey papers on deep learning method to 3D data
PDF
Objects as points (CenterNet) review [CDM]
PDF
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
PDF
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
PDF
Deformable DETR Review [CDM]
PDF
cnsm2011_slide
PPTX
Improving access to satellite imagery with Cloud computing
PDF
PR-132: SSD: Single Shot MultiBox Detector
PDF
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PDF
Squeeeze models
PPTX
PDF
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
PPTX
Non-parametric probability distribution for fitting data
Objects as points
Mmclass5
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Visualizing and understanding convolutional networks(2014)
Semantic Segmentation on Satellite Imagery
Accelerated Logistic Regression on GPU(s)
Summary of survey papers on deep learning method to 3D data
Objects as points (CenterNet) review [CDM]
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Deformable DETR Review [CDM]
cnsm2011_slide
Improving access to satellite imagery with Cloud computing
PR-132: SSD: Single Shot MultiBox Detector
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Squeeeze models
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Non-parametric probability distribution for fitting data
Ad

Similar to ES_SAA_OG_PF_ECCTD_Pos (20)

PPTX
Data acquisition and storage in Wireless Sensor Network
DOCX
Novel design algorithm for low complexity programmable fir filters based on e...
PDF
And Then There Were None
PDF
06340356
PPT
3rd 3DDRESD: DReAMS
PDF
EAMTA_VLSI Architecture Design for Particle Filtering in
PDF
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
PPT
UIC Panella Thesis
PDF
FPGA based Efficient Interpolator design using DALUT Algorithm
PDF
FPGA based Efficient Interpolator design using DALUT Algorithm
PDF
Design and fpga implementation of a reconfigurable digital down converter for...
PPT
"An adaptive modular approach to the mining of sensor network ...
PPT
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
PDF
Ijecet 06 10_004
PPT
On using BS to improve the
PDF
B43030508
PDF
PPT
Senior Year Seminar
PDF
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
Data acquisition and storage in Wireless Sensor Network
Novel design algorithm for low complexity programmable fir filters based on e...
And Then There Were None
06340356
3rd 3DDRESD: DReAMS
EAMTA_VLSI Architecture Design for Particle Filtering in
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
UIC Panella Thesis
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
Design and fpga implementation of a reconfigurable digital down converter for...
"An adaptive modular approach to the mining of sensor network ...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Ijecet 06 10_004
On using BS to improve the
B43030508
Senior Year Seminar
Welcome to International Journal of Engineering Research and Development (IJERD)
Ad

ES_SAA_OG_PF_ECCTD_Pos

  • 1. GeneralizedDivision-FreeArchitectureand CompactMemoryStructureforResampling inParticleFilters Syed Asad Alam and Oscar Gustafsson {syed.asad.alam, oscar.gustafsson}@liu.se Department of Electrical Engineering, Linköping University, Sweden Aims and Objectives • The most basic form of resampling algorithm in a particle filter has a high hardware cost • Requires normalized and ordered data set for implementation → Multinomial resampling • Alternative algorithms used to avoid multinomial → Stratified, systematic resampling • Aim – Architecture for multinomial resampling free from the need of ordering and normalization – Memory optimization for the weights and random function Particle Filters • Model based filtering – State transition and observation models are non-linear and noise non-gaussian • Purpose → Estimation of a state from a set of observations corrupted by noise • Applications → Target tracking, computer vision, channel estimation . . . • Steps → Time-update, weight computation and resampling estimate ResamplingTime update computation Weight Input observations xn yn wn ˜xn, ˜wn Output Figure 1: Overall structure of particle filter. Comparator Memory Weight Normalized Replicated− Factors Replicated/ Memory Discarded/ Normalization and Cumulative Sum Unit Generation Number Random Particle Memoryand Time Update Sample (i) Time Update Unit Unit Control Unit (iii) Resampling (ii) Weight Memory and Random Number Generation Memory address wK W ′ K U′ K Figure 2: Basic architecture of particle filter. Resampling Resampling prevents degeneration of particles and improves estimation by eliminating and replicating particles having low and high particle weights respectively. The total number of particles, M, remains the same. 0 1 Systematic Stratified Multinomial Figure 3: Uniformly distributed samples for M = 10. Standard algorithms for resampling available in the literature • Multinomial → Uniform random numbers – U[0, 1) • Stratified → Partition U[0, 1) into M regions, one sample from each interval with a random offset • Systematic → Similar to stratified resampling, offset is fixed Proposed Idea Background • Complexity of multinomial resampling can be reduced from M2 to approximately 2M by generating ordered random numbers → High hardware cost • Accumulation and normalization will provide an intrinsic ordering, reducing the hardware cost • The comparison, needed for replication and discarding particles, can be formulated as: WK WM ⋚ UK UM (1) where WK = K j=0 wj and UK = K j=0 uj, are the running sums, WM and UM are the cumulative sum. Division Free Architecture Reformulation of (1) gives: WK × UM W ′ K ⋚ UK × WM U′ K (2) • No normalization required • Equally efficient for non-powers-of-two M • Independent of generating ordered random numbers • Can be used for stratified and systematic resampling with appropriate random number generators REG REG Weight Memory Accumulator Memory Random value Accumulator From Control Unit From Control Unit uK wK WM Bw + log2 M Br + log2 M UM U′ K W ′ K Figure 4: Memory and data generation for resampling with stored cumulative sum. Memory Optimization • Storing cumulative sum of data increases the word length requirement for the two memories • Can be reduced from Bw + log2 M and Br + log2 M to Bw and Br respectively by on-the-fly accumulation • Accumulators are placed after each memory • Extra hardware cost of multiplexer and associated control logic Weight Memory REG Accumulator Memory Random value REG Accumulator Unit Unit From Control From Control uK wK WMUM Bw Br U′ K W ′ K Figure 5: Memory and data generation for resampling with on-the-fly cumulative sum. Results Complexity – Standard Cells Table 1: Complexity, in terms of Area (mm2 ), of architectures based on stored and online sum. Particle count Bit growth Stored sum Online sum Savings (%) 10 4 0.022 0.014 36.36 20 5 0.035 0.019 45.71 100 7 0.112 0.088 21.43 128 7 0.114 0.088 22.81 200 8 0.220 0.153 30.45 256 8 0.220 0.154 30.00 512 9 0.441 0.291 34.01 1000 10 0.833 0.550 33.97 1024 10 0.857 0.555 35.24 2000 11 1.703 1.103 35.23 2048 11 1.731 1.105 36.16 Complexity – FPGA Number of particles, M 512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k NumberofLUTs 0 50 100 150 200 250 300 350 400 450 500 Stored Online Figure 6: Look-up table used by architectures based on stored and online sum. Number of particles, M 512 1k 1024 2k 2048 3k 4k 8k 10k 15k 16k 20k Numberof36kBRAM 0 10 20 30 40 50 60 70 Stored Online Figure 7: Memory used by architectures based on stored and online sum. Summary • Proposed a generalized division free architecture for the resampling stage • Independent of the non-powers-of-two number of particles, normalization and generation of ordered random numbers • Achieved by use of double multipliers and accumulators • Memory optimization results in reduction of area and memory usage up to 45% and 50% respectively • Achieved by on-the-fly accumulation of particle weights and random numbers • Each memory holds the original particle weight and random number • Reduces the word length required for each memory