This document presents a VLSI architecture design for particle filtering to enable real-time state estimation. The design aims to take advantage of data-level parallelism in the particle filtering algorithm by distributing particles across many processing elements that work in parallel. A key part of the design is allocating hardware resources for computationally-intensive but parallelizable steps globally to be shared across processing elements, reducing the area needed for each element. The document outlines the particle filtering algorithm and an example radio frequency localization application. It then describes the proposed architecture featuring processing clusters with resampling modules and arrays of processing elements, allowing more particles to be processed in parallel through hardware resource sharing.