A fully convolutional, recurrent neural architecture designed to predict future occupancy and motion flow fields

A fully convolutional, recurrent neural architecture designed to predict future occupancy and motion flow fields

Optimized for Neural Processing Units (NPUs) with convolutional acceleration, CCLSTM delivers state-of-the-art (SOTA) performance in the 2024 Waymo Occupancy Flow Forecasting Challenge, while maintaining real-time efficiency and full end-to-end trainability from camera input to future motion prediction.

CCLSTM is the result of a patented innovation by Péter Lengyel, Research Engineer at aiMotive, offering a novel approach that combines convolutional and recurrent modeling to enhance motion forecasting accuracy and efficiency.

Why CCLSTM?

Predicting the future motion of dynamic agents is a cornerstone capability in autonomous driving. CCLSTM approaches this task using Occupancy Flow Fields—a rich, scalable representation that captures motion, spatial extent, and multi-modal futures in a unified framework. Unlike traditional detection-and-tracking pipelines or transformer-based approaches, CCLSTM is:

  • FULLY CONVOLUTIONAL – Built entirely from convolution operations, making it ideal for deployment on modern NPUs (e.g., aiWare).

  • RECURRENT AND AUTOREGRESSIVE – Recursively encodes history with theoretically unlimited lookback and forecasts arbitrary horizons autoregressively.

  • END-TO-END TRAINABLE – Integrates seamlessly with bird’s-eye view (BEV) encoders, requiring no intermediate heuristics or separate modules.

  • EXPLAINABLE AND CONTROLLABLE – Preserves semantic richness and enables dynamic behavior control, such as planning with different driving styles.

An overview of CCLSTM. Rasterized input grids are concatenated along the channel dimension and encoded via a CNN. The encoded features are aggregated via the accumulator CLSTM. The hidden and cell states of the accumulator CLSTM are used to initialize the forecasting CLSTM. The forecasting CLSTM is then autoregressively called to predict encoded futures states. The future hidden states are then passed to a CNN Decoder, to produce occupancy and flow grids.

To read the whole article, learn more about the problem with the existing methods and the results, click here.

To view or add a comment, sign in

Others also viewed

Explore topics