Bevheight++: Toward robust visual centric 3d object detection

L Yang, T Tang, J Li, K Yuan, K Wu… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025ieeexplore.ieee.org
While most recent autonomous driving system focuses on developing perception methods
on ego-vehicle sensors, people tend to overlook an alternative approach to leverage
intelligent roadside cameras to extend the perception ability beyond the visual range. We
discover that the state-of-the-art vision-centric detection methods perform poorly on roadside
cameras. This is because these methods mainly focus on recovering the depth regarding the
camera center, where the depth difference between the car and the ground quickly shrinks …
While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric detection methods perform poorly on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight++, to address this issue. In essence, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. By incorporating both height and depth encoding techniques, we achieve a more accurate and robust projection from 2D to BEV spaces. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. In terms of the ego-vehicle scenario, BEVHeight++ surpasses depth-only methods with increases of +2.8% NDS and +1.7% mAP on the nuScenes test set, and even higher gains of +9.3% NDS and +8.8% mAP on the nuScenes-C benchmark with object-level distortion. Consistent and substantial performance improvements are achieved across the KITTI, KITTI-360, and Waymo datasets as well.
ieeexplore.ieee.org
Bestes Ergebnis für diese Suche Alle Ergebnisse