ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network

Z Yu, B Qiu, AWH Khong - Proceedings of the Computer …, 2025 - openaccess.thecvf.com
Z Yu, B Qiu, AWH Khong
Proceedings of the Computer Vision and Pattern Recognition …, 2025openaccess.thecvf.com
The sparsity of point clouds and inadequacy of semantic information pose challenges to
current LiDAR-only 3D object detection methods. Recent methods alleviate these
challenges by converting RGB images into virtual points via depth completion to be fused
with LiDAR points. Although these methods have shown outstanding results, they often
introduce significant computation overhead due to the high density of virtual points and
noise due to inaccurate depth completion. Besides, they do not thoroughly leverage …
Abstract
The sparsity of point clouds and inadequacy of semantic information pose challenges to current LiDAR-only 3D object detection methods. Recent methods alleviate these challenges by converting RGB images into virtual points via depth completion to be fused with LiDAR points. Although these methods have shown outstanding results, they often introduce significant computation overhead due to the high density of virtual points and noise due to inaccurate depth completion. Besides, they do not thoroughly leverage semantic information from images. In this work, we propose the virtual key instance enhanced network (ViKIENet), a highly efficient and effective multi-modal feature fusion framework that fuses the features of virtual key instances (VKIs) and LiDAR points through multiple stages. Our contributions include three main components: semantic key instance selection (SKIS), virtual-instance-focused fusion (VIFF), and virtual-instance-to-real attention (VIRA). We also propose the extended version ViKIENet-R with VIFF-R which includes rotationally equivariant features. Experiment results show that ViKIENet and ViKIENet-R achieve significant improvements in detection performance on the KITTI, JRDB, and nuScenes datasets compared to existing works. On the KITTI dataset, ViKIENet and ViKIENet-R operate at 22.7 and 15.0 FPS, respectively. As of CVPR submission (Nov. 15th, 2024), ViKIENet ranks first on the car detection and orientation estimation leaderboard, while ViKIENet-R ranks second (compared with officially published papers) on the 3D car detection leaderboard.
openaccess.thecvf.com
Showing the best result for this search. See all results