With the success of Transformers in natural language processing, object detection with Transformers (DETR) has attracted widespread attentions. In previous Transformer-based 2D detectors, the object queries are a set of learning embeddings. However, it is very hard to apply these detectors to the 3D domain due to the lack of explicit physical meanings and position priors of learned object queries. In this paper, we introduce the concept of anchors and propose a novel query design based on anchor points. In our query design, we use the foreground points as the anchor points and encode these anchor points as the object queries. Consequently, each object query has an explicit physical meaning and only focus on its nearby object. Additionally, we also propose an instance-aware sampling strategy to select a small set of representation foreground points from the scene point cloud. Extensive experiments on several large-scale 3D object detection datasets demonstrate that the proposed AnchorPoint detector achieves promising accuracy and efficiency. In particularly, AnchorPoint achieves an average precision (AP) of 83.21 at 61 frame-per-second (FPS) on the moderate level of the KITTI-DET Car subset. Moreover, we model each object as its corresponding anchor point, and extend the AnchorPoint model to 3D multi-object tracking by adding an extra tracking head. We show that our method achieves comparable performance to existing state-of-the-art methods on the KITTI-MOT dataset.
AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
IEEE Transactions on Intelligent Transportation Systems ; 24 , 10 ; 10988-11000
2023-10-01
2396469 byte
Aufsatz (Zeitschrift)
Elektronische Ressource
Englisch