ActFormer: Scalable Collaborative Perception via Active Queries

May 1, 2024·

Suozhi Huang*

Juexiao Zhang

Yiming Li

Chen Feng

· 0 min read

Abstract

We present ActFormer, a Transformer that learns bird’s eye view (BEV) representations by using predefined BEV queries to interact with multi-robot multi-camera inputs. Each BEV query can actively select relevant cameras for information aggregation based on pose information, instead of interacting with all cameras indiscriminately. Experiments on the V2X-Sim dataset demonstrate that ActFormer improves the detection performance from 29.89% to 45.15% in terms of with about 50% fewer queries, showcasing the effectiveness of ActFormer in multi-agent collaborative 3D object detection.

Type

Conference paper

Publication

In ICRA 2024

Last updated on May 1, 2024

Computer Vision Robotics

← A Knowledge–Data Dual‐Driven Framework for Predicting the Molecular Properties of Rechargeable Battery Electrolytes Aug 15, 2024