Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow

Abstract

Collaborative perception can substantially boost each agent's perception ability by facilitating communication among multiple agents. However, temporal asynchrony among agents is inevitable in the real world due to communication delays, interruptions, and clock misalignments. This issue causes information mismatch during multi-agent fusion, seriously shaking the foundation of collaboration. To address this issue, we propose CoBEVFlow, an asynchrony-robust collaborative perception system based on bird's eye view (BEV) flow. The key intuition of CoBEVFlow is to compensate motions to align asynchronous collaboration messages sent by multiple agents. To model the motion in a scene, we propose BEV flow, which is a collection of the motion vector corresponding to each spatial location. Based on BEV flow, asynchronous perceptual features can be reassigned to appropriate positions, mitigating the impact of asynchrony. CoBEVFlow has two advantages: (i) CoBEVFlow can handle asynchronous collaboration messages sent at irregular, continuous time stamps without discretization; and (ii) with BEV flow, CoBEVFlow only transports the original perceptual features, instead of generating new perceptual features, avoiding additional noises. To validate CoBEVFlow's efficacy, we create IRregular V2V(IRV2V), the first synthetic collaborative perception dataset with various temporal asynchronies that simulate different real-world scenarios. Extensive experiments conducted on both IRV2V and the real-world dataset DAIR-V2X show that CoBEVFlow consistently outperforms other baselines and is robust in extremely asynchronous settings. The code is available at https://github.com/MediaBrain-SJTU/CoBEVFlow.

Architecture

The problem of asynchrony results in the misplacements of moving objects in the collaboration messages. That is, the collaboration messages from multiple agents would record various positions for the same moving object. The proposed CoBEVFlow addresses this issue with two key ideas: i) we use a BEV flow map to capture the motion in a scene, enabling motion-guided reassigning asynchronous perceptual features to appropriate positions; and ii) we generate the region of interest(ROI) to make sure that the reassignment only happens to the areas that potentially contain objects. By following these two ideas, we eliminate direct modification of the features and keep the background feature unaltered, effectively avoiding unnecessary noise in the learned features. Figure 1 is the overview of the CoBEVFlow. More tech details can be found in our paper.

Figure 1. Message packing process prepares ROI and sparse features as the message for efficient communication and BEV flow map generation. Message fusion process generates and applies BEV flow map for compensation, and fuses the features at the current timestamp from all agents.

Experiment Results

Figure 2. Comparison of the performance of CoBEVFlow and other baseline methods under the expectation of time interval from 0 to 500ms. CoBEVFlow outperforms all the baseline methods and shows great robustness under any level of asynchrony on both two datasets.

Figure 3. Trade-off between detection performance (AP@0.50/0.70) and communication bandwidth under asynchrony (expected 300ms latency) on IRV2V dataset. CoBEVFlow outperforms even with a much smaller communication volume.

Figure 4. Visualization of detection results for V2X-ViT, SyncNet, and CoBEVFlow with the expectation of time intervals are 100, 300, and 500ms on IRV2V dataset. CoBEVFlow qualitatively outperforms the others under different asynchrony. Red and green boxes denote detection results and ground-truth respectively.

Figure 5. Visualization of detection results for Where2comm, V2X-ViT, SyncNet, and CoBEVFlow with the expectation of time intervals are 100, 300, and 500ms on DAIR-V2X dataset. CoBEVFlow qualitatively outperforms the others under different asynchrony. Red and green boxes denote detection results and ground-truth respectively.

BibTeX

@inproceedings{wei2023asynchronyrobust, title={Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow}, author={Sizhe Wei and Yuxi Wei and Yue Hu and Yifan Lu and Yiqi Zhong and Siheng Chen and Ya Zhang}, booktitle = {Advances in Neural Information Processing Systems}, year={2023} }