Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow

Sizhe Wei 1†, Yuxi Wei 1†, Yue Hu 1, Yifan Lu 1, Yiqi Zhong 2, Siheng Chen 1,3*, Ya Zhang 1,3*
CMIC Logo
1 CMIC, Shanghai Jiao Tong University
USC Logo
2 University of Southern California
Shanghai AI Lab Logo
3 Shanghai AI Lab

Equal Contribution   * Corresponding Author

NeurIPS 2023

Asynchronous Co-Perception 😩

With our CoBEVFlow 🤩

Red and green boxes denote detection results and ground-truth respectively.

Abstract

Collaborative perception can substantially boost each agent's perception ability by facilitating communication among multiple agents. However, temporal asynchrony among agents is inevitable in the real world due to communication delays, interruptions, and clock misalignments. This issue causes information mismatch during multi-agent fusion, seriously shaking the foundation of collaboration. To address this issue, we propose CoBEVFlow, an asynchrony-robust collaborative perception system based on bird's eye view (BEV) flow. The key intuition of CoBEVFlow is to compensate motions to align asynchronous collaboration messages sent by multiple agents. To model the motion in a scene, we propose BEV flow, which is a collection of the motion vector corresponding to each spatial location. Based on BEV flow, asynchronous perceptual features can be reassigned to appropriate positions, mitigating the impact of asynchrony. CoBEVFlow has two advantages: (i) CoBEVFlow can handle asynchronous collaboration messages sent at irregular, continuous time stamps without discretization; and (ii) with BEV flow, CoBEVFlow only transports the original perceptual features, instead of generating new perceptual features, avoiding additional noises. To validate CoBEVFlow's efficacy, we create IRregular V2V(IRV2V), the first synthetic collaborative perception dataset with various temporal asynchronies that simulate different real-world scenarios. Extensive experiments conducted on both IRV2V and the real-world dataset DAIR-V2X show that CoBEVFlow consistently outperforms other baselines and is robust in extremely asynchronous settings. The code is available at https://github.com/MediaBrain-SJTU/CoBEVFlow.

Architecture

The problem of asynchrony results in the misplacements of moving objects in the collaboration messages. That is, the collaboration messages from multiple agents would record various positions for the same moving object. The proposed CoBEVFlow addresses this issue with two key ideas: i) we use a BEV flow map to capture the motion in a scene, enabling motion-guided reassigning asynchronous perceptual features to appropriate positions; and ii) we generate the region of interest(ROI) to make sure that the reassignment only happens to the areas that potentially contain objects. By following these two ideas, we eliminate direct modification of the features and keep the background feature unaltered, effectively avoiding unnecessary noise in the learned features. Figure 1 is the overview of the CoBEVFlow. More tech details can be found in our paper.


CoBEVFlow Framework

Figure 1. Message packing process prepares ROI and sparse features as the message for efficient communication and BEV flow map generation. Message fusion process generates and applies BEV flow map for compensation, and fuses the features at the current timestamp from all agents.

Experiment Results

BibTeX


    @inproceedings{wei2023asynchronyrobust,
      title={Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow}, 
      author={Sizhe Wei and Yuxi Wei and Yue Hu and Yifan Lu and Yiqi Zhong and Siheng Chen and Ya Zhang},
      booktitle = {Advances in Neural Information Processing Systems},
      year={2023}
    }