SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents

 

The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024), February 20-27, 2024, Vancouver, Canada.

Xiang, Haoteng Yin, He Wang, Xiaogang Jin

 

The framework of SocialCVAE. (a) The coarse motion prediction model learns the temporal motion tendencies and predicts a preferred new velocity for each pedestrian. (b) The energy-based interaction model constructs a local interaction energy map to anticipate the cost of pedestrian interactions with heterogeneous neighbors, including pedestrians, static environmental obstacles found in the scene segmentation (e.g., buildings), and dynamic environmental obstacles (e.g., vehicles). (c) The multimodal prediction model predicts future trajectories using a CVAE model conditioning on the past trajectories and the interaction energy map.

Abstract

Pedestrian trajectory prediction is the key technology in many applications for providing insights into human behavior and anticipating human future motions. Most existing empirical models are explicitly formulated by observed human behaviors using explicable mathematical terms with deterministic nature, while recent work has focused on developing hybrid models combined with learning-based techniques for powerful expressiveness while maintaining explainability. However, the deterministic nature of the learned steering behaviors from the empirical models limits the models’ practical performance. To address this issue, this work proposes the social conditional variational autoencoder (SocialCVAE) for predicting pedestrian trajectories, which employs a CVAE to explore behavioral uncertainty in human motion decisions. SocialCVAE learns socially reasonable motion randomness by utilizing a socially explainable interaction energy map as the CVAE’s condition, which illustrates the future occupancy of each pedestrian’s local neighborhood area. The energy map is generated using an energy-based interaction model, which anticipates the energy cost (i.e., repulsion intensity) of pedestrians’ interactions with neighbors. Experimental results on two public benchmarks including 25 scenes demonstrate that SocialCVAE significantly improves prediction accuracy compared with the state-of-the-art methods, with up to 16.85% improvement in Average Displacement Error (ADE) and 69.18% improvement in Final Displacement Error (FDE). The code will be released upon acceptance.

Download

PDF, 5.88MB Supplemental materials 1.77 MB Source Codes and Data on GitHub