We address interpretable representation learning for motion forecasting in self-driving cars. Rather than treating transformers as black boxes, we develop methods to interpret and modify learned representations. We introduce self-supervised pre-training with interpretable objectives. Moreover, we probe latent spaces of forecasting models and reveal interpretable features, allowing us to make targeted interventions. Finally, we uncover retrocausal mechanisms, which enable goal-based instructions.
We publiceren alleen reviews die voldoen aan de voorwaarden voor reviews. Bekijk onze voorwaarden voor reviews.