Trajectory Optimisation in Learned Multimodal Dynamical Systems

This work presents a two-stage method to perform trajectory optimisation in multimodal dynamical systems with unknown nonlinear stochastic transition dynamics. The method finds trajectories that remain in a preferred dynamics mode where possible and in regions of the transition dynamics model that have been observed and can be predicted confidently.

The first stage leverages a mixture of Gaussian process experts method (mogpe) written in GPflow/TensorFlow to learn a predictive dynamics model from historical data. Importantly, this model learns a gating function that indicates the probability of being in a particular dynamics mode at a given state location. In the second stage, this gating function acts as a coordinate map for a latent Riemannian manifold on which geodesics are solutions to our trajectory optimisation problem. Geodesics on this manifold satisfy a continuous-time second-order ODE. A set of collocation constraints are derived that ensure trajectories are solutions to this ODE, implicitly solving the trajectory optimisation problem. The trajectory optimisation is implemented in JAX.

Aidan Scannell
Aidan Scannell
Postdoctoral Researcher

My research interests include model-based reinforcement learning, probabilistic machine learning (gaussian processes, Bayesian neural networks, approximate Bayesian inference, etc), learning-based control and optimal control.