Discrete Codebook World Models for Continuous Control

1Aalto University, 2Finnish Center for Artificial Intelligence (FCAI), 3University of Edinburgh, 4Vrije Universiteit Amsterdam

Abstract

In reinforcement learning (RL), world models serve as internal simulators, enabling agents to predict environment dynamics and future outcomes in order to make informed decisions. While previous approaches leveraging discrete latent spaces, such as DreamerV3, have demonstrated strong performance in discrete action settings and visual control tasks, their comparative performance in state-based continuous control remains underexplored. In contrast, methods with continuous latent spaces, such as TD-MPC2, have shown notable success in state-based continuous control benchmarks. In this paper, we demonstrate that modelling discrete latent states has benefits over continuous latent states and that discrete codebook encodings are more effective representations for continuous control, compared to alternative encodings, such as one-hot and label-based encodings. Based on these insights, we introduce DCWM: Discrete Codebook World Model, a self-supervised world model with a discrete and stochastic latent space, where latent states are codes from a codebook. We combine DCWM with decision-time planning to get our model-based RL algorithm, named DC-MPC: Discrete Codebook Model Predictive Control, which performs competitively against recent state-of-the-art algorithms, including TD-MPC2 and DreamerV3, on continuous control benchmarks. See our project website www.aidanscannell.com/dcmpc.

DCWM DCWM is a world model with a discrete latent space where each latent state is a discrete code () from a codebook. Observations are first mapped through the encoder and then quantized into one of the discrete codes. We model probabilistic latent transition dynamics as a classifier such that it captures a potentially multimodal distribution over the next state given the previous state and action. During training, multi-step predictions are made using straight-through (ST) Gumbel-softmax sampling such that gradients backpropagate through time to the encoder. Given this discrete formulation, we train the latent space using a classification objective, i.e. cross-entropy loss. Making the latent representation stochastic and discrete with a codebook contributes to the very high sample efficiency of DC-MPC.

BibTeX

Please consider citing our arXiv paper:
@inproceedings{scannell2025discrete,
  title     = {Discrete Codebook World Models for Continuous Control},
  author    = {Aidan Scannell and Mohammadreza Nakhaeinezhadfard and Kalle Kujanp{\"a}{\"a} and Yi Zhao and Kevin Sebastian Luck and Arno Solin and Joni Pajarinen},
  booktitle = {The Thirteenth International Conference on Learning Representations},
  year      = {2025},
  url       = {https://openreview.net/forum?id=lfRYzd8ady}
}