Winning the 1X World Model Challenge

26 Nov, 2025·

Aidan Scannell

· 0 min read

Architecture: Spatio-temporal Transformer World model

Abstract

World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner, but also might form the basis for many other forms of game interactions. We present two pieces of work in this space. First, Amos Storkey will present DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling. We demonstrate how improved visual details can lead to improved agent performance on the Atari 100k benchmark and demonstrate that DIAMOND’s diffusion world model can stand alone as an interactive neural game engine on static Counter-Strike: Global Offensive gameplay. Second, Aidan Scannell will introduce the more advanced designs that enabled us to win the 1X World Model Challenge. In particular we explain the ways we adapted the video generation foundation model Wan-2.2 TI2V-5B for video-state-conditioned future frame prediction in the 1X sampling track, and how we trained a Spatio-Temporal Transformer from scratch for the 1X compression track—achieving 1st place in both.

Event

Huawei AI Application Workshop

Location

Dublin, Ireland

Last updated on 27 Nov, 2025

Authors

Aidan Scannell (he/him)

Research Associate

Generative World Modelling for Humanoids: 1X World Model Challenge 19 Oct, 2025 →