Generative World Modelling for Humanoids: 1X World Model Challenge
19 Oct, 2025·
·
0 min read

Aidan Scannell

Abstract
In this talk, I’ll present our methods which one both tracks of the 1X World Model Challenge.
The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes.
For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using adaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch.
Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
Event
Location
Hawaii Conference Center
1801 Kalakaua Avenue, Honolulu, Hawaii HI 96815