Winning the 1X World Model Challenge

26 Nov, 2025·
Aidan Scannell
Aidan Scannell
· 0 min read
Architecture: Spatio-temporal Transformer World model
Abstract
In this talk, I’ll share how our team won both the Outstanding Champion and Innovation awards in the ICCV 2025 phase of the 1X World Model Challenge. The competition introduces an open-source benchmark for real-world humanoid interaction, featuring two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. I’ll discuss how we adapted the video generation foundation model Wan-2.2 TI2V-5B for video-state-conditioned future frame prediction in the sampling track, and how we trained a Spatio-Temporal Transformer from scratch for the compression track—achieving 1st place in both.
Event
Huawei AI Application Workshop
Location

Dublin, Ireland