Preprint

Beyond Mamba SSMs: Parallel Kalman Filters as Scalable Primitives for Language Modelling

We show that Kalman filters can be reparameterized for efficient parallel training and introduces GAUSS, a more expressive yet equally scalable state-space layer that outperforms …

Vaisakh Shaj

• 12 Nov, 2025 • 1 min read

Continual Learning

Forgetting is Everywhere

We present a unified, algorithm-agnostic theory that defines forgetting as self-inconsistency in a learner’s predictive distribution—quantifying loss of predictive information—and …

Ben Sanati

• 10 Nov, 2025 • 1 min read