Preprint

Beyond Mamba SSMs: Parallel Kalman Filters as Scalable Primitives for Language Modelling featured image

Beyond Mamba SSMs: Parallel Kalman Filters as Scalable Primitives for Language Modelling

We show that Kalman filters can be reparameterized for efficient parallel training and introduces GAUSS, a more expressive yet equally scalable state-space layer that outperforms …

Vaisakh Shaj
Read more
Forgetting is Everywhere featured image

Forgetting is Everywhere

We present a unified, algorithm-agnostic theory that defines forgetting as self-inconsistency in a learner’s predictive distribution—quantifying loss of predictive information—and …

Ben Sanati
Read more