Beyond Mamba SSMs: Parallel Kalman Filters as Scalable Primitives for Language Modelling
We show that Kalman filters can be reparameterized for efficient parallel training and introduces GAUSS, a more expressive yet equally scalable state-space layer that outperforms …
Vaisakh Shaj
•
•
1 min read