Sessa: Selective State Space Attention

Researchers introduce Sessa, a state-space architecture that selectively attends to context by combining recurrent processing with input-dependent gating. The work addresses fundamental tradeoffs in Transformers (diluted token influence at scale) and Mamba-style models (exponential decay over long sequences), positioning selective state-space models as a middle path for sequence modeling.
Modelwire context
ExplainerThe core claim is architectural: Sessa uses input-dependent gating to control how much past context survives into the current state, which is a different lever than simply pruning attention heads or compressing token sequences. The framing as a 'middle path' is doing real work here, because it implies neither pure recurrence nor full attention is the right abstraction for long-context tasks.
This fits into a cluster of recent work on the site all attacking the same cost problem from different angles. AdaSplash-2 (covered April 16) approaches it by making sparse attention faster through histogram-based normalization, staying inside the Transformer paradigm. K-Token Merging, also from April 16, compresses sequences before they ever reach the attention mechanism. Sessa takes a third route: replace the attention mechanism itself with a recurrent structure that still responds to input content. These are not competing papers so much as a map of the design space, and readers tracking long-context efficiency should treat them as a set.
The meaningful test is whether Sessa's selective gating holds up on benchmarks that stress retrieval over very long documents, such as SCROLLS or a future RULER variant, rather than the shorter sequences where recurrent models traditionally look competitive. If published evaluations stay below the 32k-token range, the 'long-sequence' claim deserves scrutiny.
Coverage we drew on
- AdaSplash-2: Faster Differentiable Sparse Attention · arXiv cs.CL
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsSessa · Transformers · Mamba · State-space models
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.