Lngram: N-gram Conditional Memory in Latent Space

Researchers introduce Lngram, a memory architecture that decouples retrieval from transformer computation by learning discrete symbols in latent space rather than relying on tokenizer IDs. The approach addresses a fundamental tension in sequence modeling: balancing compositional reasoning with efficient knowledge lookup. By performing N-gram operations over learned symbols instead of text tokens, Lngram gains modality independence and shows consistent perplexity improvements in long-context settings. The technique also enables post-hoc injection of domain knowledge into existing pretrained models, suggesting a practical pathway for augmenting deployed systems without full retraining.
Modelwire context
ExplainerThe detail worth sitting with is modality independence: because Lngram operates on learned discrete symbols rather than text tokens, the same memory mechanism could in principle attach to image or audio encoders, not just language models. That's a quieter claim than the perplexity numbers, but potentially the more durable one.
This is largely disconnected from recent activity in our archive, as we have no prior coverage to anchor it to. It belongs to a cluster of research exploring alternatives to pure in-context memory, sitting alongside work on retrieval-augmented generation and external memory stores. The specific contribution here is moving the retrieval boundary earlier in the pipeline, into latent space, rather than bolting a vector database onto a finished model. The post-hoc injection angle is what makes it practically interesting: it suggests a path for updating deployed models on domain-specific knowledge without the cost of continued pretraining.
The real test is whether the perplexity gains hold when Lngram is grafted onto a publicly auditable base model at scale, say 7B parameters or larger, with results reported on a held-out benchmark like SCROLLS or ZeroSCROLLS. If a third-party replication surfaces within six months and confirms the long-context improvements, the post-hoc injection claim becomes worth taking seriously.
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsLngram · Transformer · Engram
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.