Dense Contexts Are Hard Contexts: Lexical Density Limits Effective Context in LLMs

Researchers have identified lexical density, the rate at which context introduces novel information, as a critical but overlooked constraint on LLM long-context performance. Testing models from 9B to 685B parameters on controlled benchmarks, the team found that information-dense contexts cause sharp performance collapse even when token length and needle position remain constant. Models achieving near-perfect retrieval in sparse contexts dropped below 60% accuracy on denser variants. This finding reframes the long-context problem beyond input length and position, suggesting that how tightly information is packed fundamentally limits effective context window regardless of architectural claims.

Modelwire context

Explainer

The finding isn't just that dense contexts are harder, it's that current benchmarks systematically miss this failure mode by testing retrieval on sparse, low-novelty text. A model can pass published long-context evals while being practically useless on the kind of dense technical or legal documents that represent real enterprise workloads.

This lands directly on top of our coverage of MiniMax M3 (early June), where the headline claim was a million-token context window as a competitive differentiator. That framing now looks incomplete: raw token capacity and lexical density are orthogonal constraints, and a million-token window filled with dense content may perform worse than a shorter window on sparse text. The density finding also connects to the sycophancy decomposition paper from June 4, which showed that scaling alone doesn't resolve failure modes when the underlying stress condition changes. Both papers push against the same assumption: that larger or longer automatically means more capable.

Watch whether MiniMax or any other long-context model vendor publishes benchmark results that explicitly control for lexical density in the next few months. If none do, that absence is itself informative about how seriously the field is taking this constraint.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLLMs · Open-weight models

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.