Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Researchers have closed a long-standing gap in diffusion-based language modeling by demonstrating that continuous diffusion can match discrete approaches at scale. RePlaid, an updated continuous diffusion model, achieves competitive perplexity on OpenWebText while maintaining a compute overhead only 20x higher than autoregressive baselines, challenging the field's assumption that discrete methods are inherently superior. This finding reshapes the technical landscape for diffusion research by validating an alternative architectural path that was previously dismissed as unscalable, potentially opening new directions for non-autoregressive language model development.
Modelwire context
ExplainerThe headline result buries an important qualifier: a 20x compute overhead over autoregressive baselines is described as competitive, but that framing depends heavily on what you're optimizing for. Researchers are essentially arguing the gap is now small enough to justify continued investment, not that continuous diffusion is ready for deployment at scale.
This story sits in a cluster of architecture-level research that has been building quietly alongside the benchmark and agent work dominating recent coverage. The proxy metrics paper from the same day ('Forecasting Downstream Performance of LLMs With Proxy Metrics') is the closest adjacent thread: both papers are fundamentally about making non-standard training approaches legible and comparable to mainstream baselines, which is a precondition for the field taking either direction seriously. Outside of that connection, this is largely disconnected from the agentic and safety-focused stories that have dominated recent Modelwire coverage.
Watch whether RePlaid's authors or any discrete diffusion group (MDLM in particular) publish a direct comparison on a held-out benchmark like LAMBADA or Pile within the next six months. If continuous diffusion holds parity there, the architectural debate shifts from theoretical to practical.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsRePlaid · Plaid · Duo · MDLM · OpenWebText
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.