Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

Illustration accompanying: Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

Researchers analyzed how LLMs have shifted peer review practices at top AI conferences, examining changes in review language, evaluation priorities, and recommendation patterns since model emergence. The study quantifies whether LLMs are reshaping academic gatekeeping beyond surface-level writing style.

Modelwire context

Explainer

The study's value isn't in confirming that LLMs affect writing style (that's been assumed for two years) but in whether they're changing what reviewers actually reward: the evaluative criteria, the recommendation thresholds, the things that get papers accepted or rejected at the venues that set the field's agenda.

This connects directly to Modelwire's recent coverage of LLM judge reliability. The 'Diagnosing LLM Judge Reliability' piece from April 16 found that even when aggregate consistency looks high, a substantial fraction of individual judgments are logically inconsistent. If reviewers are now using LLMs to help form or articulate opinions, those same failure modes propagate into peer review. The 'Context Over Content' story from the same day adds another layer: LLM judges demonstrably shift their verdicts based on stakes framing, which is exactly the kind of bias that would be hard to detect in aggregate conference-level data.

Watch whether the authors release venue-level breakdowns showing which conferences show the strongest signal. If the effect concentrates in venues that adopted LLM-assisted review tooling earliest, that's evidence of a direct mechanism rather than a diffuse cultural shift.

Coverage we drew on

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLarge Language Models · arXiv

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.