Large Language Models Are Overconfident in Their Own Responses

A new study reveals that instruction-tuned conversational LLMs suffer from systematic overconfidence, driven by both post-training procedures and chat templates that introduce an 'ownership bias' where models trust their own outputs more than identical user-provided text. Testing across six open-weight models and multiple benchmarks exposes a calibration gap that grows beyond base model miscalibration, suggesting deployment risks for applications relying on model confidence scores for uncertainty quantification or safety filtering.

Modelwire context

Explainer

The key finding isn't just that LLMs are overconfident in general, which has been documented before, but that the chat template itself is a structural contributor: the formatting wrapper that marks text as model-generated appears to inflate confidence scores independently of the underlying content, meaning the same factual claim gets trusted differently depending on who the model thinks produced it.

This connects directly to two threads running through recent coverage. The reranker self-assessment paper from June 2nd ('Can LLM Rerankers Predict Their Own Ranking Performance?') reached a cautiously optimistic conclusion that self-consistency signals could substitute for external evaluation, but the ownership bias finding here complicates that picture: if models systematically favor their own outputs, self-consistency checks may be measuring confidence inflation rather than genuine reliability. More urgently, the eating disorder safety paper from June 1st and the harm amplification work from the same day both assume that model confidence or refusal behavior reflects calibrated judgment. If that calibration is structurally distorted by post-training and chat formatting, the safety filtering those systems depend on is operating on a flawed signal.

Watch whether any of the six tested open-weight model families release updated chat templates or post-training recipes that explicitly target calibration in the next two release cycles. If ownership bias persists across template revisions, it points to a post-training data problem rather than a formatting artifact, which is a harder fix.

Coverage we drew on

Can LLM Rerankers Predict Their Own Ranking Performance? · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLarge Language Models · instruction-tuned LLMs · chat templates · ownership bias

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.