VideoModels & Releases Research·Latent Space·6d ago

Top Black Holes Physicist: GPT5 can do Vibe Physics, here's what I found

A theoretical physicist who won the 2024 New Horizons in Fundamental Physics Prize reports that GPT-5 reproduced one of his most complex papers in 30 minutes, a task that originally required months of research. This anecdote signals a qualitative shift in frontier model capabilities: while routine tasks show modest gains, researchers operating at the bleeding edge are discovering that capability ceilings have fundamentally expanded. The claim carries weight given the source's credibility in physics, suggesting LLMs are now competitive with domain experts on highly specialized theoretical work.

Modelwire context

Skeptical read

The story's credibility rests entirely on the source's prestige, but prestige is not methodology. We don't know which paper was reproduced, whether GPT-5's output was actually correct or merely convincing to a domain expert under time pressure, or whether the physicist tested for hallucination systematically.

This sits in direct tension with the AutoMat benchmark covered May 1st, which found that LLM-based agents fail specifically at reproducing underspecified scientific procedures and validating whether computed results actually support original claims. That work used controlled evaluation; this story uses none. The ARC-AGI-3 analysis from The Decoder on May 2nd adds further friction: frontier models still exhibit systematic reasoning failures on tasks humans solve intuitively, which makes a clean 30-minute theoretical physics reproduction harder to accept at face value without seeing the output.

If Lupsaska or Latent Space publishes the actual GPT-5 output alongside the original paper for independent review within the next 60 days, that would substantially strengthen the claim. Without that artifact, this remains an impressive anecdote rather than evidence of a capability ceiling shift.

Coverage we drew on

Can Coding Agents Reproduce Findings in Computational Materials Science? · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsGPT-5 · Alex Lupsaska · OpenAI · Latent Space

Read full story at Latent Space →(youtube.com)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on youtube.com. If you’re a publisher and want a different summarization policy for your work, see our takedown page.