
Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination
Researchers have constructed ProHist-Bench, a rigorous evaluation framework that tests whether LLMs can perform genuine historical scholarship rather than surface-level fact retrieval. Grounded in the Chinese Imperial Examination system and spanning 1,300 years of East Asian history, the benchmark comprises 400 expert-vetted questions designed to probe evidentiary reasoning and interpretive depth. This work exposes a critical gap in existing LLM evaluation: most benchmarks measure knowledge breadth, not the inferential and contextual reasoning that professional historians demand. The finding matters because it clarifies what current models actually cannot do, shaping expectations for AI in knowledge work and informing future training priorities.62




























