Research·arXiv cs.CL·May 20

Quantifying the cross-linguistic effects of syncretism on agreement attraction

Researchers leveraged LLM-derived metrics (surprisal and attention entropy) to explain why morphological syncretism amplifies agreement attraction errors in some languages but not others. The work validates that large language models capture real psycholinguistic phenomena across typologically diverse languages, offering a scalable method for testing linguistic hypotheses without behavioral experiments. This demonstrates LLMs as tools for cross-linguistic cognitive science, with implications for understanding how model internals map to human language processing and for building more robust multilingual systems.

Modelwire context

Explainer

The paper's real contribution isn't that LLMs capture psycholinguistic effects (that's known), but that morphological syncretism acts as a measurable amplifier of those effects. The work quantifies a specific linguistic property that predicts when agreement errors will spike, offering a testable hypothesis about language structure itself, not just model behavior.

This sits alongside the May 20 work on token-level credit assignment (DelTA). Both papers treat LLM internals as interpretable signals rather than black boxes. Where DelTA reveals that reward signals get dominated by high-frequency tokens, this work shows that morphological properties (syncretism) systematically distort attention patterns across languages. Together they suggest a pattern: LLMs don't just approximate human cognition, they expose structural properties of language that shape both human and model processing in measurable, comparable ways. This is distinct from the grammar adaptation work (also May 20), which uses LLMs as tools for engineering tasks rather than as windows into linguistic structure.

If follow-up work validates these syncretism predictions on a held-out language family (e.g., Finno-Ugric or Austronesian languages not in the original five), that confirms the effect generalizes beyond the tested typological sample. If instead the pattern breaks down on new language families, it suggests the findings are specific to Indo-European and Turkic morphosyntax rather than universal principles of language processing.

Coverage we drew on

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLarge Language Models · English · German · Russian · Turkish · Armenian

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.