Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives

Researchers have identified a critical gap in how LLMs process spatial language and cultural reasoning. By testing demonstratives (words like 'this' and 'that') across English and Chinese speakers, they found that five leading models fail to distinguish proximal-distal spatial relationships and show no cultural variation in perspective-taking, instead defaulting to English-centric patterns. This work exposes how text-only training leaves models blind to embodied cognition and cultural conventions, suggesting current pretraining approaches cannot capture grounded spatial reasoning that humans acquire through physical experience. The finding matters for anyone building multilingual systems or relying on LLMs for cross-cultural reasoning.

Modelwire context

Explainer

The study's sharpest contribution isn't just that models fail at spatial language, it's that they fail uniformly across architectures and language families, suggesting this is a structural consequence of text-only pretraining rather than a fixable quirk in any single model's training mix.

This connects directly to the linguistic bias work we covered the same week ('An Investigation of Linguistic Biases in LLM-Based Recommendations'), which showed models produce measurably different outputs depending on dialect even when the underlying data is identical. Together, these two papers sketch a consistent picture: LLMs don't just perform worse on non-English inputs, they impose English-centric cognitive frames on tasks where cultural or embodied conventions should vary. The demonstratives finding goes a layer deeper, though, because it implicates not just training data distribution but the absence of any grounded sensorimotor signal that spatial language actually requires.

Watch whether any of the five tested models (GPT, Claude, Gemini, Llama, Qwen) release multimodal or embodied training updates in the next 12 months and whether follow-up benchmarks on spatial demonstratives show measurable divergence from the English-centric baseline reported here.

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsLLMs · GPT · Claude · Gemini · Llama · Qwen

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.