Research Tools & Code·arXiv cs.CL·May 21

Chinese sensorimotor and embodiment norms for 3,000 lexicalized concepts

Researchers have released a large-scale dataset of sensorimotor and embodiment ratings for 3,000 Mandarin Chinese concepts, addressing a critical gap in non-Indo-European language resources for embodied AI research. The dataset, collected from 378 native speakers with 11-dimensional sensorimotor annotations, enables empirical investigation into how conceptual knowledge grounds in bodily experience and whether AI systems can acquire such grounding without direct sensorimotor interaction. This resource is foundational for training and evaluating multilingual models that capture embodied semantics, particularly important as embodied cognition becomes central to more human-aligned AI architectures.

Modelwire context

Explainer

The dataset's 11-dimensional sensorimotor schema (visual, tactile, auditory, proprioceptive, etc.) is the actual contribution; most prior embodiment work relied on 2-3 dimensions or English-only corpora. The paper tests whether this grounding transfers across languages or if embodied semantics are culturally contingent.

This connects directly to the moral semantics translation work from May 21, which showed that semantic meaning survives machine translation across languages. Here, the question inverts: do sensorimotor associations (how a concept feels, sounds, looks) also survive translation, or does embodied cognition require language-specific annotation? The Mandarin dataset lets researchers test whether embodied AI trained on English sensorimotor data generalizes to Chinese, or whether each language needs its own grounding layer. If embodiment is universal, this becomes a reusable resource; if it's language-specific, it signals that multilingual embodied AI requires parallel annotation efforts, not just translation pipelines.

If researchers publish cross-lingual transfer experiments within six months showing that English-trained embodiment models perform above 70% accuracy on Mandarin concepts without retraining, that confirms embodied semantics are largely language-agnostic. If performance drops below 50%, it signals that conceptual grounding is culturally embedded and multilingual embodied AI will require per-language annotation at scale.

Coverage we drew on

Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsMandarin Chinese · embodied artificial intelligence

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.