Research Models & Releases·arXiv cs.CL·14h ago

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

Researchers benchmarked eleven multimodal LLMs from Qwen, Gemma, and Gemini families on embodied safety planning in kitchen environments, finding models recognize hazards well in Q&A but fail to mitigate risks when acting as autonomous agents.

Modelwire context

Explainer

The critical finding isn't that models fail at safety — it's the specific shape of the failure: models can correctly identify a dangerous situation when asked about it directly, yet still take unsafe actions when operating autonomously. That dissociation suggests the problem isn't knowledge, it's the translation of knowledge into constrained sequential planning.

This connects directly to the 'Safe Continual Reinforcement Learning in Non-stationary Environments' paper from the same day, which targets physical control systems where transient safety violations during learning are unacceptable. SafetyALFRED essentially documents, at the LLM layer, the exact failure mode that work is trying to solve at the RL layer: a system that understands constraints but doesn't reliably enforce them during execution. The 'Generalization in LLM Problem Solving' paper from April 16 adds relevant texture here — models showed strong pattern recognition but broke down at longer planning horizons, which mirrors what SafetyALFRED finds when agents must chain multiple actions safely rather than answer a single question.

Watch whether any of the three model families tested — Qwen, Gemma, or Gemini — incorporate SafetyALFRED as an evaluation target in a future release checkpoint. If one does and reports improved agent scores without regression on the Q&A component, that would be meaningful evidence the gap is closable through fine-tuning rather than architectural change.

Coverage we drew on

Safe Continual Reinforcement Learning in Non-stationary Environments · arXiv cs.LG

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsQwen · Gemma · Gemini · SafetyALFRED · ALFRED

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.