
Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models
Researchers have uncovered a critical vulnerability in multilingual multimodal LLMs: adversarial images crafted to fool models in one language transfer effectively across other languages, exposing a systemic gap in cross-lingual safety. This finding challenges the assumption that safety alignment generalizes uniformly across languages and suggests that current instruction-tuning approaches leave models exposed to coordinated attacks that exploit language boundaries. For practitioners deploying MLLMs globally, the work signals that robustness testing must span linguistic diversity, not just English benchmarks.62

























