
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
Researchers have introduced LITMUS, a benchmark that exposes a critical vulnerability class in deployed LLM agents: behavioral jailbreaks that trigger irreversible OS-level operations rather than just unsafe text outputs. The work bridges a gap in existing safety evaluation by combining semantic and physical-layer verification with stateful OS rollback, enabling reproducible testing of 819 high-risk scenarios. This matters because autonomous agents increasingly operate with real system permissions, making traditional content-safety benchmarks insufficient. The dual-layer approach signals a maturation in how the field measures agent safety beyond language harms, directly informing deployment guardrails for production systems.68



























