SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack
Researchers have developed SEP-Attack, a method that improves adversarial robustness testing for language models by using ensemble weighting via Determinantal Point Processes to better estimate which surrogate models transfer attacks most effectively. This addresses a critical gap in transfer-based attack research, where prior work treated all submodels equally or used unreliable importance scoring. The technique matters because understanding transferability of adversarial examples across models is essential for building defenses and evaluating real-world vulnerability of deployed systems that attackers cannot directly probe.
Modelwire context
ExplainerThe key insight is that SEP-Attack treats surrogate model selection as a diversity problem, not just an importance problem. By using Determinantal Point Processes to weight ensemble members, it avoids redundancy in the models used to craft attacks, which prior work missed by treating all submodels as equally informative.
This connects directly to the SELECT-LLM work from earlier this week, which also tackled model selection under constraints by measuring information gain from model similarities. Both papers recognize that when you have multiple candidate models, naive averaging wastes signal. SEP-Attack applies that insight to the adversarial robustness domain: instead of asking which model is most important for benchmarking (SELECT-LLM's question), it asks which subset of models will generate the most diverse transferable attacks. The underlying principle is identical: use model relationships to reduce redundancy and improve decision-making.
If SEP-Attack's attack success rates hold when tested against models that were explicitly trained on adversarial robustness benchmarks (like those fine-tuned on RobustBench), that confirms the method captures genuine transferability rather than exploiting artifacts in standard model weights. If success rates drop significantly, the technique may be overfitting to undefended models.
Coverage we drew on
- Large Language Model Selection with Limited Annotations · arXiv cs.CL
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsSEP-Attack · Determinantal Point Process
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.