
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction
Researchers have identified a critical vulnerability in KV cache eviction policies used across major language models: all seven tested strategies (LRU, H2O, SnapKV, StreamingLLM, Ada-KV, QUEST, Random) fail catastrophically at prompt boundaries without explicit structural protection. By reserving just 10% of cache capacity at these boundaries, quality recovers from near-total collapse to 69-90% of full-cache performance on long-context benchmarks. Analysis of attention patterns reveals that position-0 tokens concentrate roughly 75% of prefix attention mass, yet standard scoring mechanisms still discard structurally critical boundary tokens. This finding reshapes how production systems should architect KV management for efficient long-context inference.62























