
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers
Researchers establish theoretical bounds on how much key-value cache compression Transformers can tolerate during multi-step reasoning before performance collapses. The work formalizes a depth-cache tradeoff, suggesting aggressive KV compression requires deeper models to maintain reasoning capability.62




























