
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
KV-Fold introduces a training-free method to extend LLM context windows by treating the key-value cache as a functional accumulator across sequence chunks. Rather than retraining or modifying model weights, the technique reuses internal attention state across segments, enabling longer inference without architectural changes. This addresses a persistent bottleneck in production LLM deployment: the computational and memory cost of processing very long documents. For practitioners, the approach offers immediate applicability to existing models, potentially unlocking longer-context capabilities without the expense of fine-tuning or model replacement.62























