
Self-Policy Distillation via Capability-Selective Subspace Projection
Self-Policy Distillation addresses a fundamental bottleneck in LLM self-improvement: existing bootstrapping methods either demand expensive external signals (execution feedback, reward models) unavailable for frontier systems, or train indiscriminately on raw outputs, conflating task-relevant skills with stylistic noise and model artifacts. SPD proposes capability-selective filtering that isolates the specific competency being refined, enabling generalizable self-distillation without external oracles. This matters because it could unlock cheaper, more targeted model refinement at scale, particularly for capabilities where ground truth is expensive or unavailable.62


























