
Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study
A controlled empirical study quantifies how skill document granularity affects LLM agent task completion, finding that structured procedural knowledge boosts GPT-5.5 performance by 27-36 percentage points and DeepSeek V4-Flash by 18-26 points relative to no-skill baselines. The work isolates a critical inference-time lever for agent reliability, suggesting that knowledge presentation format, not just availability, shapes downstream success. For teams deploying reasoning-enabled models in production, this signals that skill engineering deserves parity with prompt engineering as a tuning surface.58





















