
Hierarchical Behaviour Spaces
Hierarchical Behaviour Spaces reframes how reinforcement learning agents compose learned skills by treating reward functions as basis vectors for a continuous behaviour manifold rather than discrete options. This shift from predefined hierarchies to learned linear combinations expands policy expressiveness and scales to billion-step environments. Testing on NetHack reveals an unexpected finding: hierarchy's gains stem from exploration diversity, not temporal abstraction, challenging foundational assumptions in hierarchical RL and suggesting the field may have overweighted reasoning depth relative to search breadth.58




























