Planning in entropy-regularized Markov decision processes and games

Researchers introduce SmoothCruiser, a planning algorithm that solves entropy-regularized MDPs and two-player games with polynomial sample complexity O(1/epsilon^4), addressing a gap where non-regularized settings lack worst-case guarantees.
MentionsSmoothCruiser
Read full story at arXiv cs.LG →(arxiv.org)
Modelwire summarizes — we don’t republish. The full article lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.