On the Convergence of the Policy Iteration for Infinite-Horizon Nonlinear Optimal Control Problems
Abstract
Policy iteration (PI) is a widely used algorithm for synthesizing optimal feedback control policies across many engineering and scientific applications. When PI is deployed on infinite-horizon, nonlinear, autonomous optimal-control problems, however, a number of significant theoretical challenges emerge - particularly when the computational state space is restricted to a bounded domain. In this paper, we investigate these challenges and show that the viability of PI in this setting hinges on the existence, uniqueness, and regularity of solutions to the Generalized Hamilton-Jacobi-Bellman (GHJB) equation solved at each iteration. To ensure a well-posed iterative scheme, the GHJB solution must possess sufficient smoothness, and the domain on which the GHJB equation is solved must remain forward-invariant under the closed-loop dynamics induced by the current policy. Although fundamental to the method's convergence, previous studies have largely overlooked these aspects. This paper closes that gap by introducing a constructive procedure that guarantees forward invariance of the computational domain throughout the entire PI sequence and by establishing sufficient conditions under which a suitably regular GHJB solution exists at every iteration. Numerical results are presented for a grid-based implementation of PI to support the theoretical findings.