Breaking the Dimensional Barrier: A Pontryagin-Guided Direct Policy Optimization for Continuous-Time Multi-Asset Portfolio
Abstract
Solving large-scale, continuous-time portfolio optimization problems involving numerous assets and state-dependent dynamics has long been challenged by the curse of dimensionality. Traditional dynamic programming and PDE-based methods, while rigorous, typically become computationally intractable beyond a small number of state variables (often limited to ~3-6 in prior numerical studies). To overcome this critical barrier, we introduce the \emph{Pontryagin-Guided Direct Policy Optimization} (PG-DPO) framework. PG-DPO leverages Pontryagin's Maximum Principle to directly guide neural network policies via backpropagation-through-time, naturally incorporating exogenous state processes without requiring dense state grids. Crucially, our computationally efficient ``Two-Stage'' variant exploits rapidly stabilizing costate estimates derived from BPTT, converting them into near-optimal closed-form Pontryagin controls after only a short warm-up, significantly reducing training overhead. This enables a breakthrough in scalability: numerical experiments demonstrate that PG-DPO successfully tackles problems with dimensions previously considered far out of reach, optimizing portfolios with up to 50 assets and 10 state variables. The framework delivers near-optimal policies, offering a practical and powerful alternative for high-dimensional continuous-time portfolio choice.