Optimal Policy Learning for Multi-Action Treatment with Risk Preference using Stata
Abstract
This paper presents the Stata community-distributed command "opl_ma_fb" (and the companion command "opl_ma_vf"), for implementing the first-best Optimal Policy Learning (OPL) algorithm to estimate the best treatment assignment given the observation of an outcome, a multi-action (or multi-arm) treatment, and a set of observed covariates (features). It allows for different risk preferences in decision-making (i.e., risk-neutral, linear risk-averse, and quadratic risk-averse), and provides a graphical representation of the optimal policy, along with an estimate of the maximal welfare (i.e., the value-function estimated at optimal policy) using regression adjustment (RA), inverse-probability weighting (IPW), and doubly robust (DR) formulas.