In this paper, the minimax actor-critic algorithm is presented. This is the minimax equivalent of the actor-critic algorithm in the case of probabilistic dynamic programming. The convergence of the policies generated by the algorithm, to an optimal policy, is established. The algorithm is applied to an example involving a UAV navigating hostile territory. Further, error bounds are obtained for approximations involved in solving large scale minimax DP problems, specifically the case of state aggregation. 2003 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
name of conference
AIAA Guidance, Navigation, and Control Conference and Exhibit