Minimax Reinforcement Learning Conference Paper uri icon

abstract

  • In this paper, the minimax actor-critic algorithm is presented. This is the minimax equivalent of the actor-critic algorithm in the case of probabilistic dynamic programming. The convergence of the policies generated by the algorithm, to an optimal policy, is established. The algorithm is applied to an example involving a UAV navigating hostile territory. Further, error bounds are obtained for approximations involved in solving large scale minimax DP problems, specifically the case of state aggregation. 2003 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.

name of conference

  • AIAA Guidance, Navigation, and Control Conference and Exhibit

published proceedings

  • AIAA Guidance, Navigation, and Control Conference and Exhibit

author list (cited authors)

  • Chakravorty, S., & Hyland, D.

citation count

  • 3

complete list of authors

  • Chakravorty, Suman||Hyland, David

publication date

  • August 2003