Empirical policy iteration for approximate dynamic programming Conference Paper uri icon

abstract

  • 2014 IEEE. We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given > 0 and > 0 we specify the minimum number of simulation samples n(, ) needed in each iteration and the minimum number of iterations k(, ) that are sufficient for the EPI to yield, with a probability at least 1-, an approximate value function that is at least close to the optimal value function.

name of conference

  • 2014 IEEE 53rd Annual Conference on Decision and Control (CDC)

published proceedings

  • 53rd IEEE Conference on Decision and Control

author list (cited authors)

  • Haskell, W. B., Jain, R., & Kalathil, D.

complete list of authors

  • Haskell, William B||Jain, Rahul||Kalathil, Dileep

publication date

  • January 1, 2014 11:11 AM

publisher