Empirical Policy Iteration for Approximate Dynamic Programming Conference Paper uri icon

abstract

  • © 2014 IEEE. We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0 we specify the minimum number of simulation samples n(ε, δ) needed in each iteration and the minimum number of iterations k(ε, δ) that are sufficient for the EPI to yield, with a probability at least 1-δ, an approximate value function that is at least ε close to the optimal value function.

author list (cited authors)

  • Haskell, W. B., Jain, R., & Kalathil, D.

citation count

  • 0

publication date

  • December 2014

publisher