Empirical Policy Iteration for Approximate Dynamic Programming Conference Paper uri icon

abstract

  • 2014 IEEE. We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given > 0 and > 0 we specify the minimum number of simulation samples n(, ) needed in each iteration and the minimum number of iterations k(, ) that are sufficient for the EPI to yield, with a probability at least 1-, an approximate value function that is at least close to the optimal value function.

name of conference

  • 53rd IEEE Conference on Decision and Control

published proceedings

  • 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC)

author list (cited authors)

  • Haskell, W. B., Jain, R., & Kalathil, D.

citation count

  • 0

complete list of authors

  • Haskell, William B||Jain, Rahul||Kalathil, Dileep

publication date

  • January 2014