Empirical Policy Iteration for Approximate Dynamic Programming
Conference Paper
Overview
Identity
Additional Document Info
Other
View All
Overview
abstract
2014 IEEE. We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given > 0 and > 0 we specify the minimum number of simulation samples n(, ) needed in each iteration and the minimum number of iterations k(, ) that are sufficient for the EPI to yield, with a probability at least 1-, an approximate value function that is at least close to the optimal value function.