Empirical Value Iteration for Approximate Dynamic Programming

We propose a simulation based algorithm, Empirical Value Iteration (EVI) algorithm, for finding the optimal value function of an MDP with infinite horizon discounted cost criteria when the transition probability kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given > 0 and > 0, we specify the minimum number of simulation samples n(; ) needed in each iteration and the minimum number of iterations t(; ) that are sufficient for the EVI to yield, with a probability at least 1 - , an approximate value function that is at least close to the optimal value function. 2014 American Automatic Control Council.

Empirical Value Iteration for Approximate Dynamic Programming Conference Paper