Empirical policy iteration for approximate dynamic programming
- Additional Document Info
- View All
2014 IEEE. We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given > 0 and > 0 we specify the minimum number of simulation samples n(, ) needed in each iteration and the minimum number of iterations k(, ) that are sufficient for the EPI to yield, with a probability at least 1-, an approximate value function that is at least close to the optimal value function.
name of conference
2014 IEEE 53rd Annual Conference on Decision and Control (CDC)
53rd IEEE Conference on Decision and Control
author list (cited authors)
Haskell, W. B., Jain, R., & Kalathil, D.
complete list of authors
Haskell, William B||Jain, Rahul||Kalathil, Dileep