Empirical Dynamic Programming

abstract

We propose empirical dynamic programming algorithms for Markov decision processes. In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get empirical value iteration (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get empirical policy iteration (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.

authors

Kalathil, Dileep

published proceedings

MATHEMATICS OF OPERATIONS RESEARCH

altmetric score

1.25

author list (cited authors)

Haskell, W. B., Jain, R., & Kalathil, D.

citation count

35

complete list of authors

Haskell, William B||Jain, Rahul||Kalathil, Dileep

publication date

January 2016

publisher

Institute for Operations Research and the Management Sciences (INFORMS) Publisher

published in

Mathematics of Operations Research Journal

keywords

Dynamic Programming
Empirical Methods
Probabilistic Fixed Points
Random Operators
Simulation

Digital Object Identifier (DOI)

10.1287/moor.2015.0733

start page

402

end page

429

volume

41

issue

2

URL

http://dx.doi.org/10.1287/moor.2015.0733

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL