Optimal adaptive controllers for unknown Markov chains
Additional Document Info
We consider the problem of adoptively controlling an unknown Markov chain. No prior information regarding the values of the transition probabilities is provided us (except for a list of forbidden, zero-probability transitions, which is usually obtained as a byproduct of the modeling process itself). The goal is to design an adaptive controller to adequately control the unknown system when its performance is measured by the average cost incurred over a long operating time period. Our main result is the exhibition of a family of adaptive controllers which, when applied to the unknown system, will result in a performance precisely equal to the optimal performance attainable if the system, i. e., the transition probabilities, were known. Hence, the adaptive controllers proposed here are truly optimal, even when operating on an unknown system. The results presented here extend similar results in  where we assume to be initially provided with a finite set of possible models, one of which is guaranteed to be the true one. This paper directly addresses those practical situations where a finite set of possible models with such a guarantee is hard to come by. Copyright 1982 by The Institute of Electrical and Electronics Engineers, Inc.