STATISTICAL ASSOCIATION LEARNING OF THE MARKOV DECISION-PROCESS Academic Article uri icon

abstract

  • A neural computing approach to the Markov decision problem is presented. The method makes use of historical records of rewards as inputs and average long-run incomes per epoch as targets for training a backpropagation network to associate the two quantities, without a priori knowledge of the state transition probabilities. Estimation of the long-run income, given a new reward matrix as input, is interpreted as a statistical-association learning problem. Following the training, the relevant functional relationship between the income (output) and the reward (input), that has been learned by the network, can be used to compute an unknown expected income. The present study examines network generalization to new inputs, considering the effects of different topological designs and the characteristics of training samples on neural computing accuracy. Successful performance of the best design, which is determined directly from the underlying mathematical model of the Markov decision problem, is demonstrated by a computer simulation experiment. 1993 Taylor & Francis Group, LLC.

published proceedings

  • IIE TRANSACTIONS

author list (cited authors)

  • SASTRI, T., & MALAVE, C. O.

citation count

  • 0

complete list of authors

  • SASTRI, T||MALAVE, CO

publication date

  • January 1993