Reward maximization in a non-stationary mobile robot environment

In this paper, we present an approach to reward maximization in a non-stationary mobile robot environment. The approach works within the constraints of limited local sensing and limited a prior knowledge of the environment. It is based on the use of augmented Markov models (AMMs), which are essentially Markov chains having additional statistics associated with states and state transitions. We have developed an algorithm that constructs AMMs on-line and in real-time with little computational and space overhead, making it practical to maintain multiple models of the interaction dynamics between a robot and its environment during the execution of a task. For the purposes of reward maximization in a non-stationary environment, these models monitor events at increasing intervals of time and provide statistics used to discard redundant or outdated information while reducing the probability of conforming to noise. This approach has been successfully implemented with a real mobile robot performing a mine collection task. In the context of this task, we first present experimental results validating our reward maximization criterion in a stationary environment. We then incorporate our algorithm for redundant/outdated information reduction using multiple models and apply the approach to a non-stationary environment with an abrupt change.

Reward maximization in a non-stationary mobile robot environment Conference Paper