Approximate Dynamic Programming Applied to UAV Perimeter Patrol
Academic Article

Overview

Identity

Additional Document Info

View All

Overview

abstract

One encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for large scale controlled Markov chains. In this chapter, we consider a base perimeter patrol stochastic control problem. To determine the optimal control policy, one has to solve a Markov decision problem,whose large size renders exact dynamic programming methods intractable. So, we propose a state aggregation based approximate linear programming method to construct provably good sub-optimal policies instead. The state-space is partitioned and the optimal cost-to-go or value function is approximated by a constant over each partition. By minimizing a non-negative cost function defined on the partitions, one can construct an approximate value function which also happens to be an upper bound for the optimal value function of the original Markov chain. As a general result, we show that this approximate value function is independent of the non-negative cost function (or state dependent weights; as it is referred to in the literature) and moreover, this is the least upper bound that one can obtain, given the partitions. Furthermore,we show that the restricted system of linear inequalities also embeds a family of Markov chains of lower dimension, one of which can be used to construct a tight lower bound on the optimal value function. In general, the construction of the lower bound requires the solution to a combinatorial problem. But the perimeter patrol problem exhibits a special structure that enables tractable linear programming formulations for both the upper and lower bounds. We demonstrate this and also provide numerical results that corroborate the efficacy of the proposed methodology.