Park, Myoungkuk (2014-05). Performance Guarantee of a Sub-Optimal Policy for a Discrete Markov Decision Process and Its Application to a Robotic Surveillance Problem. Doctoral Dissertation. Thesis uri icon

abstract

  • This dissertation deals with the development and analysis of sub-optimal decision algorithms for a collection of robots that assist a remotely located operator in perimeter surveillance. The operator is tasked with the classification of incursions across the perimeter. Whenever there is an incursion into the perimeter, a nearby Unattended Ground Sensor (UGS) signals an alert. A robot services the alert by visiting the alert location, collecting evidence in the form of video imagery, and transmitting it to the operator. The accuracy of operator's classification depends on the volume and freshness of information gathered and provided by the robots at locations where incursions occur. There are two competing needs for a robot: it needs to spend adequate time at an alert location to collect evidence for aiding the operator in accurate classification but it also needs to service other alerts as soon as possible, so that the evidence collected is relevant. The control problem is to determine the optimal amount of time a robot must spend servicing an alert. The incursions are stochastic and their statistics are assumed to be known. The control problem may be posed as a Markov Decision Problem (MDP). Dynamic Programming(DP) provides the optimal policy to the MDP. However, because of the "curse of dimensionality" of DP, finding the optimal policy is not practical in many applications. For a perimeter surveillance problem with two robots and five UGS locations, the number of states is of the order of billions. Approximate Dynamic Programming (ADP) via Linear Programming (LP) provides a way to approximate the value function and derive sub-optimal strategies. Using state partitioning and ADP, this dissertation provides different LP formulations for upper and lower bounds to the value function of the MDP, and shows the relationship between LPs and MDP. The novel features of this dissertation are (1) the derivation of a tractable lower bound via LP and state partitioning, (2) the construction of a sub-optimal policy whose performance exceeds the lower bound, and (3) the derivation of an upper bound using a non-linear programming formuation. The upper and lower bounds provides approximation ratio to the value function. Finally, illustrative perimeter surveillance examples corroborate the results derived in this dissertation.
  • This dissertation deals with the development and analysis of sub-optimal decision
    algorithms for a collection of robots that assist a remotely located operator in
    perimeter surveillance. The operator is tasked with the classification of incursions
    across the perimeter. Whenever there is an incursion into the perimeter, a nearby
    Unattended Ground Sensor (UGS) signals an alert. A robot services the alert by
    visiting the alert location, collecting evidence in the form of video imagery, and
    transmitting it to the operator.

    The accuracy of operator's classification depends on the volume and freshness of
    information gathered and provided by the robots at locations where incursions occur.
    There are two competing needs for a robot: it needs to spend adequate time at an
    alert location to collect evidence for aiding the operator in accurate classification but
    it also needs to service other alerts as soon as possible, so that the evidence collected
    is relevant. The control problem is to determine the optimal amount of time a robot
    must spend servicing an alert. The incursions are stochastic and their statistics are
    assumed to be known.

    The control problem may be posed as a Markov Decision Problem (MDP). Dynamic
    Programming(DP) provides the optimal policy to the MDP. However, because
    of the "curse of dimensionality" of DP, finding the optimal policy is not practical in
    many applications. For a perimeter surveillance problem with two robots and five
    UGS locations, the number of states is of the order of billions. Approximate Dynamic
    Programming (ADP) via Linear Programming (LP) provides a way to approximate
    the value function and derive sub-optimal strategies. Using state partitioning
    and ADP, this dissertation provides different LP formulations for upper and lower
    bounds to the value function of the MDP, and shows the relationship between LPs
    and MDP. The novel features of this dissertation are (1) the derivation of a tractable
    lower bound via LP and state partitioning, (2) the construction of a sub-optimal policy
    whose performance exceeds the lower bound, and (3) the derivation of an upper
    bound using a non-linear programming formuation. The upper and lower bounds
    provides approximation ratio to the value function. Finally, illustrative perimeter
    surveillance examples corroborate the results derived in this dissertation.

publication date

  • May 2014