Anytime algorithms are a class of algorithm which are interruptible and whose solution quality improves with time, tending towards an optimal solution. In other words, there is a non-decreasing relationship between time invested in computation and solution quality. Algorithms of this nature are clearly relevant to the problem of robotic motion planning. When the state space is large or high-dimensional as is the case for many real applications, an optimal trajectory may take prohibitively long to compute. Using an anytime approach allows for a sub-optimal yet feasible solution to be returned in a reasonable amount of time, after which further time can be spent on improvement. When the nature of this improvement is defined by something like path length or mechanical work, the trade off between time and solution quality must be engineered for a specific context. However, if solution quality is determined by the length of time required to perform a given motion, then there is a well defined relationship between time committed to computation and time spent on navigation. When the objective is to have the robot arrive at its goal in the shortest amount of time possible, there will be a threshold after which time invested in computation is not sufficiently rewarded in terms of path improvement. This optimal computation duration varies greatly for any given environment and start/goal configuration. Additionally, the planner need not decide on a computation duration upfront; the state of the planner and quality of the solution can be observed throughout the process yielding sequential and episodic data. These two facts suggest that deciding when to end a computation phase and begin a navigation phase can be posed as a reinforcement learning problem. In this work, we present a motion planner that can be trained to minimize the overall time spent on both computation and navigation. To this end, we utilize anytime motion planning techniques as well as reinforcement learning algorithms. The performance of this planner will be evaluated for a variety of simulated environments.
Anytime algorithms are a class of algorithm which are interruptible and whose solution quality improves with time, tending towards an optimal solution. In other words, there is a non-decreasing relationship between time invested in computation and solution quality. Algorithms of this nature are clearly relevant to the problem of robotic motion planning. When the state space is large or high-dimensional as is the case for many real applications, an optimal trajectory may take prohibitively long to compute. Using an anytime approach allows for a sub-optimal yet feasible solution to be returned in a reasonable amount of time, after which further time can be spent on improvement. When the nature of this improvement is defined by something like path length or mechanical work, the trade off between time and solution quality must be engineered for a specific context. However, if solution quality is determined by the length of time required to perform a given motion, then there is a well defined relationship between time committed to computation and time spent on navigation. When the objective is to have the robot arrive at its goal in the shortest amount of time possible, there will be a threshold after which time invested in computation is not sufficiently rewarded in terms of path improvement. This optimal computation duration varies greatly for any given environment and start/goal configuration. Additionally, the planner need not decide on a computation duration upfront; the state of the planner and quality of the solution can be observed throughout the process yielding sequential and episodic data. These two facts suggest that deciding when to end a computation phase and begin a navigation phase can be posed as a reinforcement learning problem. In this work, we present a motion planner that can be trained to minimize the overall time spent on both computation and navigation. To this end, we utilize anytime motion planning techniques as well as reinforcement learning algorithms. The performance of this planner will be evaluated for a variety of simulated environments.