Reinforcement Learning of a Morphing Airfoil-Policy and Discrete Learning Analysis
Academic Article
Overview
Identity
Additional Document Info
Other
View All
Overview
abstract
Casting the problem of morphing a microair vehicle as a reinforcement-learning problem to achieve desired tasks or performance is a candidate approach for handling many of the unique challenges associated with such small aircraft. This paper presents an early stage in the development of learning how and when to morph a micro air vehicle by developing an episodic unsupervised learning algorithm using the Q-learning method to learn the shape and shape change policy of a single morphing airfoil. Reinforcement is addressed by reward functions based on airfoil properties, such as lift coefficient, representing desired performance for specified flight conditions. The reinforcement learning as it is applied to morphing is integrated with a computational model of an airfoil. The methodology is demonstrated with numerical examples of an NACA type airfoil that autonomously morphs in two degrees of freedom, thickness and camber, to a shape that corresponds to specified goal requirements. Because of the continuous nature of the thickness and camber of the airfoil, this paper addresses the convergence of the learning algorithm given several discretizations. Convergence is also analyzed with three candidate policies: 1) a fully random exploration policy, 2) a policy annealing from random exploration to exploitation, and 3) an annealing discount factor in addition to the annealing policy. The results presented in this paper show the inherent differences in the learned action-value function when the state-space discretization, policy, and learning parameters differ. It was found that a policy annealing from fully explorative to almost fully exploitative yielded the highest rate of convergence as compared to the other policies. Also, the coarsest discretization of the state-space resulted in convergence of the action-value function in as little as 200 episodes. Copyright 2010 by Amanda Lampton, Adam Niksch and JohnValasek.