Reinforcement Learning of a Morphing Airfoil-Policy and Discrete Learning Analysis
An episodic unsupervised learning algorithm using the Q-Learning method is developed to learn the optimal shape and shape change policy of a morphing airfoil. Optimality is addressed by reward functions based on airfoil properties such as lift coefficient, drag coefficient, and moment coefficient about the leading edge representing optimal shapes for specified flight conditions. The reinforcement learning as it is applied to morphing is integrated with a computational model of an airfoil. The methodology is demonstrated with numerical examples of a NACA type airfoil that autonomously morphs in two degrees of freedom, thickness and camber, to a shape that corresponds to specified goal requirements. Due to the continuous nature of the thickness and camber of the airfoil, this paper addresses the convergence of the learning algorithm given several action step sizes. Convergence is ao analyzed with three candidate policies: 1) a fully random exploration policy, 2) a policy annealing from random exploration to exploitation, and 3) an annealing discount factor in addition to the annealing policy. The results presented in this paper show the inherent differences in the learned action-value function when the state space discretization, policy, and learning parameters differ. It was found that a policy annealing from fully explorative to almost fully exploitative yielded the highest rate of convergence as compared to the other policies. Ao, the coarsest discretization of the state space resulted in convergence of the action-value function in as little as 200 episodes. 2008 by Amanda Lampton, Adam Niksch, and John Valasek.
name of conference
AIAA Guidance, Navigation and Control Conference and Exhibit