Eric James (2016-11). Learning to Control Linear Time-Invariant Systems with Discrete Time Reinforcement Learning. Master's Thesis. Nelson

Reinforcement learning (RL) is a powerful method for learning policies in environments with delayed feedback in a model-free way. Another powerful method for obtaining control policies is the Linear Quadratic Regulator (LQR) problem, which utilizes knowledge of a linear model to derive the optimal policy. However, the continuously evolving dynamics of linear systems pose a challenge to using RL techniques as the underlying theory is discrete in nature. Therefore, reinforcement learners must discretize the dynamics by sampling at specific time intervals. The time interval used by the reinforcement learner directly affects the quality of the control policy that is learned. This work attempts to characterize the quality of learned policies as a function of the sample time of the reinforcement learner and in comparison with the optimal control as it is derived from the LQR framework. It is shown that in the limit as the sample time is decreased to zero, the policies generated converge to the optimal policies. In addition, any non-zero sample time introduces error into the learned policies. This error increases exponentially as a function of the sample time used to train the reinforcement learner.

Nelson, Eric James (2016-11). Learning to Control Linear Time-Invariant Systems with Discrete Time Reinforcement Learning. Master's Thesis. Thesis