Online Residential Demand Response via Contextual Multi-Armed Bandits

Residential loads have great potential to enhance the efficiency and reliability of electricity systems via demand response (DR) programs. One major challenge in residential DR is to handle the unknown and uncertain customer behaviors. Previous works use learning techniques to predict customer DR behaviors, while the influence of time-varying environmental factors is generally neglected, which may lead to inaccurate prediction and inefficient load adjustment. In this paper, we consider the residential DR problem where the load service entity (LSE) aims to select an optimal subset of customers to maximize the expected load reduction with a financial budget. To learn the uncertain customer behaviors under the environmental influence, we formulate the residential DR as a contextual multi-armed bandit (MAB) problem, and the online learning and selection (OLS) algorithm based on Thompson sampling is proposed to solve it. This algorithm takes the contextual information into consideration and is applicable to complicated DR settings. Numerical simulations are performed to demonstrate the learning effectiveness of the proposed algorithm.

Online Residential Demand Response via Contextual Multi-Armed Bandits Conference Paper