Missing data estimation for 1-6 h gaps in energy use and weather data using different statistical methods Academic Article uri icon


  • Analysing hourly energy use to determine retrofit savings or diagnose system problems frequently requires rehabilitation of short periods of missing data. This paper evaluates four methods for rehabilitating short periods of missing data. Single variable regression, polynomial models, Lagrange interpolation, and linear interpolation models are developed, demonstrated, and used to fill 1-6 h gaps in weather data, heating data and cooling data for commercial buildings. The methodology for comparing the performance of the four different methods for filling data gaps uses 111-year data sets to develop different models and fill over 500 000 'pseudo-gaps' 1-6 h in length for each model. These pseudo-gaps are created within each data set by assuming data is missing, then these gaps are filled and the 'filled' values compared with the measured values. Comparisons are made using four statistical parameters: mean bias error (MBE), root mean square error, sum of the absolute errors, and coefficient of variation of the sum of the absolute errors. Comparison based on frequency within specified error limits is also used. A linear interpolation model or a polynomial model with hour-of-day as the independent variable both fill 1-6 missing hours of cooling data, heating data or weather data, with accuracy clearly superior to the single variable linear regression model and to the Lagrange model. The linear interpolation model is the simplest and most convenient method, and generally showed superior performance to the polynomial model when evaluated using root mean square error, sum of the absolute errors, or frequency of filling within set error limits as criteria. The eighth-order polynomial model using time as the independent variable is a relatively simple, yet powerful approach that provided somewhat superior performance for filling heating data and cooling data if MBE is the criterion as is often the case when evaluating retrofit savings. Likewise, a tenth-order polynomial model provided the best performance when filling dew-point temperature data when MBE is the criterion. It is possible that the results would differ somewhat for other data sets, but the strength of the linear and polynomial models relative to the other models evaluated seems quite robust. Copyright 2006 John Wiley & Sons, Ltd.

published proceedings


author list (cited authors)

  • Claridge, D. E., & Chen, H.

citation count

  • 10

complete list of authors

  • Claridge, David E||Chen, Hui

publication date

  • January 2006