A multiple linear regression model with multiplicative log-normal error term for atmospheric concentration data.
Academic Article
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
The homoscedasticity assumption (the variance of the error term is the same across all the observations) is a key assumption in the ordinary linear squares (OLS) solution of a linear regression model. The validity of this assumption is examined for a multiple linear regression model used to determine the source contributions to the observed black carbon concentrations at 12 background monitoring sites across China using a hybrid modeling approach. Residual analysis from the traditional OLS method, which assumes that the error term is additive and normally distributed with a mean of zero, shows pronounced heteroscedasticity based on the Breusch-Pagan test for 11 datasets. Noticing that the atmospheric black carbon data are log-normally distributed, we make a new assumption that the error terms are multiplicative and log-normally distributed. When the coefficients of the multilinear regression model are determined using the maximum likelihood estimation (MLE), the distribution of the residuals in 8 out of the 12 datasets is in good accordance with the revised assumption. Furthermore, the MLE computation under this novel assumption could be proved mathematically identical to minimizing a log-scale objective function, which considerably reduces the complexity in the MLE calculation. The new method is further demonstrated to have clear advantages in numerical simulation experiments of a 5-variable multiple linear regression model using synthesized data with prescribed coefficients and lognormally distributed multiplicative errors. Under all 9 simulation scenarios, the new method yields the most accurate estimations of the regression coefficients and has significantly higher coverage probability (on average, 95% for all five coefficients) than OLS (79%) and weighted least squares (WLS, 72%) methods.