Lin, Shuqiong (2018-05). A New Multilevel Cart Algorithm and Its Application in Propensity Score Analysis. Doctoral Dissertation. Thesis uri icon

abstract

  • The logistic regression model is the most commonly used analysis method for modeling binary data. Unbiased estimation using logistic regressions heavily depends on strong model assumptions which are often violated in reality. The classification and regression tree (CART) algorithm gains its popularity to replace the logistic regression, because CART does not require model assumptions and can model complex relationships automatically. However, only limited studies developed multilevel CART (M-CART) algorithms for modeling multilevel data with binary outcomes. Therefore, in the first study, a new M-CART algorithm was proposed for modeling multilevel data with binary outcomes which combines the multilevel logistic regression (M-logit) and the single-level CART (S-CART) using an expectation-maximization algorithm. This proposed algorithm allows inclusion of covariates at all levels, depends on no model assumptions, and captures interaction and nonlinearity in an automatic way. The performance of the proposed M-CART was compared with M-CART, S-CART, and single-level logistic regression (S-logit) in terms of prediction accuracy. Results from simulation study showed that M-CART lead to higher classification accuracy, sensitivity, specificity and Klecka's tau values than all other three methods. In the second study, the proposed M-CART algorithm was applied in propensity score analysis (PSA) when having multi-site non-randomized control trials (non-RCTs). PSA is the most popular statistical technique that estimates the casual effect of a treatment by eliminating the systematic differences of pre-treatment covariates between individuals who receive treatment and individuals who do not receive treatment. M-logit and S-CART have been applied to estimate propensity scores, while no study has explored the performance of using M-CART for estimation. Thus, in the second study, the performance of the proposed M-CART was compared with M-logit, S-CART, and S-logit in terms of covariate balance and treatment effect estimation. Results indicated that M-CART was more stable than the M-logit, S-CART and S-logit on achieving pre-treatment covariate balances and always yielded reasonable covariate balances over all conditions. Results further showed that, regardless of the PS conditioning approaches, M-CART yielded the least relative biases in the treatment effect estimations across all simulated conditions than other methods.
  • The logistic regression model is the most commonly used analysis method for modeling binary data. Unbiased estimation using logistic regressions heavily depends on strong model assumptions which are often violated in reality. The classification and regression tree (CART) algorithm gains its popularity to replace the logistic regression, because CART does not require model assumptions and can model complex relationships automatically. However, only limited studies developed multilevel CART (M-CART) algorithms for modeling multilevel data with binary outcomes. Therefore, in the first study, a new M-CART algorithm was proposed for modeling multilevel data with binary outcomes which combines the multilevel logistic regression (M-logit) and the single-level CART (S-CART) using an expectation-maximization algorithm. This proposed algorithm allows inclusion of covariates at all levels, depends on no model assumptions, and captures interaction and nonlinearity in an automatic way. The performance of the proposed M-CART was compared with M-CART, S-CART, and single-level logistic regression (S-logit) in terms of prediction accuracy. Results from simulation study showed that M-CART lead to higher classification accuracy, sensitivity, specificity and Klecka's tau values than all other three methods.
    In the second study, the proposed M-CART algorithm was applied in propensity score analysis (PSA) when having multi-site non-randomized control trials (non-RCTs). PSA is the most popular statistical technique that estimates the casual effect of a treatment by eliminating the systematic differences of pre-treatment covariates between individuals who receive treatment and individuals who do not receive treatment. M-logit and S-CART have been applied to estimate propensity scores, while no study has explored the performance of using M-CART for estimation. Thus, in the second study, the performance of the proposed M-CART was compared with M-logit, S-CART, and S-logit in terms of covariate balance and treatment effect estimation. Results indicated that M-CART was more stable than the M-logit, S-CART and S-logit on achieving pre-treatment covariate balances and always yielded reasonable covariate balances over all conditions. Results further showed that, regardless of the PS conditioning approaches, M-CART yielded the least relative biases in the treatment effect estimations across all simulated conditions than other methods.

ETD Chair

publication date

  • May 2018