Application of Machine Learning Techniques to High-Dimensional Clinical Data to Forecast Postoperative Complications.
Additional Document Info
OBJECTIVE: To compare performance of risk prediction models for forecasting postoperative sepsis and acute kidney injury. DESIGN: Retrospective single center cohort study of adult surgical patients admitted between 2000 and 2010. PATIENTS: 50,318 adult patients undergoing major surgery. MEASUREMENTS: We evaluated the performance of logistic regression, generalized additive models, nave Bayes and support vector machines for forecasting postoperative sepsis and acute kidney injury. We assessed the impact of feature reduction techniques on predictive performance. Model performance was determined using the area under the receiver operating characteristic curve, accuracy, and positive predicted value. The results were reported based on a 70/30 cross validation procedure where the data were randomly split into 70% used for training the model and the 30% for validation. MAIN RESULTS: The areas under the receiver operating characteristic curve for different models ranged between 0.797 and 0.858 for acute kidney injury and between 0.757 and 0.909 for severe sepsis. Logistic regression, generalized additive model, and support vector machines had better performance compared to Nave Bayes model. Generalized additive models additionally accounted for non-linearity of continuous clinical variables as depicted in their risk patterns plots. Reducing the input feature space with LASSO had minimal effect on prediction performance, while feature extraction using principal component analysis improved performance of the models. CONCLUSIONS: Generalized additive models and support vector machines had good performance as risk prediction model for postoperative sepsis and AKI. Feature extraction using principal component analysis improved the predictive performance of all models.