Afrin, Kahkashan (2020-08). Survival Analysis for Big Data. Doctoral Dissertation. Thesis uri icon


  • Survival analysis has emerged as a promising tool in biostatistics for life expectancy prognosis and personalized healthcare. However, its accuracy and potential are limited by the modern big data challenges to which the traditional survival analysis models have not yet properly adapted to. This refers to the data laced with challenges of volume, variety, velocity, and veracity. In this dissertation, we are concerned with the challenges of data imbalance - veracity and multi-view data - variety and volume. To achieve the overarching goal of improving prognosis accuracies, this dissertation was aimed at proposing methodological improvements and leveraging statistical advancements for solving the big data challenges in survival analysis and addressing the limiting assumptions of the most commonly used proportional hazard models. Firstly, we address the data imbalance issue by proposing a balanced random survival forest (BRSF) model that integrates a synthetic minority over-sampling technique with random survival forests for improved mortality prediction. Secondly, for the multi-view survival learning challenge, we proposed an integrated non-parametric survival (iNPS) learning method that captures the joint and individual structures in different data types and models their non-linearity and interactions by using a non-parametric survival learning method. Theoretical results and extensive empirical comparisons using complex cancer and cardiovascular data sets suggests major improvements in the survival prognosis accuracy due to the methods presented in this dissertation. Finally, we extend non-parametric survival learning to multiple recurring events for continuous prediction of epileptic seizures and to provide probabilistic estimates for seizure onset over a broad prediction horizon. These estimates are essential towards developing individualized quantitative risk measures and management plans for epilepsy patients and its potential application in a wearable seizure alert system. We believe that that the methodological advancements and their clinical applications presented in this study may provide a foundation for further knowledge discovery and subsequent improvement in survival analysis--a healthcare domain of immense importance.

publication date

  • August 2020