Ariane Cwiling
Statistical and Machine Learning methods for Survival Data: Prediction, Performance Assessment and Interpretability
In the context of right-censored data, we study the problem of predicting the time to event, restricted to a fixed time horizon, based on a set of covariates. For instance, predicting the time to the onset of a disease or of a relapse for a patient based on their attributes is of great interest in medical applications. Under a quadratic loss, this problem is equivalent to estimating the conditional Restricted Mean Survival Time (RMST), a widely used and easily interpretable quantity. In this work, we build a comprehensive analysis framework for the prediction of the time to event from right-censored data, including a new prediction method combining pseudo-observations and super learner, as well as new criteria to assess the performance and to enhance the interpretability of such RMST estimators. Specifically, a criterion that estimates the mean squared error of an RMST estimator is presented. A model-agnostic conformal algorithm adapted to right-censored data is also introduced to compute prediction intervals and to evaluate local variable importance. Finally, a model-agnostic statistical test is developed to assess global variable importance. These tools are built on the Inverse Probability of Censoring Weighting (IPCW) methodology and are valid under classic assumptions. As for the new prediction method combining pseudo-observations and super learner, the theoretical results of the standard super learner are extended to right-censored data, using a new definition of pseudo-observations, the so-called split pseudo-observations. The method is flexible, easy-to-use and can be efficiently analyzed and interpreted in real-world situations by means of our new IPCW criteria.