Breast cancer is one of the most common cancers in women worldwide, which causes an enormous number of deaths annually. However, early diagnosis of breast cancer can improve survival outcomes enabling simpler and more cost-effective treatments. The recent increase in data availability provides unprecedented opportunities to apply data-driven and machine learning methods to identify early-detection prognostic factors capable of predicting the expected survival and potential sensitivity to treatment of patients, with the final aim of enhancing clinical outcomes. This tutorial presents a protocol for applying machine learning models in survival analysis for both clinical and transcriptomic data. We show that integrating clinical and mRNA expression data is essential to explain the multiple biological processes driving cancer progression. Our results reveal that machine-learning-based models such as random survival forests, gradient boosted survival model, and survival support vector machine can outperform the traditional statistical methods, i.e., Cox proportional hazard model. The highest C-index among the machine learning models was recorded when using survival support vector machine, with a value 0.688, whereas the C-index recorded using the Cox model was 0.677. Shapley Additive Explanation (SHAP) values were also applied to identify the feature importance of the models and their impact on the prediction outcomes.
|Title of host publication||Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology|
|Number of pages||69|
|Publication status||Published - 11 May 2022|
|Name||Methods in Molecular Biology|
Bibliographical note© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.
AO and CA acknowledge the support of Earlier.org through their Research Grant “Application of computational models of breast cancer for early-detection personalised tests.” CA acknowledges the support of EPSRC and The Alan Turing Institute through their Turing Network Development Award, and the Children’s Liver Disease Foundation through their Research Grant.
© 2023, The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.