Integrated multi-omics analysis of ovarian cancer using Deep Learning

  • Muta Tah Hira

Student thesis: Doctoral Thesis


Ovarian Cancer (OC) is currently the most deadly gynaecological cancer at an advanced
age in developed countries. The diagnosis of OC presents significant challenges due to its
nonspecific symptoms and the lack of reliable early detection tests or tools. Advancing
early detection methods is crucial for improving ovarian cancer outcomes. Cancer is a
complex disease that deregulates cellular functions at various molecular levels (e.g., DNA,
RNA, and proteins). Integrated multi-omics data analysis from these levels is necessary to
understand the aberrant cellular functions responsible for cancer and the development of
early detection tools/tests. Deep learning (DL) approaches have become helpful in integrated
multi-omics cancer data analysis in recent years. However, high dimensional multi-omics data
are generally imbalanced with too many molecular features and relatively few patient samples.
This imbalance makes a DL-based integrated multi-omics analysis difficult. DL-based
dimensionality reduction technique, including variational autoencoder (VAE), is a potential
solution to balance high dimensional multi-omics data. However, there are some VAE-based
integrated multi-omics analyses, and they are limited to pancancer. Also, existing integrated
multi-omics analyses still need to perform a comprehensive mono- and integrated multi-omics
(di- and tri-omics) analysis of OC data. This research aims to perform a comprehensive and
integrated multi-omics analysis of OC data and use the analysis for two subdomains (i.e.,
diagnosis and prognosis) of OC. In this regard, the research first identifies the key features of
an ML/DL-based cancer data analysis. Then, it systematically reviews DL-based OC data
analyses from the perspectives of the key features identified.
After that, we focus on reducing the dimensionality of the genomics, epigenomics, and
transcriptomics data of OC using a VAE and then integrating them in different combinations
(e.g., mono-omics, di-omics, and tri-omics) to analyse them for two subdomains (i.e. diagnosis
and prognosis) of OC. To do that, we propose to use an improved version of VAE named
the maximum mean discrepancy VAE (MMD-VAE). We used the mono-omics, integrated
di-omics and tri-omics data analysis of ovarian cancer through cancer sample identification,
molecular subtypes clustering and classification, and survival analysis. TCGA’s (The Cancer Genome Atlas) OC datasets were used for the analysis, particularly samples with mRNA,
CNV/CNA (Copy Number Variation/Alteration), RNAseq, and DNA methylation features.
We then integrated them to form di- and tri-omics data. The results show that MMD-VAE
and VAE-based compressed features can accurately cluster and classify the transcriptional
subtypes of the TCGA datasets. For example, a compressed feature-based SVM classifier
can classify transcriptional subtypes with an accuracy of 93.2-95.5% and 87.1-95.7%.
Furthermore, the results of the survival analysis show that the compressed representation
based on VAE and MMD-VAE of omics data can be used in the prognosis of cancer. On
the basis of the results, we can conclude that (i) VAE and MMD-VAE outperform existing
dimensionality reduction techniques, (ii) integrated multi-omics analyses perform better or
similar compared to their mono-omics counterparts, and (iii) MMD-VAE performs better
than VAE in most omics dataset.
Date of Award26 Apr 2024
Original languageEnglish
Awarding Institution
  • Teesside University
SupervisorMosharraf Sarker (Supervisor), Claudio Angione (Supervisor) & James Scrivens (Supervisor)

Cite this