(PDF) A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology

Predictive performance of machine learning algorithms on related problems can be improved using multitask learning approaches. Rather than performing survival analysis on each data set to predict survival times of cancer patients, we developed a novel multitask approach based on multiple kernel learning (MKL). Our multitask MKL algorithm both works on multiple cancer data sets and integrates cancer-related pathways/gene sets into survival analysis. We tested our algorithm, which is named as Path2MSurv, on the Cancer Genome Atlas data sets analyzing gene expression profiles of 7,655 patients from 20 cancer types together with cancer-specific pathway/gene set collections. Path2MSurv obtained better or comparable predictive performance when benchmarked against random survival forest, survival support vector machine, and single-task variant of our algorithm. Path2MSurv has the ability to identify key pathways/gene sets in predicting survival times of patients from different cancer types.

A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology Onur Dereli 1 Ceyda Oğuz 2 Mehmet Gönen 2 3 4 Abstract Predictive performance of machine learning algorithms on related problems can be improved using multitask learning approaches. Rather than performing survival analysis on each data set to predict survival times of cancer patients, we developed a novel multitask approach based on multiple kernel learning (MKL). Our multitask MKL algorithm both works on multiple cancer data sets and integrates cancer-related pathways/gene sets into survival analysis. We tested our algorithm, which is named as Path2MSurv, on the Cancer Genome Atlas data sets analyzing gene expression profiles of 7,655 patients from 20 cancer types together with cancer-specific pathway/gene set collections. Path2MSurv obtained better or comparable predictive performance when benchmarked against random survival forest, survival support vector machine, and single-task variant of our algorithm. Path2MSurv has the ability to identify key pathways/gene sets in predicting survival times of patients from different cancer types. 1. Introduction Understanding the formation and progression mechanisms of the diseases plays a vital importance in treating them. To this aim, genomic characterizations have been used in answering various research problems. Survival analysis is one of these research problems that aims to predict survival times of patients. There are several machine learning algorithms developed to predict survival times using genomic characterizations and clinical information of patients (Cox, 1972; Cox & Oakes, 1984; Bakker et al., 2004; Shivaswamy et al., 2007; Evers & Messow, 2008; Ishwaran et al., 2008; Khan & Zubek, 2008; Van Belle et al., 2011a;b; Mogensen & Gerds, 2013; Kiaee et al., 2016; Wang et al., 2016; Yousefi et al., 2017). These existing algorithms consider the censored observations, but most of them cannot handle high-dimensional feature representations (e.g., genomic characterizations) effectively due to the limited number of training samples. These standard algorithms were recently shown to be more suitable for low-dimensional feature representations (i.e., clinical variables) (Yuan et al., 2014). Pathways/gene sets are simply the sets of genes with roles in the same or similar biological mechanisms. Relating pathways/gene sets to clinical phenotypes helps us better understand the underlying mechanisms of diseases. That is why several machine learning algorithms were proposed to identify pathways/gene sets associated with disease-related phenotypes such as overall survival time after diagnosis. These algorithms either (i) identify survival-related molecular mechanisms using feature selection and learn a survival analysis model on selected features only or (ii) train a survival analysis model on each pathway/gene set separately and pick survival-related ones by comparing their predictive performances (Pang et al., 2012; Zhang et al., 2017; Pang et al., 2010; 2011). However, both approaches have drawbacks. The first approach might pick biologically unrelated genes due to highly correlated structure of genomic characterizations. The second approach might pick related or similar pathways/gene sets due to the fact that each pathway/gene set is analyzed separately. To eliminate these problems, pathway/gene set collections should be integrated into the model during the training step, so that learning algorithm can pick informative pathways/gene sets in a more robust manner. 1 Graduate School of Sciences and Engineering, Koç University, İstanbul 34450, Turkey 2 Department of Industrial Engineering, College of Engineering, Koç University, İstanbul 34450, Turkey 3 School of Medicine, Koç University, İstanbul 34450, Turkey 4 Department of Biomedical Engineering, School of Medicine, Oregon Health & Science University, Portland, OR 97239, USA. Correspondence to: Mehmet Gönen <mehmetgonen@ku.edu.tr>. Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). Using high-dimensional genomic characterizations in machine learning algorithms is a challenging task due to their highly correlated structures. Kernel-based machine learning algorithms were used to address this problem in survival analysis (Shivaswamy et al., 2007; Evers & Messow, 2008; Khan & Zubek, 2008; Van Belle et al., 2011a;b). Kernel methods were also shown to be very successful in other cancer-related problems such as drug sensitivity predic- A Multitask Multiple Kernel Learning Algorithm for Survival Analysis tion (Costello et al., 2014) and gene essentiality prediction (Gönen et al., 2017). The success of kernel methods lies mainly in the fact that the number of model parameters optimized is proportional to the number of samples not to the number of features (Schölkopf & Smola, 2002). The kernel function that defines a similarity measure between pairs of samples is the most important component of kernel methods. No single kernel function is the best one for different problems. That is why we can use a weighted combination of several kernel functions instead of using a single kernel, which is known as multiple kernel learning (MKL) (Gönen & Alpaydın, 2011). Following this idea, an MKL-based survival analysis algorithm can pick informative pathways/gene sets by assigning zero weight to uninformative ones during inference. This approach defines a kernel function on each pathway/gene set using genomic features of the genes included. MKL part then learns an optimized kernel function, which is used to predict survival times (Dereli et al., in press). Multitask learning aims to mainly model related problems conjointly by exploiting commonalities between them (Caruana, 1997). This idea has also been applied in cancer studies to improve predictive performance (Costello et al., 2014; Gönen et al., 2017). Recently, studies modeling multiple cancer types simultaneously (i.e., pan-cancer studies) to capture common underlying biological mechanisms have attracted great attention (The Cancer Genome Atlas Research Network et al., 2013; Yang et al., 2014; Anaya et al., 2016). However, to the best of our knowledge, there is a limited number of multitask learning methods for survival analysis (Li et al., 2016; Wang et al., 2017). In this study, we combined survival analysis, MKL (for pathway selection) and multitask learning (for modeling multiple cohorts) in a unified formulation for the first time. Our algorithm can identify survival-related biological pathways/gene sets using high dimensional genomic characterizations of patients from multiple cohorts. 2. Related Work Random forest (RF) is a supervised machine learning algorithm originally developed for regression and classification (Breiman, 2001). It first creates multiple decision trees using randomly selected features from the input features or randomly selected samples from the training data. It then combines these decision trees to obtain more robust predictions. RF was also extended towards survival analysis and successfully used in many studies (Ishwaran et al., 2008). Support vector machine (SVM) is another supervised machine learning algorithm originally developed for binary classification (Cortes & Vapnik, 1995). SVM was also extended towards censored regression problems, i.e., survival analysis (Shivaswamy et al., 2007; Khan & Zubek, 2008). Survival SVM can be formulated as follows: N X 1 (ξi+ + (1 − δi )ξi− ) min. w⊤ w + C 2 i=1 w.r.t. w ∈ RD , ξ + ∈ RN , ξ − ∈ RN , b ∈ R s.t. ǫ + ξi+ ≥ yi − w⊤ xi − b ǫ+ ξi− ξi+ ξi− ≥0 ∀i ≥0 ∀i, ⊤ ≥ w xi + b − y i (1) ∀i ∀i N where the training data set is {(xi , δi , yi )}i=1 , N is the number of samples, xi is the feature vector of sample i, δi ∈ {0, 1} is the binary indicator variable that shows whether the observed survival time of sample i is censored (i.e., δi = 1) or not (i.e., δi = 0), and yi ∈ R is the observed survival time of sample i (i.e., time to last follow-up if censored or time to death if uncensored). Here, w is the set of weights assigned to features, C is the non-negative regularization parameter, ξ + and ξ − are the sets of slack variables, D is the number of input features, ǫ is the non-negative tube width parameter, and b is the bias parameter. The primal optimization problem in (1) has (D + 2N + 1) decision variables, which makes the model computationally very costly. To integrate kernel functions into this formulation via standard kernel trick, the corresponding Lagrangian function is written as N X 1 L = w⊤ w + C (ξi+ + (1 − δi )ξi− ) 2 i=1 − N X αi+ (ǫ + ξi+ − yi + w⊤ xi + b) i=1 − N X αi− (ǫ + ξi− − w⊤ xi − b + yi ) i=1 − N X βi+ ξi+ − N X βi− ξi− . i=1 i=1 The derivatives of the Lagrangian function with respect to the decision variables of the primal problem are found as N X ∂L = 0⇒w= (αi+ − αi− )xi w i=1 N X ∂L (αi+ − αi− ) = 0 = 0⇒ ∂b i=1 ∂L = 0 ⇒ C = αi+ + βi+ ∀i ∂ξi+ ∂L = 0 ⇒ C(1 − δi ) = αi− + βi− ∂ξi− ∀i. A Multitask Multiple Kernel Learning Algorithm for Survival Analysis Using the Lagrangian function and these derivatives, the corresponding dual optimization problem is written as min. − N X yi (αi+ − αi− ) +ǫ + 2 (αi+ − αi− )(αj+ s.t. − Nt Nt w.r.t. wt ∈ RDt , ξ + t ∈ R , ξ t ∈ R , bt ∈ R + s.t. ǫ + ξti ≥ yti − w⊤ t xti − bt − αj− )x⊤ i xj i=1 j=1 w.r.t. α+ ∈ RN , α− ∈ RN N X (αi+ + αi− ) i=1 i=1 N N 1 XX N X optimization problem of our formulation can be written as # " Nt T X X 1 ⊤ + − (ξti + (1 − δti )ξti ) min. wt wt + C 2 t=1 i=1 (2) (αi+ − αi− ) = 0 i C ≥ αi+ ≥ 0 C(1 − δi ) ≥ ∀i αi− ≥0 ∀i. The dual optimization problem in (2) has 2N decision variables instead of (D + 2N + 1), which significantly reduces the computational complexity. By replacing the term x⊤ i xj with a kernel function k(xi , xj ), kernel functions can be integrated into the model. Several recent studies showed that different cancer types have similar or same underlying biological mechanisms (The Cancer Genome Atlas Research Network et al., 2013; Choi et al., 2014; Damrauer et al., 2014; Hoadley et al., 2014; Lawrence et al., 2014; Yang et al., 2014; Khirade et al., 2015; Pappa et al., 2015; Wan et al., 2015; Anaya et al., 2016), which supports the joint modeling of multiple diseases. That is why there are existing multitask machine learning models to model multiple patient cohorts conjointly (Li et al., 2016; Wang et al., 2017). However, these methods use genomic features directly, and they are not able to extract relative importance of pathways/gene sets. The training data sets defined over multiple cohorts are given Nt T as {{(xti , δti , yti )}i=1 }t=1 , where T denotes the number of tasks (i.e., cohorts), Nt represents the total number of samples for task t, xti is the feature vector of sample i of task t, δti is the binary indicator variable that shows whether the observed survival time of sample i of task t is censored (i.e., δti = 1) or not (i.e., δti = 0), and yti ∈ R is the observed survival time of sample i of task t. The primal + ξti − ξti ≥0 ∀(t, i) ≥0 ∀(t, i), ≥ w⊤ t xti ∀(t, i) + bt − yti (3) ∀(t, i) where wt is the vector of weights assigned to features for task t, C is the non-negative regularization parameter, ξ + t and ξ − t are the sets of slack variables for task t, Dt is the number of input features for task t, ǫ is the non-negative tube width parameter, and bt is the bias parameter for task t. We formulated the corresponding dual optimization problem, where we have a combined objective function over all tasks with a single set of constraints on the kernel weights. min. T X Jt (η) t=1 w.r.t. η ∈ RP s.t. P X (4) ηm = 1 m=1 ηm ≥ 0 ∀m. The inner optimization model Jt (η) for each task is basically a single-kernel survival SVM defined as min. − Nt X + − yti (αti − αti )+ǫ + w.r.t. s.t. 1 2 α+ t Nt X Nt Nt X X Nt X + − (αti + αti ) i=1 i=1 3. Our Proposed Multitask MKL Algorithm for Survival Analysis We extended survival SVM algorithm towards multitask learning and MKL, which is named as Path2MSurv (Figure 1a). By doing so, we will be able to model multiple cohorts simultaneously and to extract survival-related pathways/gene sets to identify shared biological mechanisms among these cohorts. ǫ+ − ξti + − + − (αti − αti )(αtj − αtj )kη (xti , xtj ) i=1 j=1 Nt ∈ RN t , α − t ∈R (5) + − (αti − αti )=0 i=1 + C ≥ αti ≥0 C(1 − δti ) ≥ ∀i − αti ≥0 ∀i, PP where kη (xti , xtj ) corresponds to m=1 ηm km (xti , xtj ). We are guaranteed to obtain a sparse set of kernel weights P in PPPath2MSurv since η lies on a simplex, i.e., η ∈ R , m=1 ηm = 1, and ηm ≥ 0. It is not possible to find the global optimal solution of the overall optimization problem in (4) since it is not jointly convex with respect to decision variables η and A Multitask Multiple Kernel Learning Algorithm for Survival Analysis a Multitask multiple kernel learning Patients Genes ··· KT,1 ··· XT,1 ··· Genes Days to death Days to last follow-up NA 364 .. . 678 NA .. . NA 520 2555 NA Y1 XT,P KT,P η1 KT,η Survival analysis ηP ··· ··· Patients ··· ··· Genes Patients Gene setP Patients G25 XT Alive Dead ··· ··· ··· G12 G47 Alive Dead .. . f1 K1,η Patients G8 ηP Vital status Patients Patients G19 ··· K1,P Genes G42 η1 ··· ··· Patients G19 Survival analysis Patients Patients G6 G3 X1,P Patients G28 X1 G42 K1,1 ··· G17 X1,1 Patients G1 Patients Gene set1 Patients Genes Vital status Dead Dead .. . fT Alive Dead Patients YT Days to death Days to last follow-up 456 3200 .. . NA NA .. . NA 1891 2208 NA Patients Patients Patients Genes b BLCA ● (402) BRCA (1067) CESC COAD ESCA GBM HNSC (291) (433) (160) (152) (498) KIRC (526) KIRP LAML LGG (285) (130) (506) LUAD (500) LIHC (365) LUSC (493) OV PAAD READ SARC STAD (372) (176) (156) (256) (348) UCEC (539) Figure 1. Overview of the proposed Path2MSurv algorithm together with the summary of data sets used in our computational experiments. (a) Path2MSurv algorithm takes gene expression profiles of patients from each cohort, i.e., {Xt }Tt=1 , a pathway/gene set collection with P pathways/gene sets, and clinical information including vital status, days to death, and days to last follow-up, i.e., {Yt }Tt=1 , as its inputs. T,P It then calculates kernel matrices, i.e., {Kt,p }P p=1 , on data matrix slices, i.e., {Xt,p }t=1,p=1 , obtained by mapping pathways/gene sets on T gene expression profiles. The weighted sums of these kernel matrices, i.e., {Kt,η }t=1 , are used to predict survival times of cancer patients using the prediction functions, i.e., {f }Tt=1 . (b) Data sets used in our computational experiments and their corresponding numbers of patients after filtering steps. − T {(α+ t , αt )}t=1 . Instead, we formulated an alternating optimization approach to the overall optimization problem by following the idea proposed by Xu et al. (2010). Kernel (s) weights are initialized to uniform values, i.e., ηm = 1/P , at the first iteration, i.e., s = 0. In each iteration, using kernel weights η (s) , we solve the inner optimization problem in (5) for each task to obtain its corresponding support vector +(s) −(s) coefficients {αt , αt }. Kernel weights are then updated for the next iteration (s + 1) using the support vector coefficients of all tasks in the following update equation: s Nt P Nt T P P (s) (s) (s) ηm αti αtj km (xti , xtj ) (s+1) ηm = t=1 T P P P i=1 j=1 (s) ηo t=1 o=1 (s) s Nt P Nt P i=1 j=1 +(s) −(s) ∀m, (s) (s) αti αtj ko (xti , xtj ) where αti = (αti − αti ). The convergence of this alternating optimization approach is guaranteed since we monotonically decrease the objective function value of (4). A Multitask Multiple Kernel Learning Algorithm for Survival Analysis 4. Experiments 4.1.2. PATHWAY /G ENE S ET DATABASES We performed an extensive set of computational experiments on several cancer data sets, where we compared the predictive performance of our proposed method Path2MSurv against survival RF (Ishwaran et al., 2008), survival SVM (Shivaswamy et al., 2007; Khan & Zubek, 2008), and single-task variant of our algorithm (i.e., Path2Surv) trained on each data set separately (Dereli et al., in press). We used two pathway/gene set databases in addition to gene expression profiles of primary tumors to understand which biological mechanisms are predictive of overall survival times of cancer patients. We extracted gene sets in Hallmark collection (Liberzon et al., 2015) and pathways in Pathway Interaction Database (PID) collection (Schaefer et al., 2009) from the Molecular Signatures Database (MSigDB) (http://software. broadinstitute.org/gsea). These collections consist of group of genes that play joint roles in metabolism, gene regulation, and signaling in cells. Hallmark is a computationally constructed gene set collection including 50 gene sets with sizes between 32–200. It summarizes and represents specific well-defined biological states or processes displaying coherent expression of gene sets. PID is a manually curated and peer-reviewed pathway collection including 196 human signaling and regulatory pathways with sizes between 10–137. 4.1. Data Sets We used gene expression profiles and clinical annotation data of cancer patients provided by The Cancer Genome Atlas (TCGA) at the Genomics Data Commons (GDC) data portal (https://portal.gdc.cancer.gov). To integrate prior biological knowledge about cancer-specific pathways/gene sets into our model, we used one pathway and one gene set collection. 4.1.1. TCGA DATA S ETS TCGA data sets include genomic characterizations and clinical information of more than 10,000 cancer patients for 33 different cancer types. We used gene expression profiles and survival characteristics (i.e., days to death for dead patients and days to last follow-up for alive patients) of patients. We downloaded 9,911 HTSeq-FPKM and 10,949 Clinical Supplement files to obtain gene expression profiles and clinical annotation data, respectively. We did not include metastatic tumors in this study since their underlying mechanisms might be significantly different than primary tumors. We included the patients who have both gene expression profile and survival information available in our analyses. Patients with vital status as Dead (Alive) and days to death (days to last followup) as non-positive or NA were also discarded. We only included cohorts with at least 20 patients having vital status as Dead and at least 100 patients in total. After these filtering steps, we obtained 20 TCGA data sets including 7,655 patients in total (Figure 1b). The following 20 cancer types were included in our experiments: bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma(KIRP), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), rectum adenocarcinoma (READ), sarcoma (SARC), stomach adenocarcinoma (STAD), and uterine corpus endometrial carcinoma (UCEC). 4.2. Experimental Settings We divided each cohort into training and test partitions by randomly picking 80% of samples as the training set and using the remaining 20% as the test set. We tried to keep the ratio between the number of patients having vital status as Dead and the number of patients having vital status as Alive for training and test sets as close as possible. We repeated this procedure 100 times for each cohort to get more robust performance values. We first log2 -transformed the gene expression profiles. For each data set, we then normalized the training set to zero mean and unit standard deviation, whereas we used the mean and the standard deviation of the original training data to normalize the test set. In each replication, we used 4-fold inner cross-validation on the training set to pick the hyper-parameters of Path2MSurv (i.e., regularization parameter C) and baseline algorithms (i.e., number of trees to grow, ntree, for survival RF; regularization parameter C for survival SVM and Path2Surv). We chose ntree from the set {500, 1000, . . . , 2500} and C from the set {10−4 , 10−3 , . . . , 10+5 }. For survival RF, we used randomForestSRC R package version 2.5.1 (Ishwaran & Kogalur, 2017). We implemented survival SVM, Path2Surv, and Path2MSurv in R using CPLEX version 12.7.1 (IBM, 2017) to solve quadratic optimization problems. Our implementations are publicly available at https://github.com/mehmetgonen/ path2msurv. We used the Gaussian kernel function in kernel-based algorithms, namely, survival SVM, Path2Surv, and Path2MSurv: kG (xi , xj ) = exp −(xi − xj )⊤ (xi − xj )/(2σ 2 ) , A Multitask Multiple Kernel Learning Algorithm for Survival Analysis where the kernel width parameter, i.e., σ, was set to the average pairwise Euclidean distance between training data points. We chose the Gaussian kernel function since it is more likely to better capture highly non-linear dependency between gene expression profiles and overall survival times. We calculated kernel matrices on subset of gene expression profiles which includes only corresponding genes of each pathway/gene set. We set the tube width parameter, i.e., ǫ, in kernel-based algorithms to zero. For Path2Surv and Path2MSurv, the convergence is usually observed in tens of iterations. That is why we picked the number of iterations as 200 to guarantee the convergence. 4.3. Performance Measure We used the concordance index (C-index) to compare the predictive performances of baseline algorithms and our proposed Path2MSurv algorithm. C-index gives the ratio between the number of concordant pairs and the number of all comparable pairs. Comparable pairs consist of two Dead patients, or one Dead patient and one Alive patient with an observed survival time longer than the Dead patient. A comparable pair is called concordant if the predicted survival times can be ordered in the same way with the observed survival times. Higher C-index indicates better predictive performance. C-index can be formulated as N P P ∆ij 1((yi − yj )(ŷi − ŷj ) > 0) i=1 j6=i N P P , ∆ij i=1 j6=i where ŷi is the predicted survival time of patient i and ( 1, (δi = 0, δj = 0) or (δi = 0, δj = 1, yi < yj ), ∆ij = 0, otherwise. 4.4. Experimental Results We compared four different machine learning algorithms, namely, survival RF (denoted as RF), survival SVM (denoted as SVM), single-task version of our algorithm Path2Surv (denoted as MKL), and our multitask MKL algorithm Path2MSurv (denoted as MTMKL) on 20 TCGA data sets (Figure 1b). We added [H] or [P] to the algorithm name when Hallmark or PID pathway/gene set collection was used in the corresponding algorithm. RF, SVM, MKL[P], MTMKL[P], MKL[H], and MTMKL[H] are compared in terms of their predictive performance on 20 TCGA data sets in Figure 2, where they predicted overall survival times of cancer patients from their gene expression profiles at the diagnosis time. For each cancer type, we reported C-index values over 100 replications in the corresponding box-and-whisker plots and used two-tailed paired t-tests to see whether there is a significant predictive performance difference between the algorithm pairs. SVM, MKL, and MTMKL were compared against RF. MTMKL was also compared against MKL to see the added benefit of multitask learning. Figure 2 indicates that Path2MSurv with PID pathways (i.e., MTMKL[P]) and Path2MSurv with Hallmark gene sets (i.e., MTMKL[H]) outperformed RF on 13 out of 20 data sets. On the other hand, RF outperformed MTMKL[P] on COAD and READ data sets, while it outperformed MTMKL[H] only on READ data set. When compared against RF, most successful predictive performance for overall survival times of patients was obtained using MTMKL, especially on CESC, GBM, HNSC, LUAD, LUSC, PAAD, and UCEC data sets by improving the C-index values more than 4%. Single-task algorithms MKL[P] and MKL[H] outperformed RF on 10 and 13 out of 20 data sets, respectively. However, both algorithms were outperformed by RF on COAD, LAML, and READ data sets. We observed that MTMKL[P] outperformed MKL[P] on 15 out of 20 data sets, especially by improving prediction performances on BLCA, CESC, GBM, OV, PAAD, SARC, and STAD data sets more than 2%. When we considered the results obtained using Hallmark gene sets, MTMKL[H] outperformed MKL[H] on 14 out of 20 data sets, especially by improving prediction performances on BLCA, LAML, and UCEC data sets more than 2%. MKL[P] and MKL[H] outperformed MTMKL[P] and MTMKL[H] simultaneously only on BRCA and LUSC data sets. These results clearly showed the benefit of multitask learning over single-task learning (i.e., modeling each cohort separately). In our experiments, we observed that RF could not make satisfactory predictions on GBM and LUSC, where the corresponding median C-index values were below 0.5. MKL[P], MTMKL[P], and MKL[H] obtained median C-index values higher than 0.5 on all data sets. Only MTMKL[H] gave a median C-index value lower than 0.5 on READ data set. These results clearly showed that MKL-based algorithms MKL and MTMKL are better in capturing highly non-linear dependency between gene expression profiles and overall survival times. Our method Path2MSurv used a shared set of kernel weights to identify informative pathways/gene sets during training for included cohorts. A pathway/gene set was considered as included into the final model if the corresponding kernel weight was greater than 0.01. For each pathway/gene set, we counted the number of replications in which that pathway/gene set was included into the final model. Figure 3 and Figure 4 show the selection frequencies of Hallmark gene sets and top 50 PID pathways used by Path2MSurv algorithm over 100 replications, respectively. A Multitask Multiple Kernel Learning Algorithm for Survival Analysis ESCA p = 0.656 p = 0.834 p = 0.074 p = 0.360 p = 0.009 p = 0.313 p = 0.740 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ●●● ●●● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●●● ●●●● ●●● ●● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ●●● ●● ● ● ● ● ●● ● ●●● ● ● ●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●●● ●●●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●●●● ● ● ● ●● ● ● ● ●● ●● ●●● ● ●●● ● ● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ●● ● ●● ● ● ●●● ● ● ●● ● ●●●● ● ●● ●● ● ● ● ●● ● ●●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●● ●●● ● ●●● ● ● ●●● ●● ●● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●●●● ● ● ●● ● ●● ●● ● ●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ●● ● ● ●● ● ●● ● ●● ●●●● ● ● ● ● ●● 0.8 ● 0.7 0.6 ● ● 0.5 ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ●● ●● ● ● 0.9 ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●●●●● ●●● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ●●●● ●●●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ●● ●● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● 0.4 0.6 0.5 ● 0.4 0.9 ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ●● ●● ●● ●● ●● ● ● ●●● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ●● ●● ●● ●● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ●● ●●● ● ● ●● ● ● ● ●● ● ●●● ● ●●● ● ●● ●● ● ●● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●●● ●● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ●●●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ●●●● ●●● ● ●● ● ● ● ●●● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ●● ●● ● ● ●●● ●● ●● ● ●● ●●● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ●●● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●● ●● ●● ● ● ● ● ● ●● 0.8 0.7 0.6 0.5 0.4 ● ● 0.3 ● ● ● ●●● ● ●●●● ● ● ●● ● ●● ●● ●● ● ●●●● ●●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●●● ● ●●● ● ● ●● ● ● ●●●● ●● ●● ● ● ● ● ●●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ● ● ●● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ●●● ●●● ● ●● ● ● ●● ● ● ●● ● ● ●●●● ● ● ● ● ● ●● ●●●● ● ● ● ● ●●● ●● ● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ●● ● ●●● ● ● ● ● ● ● ●●● ●●●● ●●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ● ●●● ● ●● ●●●● ● ●●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ●●● ● ●●● ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ●●●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● 0.3 ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ● ●●● ●● ●● ● ● ●● ●●●●● ● ● ●● ●● ●● ● ● ●● ● ●●● ●●●●● ●● ● ● ● ●●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● KIRC KIRP LAML p = 0.592 p = 0.096 p = 0.888 p = 0.833 p = 0.230 p = 0.866 p = 0.006 p = 0.931 p < 1e−3 p = 0.325 p = 0.001 p < 1e−3 p < 1e−3 p < 1e−3 0.8 0.8 ● ●● ● ●● ● ● ● ● ● ● ●●●● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ●●●●● ●●●● ●● ● ●● ●● ●● ● ●● ● ●● ●● ● ● ●● ●●● ●●● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ●● ●● ● ●● ● ●●●●● ● ●● ● ●● ● ●● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ●● ●● ●● ●●● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ●●●● ● ●● ●● ●● ●● ● ●●● ●●●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ●● ● ●● ●● ●● ●●●●● ● ● ●● ●●● ●● ● ●● ● ● ●● ●● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ●●● ●● ● ● ●● ● ● ● ● ●● ● ●●● ●● ● ●● ●●● ● ●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ●●● ●● ● ● ● ● ●● ● ●●● ● ● ● ●●●● ● ●● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ●● ●● ● ●●● ● ●● ●● ● ●● ●● ●● ● ● ● ● ●● ●●● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ●● ●● ● ●● ● ● 0.6 0.5 0.4 ● ● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●●●● ●●● ●● ●● ●● ● ● ●● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ●●●●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ●● ● ●● ●●● ● ● ● ● ● ● ●● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ●● ●●● ● ●● ● ● ● ●●● ●● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●●● ● ● ● ● ●● ●● ● ● ● ●●● ●● ●● ● ● ●● ●● ●● ●● ●● ● 0.7 0.6 ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ●● ● ●●● ●● ● ●●●● ●● ● ● ● ●● ●●● ●● ●●● ●●● ●● ● ● ●●● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ●● ● ●●●● ● ●● ● ●● ● ●●●● ● ● ● ●● ● ●●● ● ●● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●● ●● ● ●● ● ● ●● ● ●●● ●●● ● ● ● ●●● ● ●● ●● ● ●● ●● ●● ●● ● 0.3 1.0 ● ● ● ● ● 0.7 ●● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ●●● ●●● ● ● ●●● ●● ● ●● ● ●●●●● ● ●● ●●●● ● ● ●●● ●● ● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ●● ● ● ●●● ● ●●● ●●● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●● ●●●●● ● ●● ● ●●●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ●● ●● ● ●● ●● ●● ● ●● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●●●●● ● ●● ● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ●●● ● ● ●● ● ●● ● ●●●● ●● ● ●● ● ● ● ●●●● ●●● ●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ●● ●● ●●● ● ● ●● ● ●● ●● ● ●●● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ●● ● ●● ●● ● ● ● ● ● 0.9 0.8 0.7 0.6 ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●●● ●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●●● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●●●● ● ●●●● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●●● ●● ● ● ●●● ● ●●● ● ●●●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ●● ● ●●● ●● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ●● ● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●●● ● ●● ●●● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●● ●● ● ●● 0.9 0.8 0.7 0.6 ● 0.5 0.5 ● ●● ● ● ●● ● ● ●●●●● ● ●● ●● ● ● ● ●● ● ● ● ●● ●●●●● ●● ● ● ●● ● ●● ●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ●● ●● ●● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● 0.4 ● ● ●● ● ● ●● ● ● ●●● ● ● ●● ●● ●●● ● ●●● ● ● ●● ●●● ●● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ●●● ● ● ●●●● ●●● ●● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ●● ● ● 0.5 ●● ●● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ●● ● ● ● ●●● ● ●● ●● ●● ● ● ● ● ●● ● ● ●●● ● ● ●●●● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ●● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ●●●●● ● ● ● ●●● ●● ●● ● ● ● ● ●●●● ● ● ●● ●● ● ●● ●● ● ●● ●●● ●● ●● ● ● ● ● ●●●● ● ●● ● ●● ● ● ● ● ●● ●● ● ●●●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ●●●●● ●●● ● ● ● ●● ●●● ● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● 0.4 LGG LIHC LUAD LUSC OV p = 0.483 p = 0.541 p = 0.803 p = 0.026 p < 1e−3 p < 1e−3 p = 0.922 p = 0.008 p = 0.620 p = 0.003 p = 0.505 p = 0.349 p = 0.004 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.001 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 ● ● ●●● ● ● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ●●● ●● ●● ●● ●●●● ●●● ●●●● ● ● ● ●● ●● ● ●● ●●● ● ● ●● ●●● ●● ●●● ● ●●● ● ●●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ●●● ●● ●● ●● ●● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●● ● ●● ● ● ●●● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●●●●● ●● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ●●●● ● ●● ●●● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ●● ●● ●● ● ● 0.8 ● ● ●● ● ● ●● ● ●● ● ●● ●●● ●● ● ●●● ●● ●● ● ● ● ●● ● ● ●●● ●●●● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●●●●● ● ●● ●● ●● ●●● ●● ● ● ● 0.7 0.6 ● ● ● ● ●● ● ●● ● ● ● ●● ●● ●● ●● ● ●● ● ●● ● ●●●● ● ●● ● ● ● ●● ● ● ● ●●● ●●● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ●● ●● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●●●● ●● ●● ● ●● ● ● ●● ●● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ● ● ● ●● ●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ●●● ●●● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● 0.8 ● ●● ●● ●● ●● ● ●●● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ●●● ● ●● ●●● ● ● ●●●● ●● ●● ● ●● ● ● ●● ● ● ●● ●● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ●●●●● ● ●● ● ●●● ● ● ● ● ● ●●● ● ●●●●● ●● ● ● ● ●● ● ● ●● ● ● ●● 0.5 ● ● ● 0.6 0.5 ● ● ● ●●●● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●●● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● 0.4 0.7 0.8 ● 0.7 ● ●● ● ●● ● ● ●●● ● ● ●●● ●●● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ●●●● ●●● ●● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ●● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ●●●● ●● ● ●● ● ● ● ● ● ●● ●● ● ●●● ●●● ● ● ● ●●● ●● ●● ●●●● ● ● ●● ●● ● ●●● ● ●● ●●●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ●●● ● ●●●●● ●● ●●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●●●● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●●●● ● ●●● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ●● ● ●●● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ●●●● ● ●● ● ● ●●● ●● ● ●●●● ● ●● ● ●●●● ●● ●● ●●●● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ●●●●●●● ●● ●● ● ● ● ● ●●● ● ●● ●● ●● ● ● ● ● ● ● ● 0.7 ● 0.6 0.5 0.4 0.6 ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ● ●● ●●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●●● ● ●● ● ●● ● ●● ● ●●●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●●● ● ●● ●● ● ●●●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●●● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ●●●● ● ● ●● ● ● ●● ● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●●●● ● ● ●●●● ● ●●● ●● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ●● ● ● ● ●● ● ●● ● ●●●●● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ●● ●● ● ● ●●●●● ●● ●●● ● ●● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ●● ●●● ●●● ● ● ●●● ● ● 0.5 ● ● ●● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ●● ●●● ●● ●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ● ● ●● ● ●● ● ●●● ● ●● ●● ●● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ●●●● ● ●● ●●● ●● ●● ●● ● ●●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ●●● ● ● ● ●● ● ●● ●● ●● ● ● ●● ●● ● ●●● ●● ● ●●● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ●● ●● ● ●● ● ● ● ●● ●● ● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ●●●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● 0.4 ● ● ● ● ● 0.4 0.3 0.3 PAAD READ SARC STAD UCEC p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.001 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.681 p = 0.072 p < 1e−3 p = 0.001 p < 1e−3 p = 0.053 p = 0.001 p < 1e−3 p = 0.004 p < 1e−3 p = 0.010 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.003 p < 1e−3 p = 0.002 p < 1e−3 p < 1e−3 p < 1e−3 0.6 0.5 ● ● ● ● ●● ● ● ●●● ●● ●● ● ●●●●● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ●● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ●●● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●●● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ●●●● ●●●● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ●●●● ●● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ●● ●●●● ● ●● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●●●● ●● ● ● ●● ● ● ●● ●●● ●●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●●● ● ● ● ● ●● ●● ●● ●●● ●● ● ● ● ●● ●● ● ●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ●●● ●●● ● ● ●● ● ●● ● ● ● 0.8 0.5 0.4 ● ● ●● ● ●●● ●● ● ●● ● ● ● ●● ●● ● ●● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ●● ● ●● ●● ●● ● ●● ● ● ● ● ●●●●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ●● ●● ● ●● ●● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●●● ● ●●● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●●●●● ● ● ●● ●● ●● ● ●● ●● ● ●●●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ●● ●● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●●● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●●●●● ●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ●● ●●● ●●● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●●● ● ● ●●● ●● ●● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● 0.7 0.6 0.5 ● ● ●● ● ●● ●● ●●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ●● ●● ● ● ●● ●● ● ●●● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ●●● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ●● ●● ● ● ●●●● ●● ●● ● ● ●●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ●●●● ●● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ●● ●● ●●● ● ● ●● ● ● ● ●● ● ● ●● ● ●●● ● ●●● ● ● ● ●● ● ●● ● ●●● ● ● ●● ●● ● ● ●●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ●● ●●● ●●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ● ●● ●●● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ● ●●● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.4 ● 0.6 0.3 0.4 MKL[H] ● 0.7 ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●● ● ●● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●●●● ●●● ●● ●● ● ●●● ●●● ●● ●● ● ●● MTMKL[H] ● ● ● 0.8 MKL[P] ● ●● ●●● ● ●● ●● ●● ●● ●● ● ● ●● ● ● ●●●● ● ●●●● ● ● ● ●●● ● ● ● ●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ●● ●● ●●● ● ●●● ● ● ●● ●● ● ● MTMKL[P] ● ● ●● ●● ● ●● ● ● ●●●● ●● ● ●●● ●● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●●● ● ● ●● ●●● ● ●● RF ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●●● ●● ● ●● ● ●● ● ●● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● SVM MKL[H] MTMKL[H] MKL[P] MTMKL[P] RF SVM 0.3 ● ● ● ●● ● ● ● ●● ●●● ●● ●● ● ● ●● ● ●●●● ● ●● ● ● ●● ●●● ●● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●●● ● ● ●● ● ●● ● ● ● MKL[H] ● ● ● ● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●●● ●●● ●● ● ● ●●● ●●● ●● ● ●● ● ● ●● ●● ●● ●●● ● ●●●● ● ●●●● ● ●● ● ● ●● ● ● ● ● ●●● ●● ●●● ●● ● ●● ● MTMKL[H] ● 0.7 0.9 ● ● ● ● ● ● ● ● ●● ●● ●● ●●●●●● ●●● ● ● ● ●● ●● ●● ● ●●● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● MKL[P] ● 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 RF ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ●●● ●● ●● ● ● ● ●● ● ● ● ● ●●● ● ●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● SVM ● ● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ●● ● ●● ●● ●● MKL[H] ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●●●● ● ●● ● ●● ● ●● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ●● ● MTMKL[H] ● ● ● ● ● ●●● ● ● ●● ●●● ●●● ● ●●●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●●●● ●●●● ● ●● ● MKL[P] ● ●● ● ● MTMKL[P] ●● ● ●●●●● ● ● ● ● ●●● ●●● ● ●● ●●●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ●● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● RF ● ● ● ● ● ● ● ● ● SVM 0.8 C−index 0.7 p = 0.331 p = 0.083 p = 0.778 p < 1e−3 p = 0.030 p < 1e−3 p < 1e−3 ● 0.4 ●● ●● ● ●● ●●●● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ● HNSC 0.5 0.5 ● 0.3 ● 0.6 ● ● ●● ● ● ● ●●● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●●● ● ●●● ●●● ● ● ●● ●●●●● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 0.6 0.7 ●●● ● ●●● ● ● ●● ●● ● ●●● ●● ● ● ●●●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●● ● ● ●●● ● ●● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● MTMKL[P] 0.7 0.5 ●● ●●● ● ● ● ● ●●●● ●● ●●●● ● ●● ●● ● ●● ● ● ●● ●● ●● ●● ● ●● ● ●● ● ● ● ●●●●● ● ● ● ●● ●● ● ●●● ● ● ● ●●● ● ●● ● ● ● ●●● ●●● ● ● ●● ●● ● ● ●● RF C−index 0.8 0.6 ●●● ●● ● ●● ●●● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● GBM 0.3 0.9 0.7 ● ● ● ●● ● ●●● ● ●● ● ●● ●● ●●● ● ●● ●● ●● ●● ●●● ● ● ●●● ● ●● ●● ●●● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● SVM C−index 0.4 ●● ● ● ● 0.8 0.8 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.681 0.7 0.5 ● ● ●● ● ● ●● ●● ●● ● ●● ● ● ●●● ●● ●●●● ●● ● ●● ●●● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ●●●● ● ●● ● ●● ● ● ● ● ● ● ● 0.4 0.8 0.6 ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ●● ●●● ● ● ●● ●●●● ● ● ●● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● 0.3 ● ●● ● ●● ● ● ●● ●● ●● ●●● ●● ● ●●●● ●● ●●● ●●● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● MKL[H] 0.4 COAD p = 0.086 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.012 p < 1e−3 MTMKL[H] 0.5 CESC p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 MKL[P] 0.6 BRCA p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p < 1e−3 p = 0.023 MTMKL[P] C−index 0.7 BLCA p < 1e−3 p = 0.016 p < 1e−3 p = 0.640 p < 1e−3 p < 1e−3 p < 1e−3 1.0 0.8 0.6 0.4 0.2 0.0 G L ANYC BI GI OLY O SPLE_ G SIS E KRERAC NE I AP ASMA D_ SIS M T _ APOP S OG E T I AL IC TO GN ENAB A S U LO L_ IS ALI ESOLI N N IS S S G U FO GR UR _D M V A W _R LD FT FA N ED _ C N E ES T_ S _ R E P E PE TR BE ON PR JE ES ROOG TA_ SE OT CT TR XI EN CA _U EIN ION C S _ T H P _R H OL OG OM RE EN ES YP E EN E S IN P5 O S PO _ PO _R TE X 3 M _ IA R N SIG N E P S SE T SP N O E XE OR AT L_ _E AL H O IN H KR NO C1 W N A O SE R G A M LY PA AS BIO_SI Y _L EO TI GN FA NC _S ST ATE C A I C TTY RE GN _M LIN A O SI A E2 AG _A S_ ALI ETA G S N BO G H F_ UL CID BE _ EM TA A _ T TI M A_ UP LIS IL R O E 2 E M G _S _MGE N TA CE 2 T BO LL M M_ TAT ETA S S YO C 5 B L IS TN G H _S O M AP FA ENEC IGN LIS K R IC _SI ES PO AL M EA AL G IS I N ING IN C _ N T J A PI TERTIV UN LIN AD3K_ FE E_OCT G_ I V R A AN IP K O XI ON IA O T G IN DR GE _MN_G EN _N FK M TEROG NE TO AM _S B I TGTO FE EN SISR_SMAPEC T _ R IL F_ IC O RE IG _R IE N E S_ _ 6 AL SP P D _J BET SPN_ASP N A IN O AT N A_ K_ A_ IND LP ON G NS HW O R S S L H S I T PR C E TA G E A_ E E A P N Y H OT H_ AI T3 AL RE E S R _ C DGEIN IG SIGING SP O E _ N O N N M M H S A A SE E L Y P M C_ LE OG CRING LIN Y G EP C_TA ME _SI ET R N G I T O IN ITH A GE T N N R A O FLA EL GETS LIN X U ID MMIAL TS_V1 G V_ AT A _M _ V R IV TO E 2 ES E R S PO_P Y_ EN H N O RE CH SE S S Y _DPH PO MA N OR NS L_ YL E TR AT A IO NS IT N IO N Selection frequency Figure 2. Predictive performances of survival RF (RF) algorithm, survival SVM (SVM) algorithm, single-task MKL algorithm Path2Surv with PID pathway collection (MKL[P]) and with Hallmark gene set collection (MKL[H]), multitask MKL algorithm Path2MSurv with PID pathway collection (MTMKL[P]) and with Hallmark gene set collection (MTMKL[H]) on 20 cancer data sets. Each box-andwhisker plot shows C-index values over 100 replications. Two-tailed paired t-tests are used to see whether there are significant differences between pairs of algorithms. For P -value results, red: RF is better; green: SVM is better; light blue: MKL[P] is better; dark blue: MTMKL[P] is better; light magenta: MKL[H] is better; dark magenta: MTMKL[H] is better; black: no difference. Orange: baseline performance level where C-index = 0.5. Figure 3. Selection frequencies of 50 gene sets in the Hallmark collection over 100 replications by Path2MSurv algorithm. The red line shows where the selection frequency is 50%. 1.0 0.8 0.6 0.4 0.2 P7 H 3P I A M F2P TH Y H C_AT WA N H C F3 AC W Y O B T AY BE NE _ IV PA _ _ T W A P T PA IL NT _CAATHHW TH 2 _ C 3_ SIGTE WAAY WA I Y R RC PAT NANIN Y H N ODAD HW LIN_N I O U A C TC OP N AY G_ C_ D _ M 8_ H_ SINPA PAT PA T Y AU C_TC PAT _PTHW HW HW R ER RO RE _D HW ATH AY AY AY P BB RA R OW AY W U AY P H A_ _N _A ESS NS I _ H F1_UPA ETW PA _P TR N T P3 F3 TF R_ O HWATHEA M R P R 8_ A_ AT PAT K_ AY WA _P X M P IL R_ K AT HW HW PAT Y ATH 3_ V 2_ H A A W H P D P W Y Y HW E AY A AY D DG ATHR_P THAY E H LT EH WA AT WA I A H P3F1A _N OG Y W Y AY P7 8_ _P P6 _2 A A 3 P H 5_ LP TH _P AT IV N H W A H SY _N TR A A TH W W AY LK ND EF _P _BEY AY A P3 B1 EC _PA TH TA AP 8_ _PA AN TH WA _DO G _ W T Y W PL 1_ AM H 3_ A N W ST IN K1 PAT M AYPATY A R IL TE _PA HW _D HW EA G 2 C 7_ RI TH AY ELT AY M D P N W _P A VE 40 AT 3_ AY _ AT PA TA GF_PA HW PA H TH TH W T A P _ IL AY W 12 63_ VEGHW Y WA AY H _ A PA F Y Y DA 2 R TH _ SM C PA W PA EP AD _C TH AY TH W R HR 2_ LAS A W Y B AY N _1 INB3NU SII FK P _ C I_ P5 A AT R L P A E2 3_ PP HWEV EA TH D A _ R IL F_ OWB_ AY PA _P WA 2 P AR _S AT N CA THWATH Y ST N H T F FO 6 AT W R O AY W AY BE XM_PA 5_PAY EANIC M A T A 1 T C A _ H T M _ PA W H _PAL_P YB CA T A W TH AT _P TE HWY AY W HW AT N A AY A Y H IN_ Y W D AY E G _P AT H W AY Selection frequency A Multitask Multiple Kernel Learning Algorithm for Survival Analysis 0.0 Figure 4. Selection frequencies of top 50 out of 196 pathways in the PID collection over 100 replications by Path2MSurv algorithm. The red line shows where the selection frequency is 50%. Figure 3 shows that 19 out of 50 Hallmark gene sets were selected as informative in at least 50 replications. The most informative gene sets were GLYCOLYSIS and ANGIOGENESIS with 100% selection frequencies. These two gene sets are known to be key mechanisms cancer cells benefit from. KRAS SIGNALING DN, SPERMATOGENESIS, APOPTOSIS, APICAL SURFACE and BILE ACID METABOLISM were selected in more than 90 replications. Figure 4 indicates that 26 out of 196 PID pathways were selected as informative in at least 50 replications. The most informative pathways were P73PATHWAY, BETA CATENIN NUC PATHWAY, HIF2PATHWAY, CONE PATHWAY, HNF3B PATHWAY, MYC ACTIV PATHWAY, WNT SIGNALING PATHWAY, and IL23 PATHWAY, which were selected in almost all replications and were known to be key biological mechanisms in cancer. We also observed that multitask MKL algorithms (i.e., MTMKL[P] and MTMKL[H]) used slightly more pathways/gene sets than MKL algorithms (i.e., MKL[P] and MKL[H]), hence the increased predictive performance of MTMKL can be attributed to this. MTMKL were modeling multiple patient cohorts conjointly, so it needed more pathways/gene sets than MKL to capture underlying survival mechanisms of all cohorts simultaneously. Even with this increased number of pathways/gene sets, MTMKL used significantly fewer gene expression features than RF and SVM. 5. Conclusions Identification of biologically important mechanisms for predicting disease-related phenotypes (e.g, overall survival time) is quite important to better understand the formation and progression characteristics of the diseases. In this study, we extended survival SVM algorithm towards MKL (Gönen & Alpaydın, 2011) and multitask learning (Caruana, 1997), which is known to improve the predictive performance of machine learning algorithms when modeling related tasks. To test our proposed Path2MSurv algorithm (Figure 1a), we used gene expression profiles of patients from 20 different cancer types provided by TCGA consortium (Figure 1b). We used two cancer-specific pathway/gene set databases, namely, Hallmark gene set collection (Liberzon et al., 2015) and PID pathway collection (Schaefer et al., 2009), to identify key biological mechanisms for survival. We reported predictive performance of our Path2MSurv algorithm and compared its performance against survival RF (Ishwaran et al., 2008), survival SVM (Shivaswamy et al., 2007; Khan & Zubek, 2008), and single-task variant of our algorithm (Figure 2). Path2MSurv algorithm obtained the best predictive performance on most of the data sets using significantly fewer gene expression features than survival RF and survival SVM algorithms. We envision extending our work towards task clustering in the future. In this study, we trained a shared MKL model on all data sets, which makes sense if all of the tasks are related. If we have disease groups with different underlying biological mechanisms, forcing all tasks to use the same pathways/gene sets for prediction might not be meaningful. We will extend Path2MSurv algorithm to conjointly perform the following three steps: (i) clustering of data sets, (ii) learning shared kernel weights for each cluster, and (iii) learning a survival analysis model for each data set. Acknowledgments This work was supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) under Grant EEEAG 117E181. Onur Dereli was supported by the Ph.D. scholarship (2211) from TÜBİTAK. Mehmet Gönen was supported by the Turkish Academy of Sciences (TÜBAGEBİP; The Young Scientist Award Program) and the Science Academy of Turkey (BAGEP; The Young Scientist Award Program). Computational experiments were performed on the OHSU Exacloud high performance computing cluster. A Multitask Multiple Kernel Learning Algorithm for Survival Analysis References Anaya, J., Reon, B., Chen, W.-M., Bekiranov, S., and Dutta, A. A pan-cancer analysis of prognostic genes. PeerJ, 3: e1499, 2016. Bakker, B., Heskes, T., Neijt, J., and Kappen, B. Improving Cox survival analysis with a neural-Bayesian approach. Stat. Med., 23:2989–3012, 2004. Breiman, L. Random forests. Mach. Learn., 45:5–32, 2001. Caruana, R. Multitask learning. Mach. Learn., 28:41–75, 1997. Choi, W., Porten, S., Kim, S., Willis, D., Plimack, E., et al. Identification of distinct basal and luminal subtypes of muscle-invasive bladder cancer with different sensitivities to frontline chemotherapy. Cancer Cell, 25:152–165, 2014. Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158:929–944, 2014. IBM. ILOG CPLEX Interactive Optimizer. Version 12.7.1.0, 2017. Ishwaran, H. and Kogalur, U. B. randomForestSRC: Random Forests for Survival, Regression, and Classification (RF-SRC) R package version 2.5.1, 2017. Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer, M. S. Random survival forests. Ann. Appl. Stat., 2:841– 860, 2008. Khan, F. M. and Zubek, V. B. Support vector regression for censored data (SVRc): A novel tool for survival analysis. In Proc. 8th IEEE ICDM, 2008. Cortes, C. and Vapnik, V. Support-vector networks. Mach. Learn., 20:273–297, 1995. Khirade, M. F., Lal, G., and Bapat, S. A. Derivation of a fifteen gene prognostic panel for six cancers. Sci. Rep., 5: 13248, 2015. Costello, J. C., Heiser, L. M., Georgii, E., Gönen, M., Menden, M. P., et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol., 32:1202–1212, 2014. Kiaee, F., Sheikhzadeh, H., and Mahabadi, S. E. Relevance vector machine for survival analysis. IEEE Trans. Neural Netw. Learn. Syst., 27:648–660, 2016. Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. Ser. B-Stat. Methodol., 34:187–220, 1972. Lawrence, M., Stojanov, P., Mermel, C., Robinson, J., Garraway, L., et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505:495– 501, 2014. Cox, D. R. and Oakes, D. Analysis of Survival Data. Chapman and Hall, London, 1984. Damrauer, J., Hoadley, K., Chism, D., Fan, C., Tiganelli, C., et al. Intrinsic subtypes of high-grade bladder cancer reflect the hallmarks of breast cancer biology. Proc. Natl. Acad. Sci. U. S. A., 111:3110–3115, 2014. Dereli, O., Oğuz, C., and Gönen, M. Path2Surv: Pathway/gene set-based survival analysis using multiple kernel learning. Bioinformatics, in press. Evers, L. and Messow, C. M. Sparse kernel methods for high-dimensional survival data. Bioinformatics, 24:1632– 1638, 2008. Gönen, M. and Alpaydın, E. Multiple kernel learning algorithms. J. Mach. Learn. Res., 12:2211–2268, 2011. Gönen, M., Weir, B. A., Cowley, G. S., Vazquez, F., Guan, Y. F., et al. A community challenge for inferring genetic predictors of gene essentialities through analysis of a functional screen of cancer cell lines. Cell Syst., 5:485– 497, 2017. Li, Y., Wang, J., Ye, J., and Reddy, C. K. A multi-task learning formulation for survival analysis. In Proc. 22nd ACM KDD, 2016. Liberzon, A., Birger, C., Thorvaldsdottir, H., Ghandi, M., Mesirov, J. P., and Tamayo, P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst., 1:417–425, 2015. Mogensen, U. B. and Gerds, T. A. A random forest approach for competing risks based on pseudo-values. Stat. Med., 32:3102–3114, 2013. Pang, H., Datta, D., and Zhao, H. Pathway analysis using random forests with bivariate node-split for survival outcomes. Bioinformatics, 26:250–258, 2010. Pang, H., Hauser, M., and Minvielle, S. Pathway-based identification of SNPs predictive of survival. Eur. J. Hum. Genet., 19:704–709, 2011. Pang, H., George, S. L., Hui, K., and Tong, T. Gene selection using iterative feature elimination random forests for survival outcomes. IEEE-ACM Trans. Comput. Biol. Bioinform., 9:1422–1431, 2012. A Multitask Multiple Kernel Learning Algorithm for Survival Analysis Pappa, K. I., Polyzos, A., Jacob-Hirsch, J., Amariglio, N., Vlachos, G. D., et al. Profiling of discrete gynecological cancers reveals novel transcriptional modules and common features shared by other cancer types and embryonic stem cells. PLoS One, 10:e0142229, 2015. Schaefer, C. F., Anthony, K., Krupa, S., Buchoff, J., Day, M., et al. PID: The Pathway Interaction Database. Nucleic Acids Res., 37:D674–D679, 2009. Schölkopf, B. and Smola, A. J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002. Shivaswamy, P. K., Chu, W., and Jansche, M. A support vector approach to censored targets. In Proc. 7th IEEE ICDM, 2007. The Cancer Genome Atlas Research Network, Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genet., 45:1113–1120, 2013. Van Belle, V., Pelckmans, K., Van Huffel, S., and Suykens, J. A. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif. Intell. Med., 53:107–118, 2011a. Van Belle, V., Pelckmans, K., Van Huffel, S., and Suykens, J. A. Improved performance on high-dimensional survival data by application of Survival-SVM. Bioinformatics, 27: 87–94, 2011b. Wan, Q., Dingerdissen, H., Fan, Y., Gulzar, N., Pan, Y., et al. BioXpress: An integrated RNA-seq-derived gene expression database for pan-cancer analysis. Database, 2015. Wang, L., Li, Y., Zhou, J., Zhu, D., and Ye, J. Multi-task survival analysis. In Proc. 17th IEEE ICDM, 2017. Wang, Y., Chen, T., and Zeng, D. Support vector hazards machine: A counting process framework for learning risk scores for censored outcomes. J. Mach. Learn. Res., 17: 1–37, 2016. Xu, Z., Jin, R., Yang, H., King, I., and Lyu, M. Simple and efficient multiple kernel learning by group Lasso. In Proc. 27th ICML, 2010. Yang, Y., Han, L., Yuan, Y., Li, J., Hei, N., et al. Gene co-expression network analysis reveals common systemlevel properties of prognostic genes across cancer types. Nat. Commun., 5:3231, 2014. Yousefi, S., Amrollahi, F., Amgad, M., Dong, C., Lewis, J. E., et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci. Rep., 7:11707, 2017. Yuan, Y., Van Allen, E. M., Omberg, L., Wagle, N., AminMansour, A., et al. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol., 32:644–652, 2014. Zhang, X., Li, Y., Akinyemiju, T., Ojesina, A. I., Buckhaults, P., et al. Pathway-structured predictive model for cancer survival prediction: A two-stage approach. Genetics, 205: 89–100, 2017.

RELATED PAPERS

RELATED TOPICS

Log In

A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology

A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology

Related Papers

RELATED PAPERS

RELATED TOPICS