[go: up one dir, main page]

Academia.eduAcademia.edu

Brain tumor classification based on long echo proton MRS signals

2004, Artificial Intelligence in Medicine

There has been a growing research interest in brain tumor classification based on proton magnetic resonance spectroscopy (Math EqH MRS) signals. Four research centers within the EU funded INTERPRET project have acquired a significant number of long echo Math EqH MRS signals for brain tumor classification. In this paper, we present an objective comparison of several classification techniques applied to the discrimination of four types of brain tumors: meningiomas, glioblastomas, astrocytomas grade II and metastases. Linear and non-linear classifiers are compared: linear discriminant analysis (LDA), support vector machines (SVM) and least squares SVM (LS-SVM) with a linear kernel as linear techniques and LS-SVM with a radial basis function (RBF) kernel as a non-linear technique. Kernel-based methods can perform well in processing high dimensional data. This motivates the inclusion of SVM and LS-SVM in this study. The analysis includes optimal input variable selection, (hyper-) parameter estimation, followed by performance evaluation. The classification performance is evaluated over 200 stratified random samplings of the dataset into training and test sets. Receiver operating characteristic (ROC) curve analysis measures the performance of binary classification, while for multiclass classification, we consider the accuracy as performance measure. Based on the complete magnitude spectra, automated binary classifiers are able to reach an area under the ROC curve (AUC) of more than 0.9 except for the hard case glioblastomas versus metastases. Although, based on the available long echo Math EqH MRS data, we did not find any statistically significant difference between the performances of LDA and the kernel-based methods, the latter have the strength that no dimensionality reduction is required to obtain such a high performance.

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220103439 Brain tumor classification based on long echo proton MRS signals ARTICLE in ARTIFICIAL INTELLIGENCE IN MEDICINE · MAY 2004 Impact Factor: 2.02 · DOI: 10.1016/j.artmed.2004.01.001 · Source: DBLP CITATIONS READS 116 28 11 AUTHORS, INCLUDING: Lukas Lukas Andy Devos 18 PUBLICATIONS 1,702 CITATIONS 12 PUBLICATIONS 394 CITATIONS Atma Jaya Catholic University of Indonesia SEE PROFILE Royal Observatory of Belgium SEE PROFILE Johan A.K. Suykens Sabine Van Huffel 619 PUBLICATIONS 15,412 CITATIONS 794 PUBLICATIONS 13,212 CITATIONS www.esat.kuleuven.be/stadius SEE PROFILE University of Leuven SEE PROFILE Available from: Anne Rosemary Tate Retrieved on: 22 March 2016 Artificial Intelligence in Medicine (2004) 31, 73—89 Brain tumor classification based on long echo proton MRS signals L. Lukasa, A. Devosa,*, J.A.K. Suykensa, L. Vanhammea, F.A. Howeb, C. Majósc, A. Moreno-Torresd, M. Van Der Graafe, A.R. Tateb, C. Arúsf, S. Van Huffela a SCD-SISTA, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee (Leuven), Belgium b CRC Biomedical Magnetic Resonance Research Group, Department of Biochemistry and Immunology, St. George’s Hospital Medical School, Cranmer Terrace, London SW17 0RE, UK c Institut de Diagnòstic per la Imatge (IDI), CSU de Bellvitge, Autovia de Castelldefels km 2.7, L’Hospitalet de Llobregat, 08907 Barcelona, Spain d Centre Diagnòstic Pedralbes, Unitat Esplugues, C/Josep Anselm Clavé 100, 08950 Esplugues de Llobregat, Spain e Department of Radiology, University Medical Center Nijmegen, PO Box 9101, 6500 HB Nijmegen, The Netherlands f Departament de Bioquímica i Biologia Molecular, Unitat de Ciències, Edifici Cs, Universitat Autonòma de Barcelona, 08193 Cerdanyola del Vallès, Spain Received 28 April 2003; received in revised form 7 August 2003; accepted 17 January 2004 KEYWORDS Brain tumors; Classification; Magnetic resonance spectroscopy (MRS); Linear discriminant analysis (LDA); Support vector machine (SVM); Least squares support vector machine (LS-SVM) * Summary There has been a growing research interest in brain tumor classification based on proton magnetic resonance spectroscopy (1 H MRS) signals. Four research centers within the EU funded INTERPRET project have acquired a significant number of long echo 1 H MRS signals for brain tumor classification. In this paper, we present an objective comparison of several classification techniques applied to the discrimination of four types of brain tumors: meningiomas, glioblastomas, astrocytomas grade II and metastases. Linear and non-linear classifiers are compared: linear discriminant analysis (LDA), support vector machines (SVM) and least squares SVM (LS-SVM) with a linear kernel as linear techniques and LS-SVM with a radial basis function (RBF) kernel as a non-linear technique. Kernel-based methods can perform well in processing high dimensional data. This motivates the inclusion of SVM and LS-SVM in this study. The analysis includes optimal input variable selection, (hyper-) parameter estimation, followed by performance evaluation. The classification performance is evaluated over 200 stratified random samplings of the dataset into training and test sets. Receiver operating characteristic (ROC) curve analysis measures the performance of binary classification, while for multiclass classification, we consider the accuracy as performance measure. Based on the complete magnitude spectra, automated binary classifiers are able to reach an area under the ROC curve (AUC) of more than 0.9 except for the hard case glioblastomas versus metastases. Although, based on the available long Corresponding author. Tel.: þ32-16-321-926; fax: þ32-16-321-970. E-mail address: andy.devos@esat.kuleuven.ac.be (A. Devos). 0933–3657/$ — see front matter ß 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.artmed.2004.01.001 74 L. Lukas et al. echo 1 H MRS data, we did not find any statistically significant difference between the performances of LDA and the kernel-based methods, the latter have the strength that no dimensionality reduction is required to obtain such a high performance. ß 2004 Elsevier B.V. All rights reserved. 1. Introduction Brain tumors are the second leading cause of cancer death in children under 15 years and young adults up to the age of 34. These tumors are also the second fastest growing cause of cancer death among humans older than 65 years [1]. Early detection and correct treatment based on accurate diagnosis are important steps to improve disease outcome. Currently, magnetic resonance spectroscopy (MRS) in combination with magnetic resonance imaging (MRI) are important tools to identify the location, size and type of brain tumors. So far, MRS has been proven to be an accurate non-invasive technique which can give detailed chemical information of metabolites present in the suspected brain tumors [2,3]. Under physiological conditions, several important metabolites are observed: NAA (N-acetyl aspartate) as a neuronal marker; Cho (choline-containing compounds) as membrane precursors and degradation products; Cr (total creatine) as a measure of the energy status; glucose; and mI (myo-inositol). Under pathological conditions, the presence of some resonances can be indicative: a doublet of Lac (lactate); lipids and/or some low molecular weight proteins which might occur even under normal conditions; Ace (acetate) and certain amino acids, such as Ala (alanine), Gln (glutamine), Glu (glutamate) and Gly (glycine). In comparison to in vitro spectroscopy, in vivo spectroscopy signals are more difficult to analyze because of their broader resonances, strongly overlapping peaks, lower signal-to-noise ratio and higher number of artifacts. Cousins [4] discusses the influence of the echo time TE on the spectral pattern of an MRS signal. The above-mentioned metabolites can be detected in short echo 1 H MRS signals. However, short echo 1 H MRS signals are more difficult to analyze than long echo 1 H MRS signals due to a higher number of overlapping peaks, a stronger baseline and a higher sensitivity to artifacts. In comparison, long echo 1 H MRS signals are poorer in information but they allow a more reliable analysis and testing of classification methods. Many studies have been performed to classify MRS signals. Lindon et al. [5] overviewed pattern recognition methods and their applications in biomedical magnetic resonance. Several studies [6—11] also show some progress in automated pattern recognition for brain tumor classification based on MR data. These studies are either based on MRI (e.g. [11]), MRI combined with MR spectroscopic imaging (MRSI) (e.g. [7]), long echo (e.g. [6,8,9]) or short echo 1 H MRS (e.g. [10]), but most of the papers investigate only one classification method and restrict data collection to one center only. As performance measure either the training performance is considered or test performance on a specifically selected set. In our study we measure the binary classification performance based on the receiver operating characteristic (ROC) curve analysis over 200 stratified random samplings of training and test set. ROC analysis is commonly used in medicine [12] to objectively judge the discrimination ability of various statistical methods for predictive purposes, which can be measured by the area under the ROC curve (AUC). The AUC gives a global measure of the clinical efficiency over a range of test cut-off points on the ROC curve. This is in contrast to performance measures like the accuracy, e.g. used in [11], which is only based on a single cut-off point (e.g. for one specific value of the false-positive rate). Various clinical studies focus on the prediction of the malignancy of tumors, more specifically for brain gliomas (e.g. [6,11]). Thereby, they consider only two classes: low-grade and high-grade gliomas. In our study, astrocytomas of grade II and glioblastomas (also called astrocytomas of grade IV) are included, which are large subtypes of, respectively, lowgrade and high-grade gliomas. Additionally, we consider two other common brain tumor types, namely metastases and meningiomas. Moreover, this paper reports the results of a comparative study on a multicenter dataset of MRS signals. This dataset was developed in the framework of the EU funded INTERPRET project [13]. Several INTERPRET partners [7,9,10,14—19] have already published results for classification of brain tumors based on MR data available within the project. The papers [7,15,18] focus on the use of 1 H MRSI data, while others consider the use of short or long echo 1 H MRS. Nevertheless, most of these studies are based on a previous version of the dataset or focus on a specific technique. For example, in [10], 144 short echo 1 H MRS spectra from three contributing centers were used, originating Brain tumor classification based on long echo proton MRS signals from three groups of brain tumors; meningiomas, low-grade astrocytomas and aggressive tumors. The latter group includes glioblastomas and metastases. Note that these groups correspond to the same four tumor groups as considered in this paper. But Tate et al. selects a specific training and test set; the data from two centers formed the training set (94 spectra) and the data from the third center were used for testing (50 spectra). Based on this specific test set an accuracy of 96% was obtained using LDA. In this study several methods are applied on all histopathologically validated long echo 1 H MRS data from four common brain tumor types as available in the final status of the database development. We mention three additional points differing with previous classification studies within the framework of the INTERPRET project. First of all, we investigate what can be obtained as typical performance on a representative test set. Therefore, we construct 200 different combinations of training and independent test set. Second of all, the discrimination ability was judged by the AUC, which is, in contradiction to the accuracy, a global measure. Only in one other INTERPRET study [15] ROC analysis was also applied to compare two diagnostic methods for classification based on 1 H MRSI. Third of all, four different techniques are applied for classification; linear as well as non-linear techniques. We investigate binary as well as multiclass classification. Moreover, this analysis includes optimal input variable selection and (hyper-) parameter estimation. Several classification techniques are compared in this paper. We evaluate the performance of linear discriminant analysis (LDA), support vector machines (SVMs) and the least squares version of support vector machines (LS-SVMs) in classifying brain tumors based on long echo 1 H MRS spectra. The support vector machine [20,21] is a training algorithm for learning classification and regression rules from data. It applies the idea of kernel representation from mathematical analysis, for example, using either linear, polynomial, radial basis functions (RBF) or multi-layer perceptrons (MLP) as its learning kernel. SVMs were first introduced by Vapnik in the 1960s for classification and have recently become an area of intense research owing to developments in the techniques and theory coupled with extensions to density estimation and regression. SVMs arose from statistical learning theory; the aim being to solve only the problem of interest without solving a more difficult problem as an intermediate step. SVMs are based on the structural risk minimization principle, closely related to regularization theory. This principle incorporates capacity 75 control to prevent overfitting and is thus a partial solution to the bias-variance trade-off dilemma. Least squares SVM [22] uses equality constraints and solves a set of linear equations in the dual space instead of solving a quadratic programming problem as for the standard SVM. This simplifies the computations and enhances the speed considerably. There exists a link between the LS-SVM classifier formulation with the well-known Fisher discriminant analysis, namely by extending it to a high-dimensional feature space. Some parameters have to be tuned to achieve a high level performance of the (LS-)SVM, including the regularization parameter and the kernel parameter corresponding to the kernel type. The paper is organized as follows. Section 2 explains the material and methods used for classification; description of the data and short explanation of the kernel based methods SVM and LS-SVM. Section 3 summarizes the results of binary classification using complete spectra, selected frequency regions and peak integrated values, consecutively. Afterwards, results of the multiclass classification approach are also mentioned. In Section 4, we discuss the classification performance of the classifiers, the limitations of the dataset and the influence of dimensionality reduction. Finally, Section 5 presents the conclusions. 2. Material and methods 2.1. Material The data were provided by CDP (Centre Diagnòstic Pedralbes, Barcelona, Spain), IDI (Institut de Diagnòstic per la Imatge, Barcelona, Spain), SGHMS (St. George’s Hospital Medical School, London, UK) and UMCN (University Medical Center Nijmegen, Nijmegen, The Netherlands) in the framework of the INTERPRET project. It concerns long echo 1 H MRS data, acquired both with and without water suppression using a PRESS sequence (the repetition time TR is between 1500 and 2020 ms, the echo time TE ¼ 135 or 136 ms, the spectral width SW ¼ 1000 or 2500 Hz, the number of datapoints is 512 or 2048) (Table 1). Four main classes are considered, corresponding to four brain tumor types, i.e. glioblastomas, meningiomas, metastases and astrocytomas (grade II). They are labeled as class 1 (glio), class 2 (meni), class 3 (meta) and class 4 (astroII), respectively. All data have passed a quality control and validation process, which was regulated by strict rules agreed on by all INTERPRET partners. After thorough examinations, the brain tumors were histopathologically classified by three pathologists. These class 76 L. Lukas et al. Table 1 Number of long echo 1 H MRS data of glioblastomas (class 1), meningiomas (2), metastases (3) and astrocytomas grade II (4) Center (acquisition scheme) 1 2 3 4 Total CDP (PRESS, TE ¼ 135 ms) IDI (PRESS, TE ¼ 136 ms) SGHMS (PRESS, TE ¼ 136 ms) UMCN (PRESS, TE ¼ 136 ms) 38 28 10 1 16 27 9 1 5 16 11 0 6 6 7 2 65 77 37 4 Total 77 53 32 21 183 The rows correspond to the acquisition center, while the columns mention the type of brain tumor. The acquisition scheme is a PRESS sequence and TE denotes the echo time. assignments were based on the histological classification of tumors of the central nervous system (CNS) set up by the World Health Organization (WHO). The raw data are acquired in the time domain at the aforementioned centers. A few preprocessing steps are carried out: frequency alignment and phase correction with Klose’s method [23] and filtering of the dominating residual water peak using HSVD [24]. The initial point of the time domain signal was removed, because it was often affected by artifacts. The resulting signal is transformed to the frequency domain by a FFT. For each signal the 2.2. Methods Several classification techniques can be applied to separate the given MR spectra. The techniques we apply in this paper are chosen so that we consider linear as well as non-linear methods: LDA, SVM and LS-SVM. glioblastomas 0.45 0.4 0.4 0.35 0.35 0.3 Cho 0.25 lipids/Lac 0.2 Cr 0.15 NAA 0.1 4 3.5 3 2 1.5 1 0.2 Cr NAA 0.15 0 4.5 0.5 Ala 0.3 magnitude 0.4 0.35 lipids/Lac Cho 0.2 Cr NAA 0.1 3 2.5 2 1.5 1 0.5 astrocytomas grade II 0.45 0.4 0.15 3.5 ppm 0.35 0.25 4 (b) metastasis 0.45 magnitude 2.5 ppm 0.3 Cho 0.25 Cr 0.2 0.15 NAA lipids/Lac 0.1 0.05 0 4.5 Cho 0.05 (a) (c) 0.3 0.25 0.1 0.05 0 4.5 meningiomas 0.45 magnitude magnitude L2-normalized magnitude spectrum (of unit length) is considered only in the frequency region of interest (4.17—0.94 ppm), corresponding to 108 input variables. Fig. 1 depicts the mean magnitude frequency spectra of the four considered classes. 0.05 4 3.5 3 2.5 ppm 2 1.5 1 0 4.5 0.5 (d) 4 3.5 3 2.5 2 1.5 1 0.5 ppm Figure 1 Mean L2-normalized magnitude frequency spectra of the four considered classes: class 1 (top-left), class 2 (top-right), class 3 (bottom-left) and class 4 (bottom-right) correspond to the glioblastomas, meningiomas, metastases and astrocytomas (grade II), respectively. The solid lines are the means, while the dotted lines are the means plus the standard deviations of each class. Brain tumor classification based on long echo proton MRS signals Linear discriminant analysis [25,26] basically projects the data xk 2 Rn from the original input space into a one-dimensional variable zk 2 R and makes a discrimination using this projected variable. This approach tries to maximize between-class variances and minimize the within-class variances for two given classes. Linear principal component analysis (PCA) is applied to select the input variables. It reduces the 108 given spectral variables to a minimal set of variables which cover 75% variance of the data. Quite often, different classes do not have equally distributed datapoints and their distributions are also overlapping among classes, which causes the problem to be linearly non-separable. Here, two kernel-based classifiers SVM and LS-SVM (briefly explained below) are assessed. SVM and LS-SVM with linear kernel can be regarded as regularized linear classifiers, while LS-SVM with RBF kernel is regarded as a regularized non-linear classifier. A support vector machine [20,21] is a universal learning machine, which has become more established and performs well in many classification problems. The principles of SVM are as follows: (1) Consider the training samples fxk ; yk gNk¼1 , xk 2 Rn , yk 2 f1; þ1g. The classifier in the primal space is defined by: yðxÞ ¼ sign½w T jðxÞþ b ; k ¼ 1; . . . ; N, in which w is a weighting function. (2) The SVM performs a non-linear mapping j of the input vectors xk 2 Rn from the input space into a high dimensional feature space. Some kernel functions can be used for this mapping, e.g. linear, polynomial, RBF kernels. (3) In the feature space, an optimal linear decision rule is constructed by calculating a separating hyperplane which has the largest margin: N X 1 minw;ek Jðw; ek Þ ¼ w T w þ C ek 2 k¼1 s:t: yk ½w T jðxÞ þ b 1  ek ; ek 0; k ¼ 1; . . . ; N in which C is a regularization constant. (4) This hyperplane is the solution of the following quadratic programming (QP) problem: N X N X N 1X maxJðaÞ ¼ ak  ak al yk yl Kðxk ; xl Þ a 2 k¼1 l¼1 k¼1 PN satisfying the constraints k¼1 ak yk ¼ 0 and 0 ak C for k ¼ 1; . . . ; N where fxk 2 Rn jk ¼ 1; . . . ; Ng is the training sample set, and fyk 2 f1; þ1gjk ¼ 1; . . . ; Ng the corresponding class labels. Kðx; xk Þ is a symmetric kernel function in the input space which satisfies Mercer’s theorem: Kðx; xk Þ ¼ jðxÞT jðxk Þ. 77 (5) Those input vectors xk 2 Rn with corresponding non-zero ak are called support vectors. They are located in the boundary margin and contribute to the construction of the separating hyperplane. (6) Classification in the input space is calculated by mapping the separating hyperplane back into the input space (SV, set of support vectors): yðxÞ ¼ sign " X xk 2SV # ak yk Kðx; xk Þ þ b : Recently, a least squares version (LS-SVM) has been proposed [22,27], incorporating equality instead of inequality constraints as in the SVM case. This simplifies the computation of the solution, namely by solving a set of linear equations. The modifications are: (1) The constrained optimization problem in the primal space is reformulated as N 1 1X minw;b;e Jðw; b; eÞ ¼ w T w þ g e2 2 2 k¼1 k s:t: yk ½w T jðxk Þ þ b ¼ 1  ek ; k ¼ 1; . . . ; N T The conditions for optimality PN are yk ½w jðxk Þþ b  1 þ ek ¼ 0; ak ¼ gek ; k¼1 ak yk ¼ 0 and w ¼ P N k¼1 ak yk jðxk Þ; k ¼ 1; . . . ; N. (2) Here, non-zero support values ak are spread over all datapoints. Each ak value is proportional to the error of the corresponding datapoint. No sparseness property raises as in the standard SVM case. But, interestingly, in the LS-SVM case one can relate a high support value to a high contribution of the datapoint on the decision line. (3) Elimination of w and e from the previous equations gives (1) with Y ¼ ½y1    yN T , 1v ¼ ½1    1 T , e ¼ ½e1    eN T , a ¼ ½a1    aN T , ðOÞkl ¼ yk yl Kðxk ; xl Þ. This set of linear equations is easier to solve rather than the QP problem as in the standard SVM. In certain problems, non-linear techniques could improve classification performance, especially when data are linearly non-separable. Therefore, in addition to the use of linear kernels in SVM and LS-SVM classifiers, we also apply LS-SVM classifiers with RBF kernels. 78 The MRS spectra were classified using Steve Gunn’s MATLAB Support Vector Machines toolbox [28,29] and KULeuven’s MATLAB/C LS-SVMlab toolbox [27,30,31] for LS-SVM classification with both linear and RBF kernels. 2.3. Selected frequency regions It is well known that characteristic peaks at certain frequencies correspond to important metabolites in the brain [2,3,32—35]. These peaks might be used as discriminatory features to distinguish tumor types. In particular, when their appearance clearly differ in size and shape in between spectra of different tumor types. Instead of using complete spectra as input variables to the classifier, selection of the most explanatory input features can be used. One approach is based on selected frequency regions: therefore, the input variables within certain regions of the magnitude spectrum which are assumed to contain most of the information as input features are selected. Hence, the redundancy produced by spectral noise and artefacts in the spectrum is reduced. Characteristic metabolites can be observed in the following regions of the magnitude MRS spectrum: Cho and Cr (2.95—3.3 ppm); NAc (1.95—2.1 ppm); Lac, Ala and lipid1 (1.15—1.55 ppm); lipid2 (0.9—1.0 ppm). Note that these selected regions are based on the metabolites that are assumed to be most characteristic according to prior knowledge available from field experts participating in this study. Nevertheless, this selection is still subjective as the size of the regions could be altered or some other resonances (e.g. from metabolites with a typically lower intensity at a long echo time; mI, Gln, Gly, etc.) could also have been included. 2.4. Peak integration Another approach to select the most explanatory input is based on peak integration. The amplitude of a resonance is proportional to the integral of the corresponding peak in the spectrum. However, precise estimation of the peak integrals is difficult due to several factors, including nonzero baseline, peak overlap, noise and also the discrete nature of the spectrum. Peak integration is performed here by using the trapezoidal rule. For each selected metabolite the area under the frequency peak in the magnitude spectrum is calculated. These regions cover: Cho (3.1—3.3 ppm); Cr (2.95—3.05 ppm); NAc (1.95— 2.1 ppm); Lac and Ala (1.25—1.55.ppm); and lipid1 (1.1—1.25 ppm). L. Lukas et al. 2.5. Training and test data 2.5.1.Binary classification Binary classification can be used to distinguish two different tumor types. Instead of using a oneagainst-all scheme, the classes are pairwise compared by means of a binary classifier. Consider four types of brain tumors, then six binary classifiers can be constructed to separate the following pairs:       glioblastomas versus meningiomas, glioblastomas versus metastases, glioblastomas versus astrocytomas grade II, meningiomas versus metastases, meningiomas versus astrocytomas grade II, and metastases versus astrocytomas grade II. By classifying in pairs, we obtain more information about: (1) the distribution of two classes and their overlap, (2) the balance of the data distribution of the classes, and (3) the performance of the classifier which can be measured using ROC analysis. The dimension of the input features to LDA is reduced by PCA. The number of principal components is determined by the number of components that account for 75% of the total variance of the given data. Note that PCA is not used when peak integrated values are taken as input features, as peak integration already significantly reduces the dimension. To achieve a high level of performance in SVMs, some hyperparameters must be tuned. These adjustable hyperparameters include: a regularization parameter, which determines the tradeoff between minimizing the training errors and minimizing the model complexity. In case of a RBF kernel, also a kernel parameter (the width s) must be selected. We choose the value of hyperparameters C for SVM, g for LS-SVM with a linear kernel and ðs; gÞ for LSSVM with a RBF kernel through leave-one-out (LOO) cross-validation, while bounding the search to avoid overfitting. The experiment consists of the following steps: (1) the data are divided in a training set (2/3 of the data) and a test set (remainder) using stratified random sampling, (2) train the classifiers and use the test set to evaluate the performance, (3) the index of the misclassified spectra is noted. This randomization is repeated 200 times to avoid bias possibly introduced by selection of a specific training and test set. In this way we try to obtain a Brain tumor classification based on long echo proton MRS signals representative performance on the test set. ROC [12] analysis is used to evaluate the binary classifiers. The performance is then measured by the mean AUC and its pooled standard error calculated from 200 randomizations. 2.5.2. Multiclass classification In the framework of binary classification we assume that a new MRS spectrum belongs to one of the two considered classes. Nevertheless, in medical practice, the number of possible tumor types is mostly not restricted to two types. This motivates the development of multiclass classifiers, that handle all classes in one construction, which extends the classifiers mentioned in the previous section. With this setup, the classifier is expected to classify a certain spectrum as one of the four tumor types. Various pattern recognition techniques have been tried to distinguish MRS spectra of class 1 (glio) and class 3 (meta), but none gives satisfactory results [9,36]. Alternatively, as was suggested by Tate et al. in [10], we can merge these two classes, obtaining a new group called class 5, containing only aggressive (aggr) tumors. This scheme is depicted as step 1 shown on the left part of Fig. 2. A voting scheme is applied to decide which class is chosen based on the three outputs of the contributing binary classes. With a minimum two-out-of-three vote, a certain class is taken if two or three of the binary classifiers give the same output, otherwise the classifier considers the output as undecided. Step 2 is carried out, as illustrated in the right part: if the output of step 1 is class 5, then further classify the spectrum either into class 1 (glio) or 3 (meta) using the binary classifier 13. If the output is class 2 or 4, then the output of step 2 is the same as the output of step 1. Four binary classifiers are the building blocks of this multiclass classifier: binary classifiers 24 and 13 are available from the 79 previous section, additionally two binary classifiers are required:  meningiomas versus aggressive tumors (class 2 versus class 5),  astrocytomas grade II versus aggressive tumors (class 4 versus class 5). 2.6. Statistical analysis From 200 runs, the mean AUCs (AUC) is listed in the tables, as well as the standard error (SE) on the AUC. For each binary classifier C, the mean and standard error of the AUC is calculated. Consider two classifiers C1 and C2 that handle the same input data; e.g. C1 is PCA/LDA and C2 is LS-SVM with a linear kernel applied to the complete spectra of classes 1 and 2. Let the AUC of each classifier Ci ; i ¼ 1; 2 be Ai;l with standard error SEi;l ; i ¼ 1; 2; l ¼ 1; . . . ; M, with M the number of stratified randomizations (M ¼ 200). The pooled statistics are then given by (i ¼ 1; 2), where nl is the amount of samples for the stratified randomization l ¼ 1; . . . ; M: i ¼ 1 A n M X (2) Ai;l ; l¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M u 1 X ðnl  1ÞSE2i;l ; SEi ¼ t N  M l¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M u1 X SEi ¼ t SE2 : M l¼1 i;l (3) (4) The last line is satisfied, since the test set contains an equal amount of samples forPeach stratified randomization, i.e. 8lnl ¼ n; N ¼ M l¼1 nl . Multiclass Classifier traindata24 binary class 24 2 or 4 1 or 3 Input data traindata25 binary class 25 2 or 5 if 5 then Voting scheme 2 binary class 13 Classifier Output 4 5 traindata45 binary class 45 if 2 or 4 2 or 4 4 or 5 Figure 2 Two-steps classification. The left part shows step 1, classification of three tumor classes: (2) meni, (4) astroII, and (5) aggressive tumors. The right part, or step 2, further refines the classification if the output is class 5 and assigns the spectra of this class either to class 1 (glio) or 3 (meta). 80 L. Lukas et al. A general approach to statistically test whether the areas under two ROC curves derived from the same samples differ significantly from each other is then given by the critical ratio z, defined as [37]: 1  A 2 A z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SE21 þ SE22  2rSE1 SE2 in which r is a quantity representing the correlation introduced between the two areas by studying the same samples. In our study we calculate the z-value  i ; SEi ; i ¼ 1; 2 from based on the pooled statistics A 200 runs as calculated in Eqs. (2)—(4). If the result 1 and A  2 are ing z-value satisfies z 1:96, then A statistically different. The cut-off value 1.96 is taken as the quantity for which, under the hypoth1 ¼ A  2 ), z 1:96 occurs with a esis of equal AUCs (A probability of a ¼ 0:05 under a normal distribution. This ROC analysis is performed for binary classification. Although ROC analysis has been extended to multiclass classification [38], the result is generally non-intuitive and computationally expensive. This motivates the use of the correct classification rate as performance measure for multiclass classification. 3.2. Classification using selected frequency regions By selecting the values within specific frequency ranges in the spectra, the number of datapoints is reduced from 108 to 30. For the LDA classifier, PCA is applied to further reduce this input dimension, covering at least 75% of the variance. These input variables are different from those obtained for the complete spectrum, due to the higher degree of freedom in the latter case. The results of 200 runs of stratified random samplings of the L2-normalized magnitude MRS spectra are shown in Table 3 and Fig. 4. 3.3. Classification using peak integration Table 4 and Fig. 5 show the results of the ROC analysis for classification using peak integration. Five peak integrated values are used as input of the classifiers. The linear classifier LDA is used without applying PCA. 3.4. Multiclass approach 3.1. Classification using complete spectra As mentioned above, two additional binary classifiers are constructed by merging glioblastomas and metastases into one class of aggressive tumors. Table 5 shows the performance of these classifiers using the complete spectra as input. In the following the classification performance of LDA, SVM, and LS-SVM (using linear and RBF kernels) are reported. The result using the complete spectra are summarized in Table 2, while Fig. 3 shows the boxplots corresponding to the same cases. Note that the boxplots display the median of the AUC values and the Interquartile Range (IQR), while the tables display the mean and standard error of the AUC values. The latter can be used to calculate the z-value (Section 2.6). 3.4.1. Training performance One way to train the multiclass classifier is by feeding all the spectra to the classifier and train each binary classifier with the corresponding classes. For example, use the spectra of class 2 and class 4 to train the binary classifier 24, and similarly for the others. Table 6 shows a comparison of the multiclass classifier performance. The first row shows the percentage of correctly classified spectra in the 3. Results Table 2 Classification using complete spectra Classes PCA/LDA SVM lin LS-SVM lin LS-SVM RBF glio-meni glio-meta glio-astroII meni-meta meni-astroII meta-astroII 0:9528  0:0306ð8Þ 0:5926  0:1036ð6Þ 0:9180  0:0627ð7Þ 0:9605  0:0375ð5Þ 0:9313  0:0725ð10Þ 0:9612  0:0533ð4Þ 0:9519  0:0335 0:6323  0:0942 0:9159  0:0565 0:9642  0:0337 0:9661  0:0390 0:9695  0:0418 0:9506  0:0338 0:6431  0:0983 0:9351  0:0524 0:9711  0:0307 0:9581  0:0482 0:9740  0:0393 0:9560  0:0304 0:5851  0:1037 0:9385  0:0486 0:9701  0:0306 0:9595  0:0456 0:9721  0:0377 Average performance on the test set from 200 runs of stratified random samplings of the L2-normalized magnitude MRS spectra. As performance measure we use the mean AUC and its pooled standard error (SE). The number between the brackets mentions the number of principal components used. Brain tumor classification based on long echo proton MRS signals glio vs. meni glio vs. meta 1 1 0.95 81 0.9 0.85 0.8 0.8 0.7 AUC AUC 0.9 0.75 0.7 0.65 0.5 0.6 0.4 0.55 0.5 (a) 0.6 1 2 3 4 Model 0.3 (b) 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.75 0.7 0.65 0.8 0.75 0.7 0.65 0.6 0.6 0.55 0.55 1 2 (c) 3 0.5 4 1 3 4 Model meta vs. astroII meni vs. astroII 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 AUC AUC 2 (d) Model 0.75 0.7 0.65 0.8 0.75 0.7 0.65 0.6 0.6 0.55 0.55 0.5 4 meni vs. meta 1 AUC AUC glio vs. astroII (e) 3 Model 1 0.5 2 1 2 3 4 Model 0.5 1 (f) 2 3 4 Model Figure 3 Boxplots of the area under ROC curves (AUC) on 200 stratified randomly sampled test sets of the four models: (1) PCA-LDA, (2) SVM with linear kernel, (3) LS-SVM with linear kernel, (4) LS-SVM with RBF kernel. Six figures correspond to the binary classifiers using complete spectra: (a) glio vs. meni, (b) glio vs. meta, (c) glio vs. astroII, (d) meni vs. meta, (e) meni vs. astroII and (f) meta vs. astroII. Table 3 Classification using selected frequency regions Classes PCA/LDA SVM lin LS-SVM lin LS-SVM RBF glio-meni glio-meta glio-astroII meni-meta meni-astroII meta-astroII 0:7643  0:0722ð2Þ 0:6381  0:1004ð2Þ 0:8319  0:0776ð2Þ 0:9212  0:0525ð2Þ 0:9079  0:0645ð3Þ 0:9173  0:0689ð2Þ 0:8532  0:0575 0:5081  0:1044 0:8692  0:0713 0:9098  0:0594 0:9592  0:0410 0:9459  0:0549 0:8922  0:0494 0:6368  0:0998 0:8849  0:0660 0:9339  0:0475 0:9619  0:0422 0:9698  0:0389 0:9187  0:0413 0:5576  0:1030 0:9012  0:0594 0:9534  0:0374 0:9617  0:0411 0:9642  0:0429 Average performance on the test set from 200 runs of stratified random samplings of the L2-normalized magnitude MRS spectra. As performance measure we use the mean AUC and its pooled standard error (SE). The number between the brackets mentions the number of principal components used. 82 L. Lukas et al. glio vs. meni glio vs. meta 1 1 0.95 0.9 0.85 0.8 0.8 0.7 AUC AUC 0.9 0.75 0.7 0.65 0.5 0.6 0.4 0.55 0.5 (a) 0.6 1 2 Model 3 4 0.3 (b) 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.75 0.7 0.65 0.6 0.6 0.55 0.55 1 2 Model 3 4 0.5 (d) 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 AUC AUC 1 0.75 0.7 0.65 0.6 0.6 0.55 0.55 2 Model 3 3 4 0.8 0.7 1 Model 0.75 0.65 0.5 2 meta vs. astroII meni vs. astroII (e) 4 0.8 0.7 0.5 3 0.75 0.65 (c) Model meni vs. meta 1 AUC AUC glio vs. astroII 2 4 0.5 (f) 1 2 Model 3 4 Figure 4 Boxplots of the area under ROC curves (AUC) on 200 stratified randomly sampled test sets of the four models: (1) PCA-LDA, (2) SVM with linear kernel, (3) LS-SVM with linear kernel, (4) LS-SVM with RBF kernel. Six figures correspond to the binary classifiers using selected frequency regions: (a) glio vs. meni, (b) glio vs. meta, (c) glio vs. astroII, (d) meni vs. meta, (e) meni vs. astroII and (f) meta vs. astroII. Table 4 Classification using peak integrated values Classes LDA SVM lin LS-SVM lin LS-SVM RBF glio-meni glio-meta glio-astroII meni-meta meni-astroII meta-astroII 0:8504  0:0586 0:6252  0:1007 0:8773  0:0635 0:9103  0:0628 0:8441  0:0858 0:9592  0:0461 0:8561  0:0577 0:6236  0:1005 0:8916  0:0571 0:9113  0:0618 0:8297  0:0926 0:9727  0:0376 0:8448  0:0593 0:6434  0:1006 0:8787  0:0628 0:9191  0:0585 0:8485  0:0851 0:9597  0:0453 0:8677  0:0550 0:6264  0:0988 0:8818  0:0631 0:9357  0:0473 0:8281  0:0921 0:9521  0:0528 Average performance on the test set from 200 runs of stratified random samplings of the L2-normalized magnitude MRS spectra. As performance measure we use the mean AUC and its pooled standard error (SE). Brain tumor classification based on long echo proton MRS signals glio vs. meni glio vs. meta 1 1 0.95 83 0.9 0.85 0.8 0.8 0.7 AUC AUC 0.9 0.75 0.7 0.65 0.5 0.6 0.4 0.55 0.5 0.6 1 2 (a) 3 4 Model 0.3 (b) 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.75 0.7 0.65 0.8 0.75 0.7 0.65 0.6 0.6 0.55 0.55 1 2 (c) 3 4 Model 0.5 (d) 1 2 3 4 Model meta vs. astroII meni vs. astroII 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 AUC AUC 4 meni vs. meta 1 AUC AUC glio vs. astroII 0.75 0.7 0.65 0.8 0.75 0.7 0.65 0.6 0.6 0.55 0.55 0.5 3 Model 1 0.5 2 1 (e) 2 3 4 Model 0.5 (f) 1 2 3 4 Model Figure 5 Boxplots of the area under ROC curves (AUC) on 200 stratified randomly sampled test sets of the four models: (1) LDA, (2) SVM with linear kernel, (3) LS-SVM with linear kernel, (4) LS-SVM with RBF kernel. Six figures correspond to the binary classifiers using peak integration: (a) glio vs. meni, (b) glio vs. meta, (c) glio vs. astroII, (d) meni vs. meta, (e) meni vs. astroII and (f) meta vs. astroII. class of meningiomas, astrocytomas grade II and aggressive tumors. One undecided case arose when using PCA/LDA with complete spectra classification, 15 when using PCA/LDA and one when using Table 5 LS-SVM classification both with the selected frequency regions as input variables. In the second step, we use classifier 13 to further subclassify the aggressive class. Using this subclas- Classification using complete spectra Classes LDA SVM lin LS-SVM lin LS-SVM RBF meni-aggr astroII-aggr 0:9433  0:0306ð6Þ 0:9230  0:0674ð6Þ 0:9409  0:0343 0:9343  0:0458 0:9620  0:0279 0:9416  0:0502 0:9110  0:1121 0:9129  0:1137 Average performance on the test set from 200 runs of stratified random samplings of the L2-normalized magnitude MRS spectra. As performance measure we use the mean AUC and its pooled standard error (SE). The number between the brackets mentions the number of principal components used. 84 Table 6 L. Lukas et al. One-step classification using complete spectra Compl. spec. Disc. feat. Peak integ. PCA/LDA (%) LS-SVM lin (%) LS-SVM RBF (%) 84.6995 65.0273 75.9563 93.9891 84.6995 77.0492 97.8142 90.1639 80.8743 Percentage of correctly classified spectra using all L2-normalized magnitude MRS spectra to assess the training performance. Table 7 Two-steps classification using complete spectra Compl. spec. Disc. feat. Peak integ. PCA/LDA (%) LS-SVM lin (%) LS-SVM RBF (%) 71.0383 50.2732 61.7486 78.1421 68.8525 62.2951 83.6066 74.8634 67.7596 Percentage of correctly classified spectra using all L2-normalized magnitude MRS spectra to assess the training performance. sification, the multiclass classifier’s performance is shown in Table 7. 3.4.2. Test performance Besides using all the spectra to choose the hyperparameters and to train the classifiers, one can also select 2/3 of the dataset as training set and use the remainder as test set. This stratified random sampling is repeated for 200 runs. The results are shown in Table 8 for one-step classification, which assigns Table 8 the spectra to one of the three following classes: 2, 4 or 5. Table 9 shows the classifier performance after two-steps classification, which assigns the spectra to 1 of the 4 following classes: 1, 2, 3 or 4. Each spectrum of class 5 in step 1, is either assigned to class 1 or class 3 in step 2. In Tables 8 and 9 we mention the mean correct classification rate, the mean misclassification rate and the mean percentage of undecided cases and their standard deviation. The correct classification rate is defined as the One-step classification using complete spectra Correct Misclass Undecided PCA/LDA (%) LS-SVM lin (%) LS-SVM RBF (%) 80:1855  4:2853 14:0887  4:0665 05:7258  2:6110 82:7823  3:3449 13:6532  3:1140 03:5645  2:0870 83:5726  3:5058 12:5565  3:3290 03:8710  2:1144 Average performance on the test set from 200 runs of stratified random samplings (2/3 of the data used for training, 1/3 for testing). The first, second and third rows give, respectively, the mean correct classification rate, the mean misclassification rate and the mean percentage of undecided cases, each with their standard deviation. Table 9 Two-steps classification using complete spectra Correct Misclass Undecided PCA/LDA (%) LS-SVM lin (%) LS-SVM RBF (%) 63:1532  4:7255 31:1210  4:6858 05:7258  2:6110 65:7984  3:3449 30:6371  3:1706 03:5645  2:0870 66:8145  3:5058 29:3145  3:5954 03:8710  2:1144 Average performance on the test set from 200 runs of stratified random samplings (2/3 of the data used for training, 1/3 for testing). The first, second and third rows give, respectively, the mean correct classification rate, the mean misclassification rate and the mean percentage of undecided cases, each with their standard deviation. Note that the number of undecided cases is equal to that for the one-step classifier. Brain tumor classification based on long echo proton MRS signals percentage of correctly classified spectra, while the misclassification rate is the percentage of misclassified cases. 4. Discussion In this section we discuss various issues concerning the results we obtained using the available long echo 1 H MRS data. We do not necessarily claim that these remarks generally hold for similar analyses on other data. 4.1. Limitations MRS signals of brain tumors contain chemical information about metabolites characteristic for the type of tumor. Nevertheless, there are still some factors making it hard to construct a classifier which is able to discriminate between different brain tumors using MRS signals: (1) The limited number of available spectra per type of tumor (see Table 1). Especially the amount of available metastases and astrocytomas grade II is low. This makes it difficult to construct a classifier with a high generalization capacity. (2) The presence of noise and artefacts in the spectra. Even after elimination of the dominating water peak, remaining artefacts might affect important peaks in the spectra. (3) The large variances within each class and the overlap between spectra of different brain tumor types (see Fig. 1). For example, the mean spectra of glio and meta show a very similar characteristic pattern, which makes the discrimination between glio and meta a very hard problem. This problem is also observed in [9,36]. Further discussion about this is addressed below. 4.2. Glioblastomas versus metastases Although we obtained a low performance for distinguishing glioblastomas (glio) and metastases (meta), there are indications that these tumor types might be separable based on MR. In [8] Szabo De Edelenyi et al. introduced the so-called nosologic images, which is an approach to analyze 1 H MRSI data of brain tumors. It is a tool that assigns the spectroscopic data of each voxel in the spectroscopic image to a histopathological class. Classification was carried out by LDA applied on six metabolite values obtained from long echo 1 H MRSI spectra (TE ¼ 272 ms), together with the unsuppressed 85 water area. Their study included 77 images, of which 24 high-grade gliomas and 10 metastases, for which they obtained a training performance of 87% following a leave-one-out (LOO) procedure. For the highgrade gliomas and metastases, respectively, 19 and 6 spectra were correctly assigned. Researchers [39—41] have found a few metabolite peaks or ratios which might contribute to the discrimination of high-grade gliomas and metastases. Law et al. concluded out of a study based on MRSI that, despite the small size of their dataset (11 high-grade gliomas, 6 metastases), the Cho/Cr ratio was significantly higher in high-grade gliomas than in metastases; this was the case for the tumoral region as well as the peritumoral region. Also based on perfusion-weighted MRI they have found different characteristics. Opstad et al. [41] have considered short echo 1 H MRS spectra (TE ¼ 30 ms) from 25 glioblastomas and 34 metastases. Based on these data, they were able to find a significant difference in the ratio of the 1.3 ppm and the 0.9 ppm lipid/macromolecule peaks between these two tumor groups. This lipid peak area (LPA) ratio was 2:6  0:6 for glioblastomas and 3:8  1:4 for metastases (P 0:0001). Based on 1 H MRS, Ishimaru et al. [39] have shown that the absence of Cr might indicate a diagnosis of metastasis, while in short echo the absence of lipids may exclude metastasis. In Fig. 1 we do indeed notice a large mean lipid peak in metastases, but also in glioblastomas. This latter might be due to the occurrence of necrotic tissue in part of the glioblastomas. This partially explains the large variation we especially observe within this class and the similarity with the class of metastases. 4.3. Classification techniques In general, LDA as linear classifier, preceded by PCA (except for peak integration) performs quite well in solving the brain tumor classification problem. This is in correspondence with [10]. However, due to its linear boundary, overlapping classes are very difficult to handle. As stated above, the small dataset available also forms a limitation for training. Therefore, the discrimination boundary will strongly correlate with the training set. Especially LDA requires a significant amount of datapoints to be able to draw a linear separating line between overlapping classes. In addition, it is possible that the separating line is very dependent on the selected training set. Kernel-based classifiers are less sensitive to the amount of datapoints; although the dimension is larger than the number of datapoints, these classifiers could draw an optimal separating boundary, 86 without applying any dimensionality reduction (such as PCA). Kernel-based classifiers, SVM and LS-SVM, feature the advantage of detecting automatically important characteristics independently of the input pattern. Based on the statistical analysis, described in Section 2.6, no statistically significant difference was found between the AUC values for any of the classification techniques applied to the available long echo 1 H MRS data. The highest z-value (1.72) was obtained, when comparing PCA/LDA with LSSVM with a linear kernel based on the frequency selected regions; this is still lower than the cut-off value (1.96). Also from visual inspection, we cannot conclude that there is a clear difference in between the considered classification techniques. In particular, the best performing technique depends on the considered classes and the type of input. To be more specific, when comparing the classification techniques we can group them in two ways (e.g. consider only the results with the complete spectra as input):  linear (LDA, SVM lin, LS-SVM lin) versus non-linear techniques (LS-SVM RBF): in the cases glio-meni, glio-astroII the mean AUC values are slightly higher for the non-linear technique, while in the other cases the AUC values are in the same range or slightly lower.  LDA versus kernel-based techniques (SVM lin, LS-SVM lin, LS-SVM RBF): in the cases meni-meta, meni-astroII and meta-astroII the kernel-based techniques perform slightly better. Additionally, we can still consider the comparison of SVM versus LS-SVM. Out of this we can only conclude that the best performing technique is also quite dependent on the case. 4.4. Influence of dimensionality reduction For classification using selected frequency regions (Section 3.2) and peak integrated values (Section 3.3) the input dimension is reduced by selecting only spectral regions which contain resonances of important metabolites. The underlying idea for this dimensionality reduction is to remove any redundant input features and reduce the influence of noise and artefacts. Hence, we try to enhance the discriminatory chemical information present in the spectra. In contradiction to our expectations, we observe that the results in Sections 3.2 and 3.3 on average are worse than in Section 3.1. This seems to imply that this approach to dimensionality reduction also reduces part of the valuable information, that is present in the excluded frequency regions, which L. Lukas et al. was important to explain the variance between the brain tumor classes. More specifically, here we are not considering resonances which typically have a low intensity at long echo times (because of a small T2-value or cancellation due to J-modulation) [42,43]: mI (e.g. with triplets and multiplet at 3.26 and 3.57 ppm); Glu (e.g. multiplets at 2.33 and 3.74 ppm); Gln (e.g. multiplets at 2.43 and 3.75 ppm); Gly (singlet at 3.55 ppm). In Fig. 1 we indeed notice a few small peaks around the specified resonances. 4.5. Multiclass classification Multiclass classifiers handle all classes in one construction. We reduce this problem to a set of four binary classification problems, as explained in Section 2.5.2. Hence, we obtain four separating functions instead of one (one for each binary problem). Multiclass classifiers with the proposed scheme show a high learning capability. This is illustrated by the correct classification rates for the first step using the complete spectra as input: 84.7% (PCA/LDA), 93.9% (LS-SVM lin) and 97.8% (LS-SVM RBF). Given an independent test set as input, the classifiers on average give a quite good generalization performance: 80.2% (PCA/LDA), 82.8% (LS-SVM lin) and 83.6% (LS-SVM RBF). In the second step of the multiclass classifier we combine the output of the first step with the binary classifier that separates glio and meta. The test performances reduce to 63.1% correct classification (LDA/PCA), 65.8% (LS-SVM lin), 66.8% (LS-SVM RBF). This can be explained by the hard binary problem glio versus meta. As observed in the binary classification, these two classes are very similar. Therefore, separating them from one single class 5 into class 1 and class 3 deteriorates the total performance of the classifier. Although, no ROC analysis for multiclass classification was performed in order to test for significant differences in between the classification techniques, the following indications can be noted, without drawing a general conclusion. The results, after step 1 as well as step 2, yield a clearly higher training performance for the kernel-based methods than for LDA. Moreover, the kernel-based methods on average perform slightly better on an independent test set than LDA. This is clear from the mean percentage of correctly classified spectra which is a few percentages higher for LS-SVM (see Tables 8 and 9 and Fig. 6). Also the mean percentage of undecided cases differs slightly in favor of LS-SVM. This indicates that kernel-based methods can generalize at least as well as LDA based on a small dataset. Brain tumor classification based on long echo proton MRS signals 75 90 70 85 performance performance 87 80 75 60 55 70 50 1 (a) 65 2 3 classification technique 1 (b) 2 3 classification technique Figure 6 Boxplots of the correct classification rate on the test set from 200 runs of stratified random samplings (2/3 of the data for training, 1/3 for testing) of the three models: (1) LDA, (2) LS-SVM with linear kernel, (3) LS-SVM with RBF kernel. Two figures correspond to the multiclass classifiers using complete spectra: (a) output of step 1 classifier, (b) output of step 2 classifier. 5. Conclusions This paper shows a comparative study of brain tumor classification based on long echo 1 H MRS signals. Linear as well as non-linear classifiers are compared. All techniques are applied automatically, including (hyper-) parameter selection, training and testing. Also for use in clinical practice, all techniques are easy to automate for analysis of independent data. Binary classification gives more insight on the distributions of each class and their overlap. Except for the hard case glioblastomas versus metastases, all classifiers based on the complete magnitude spectra are able to reach an AUC of more than 0.9. Based on the available data, we were not able to statistically prove any difference in performances between the classification techniques, for binary as well as multiclass classification. This indicates that kernel-based methods and LDA statistically perform as well for classification of brain tumors based on a small set of long echo 1 H MRS data. However, each of the applied techniques has its characteristics. LDA requires a prior dimensionality reduction of input variables (e.g. by applying PCA), while dimensionality reduction is done automatically in kernel-based methods. We expected that dimensionality reduction, by selecting frequency regions or peak integration, would reduce the disturbing noise and artefacts in the spectra. However, the described approach for selecting resonance peaks of long echo 1 H MRS spectra resulted in a lower performance. It might be necessary to include additional spectral information to increase classification performance. This also motivates further research in learning the peak pattern of short echo 1 H MRS, for which data are also provided within the INTERPRET project. By using magnitude spectra, phasing problems are avoided. Nevertheless, with respect to real spectra, in magnitude spectra there occurs more peak overlap. Also, in real spectra at long echo time TE (TE ¼ 135, 136 ms) the peaks of Ala and Lac are inverted. This might reduce the ability to distinguish tumor types based on subtle differences in the spectral pattern. In order to test for this effect, in a future study also real spectra could be included as input features. Discriminating aggressive tumor types, glioblastomas and metastases, using long echo 1 H MRS spectra clearly is a very hard problem due to the highly similar pattern of the spectra from both classes, possibly due to the presence of some necrotic tissue. In order to address this problem, the discriminatory information present in the 1 H MRS spectrum should be enhanced, potentially by improvements in the acquisition of 1 H MRS signals. In particular, improvements are expected when processing short echo 1 H MRS signals, since more metabolites are visible in these spectra. Moreover, this spectral information is spread out over a larger amount of peaks, thereby enlarging the number of possible discriminatory features. This is part of future research. Acknowledgements This research work was carried out at the ESAT Laboratory and the Interdisciplinary Center of Neural Networks ICNN of the Katholieke Universiteit Leuven, in the framework of the Belgian Programme on Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister’s Office for Science, Technology and Culture (IUAP Phase V-22), the Concerted Action Project MEFISTO 88 of the Flemish Community, the FWO projects G.0407.02 and G.0269.02 and the IDO/99/03 project. AD research financed by IWT grant of the Flemish Institute for the promotion of scientifictechnological research in the industry. LVH is a postdoctoral researcher with the National Fund for Scientific Research FWO, Flanders. Use of the data provided by the EU funded INTERPRET project (IST-1999-10310; http://carbon.uab.es/ INTERPRET/) is gratefully acknowledged. References [1] The Brain Tumor Society. http://www.tbts.org. [2] Mukherji SK, editor. Clinical applications of magnetic resonance spectroscopy. Wiley-Liss, 1998. [3] Smith ICP, Stewart LC. Magnetic resonance spectroscopy in medicine: clinical impact. Prog Nucl Mag Res Sp 2002; 40:1—34. [4] Cousins JP. Clinical MR spectroscopy: fundamentals, current applications, and future potential. AJR Am J Roentgenol 1995;164:1337—47. [5] Lindon JC, Holmes E, Nicholson JK. Pattern recognition methods and applications in biomedical magnetic resonance. Prog Nucl Mag Res Sp 2001;39:1—40. [6] Herminghaus S, Dierks T, Pilatus U, Möller-Hartmann W, Wittsack J, Marquardt G, et al. Determination of histopathological tumor grade in neuroepithelial brain tumors by using spectral pattern analysis of in vivo spectroscopic data. J Neurosurg 2003;98:74—81. [7] Simonetti AW, Melssen WJ, van der Graaf M, Heerschap A, Buydens LMC. Brain tumor classification and probability maps using MRI and MRSI data. Anal Chem 2003;75(20): 5352—61. [8] Szabo De Edelenyi F, Rubin C, Estève F, Grand S, Décorps M, Lefournier V, et al. Nature Med. 2000;6:1287—9. [9] Tate AR, Griffiths JR, Mart½´nez-Pérez I, Moreno A, Barba I, Cabañas ME, et al. Towards a method for automated classification of 1H MRS spectra from brain tumours. NMR Biomed 1998;11:177—91. [10] Tate AR, Majós C, Moreno A, Howe FA, Griffiths JR, Arús C. Automated classification of short echo time in vivo 1 H brain tumor spectra: a multicenter study. Magn Reson Med 2003;49:29—36. [11] Ye C-Z, Yang J, Geng D-Y, Zhou Y, Chen N-Y. Fuzzy rules to predict degree of malignancy in brain glioma. Med Biol Eng Comput 2002;40:145—52. [12] Swets JA. ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol 1979;14(2):109—21. [13] International network for pattern recognition of tumours using magnetic resonance. http://carbon.uab.es/INTERPRET/. [14] Ladroue C, Tate AR, Howe FA, Griffiths JR. Exploring magnetic resonance data with independent component analysis. In: Proceedings of the 19th Annual Meeting of the European Society for Magnetic Resonance in Medicine and Biology (ESMRMB02), Cannes, France, August 22—25, 2002. p. 147—8. [15] Lefournier V, Szabo De Edelenyi F, Estève F, Grand S, Bessou P, Boubagra K, et al. Nosologic images for classification of brain tumors with 1 H MRSI: clinical performance. In: Proceedings of the 19th Annual Meeting of the European Society for Magnetic Resonance in Medicine and Biology (ESMRMB02), Cannes, France, August 22—25, 2002. p. 91—2. L. Lukas et al. [16] Lukas L, Devos A, Suykens JAK, Vanhamme L, Van Huffel S, Tate AR, et al. The use of LS-SVM in the classification of brain tumors based on magnetic resonance spectroscopy signals. In: Proceedings of the European Symposium for Artifical Neural Networks (ESANN), Bruges, Belgium, April 24—26, 2002. p. 131—5. [17] Lukas L, Devos A, Suykens JAK, Vanhamme L, Van Huffel S, Tate AR, et al. The use of LS-SVM in the classification of brain tumors based on 1 H-MR spectroscopy signals. In: Proceedings of the IEE Symposium on Medical Applications of Signal Processing, Savoy Place, London, UK, October 7, 2002. p. 15/1—5. [18] Szabo De Edelenyi F, Estève F, Rémy C, Buydens L. Proceedings of the 19th Annual Meeting of the European Society for Magnetic Resonance in Medicine and Biology (ESMRMB02), Cannes, France, August 22—25, 2002. p. 91. [19] Tate AR, Griffiths JR, Howe FA, Pujol J, Arús C. Differentiating types of human brain tumours by MRS. A comparison of pre-processing methods and echo times. In: Proceedings of the Ninth Scientific Meeting & Exhibition (ISMRM01), Glasgow, Scotland, April 21—27, 2001. p. 2284. [20] Vapnik V. The nature of statistical learning theory. New York: Springer, 1995. [21] Vapnik V. Statistical learning theory. New York: Wiley, 1998. [22] Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neur Proc Lett 1999;9(3):293—300. [23] Klose U. In vivo proton spectroscopy in presence of eddy currents. Magn Reson Med 1990;14:26—30. [24] Barkhuijsen H, De Beer R, Van Ormondt D. Improved algorithm for noniterative time-domain model fitting to exponentially damped magnetic resonance signals. J Magn Reson 1987;73:553—7. [25] Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley, 2001. [26] Ripley BD. Pattern recognition and neural networks. Cambridge: Cambridge University Press, 1996. [27] Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J. Least squares support vector machines. Singapore: World Scientific, 2002. [28] Gunn SR. Support vector machines for classification and regression. Technical Report. Image Speech and Intelligent Systems Research Group, University of Southampton, 1997. [29] MATLAB support vector machines toolbox. http://www. isis.ecs.soton.ac.uk/isystems/kernel. [30] MATLAB/C LS-SVMlab toolbox. http://www.esat.kuleuven.ac.be/sista/lssvmlab. [31] Pelckmans K, Suykens JAK, Van Gestel T, De Brabanter J, Lukas L, Hamers B, et al. LS-SVMlab Toolbox User’s Guide. Internal Report 02-145. ESAT-SISTA, K. U. Leuven, Leuven, Belgium, 2002. [32] Howe FA, Barton SJ, Cudlip SA, Stubbs M, Saunders DE, Murphy M, et al. Metabolic profiles of human brain tumors using quantitative in vivo 1 H magnetic resonance spectroscopy. Magn Reson Med 2003;49:223—32. [33] Lecrerc X, Huisman TAGM, Sorensen AG. The potential of proton magnetic resonance spectroscopy (1 H) in the diagnosis and management of patients with brain tumors. Curr Opin Oncol 2002;14:292—8. [34] Majós C, Alonso J, Aguilera C, Serrallonga M, Acebes JJ, Arús C, et al. Adult primitive neuroectodermal tumor: proton MR spectroscopic findings with possible application for differential diagnosis. Radiology 2002;225:556—66. [35] Murphy M, Loosemore A, Clifton AG, Howe FA, Tate AR, Cudlip SA, et al. The contribution of proton magnetic resonance spectroscopy (1 H MRS) to clinical brain tumour diagnosis. Br J Neurosurg 2002;16(4):329—34. Brain tumor classification based on long echo proton MRS signals [36] Poptani H, Kaartinen J, Gupta RK, Niemitz M, Hiltunen Y, Kauppinen RA. Diagnostic assessment of brain tumours and non-neoplastic brain disorders in vivo using proton nuclear magnetic resonance spectroscopy and artificial neural networks. J Cancer Res Clin Oncol 1999;125:343—9. [37] Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839—43. [38] Srinivasan A. Note on the location of optimal classifiers in n-dimensional ROC space. Technical Report PRG-TR-2-99. Oxford University Computing Laboratory, Oxford, England, 1999. [39] Ishimaru H, Morikawa M, Iwanaga S, Kaminogo M, Ochi M, Hayashi K. Differentiation between high-grade glioma and metastastic brain tumor using single-voxel proton MR spectroscopy. Eur Radiol 2001;11:1784—91. 89 [40] Law M, Cha S, Knopp EA, Johnson G, Arnett J, Litt AW. High-grade gliomas and solitary metastases: differentiation by using perfusion and proton spectroscopic MR imaging. Radiology 2002;222:715—21. [41] Opstad KS, Griffiths JR, Bell BA, Howe FA. In vivo lipid T2 relaxation time measurements in high-grade tumors: differentiation of glioblastomas and metastases. In: Proceedings of the 11th Scientific Meeting and Exhibition (ISMRM 03), Toronto, Canada, July 10—16, 2003, p. 754. [42] Ernst T, Hennig J. Coupling effects in volume selective 1H spectroscopy of major brain metabolites. Magn Reson Med 1991;21:82—96. [43] Govindaraju V, Young K, Maudsley AA. Proton NMR chemical shifts and coupling constants for brain metabolites. NMR Biomed 2000;13:129—53.