EP2710147A1

EP2710147A1 - Molecular analysis of acute myeloid leukemia

Info

Publication number: EP2710147A1
Application number: EP12724304.6A
Authority: EP
Inventors: Joachim Schultze; Andrea Hofmann; Andrea Staratschek-Jox
Original assignee: Rheinische Friedrich Wilhelms Universitaet Bonn
Current assignee: Rheinische Friedrich Wilhelms Universitaet Bonn
Priority date: 2011-05-18
Filing date: 2012-05-18
Publication date: 2014-03-26
Also published as: WO2012156515A1

Abstract

The present invention provides a method for molecular analysis of Acute Myeloid Leukemia (AML)based on the abundance of particular RNAs from blood samples, as well as diagnostic tools such as kits and arrays suitable for such method.

Description

Molecular Analysis of Acute Myeloid Leukemia

The present invention provides a method for molecu l ar analysis of Acute Myeloid Leukemia (AML) based on the abundance of particular RINAs from blood samples, as well as diagnostic tools such as kits and arrays suitable for such method .

Background of the Invention

With the initial introduction of DNA microarray analysis for cancer diagnostics in the late 1990s (T. R. Golub et al., Science 286: 531-537 (1999)), a rush towards diagnostic, prognostic and even predictive gene signatures was initiated (J. A. Ludwig and J.N. Weinstein, Nat. Rev. Cancer 5 : 845-856 (2005); A. Rosenwald et al., New England J. Med . 346: 1937-1947 (2002); E.J. Yeoh et al., Cancer Cell 1 : 133-143 (2002); L.J. van 't Veer et al ., Nature 415 : 530-536 (2002); D.G. Beer et al., Nat. Med . 8: 816-824 (2002); A. Bhattacharjee et al., Proc. Natl. Acad . Sci. USA 98: 13790-13795 (2001); S. Ramaswamy et al., Nat. Genet. 33 : 49-54 (2003); S.L. Pomeroy et al., Nature 415 : 436-442 (2002)). Not much later similar strategies were applied to other areas of medical sciences, e.g . infectious diseases (M .P. Berry et al., Nature 466: 973-977 (2010)). At the same time, there was a tremendous surge for novel analytical tools and mathematical algorithms to be utilized for the analysis of high throughput data for diagnostic purposes (M. D. Radmacher et al., J Comput Biol 9 : 505-511(2002); A.M . Glas et al ., BMC Genomics 7 : 278 (2006); Q. Liu et al., PLoS One 4: e8250 (2009); T. Reme et al ., BMC Bioinformatics 9 : 16(2008); R. M : Parry et al., Pharmacogenomics J. 10 : 292- 309 (2010)). Irrespective of these significant advances, the number of gene signatures that have entered clinical practice is alarmingly small (FDA, FDA Clears Breast Cancer Specific Molecular Prognostic Test (2007) and F.M . Goodsaid et al., Nat Rev Drug Discov 9 :435-445 (2010)). More futile, there have been serious concerns about the validity of several landmark studies in gene signature development (S. Michiels et al., Lancet 365 :488-492 (2005)) if not about the approach in general (S. Michiels et al., Lancet 365 :488-492 (2005); J. P. Ioannidis et al., Nat Genet 41 : 149-155 (2009); A. Dupuy A, R. M . Simon, J Natl Cancer Inst 99 : 147-157(2007)). As an important result of these concerns the MicroArray Quality Control (MAQC-I) project was successfully installed demonstrating that the technology itself is reliable and reproducible (L. Shi et al ., Nat Biotechnol 24: 1151- 1161(2006); L. Shi et al., Curr Opin Biotechnol 19 : 10-18 (2008)). More recently, concerns about overoptimistic data presentation in clinical gene signature studies could be eased by improving the analytical processes used within these pilot studies (X. Fan et al., doi : 10.1158/1078-0432. CCR-09-1815). In addition, the MAQC-II study set the framework for further development of microarray-based predictive models (L. Shi et al., Nat Biotechnol 28: 827-838(2010); S. Dudoit et al., J . Am. Stat. Assoc. 97, 77-87 (2002). Several important points are derived from this large consortium effort. Most important, model prediction performance is largely (biological) endpoint dependent, probably the most critical finding supporting further development of gene signature technology. Further, internal validation performance from well-implemented, unbiased cross-validation shows a high degree of concordance with external validation performance. Nevertheless, external validation is a critical feature for signature development. Formerly questioned by others [L. Ein-Dor et al., Bioinformatics 21 : 171-178 (2005); L. Ein-Dor et al., Proc Natl Acad Sci U S A 103 : 5923-5928 (2006)) the MAQC-II study also clearly established that many classifiers with similar performance can be developed from a given data set. Not surprising, proficiency of investigators and good modeling practice are leading to improved results (L. Shi et al ., Nat Biotechnol 28: 827-838 (2010)). WO2010/143941 discloses the subclassifying of juvenile leukemia via molecular signatures to predict the development of the disease. This subclassifying, however, requires that a primary diagnosis of AML has been established beforehand. Finally WO2006/071088 discloses gene expression analysis by RNA hybridization and quantitation based on a small number of patients. The genes determined in this analysis include CITED2, MGST1, BIN 1, RAB32, ICAM3, PXN, PPGB and TAF15.

Summary of the Invention

Increasing data support the notion that biological high throughput data will transform molecular diagnostics of many diseases including cancer and infection. Among the most advanced approaches are gene signatures based on gene expression profiling of diseased tissue or peripheral blood. However, despite the enormous number of studies performed translation of gene signatures into clinical use has been very limited and continues to be tremendously difficult. Here, we introduce adaptive learning and simulation approaches to significantly improve and accelerate the development of gene signature-based diagnostic biomarkers as exemplified for acute myeloid leukemia. In addition to current approaches determining optimal classifiers within a defined study setting (training and validation set), the overall study setting (n> 10.000) was permutated thereby simulating the performance range (sensitivities, specificities, AUC) of potential disease classifiers in other study settings. With these significant improvements we establish an exceedingly robust and clinically applicable gene signature for the diagnosis of acute myeloid leukemia.

These comprehensive findings strongly suggest to quickly develop high throughput gene expression data into diagnostic tests to address several unresolved issues. First, to better judge the validity of small pilot trials a two-step validation approach is developed that is combined with randomized permutation ("10.000 clinical trial simulation"). Second, to predict the minimum size of a consecutive pivotal validation trial an algorithm is described, which combines sample simulation and adaptive learning approaches ("on the fly optimization strategy"). This approach can also estimate overall best test performance. Further, evidence is provided that patients included in such a pivotal trial can already benefit from this adaptive learning algorithms. Utilizing these approaches a high-performance test for primary molecular diagnosis of leukemia is established . A typical patient history for a patient with AML is characterized by early episodes with fever, abnormal fatigue, signs of an infection or a cold. Such patients usually are visiting their private practitioner and very often the early signs of leukemia are misclassified as viral or bacterial infections with leukocytosis. Only if symptoms are remaining for a prolonged time, patients are sent to the hospital (see also Figure 7A). Particularly if such hospitals are not specialized in hematology, these patients have to be further forwarded to regional centers or university hospitals before they are correctly diagnosed with AML. At the centers the primary diagnosis of AML is performed by an experienced hematologist using the patient's history, blood counts and light microscopy of cells derived from bone marrow aspirates. Based on the primary diagnosis of AML using these rather old technologies (light microscope, blood counts) a hematologist is usually performing further tests for differential diagnosis, subclassification of the disease, prognosis of disease outcome, or therapy outcome prediction. These include flow cytometric analysis, cytogenetics, and PCR-based assays for genetic translocations. Previous inventions in the field of gene expression profiling (GEP) are exclusively targeted at improving differential diagnosis, subclassification, prognosis of disease outcome, and therapy outcome prediction and rely on primary diagnosis by current diagnostic procedures (patient history, physical exam, blood counts and light microscopy of bone marrow cells) prior introduction of the inventions as new diagnostics. In other words, these tests are supposed to be used in addition to current methodology for differential diagnosis and subclassification and only add further value by introducing prognosis of disease outcome and therapy outcome prediction as endpoints. Figure 7A shows their relation to current standards in AML diagnostics. However, as described in WO2010/143941, the performance of GEP-based assays for these outcomes are currently still insufficient for a successful clinical application.

The present invention is directed to substitute currently used approaches (patient history, physical exam, blood counts and light microscopy of bone marrow cells) for primary diagnosis by gene expression profiling (GEP) of peripheral blood (grey box in Figure 7B). Moreover, our test is targeted at an earlier time point during the diagnostic process and should be available for the private practitioner rather than the specialist, e.g . a hematologist or an oncologist. This is shown in Figure 7B. It is envisioned that a private practitioner can order such GEP-based test for primary diagnosis of AML from a specialized laboratory at an early time point of the diagnostic process. The specialized laboratory will provide the private practitioner with a probability for the patient being diagnosed with AML. If the patient is diagnosed with AML, the private practitioner can directly refer the patient to a center for the treatment of the patient. The center is then quickly performing further diagnostics for subclassification, therapy outcome prediction and prognosis (e.g . by other GEP-based algorithms) and can immediately start with therapy.

Of note, neither for the clinically based tentative diagnosis of AML by private practitioners, nor for the tentative diagnosis by regional hospitals nor for the primary diagnosis (using classical diagnostic procedures) in specialized centers we know the specificity and sensitivity for a correct diagnosis of AML patients. Estimations range from 60 to 95% for both specificity and sensitivity in the three scenarios. Therefore, the very high statistical performance of our GEP-based assay (> 99%) is considerably higher than current practice.

Another very important advantage of the present invention is the possibility to diagnose patients significantly earlier. Since our GEP-based assay can be performed from a small amount of blood in an unbiased fashion, the primary diagnosis does not require the expertise of specialized hematologists (in specialized centers). In other words, the private practioner (with the help of a specialized laboratory) can be enabled to primarily diagnose AML in a rather short time frame.

The invention thus provides methods and kits for diagnosing, detecting, and screening for of Acute Myeloid Leukemia (AML). Also provided is a method for preparing an RNA expression profile that is indicative of the presence or absence of AML in a subject. Further provided is the eva luation of the patient RNA expression profiles for the presence or absence of one or more RNA expression signatures that are indicative of AML. More concretely the application provides a method for the detection of AML in a human subject based on RNA from a blood sample obtained from said subject, comprising :

measuring the abundance of at least 4 RNAs in the sample, that are chosen from the RNAs listed in Ta ble 2, a nd

concluding based on the measured abundance whether the subject has AML The above method is suitable as a primary test for AML, i.e. it does not require a preceding primary test by classical methods. The conclusion whether the patient has AML or not may comprise, in a preferred embodiment of the method, classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML (in a reference set). In the present method, a sample can be classified as being from a patient with AML or from a healthy individual without the necessity to run a reference sample of known origin (i.e. from an AML patient or a healthy individual) at the same time.

In a preferred embodiment the method of the invention is a method for the detection of AML in a human individual based on RNA obtained from a blood sample obtained from the individual, comprising :

determining the abundance of at least 4 RNAs in the sample that are chosen from the RNAs listed in Table 2, and

classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.

Particularly preferred for this method is that the abundance of at least 6 RNAs, of at least 8 RNAs, of at least 10 RNAs, of at least 12 RNAs, or of at least 14 RNAs listed in Table 2 is determined. It is further preferred to determine the highest ranked RNAs of Table 2 in the method of the invention, i.e. the first 4, first 6, first 8, first 12 or first 14 RNAs of Table 2.

In one aspect, the invention provides a method for preparing RNA expression profiles that are indicative of the presence or absence of AML. The RNA expression profiles are prepared from patient blood samples. The number of transcripts in the RNA expression profile may be selected so as to offer a convenient and cost effective means for screening samples for the presence or absence of AML cancer with high sensitivity and high specificity. Generally, the RNA expression profile includes the expression level or "abundance" of from 4 to about 3000 transcripts. In certain embodiments, the expression profile includes the RNA levels of 2500 transcripts or less, 2000 transcripts or less, 1500 transcripts or less, 1000 transcripts or less, 500 transcripts or less, 250 transcripts or less, 100 transcripts of less, or 50 transcripts or less. In such embodiments, the profile may contain the abundance or expression level of at least 4 RNAs that are indicative of the presence or absence AML, and specifically, as selected from Table 2, or may contain the expression level of at least 6, at least 8, at least 12 or at least 14 RNAs selected from Table 2. Where larger profiles are desired, the profile may contain the expression level or abundance of at least about 60, at least 100, at least 150, or 200 RNAs that are indicative of the presence or absence of AML, and such RNAs may be selected from Table 2. Combinations of genes and/or transcripts that make up or are included in expression profiles are available from Examples 1 to 15 shown in Tables 3, 4 and 5.

Such RNA expression profiles in accordance with this aspect may be evaluated for the presence or absence of an RNA expression signature indicative of AML. Generally, the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity for the detection of AML. For example, the area under the ROC curve (AUC) may be at least at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9. The AUC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detection method described herein. An AUC of 1.0 refers to a sensitivity and specificity of 100 %.

Alternatively median Mathews Correlation Coefficient (MCC) may be at least at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9. Again, the MCC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detection method described herein. An MCC of 1.0 refers to a sensitivity and specificity of 100 %.

In a second aspect, the invention provides a method for detecting, diagnosing, or screening for AML. In this aspect, the method comprises preparing an RNA expression profile by measuring the abundance of at least 4, at least 6, at least 8, or at least 12, or at least 14 RNAs in a patient blood sample, where the abundance of such RNAs are indicative of the presence or absence of AML. The RNAs may be selected from the RNAs listed in Table 2. The method further comprises evaluating the profile for the presence or absence of an RNA expression signature indicative of AML, to thereby conclude whether the patient has or does not have AML. The method generally provides a sensitivity for the detection of AML of at least about 70 %, while providing a specificity of at least about 70 %.

In various embodiments, the method comprises determining the abundance of at least 4 RNAs, at least 60 RNAs, at least 100 RNAs, at least 200, or of at least 500 RNAs chosen from the RNAs listed in Table 2, and classifying the sample as being indicative of AML, or not being indicative of AML.

In other aspects, the invention provides kits a nd custom arrays for preparing the gene expression profiles, and for determining the presence or absence of AML.

Short Discription of the Figures

Figure 1 : Development of a classifier to molecularly diagnose AML. (A) Schema of approach to define classifiers. A dataset was compiled out of a total of 17 individual studies containing a total of 2013 samples. Randomly, a total of 150 samples was drawn. Within these 150 samples, 75 cases of AML were included, the other samples were called non-AML or control samples. The 150 samples were evenly split into three independent cohorts of 50 samples, called training set (TS) and validation set 1 (VI) and validation set 2 (V2). The classifier was built in TS and applied to VI and V2. Classifier performance is only shown for external validation in VI and V2. Classifiers were build according to a combined approach of 1) feature selection, 2) application of a classifier algorithm, and 3) 10-fold cross-validation (internal validation). Influence of 1) feature size, 2) classification algorithm, 3) ratio of cases and controls in the TS, and 4) the size of the TS were assessed by varying the respective parameters. As readout we assessed AUC, MCC, specificity and sensitivity. The process of generating three independent sets of data out of the 150 samples was performed 10,000 times and termed 'trial simulation approach' (TSA). £B) For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach is shown using a defined feature selection (transcripts defined by a combined fold change and p-value filter) combined with an SVM-based classification algorithm. It can be clearly seen that dependent on the distribution of the samples into TS, VI and V2, the performance read out as AUC can widely vary between 0,3 and 1. On the right panel the data are shown in an integrated format shown as a boxplot (mean, 25 to 75 percentiles, standard deviation and outliers). (C) The influence of different classification algorithms on classifier performance is shown. For each condition, 10,000 individual classifiers in 10,000 independent trial simulation approaches are shown for VI and V2 independently. On the left, SVM with linear or radial kernel was used in combination with either t-test (t) or Wilcoxon-test (wile). On the right, PAM or LDA in combination with t-test or Wilcoxon-test were used. £D) The influence of feature size on classifier performance was assessed again using 10,000 independent TSA settings. Filter based on differentially expressed genes (combined fold change, p- value filter, here abbreviated by FC only) were used . The larger the FC value the smaller the number of transcripts per classifier. Shown is again classifier performance in VI and V2 independently. (El The influence of sample distribution in TS was assessed by decreasing the number of cases from 25 out of 50 in TS to 5 out of 30 in TS. Shown is again classifier performance in VI and V2 independently. (F) The influence of sample size in TS was assessed by decreasing both the number of cases and controls from each 25 to each 5 in TS. Shown is again classifier performance in VI and V2 independently.

Figure 2 : Development of a classifier to molecularly diagnosis of AML using a compiled dataset of 2013 samples. (A) Schema of approach to define classifiers. A dataset was compiled out of a total of 17 individual studies containing a total of 2013 samples. The 2013 samples were evenly split into three independent cohorts of 671 samples, called training set (TS) and validation set 1 (VI) and validation set 2 (V2). The classifier was built in TS and applied to VI and V2. Classifier performance is only shown for external validation in VI and V2. Classifiers were built according to a combined approach of 1) feature selection, 2) application of a classifier algorithm, and 3) 10-fold cross-validation (internal validation) (as defined in Figure 1). Influence of 1) feature size, 2) classification algorithm, 3) ratio of cases and controls in the TS, and 4) the size of the TS were assessed by varying the respective parameters. As readout we assessed AUC, MCC, specificity and sensitivity. The process of generating three independent sets of data out of the 2013 samples was performed 10,000 times (TSA). £B) For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach and the influence of feature size is shown. It can be clearly seen that the classifier generated with a larger training set perform significantly better than data presented in Figure 1A. Depending on the distribution of the samples into TS, VI and V2, the performance read out as AUC varied in a small range of 0.96 to 1. Data are shown in boxplots (mean, 25 to 75 percentiles, standard deviation and outliers). (C) Classifier performance (here AUC) was plotted against the frequency of classifiers reaching a certain AUC level . In red, performance within the small cohort of 150 samples described in Figure 1 is shown, in black, the performance of classifiers in the complete dataset (n = 2013) is shown. £D) Instead of AUC, MCC is shown.

Figure 3 : Correlation of feature size and feature selection with classifier performance. (A) The number of features in each individual classifier of a total of 10.000 classifiers is plotted against the AUC of the respective classifier. For each level of filter settings (a total of five different levels of filter settings) the data are plotted separately. On the top panel, the data obtained in the small dataset (150 samples) are shown, on the lower panel, the data obtained in the complete dataset (2013 samples) is shown. It can be clearly seen that the variation in AUC is reduced with higher feature sizes in the small cohort, but this effect is not apparent anymore in the complete dataset. £B) For each transcript interrogated on the array, its participation in any of the 60.000 classifier (6 levels of filtering) were calculated and ranked . If a transcript was part of at least 1 classifier, its participation frequency was plotted. In B the results of the small dataset of 150 samples is shown. (C) Similar analysis, but this time for the complete dataset.

Figure 4: Corresponding to Figure 1, instead of the AUC, the MCC results in the small sample cohort (n = 150) is shown. (A) The influence of feature size on classifier performance was assessed using 10,000 independent TSA settings for each filter setting . 5 different filter settings based on differentially expressed genes (combined fold change, p-value filter, here abbreviated by FC only) were used. The larger the FC value the smaller the number of transcripts per classifier. Shown is classifier performance in VI and V2 independently. £B) The influence of sample distribution in TS was assessed by decreasing the number of cases from 25 out of 50 in TS to 5 out of 30 in TS. Shown is again classifier performance in VI and V2 independently. (C) The influence of sample size in TS was assessed by decreasing both the number of cases and controls from each 25 to each 5 in TS. Shown is again classifier performance in VI and V2 independently.

Figure 5 : Corresponding to Figure 2, instead of AUC, the MCC is shown for the complete dataset. (A) For VI (black dots) and V2 (red dots), the performance of the 10,000 classifiers generated in this simulation approach and the influence of feature size is shown. It can be clearly seen that the classifier generated with a larger training set perform significantly better than data presented in Figure 1A. Data are shown in boxplots (mean, 25 to 75 percentiles, standard deviation and outliers). £Β) The influence of different classification algorithms on classifier performance is shown. For each condition, 10,000 individual classifiers in 10,000 independent trial simulation approaches are shown for VI and V2 independently. SVM with linear or radial kernel was used in combination with either t-test (t) or Wilcoxon-test (wile). (C) PAM or LDA in combination with t-test or Wilcoxon-test were used.

Figure 6: Correlation of feature size and feature selection with classifier performance. Shown here is the MCC as readout (A) The number of features in each individual classifier of a total of 10.000 classifiers is plotted against the MCC of the respective classifier. For each level of filter settings (a total of five different levels of filter settings) the data are plotted separately. On the top panel, the data obtained in the small dataset (150 samples) are shown, the data obtained in the complete dataset (2013 samples) is shown on the lower panel . It can be clearly seen that the variation in MCC is reduced with higher feature sizes in the small cohort, but this effect is not apparent anymore in the complete dataset.

Figure 7 : (A) Current time line for diagnosis of patients with AML. Use of classical diagnostic approaches. Gene expression profiling of bone marrow is targeted at further diagnostics (including differential diagnosis, subclassification, therapy outcome prediction and prognosis) as an additional part of standard diagnostics (add on technology). (B) Scenario envisioning the use of the GEP-based technology for primary diagnosis in a setting where the private practioner can cooperate with a specialized laboratory applying our GEP-based test to blood from patients with a tentative diagnosis AML. The technology would substitute for previous technology (blood counts, light microscopy of bone marrow cells) = substitution technology. Time axis are estimation based. Since the scenarios for further diagnostics by specialized hematologists (as the focus of previous inventions) and our invention targeting at primary diagnosis by private practitioners are mutually exclusive, cited prior inventions cannot be seen as prior art in the field for primary diagnosis of AML. However, they can be seen complementary.

Detailed Description of the invention

The invention provides methods and kits for screening, diagnosing, and detecting AML in human patients (subjects). A synonym for a patient with AML is "AML-case" or simply "case."

As disclosed herein, the present invention provides methods and kits for screening patient samples for those that are positive for AML, e.g., in the absence of surgery or any other diagnostic procedure.

The invention relates to the determination of the abundance of RNAs to detect a AML in a human subject, wherein the determination of the abundance is based on RNA obtained (or isolated) from whole blood of the subject or from blood cells of the subject.

In various aspects, the invention involves preparing an RNA expression profile from a patient sa mple. The method may comprise isolati ng RNA from whole blood , a nd detecting the abundance or relative abundance of selected transcripts. The "RNAs" may be defined by reference to an expressed gene, or by reference to a transcript, or by reference to a particular oligonucleotide probe for detecting the RNA (or cDNA derived therefrom), each of which is listed in Table 2 for 680 RNAs that are indicative of the presence or absence of AML.

The number of transcripts in the RNA expression profile may be selected so as to offer a convenient and cost effective means for screening samples for the presence or absence of AML with high sensitivity and high specificity. For example, the RNA expression profile may include the expression level or "abundance" of from 4 to about 3000 transcripts. In certain embodiments, the expression profile includes the RNA levels of 2500 transcripts or less, 2000 transcripts or less, 1500 transcripts or less, 1000 transcripts or less, 500 transcripts or less, 250 transcripts or less, 200 transcripts of less, 100 transcripts of less, or 50 transcripts or less. Such profiles may be prepared, for example, using custom microarrays or multiplex gene expression assays as described in detail herein.

Such RNA expression profiles in accordance with this aspect may be evaluated for the presence or absence of an RNA expression signature indicative of AML. Generally, the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity for the detection of AML, as indicated by the AUC. A clinical utility is reached if the AUC is at least 0.8.

The inventors have surprisingly found that an AUC of 0.8 is reached if and only if at least 4 RNAs are measured that are chosen from the RNAs listed in Table 2. In other words, measuring 4 RNAs is necessary and sufficient for the detection of AML in a human subject based on RNA from a blood sample obtained from said subject by measuring the abundance of at least 4 RNAs in the sample, that are chosen from the RNAs listed in Table 2, and concluding based on the measured abundance whether the subject has AML or not. An analysis of 1, 2 or 3 RNAs chosen from the RNAs listed in Table 2, however, does not allow for this detection.

For example, the area under the ROC curve (AUC) may be at least 0.8, or at least 0.82, or at least 0.85 or at least 0.9. The AUC is a quantitative parameter for the clinical utility (specificity and sensitivity) of the detectbn method described herein. An AUC of 1.0 refers to a sensitivity and specificity of 100 %. In such embodiments, the profile may contain the expression level of at least 4 RNAs that are indicative of the presence or absence of AML, and specifically, as selected from Table 2, or may contain the expression level of at least 6, 8, 10, 12 or 14 RNAs selected from Table 2. Where larger profiles are desired, the profile may contain the expression level or abundance of at least 60, 100, 200, 500, or 680 RNAs that are indicative of the presence or absence of AML, and such RNAs may be (at least in part) selected from Table 2. Such RNAs may be defined by gene, or by transcript ID, or by probe ID.

The identities of genes and/or transcripts that make up, or are included in exemplary expression profiles are disclosed in Table 2. As shown herein, profiles selected from the RNAs of Table 2 support the detection of AML with high sensitivity and high specificity.

Thus, in various embodiments, the abundance of at least 4, at least 6, at least 8, at least 10, at least 12, at least 14, at least 60, at least 100, at least 200, or at least 500 distinct RNAs are measured, in order to arrive at a reliable diagnosis of AML. The set of RNAs may comprise, consist essentially of, or consist of, a set or subset of RNAs exemplified in Table 2. The term "consists essentially of" in this context allows for the expression level of additional transcripts to be determined that are not differentially expressed in AML subjects, and which may therefore be used as positive or negative expression level controls or for normalization of expression levels between samples.

Such RNA expression profiles may be evaluated for the presence or absence of an RNA expression signature indicative of AML. Generally, the sequential addition of transcripts from Table 2 to the expression profile provides for higher sensitivity and/or specificity and stability (i.e. independence from the sample analyzed) for the detection of AML. For example, the sensitivity and specificity of the methods provided herein may be equivalent to an area under the ROC curve (AUC) of at least at least 0.8, or at least 0.82, at least 0.85, or of at least 0.9.

The present invention provides an in-vitro diagnostic test system (IVD) that is trained (as described further below) for the detection of a AML. For example, in order to determine whether a patient has AML, reference RNA abundance values for AML positive and negative samples are determined. The RNAs can be quantitatively measured on an adequate set of training samples comprising cases and controls, and with adequate clinical information on leukemia status, applying adequate quality control measures, and on an adequate set of test samples, for which the detection is yet to be made. With such quantitative values for the RNAs and the clinical data for the training samples, a classifier can be trained and applied to the test samples to calculate the probability of the presence or non— presence of the AML. Therefore, in one embodiment of the present method, a sample can be classified as being from a patient with AML or from a healthy individual without the necessity to run a reference sample of known origin (i.e. from a AML patient or a healthy individual) at the same time.

Various classification schemes are known for classifying samples between two or more classes or groups, and these include, without limitation : Naive Bayes, Support Vector Machines, Nearest Neighbors, Decision Trees, Logistics, Articifial Beural Networks, and Rule-based schemes. In addition, the predictions from multiple models can be combined to generate an overall prediction. Thus, a classification algorithm or "class predictor" may be constructed to classify samples. The process for preparing a suitable class predictor is reviewed in R. Simon, Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data, British Journal of Cancer (2003) 89, 1599-1604, which review is hereby incorporated by reference.

In this context, the invention teaches an in-vitro diagnostic test system (IVD) that is trained in the detection of a AML referred to above, comprising at least 4 RNAs, which can be quantitatively measured on an adequate set of training samples comprising cases and controls, with adequate clinical information on leukemia status, applying adequate quality control measures, and on an adequate set of test samples, for which the detection yet has to be made. Given the quantitative values for the RNAs and the clinical data for the training samples, a classifier can be trained and applied to the test samples to calculate the probability of the presence or absence of the AML.

The present invention provides methods for detecting, diagnosing, or screening for AML in a human subject with a high sensitivity and specificity. Specifically, the sensitivity of the methods provided herein is equivalent to an area under the ROC curve (AUC) of at least at least 0.8, or at least 0.82, at least 0.85, of or at least 0.9.

Without wishing to be bound by any particular theory, the above finding may be due to the fact that an organism such as a human systemically reacts to the development of an AML by altering the expression levels of genes in different pathways. Although the change in expression (abundance) might be small for each gene in a particular signature, measuring a set of at least 4 genes, preferably even larger numbers such as 6, 8, 10, 12, 14, 100, 200, 500 or even more RNAs, for example at least 5, at least 8, at least 120, at least 160 RNAs at the same time allows for the detection of AML in a human with high sensitivity and high specificity.

In this context, a RNA obtained from a subject's blood sample, i.e. a RNA biomarker, is a RNA molecule with a particular base sequence whose presence within a blood sample from a human subject can be quantitatively measured. The measurement can be based on a part of the RNA molecule, namely a part of the RNA molecule that has a certain base sequence, which allows for its detection and thereby allows for the measurement of its abundance in a sample. The measurement can be by methods known in the art, for example analysis on a solid phase device, or in solution (for example, by RT-PCR) . Probes for the particular RNAs can either be bought commercially, or designed based on the respective RNA sequence.

In the method of the invention, the abundance of several RNA molecules (e.g . mRNA or pre-spliced RNA, intron-lariat RNA, micro RNA, sma ll nuclear RNA, or fragments thereof) is determined in a relative or an absolute manner, wherein an absolute measurement of RNA abundance is preferred . The RNA abundance is, if applicable, compared with that of other individuals, or with multivariate quantitative thresholds.

The determination of the abundance of the RNAs described herein is performed from blood samples using quantitative methods. In particular, RNA is isolated from a blood sample obtained from a human subject that is to undergo AML testing, i .e. for example a smoker or a person with high fever and waekness. Although the examples described herein use microarray-based methods, the invention is not limited thereto. For example, RNA abundance can be measured by in situ hybridization, amplification assays such as the polymerase chain reaction (PCR), sequencing, or microarray-based methods. Other methods that can be used include polymerase-based assays, such as RT- PCR (e.g ., TAQMAN), hybridization-based assays, such as DNA microarray analysis, as well as direct mRNA capture with branched DNA (QUANTIGENE) or HYBRID CAPTURE (DIGENE) .

In certain embodiments, the invention employs a microarray. A "microarray" includes a specific set of probes, such as oligonucleotides and/or cDNAs (e.g. , expressed sequence tags, "ESTs") corresponding in whole or in part, and/or continuously or discontinuously, to regions of RNAs that can be extracted from a blood sample of a human subject. The probes are bound to a solid support. The support may be selected from beads (magnetic, paramagnetic, etc. ), glass slides, and silicon wafers. The probes can correspond in sequence to the RNAs of the invention such that hybridization between the RNA from the subject sample (or cDNA derived therefrom) and the probe occurs. In the microarray embodiments, the sample RNA can optiona lly be amplified before hybridization to the microarray. Prior to hybridization, the sample RNA is fluorescently labeled . Upon hybridization to the array and excitation at the appropriate wavelength, fluorescence emission is quantified . Fluorescence emission for each particular RNA is directly correlated with the amount of the particular RNA in the sample. The signal can be detected and together with its location on the support can be used to determine which probe hybridized with RNA from the subject's blood sample.

Accordingly, in certain aspects, the invention is directed to a kit or microarray for detecting the level of expression or abundance of RNAs in the subject's blood sample, where this "profile" allows for the conclusion of whether the subject has AML or not (at a level of accuracy described herein). In another aspect, the invention relates to a probe set that allows for the detection of the RNAs associated with AML. If these particular RNAs are present in a sample, they (or corresponding cDNA) will hybridize with their respective probe (i .e, a complementary nucleic acid sequence), which will yield a detectable signal. Probes are designed to minimize cross reactivity and false positives.

Thus, the invention in certain aspects provides a microarray, which generally comprises a solid support and a set of oligonucleotide probes. The set of probes generally contains from 4 to about 3,000 probes, including at least 4 probes deduced from Table 2. In certain embodiments, the set contains 2000 probes or less, or 1000 probes or less, 500 probes or less, 200 probes or less, or 100 probes or less.

The conclusion whether the subject has AML or not is preferably reached on the basis of a classification algorithm, which can be developed using e.g. a random forest method, a support vector machine (SVM), a K-nearest neighbor method (K-NN), such as a 3-nearest neighbor method (3-NN), a linear discrimination analysis (LDA), or a prediction analysis for microarrays (PAM), as known in the art.

Preferably, F-statistics (ANOVA) is used to identify specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.

"Sensitivity" (S⁺ or true positive fraction (TPF)) refers to the count of positive test results among all true positive disease states divided by the count of all true positive disease states.

"Specificity" (S^" or true negative fraction (TNF)) refers to the count of negative test results among all true negative disease states divided by the count of all true negative disease states. "Correct Classification Rate" (CCR or true fraction (TF)) refers to the sum of the count of positive test results among all true positive disease states and count of negative test results among all true negative disease states divided by all the sum of all cases. The measures S⁺, S^", and CCR address the question : To what degree does the test reflect the true disease state?

"Positive Predictive Value" (PV⁺ or PPV) refers to the count of true positive disease states among all positive test results dived by the count of all positive test results.

"Negative Predictive Value" (PV^" or NPV) refers to the count of true negative disease states among all negative test results dived by the count of all negative test results. The predictive values address the question : How likely is the disease given the test results?

The preferred RNA molecules that can be used in combinations described herein for diagnosing and detecting AML in a subject according to the invention can be found in Table 2. The inventors have shown that the selection of at least 4 or more RNAs of the markers listed in Table 2 can be used to diagnose or detect AML in a subject using a blood sample from that subject. The RNA molecules that can be used for detecting, screening and diagnosing AML are selected from the RNAs provided in Table 2.

Specifically, the method of the invention comprises at least the following steps: measuring the abundance of at least 4 RNAs (preferably 9 RNAs or 10 RNAs) in the sample, that are chosen from the RNAs listed in Table 2, and concluding, based on the measured abundance, whether the subject has AML or not. Measuring the abundance of RNAs may comprise isolating RNA from blood samples as described, and hybridizing the RNA or cDNA prepared therefrom to a microarray. Alternatively, other methods for determining RNA levels may be employed .

Examples for sets of 4 or more RNAs that a re measured together, i .e. sequentia lly or prefera bly simultaneously, are shown in Examples 1 to 15 of Tables 3, 4 and 5. The sets of at least 4 RNAs of Tables 3 and 4 are defined by a common threshold of AUC>0.8.

In a preferred embodiment of the invention as mentioned herein, the abundance of at least 4 RNAs (preferably 6, 8, 10, or 12 RNAs) in the sample is measured, wherein the at least 4 RNAs are chosen from the RNAs listed in Table 2. Examples for sets of 4 RNAs that can be measured together, i.e. sequentially or preferably simultaneously, to detect AML in a human subject are shown in Table 2.

An example for a set of 680 RNAs of which the abundance can be measured in the method of the invention is listed in Table 2.

The wording "at least a number of RNAs" refers to a minimum number of RNAs that are measured . It is possible to use up to 10,000 or 20,000 genes in the invention, a fraction of which can be RNAs listed in Table 2. In preferred embodiments of the invention, abundance of up to 5.000, 2.500, 2.000, 1,000, 500, 250, 100, 80, 70, 60, 50, 40, 30, 20, 10, 5,4, 3, 2, or 1 RNA of randomly chosen RNAs that are not listed in Table 2 is measured in addition to RNAs of Table 2 (or subsets thereof) .

In a preferred embodiment, only RNAs that are mentioned in Table 2 are measured .

The expression profile or abundance of RNA markers for AML, for example the at least 4 RNAs described above, (or more RNAs as disclosed above and herein), is determined preferably by measuring the quantity of the transcribed RNA of the marker gene. This quantity of the mRNA of the marker gene can be determined for example through chip technology (microarray), (RT-) PCR (for example also on fixated material), Northern hybridization, dot-blotting, sequencing, or in situ hybridization .

The microarray technology, which is most preferred, allows for the simultaneous measurement of RNA abundance of up to many thousa nd RNAs and is therefore an important tool for determining differential expression (or differences in RNA abundance), in particular between two biological samples or groups of biological samples. In order to apply the microarray technology, the RNAs of the sample need to be amplified and labeled and the hybridization and detection procedure can be performed as known to a person of skill in the art.

As will be understood by those of ordinary skill in the art, the analysis can also be performed through single reverse transcriptase-PCR, competitive PCR, real time PCR, differential display RT-PCR, Northern blot analysis, sequencing, and other related methods. In general, the larger the number of markers is that are to be measured, the more preferred is the use of the microarray technology. However, multiplex PCR, for example, real time multiplex PCR is known in the art and is amenable for use with the present invention, in order to detect the presence of 2 or more genes or RNAs simultaneously.

The RNA whose abundance is measured in the method of the invention can be mRNA, cDNA, unspliced RNA, or its fragments. Measurements can be performed using the complementary DNA (cDNA) or complementary RNA (cRNA), which is produced on the basis of the RNA to be analyzed, e.g . using microarrays. A great number of different arrays as well as their manufacture are known to a person of skil l in the art and are described for example in the U .S. Patent Nos. 5,445,934; 5,532,128; 5,556,752; 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,472,672; 5,527,681; 5,529,756; 5,545,331; 5,554,501; 5,561,071; 5,571,639; 5,593,839; 5,599,695; 5,624,711 ; 5,658,734; and 5,700,637.

Preferably the decision whether the subject has AML comprises the step of training a classification algorithm on an adequate training set of cases and controls and applying it to RNA abundance data that was experimentally determined based on the blood sample from the human subject to be diagnosed .

The classification method can be a random forest method, a support vector machine (SVM), or a K-nearest neighbor method (K-NN), such as 3-NN.

For the development of a model that allows for the classification for a given set of biomarkers, such as RNAs, methods generally known to a person of skill in the art are sufficient, i.e., new algorithms need not be developed .

The major steps of such a model are :

l] condensation of the raw measurement data (for example combining probes of a microarray to probe set data, and/or normalizing measurement data against common controls) ;

2] training and applying a classifier (i.e. a mathematical model that generalizes properties of the different classes (leukemia vs. healthy individual) from the training data and applies them to the test data resulting in a classification for each test sample.

For example, the raw data from microarray hybridizations can first be condensed with FARMS as shown by Hochreiter et al., Bioinformatics 22(8) : 943-9(2006) . Alternative methods for condensation such as Robust Multi-Array Analysis (RMA, GC-RMA, see Irizarry et al. Biostatistics. 4, 249-264 (2003) can be used. Similar to condensation, classification of the test data set through a support-vector- machine or other classification algorithms is known to a person of skill in the art, like for example classification and regression trees, penalized logistic regression, sparse linear discriminant analysis, Fisher linear discriminant analysis, K-nearest neighbors, shrunken centroids, and artificia l neura l networks (see W. Wapni k, The Nature of Statistica l Learning Theory, Springer Verlag, New York, NY, USA, 1995; Berhard Scholkopf, Alex Smola : Learning with Kernels : Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002; S. Kotsiantis, Informatica J. 31 : 249-268 (2007)).

The key component of these classifier training and classification techniques is the choice of RNA biomarkers that are used as input to the classification algorithm . In a further aspect, the invention refers to the use of a method as described above and herein for the detection of AML in a human subject, based on RNA from a blood sample.

In a further aspect, the invention also refers to the use of a microarray for the detection of AML in a human subject based on RNA from a blood sample. According to the invention, such a use can comprise measuring the abundance of at least 4 RNAs (or more, as described above and herein) that are listed in Table 2. Accordingly, the microarray comprises at least 3 probes for measuring the abundance of the at least 3 RNAs. Commercially available microarrays, such as from Illumina or Affymetrix, may be used .

In another embodiment, the abundance of the at least 4 RNAs is measured by multiplex RT-PCR. In a further embodiment, the RT-PCR includes real time detection, e.g ., with fluorescent probes such as Molecular beacons or TaqMan® probes.

In a preferred embodiment, the microarray comprises probes for measuring only RNAs that are listed in Table 2 (or subsets thereof) .

In yet a further aspect, the invention also refers to a kit for the detection of AML in a human subject based on RNA obtained from a blood sample. Such a kit comprises a means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2. The means for measuring expression can be probes that allow for the detection of RNA in the sample or primers that allow for the amplification of RNA in the sample. Ways to devise probes and primers for such a kit are known to a person of skill in the art.

Further, the invention refers to the use of a kit as described above and herein for the detection of AML in a human subject based on RNA from a blood sample comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2. Such a use may comprise the following steps : contacting at least one component of the kit with RNA from a blood sample from a human subject, measuring the abundance of at least 4 RNAs (or more as described above and herein) that are chosen from the RNAs listed in Table 2 using the means for measuring the abundance of at least 4 RNAs, and concluding, based on the measured abundance, whether the subject has AML.

In yet a further aspect, the invention also refers to a method for preparing an RNA expression profile that is indicative of the presence or absence of AML, comprising : isolating RNA from a whole blood sample, and determining the level or abundance of from 4 to about 3000 RNAs, including at least 4 RNAs selected from Table 2.

Preferably, the expression profile contains the level or abundance of 680 RNAs or less, 500 or less, of 150 RNAs or less, or of 100 RNAs or less. Further, it is preferred that at least 10 RNAs, at least 30 RNAs, at least 100 RNAs are listed in Table 2.

In yet a further aspect, the invention also refers to a microarray, comprising a solid support and a set of oligonucleotide probes, the set containing from 4 to about 3,000 probes, and including at least 4 probes selected from Table 2. Preferably, the set contains 680 probes or less (such as e.g. 500 probes, or less). At least 10 probes can be those listed in Table 2. At least 30 probes can be those listed in Table 2. In another embodiment, at least 100 probes are listed in Table 2.

Features of the invention that were described herein in combination with a method, a microarray, a kit, or a use also refer, if applicable, to all other aspects of the invention.

The evaluation of simulation and adaptive learning approaches to accelerate biomarker development first required the establishment of a larger set of high- throughput data suitable for assessing a clinically relevant endpoint. We chose primary molecular diagnosis for acute myeloid leukemia (AML) as the first model endpoint, since several studies independently reported gene expression profiling (GEP)-based classifier for disease subclassification, outcome prediction and differential diagnosis (Miesner et al., Blood 116: 2742-51 (2010)), while primary molecular diagnosis was not a primary endpoint in these studies. As AML has a low prevalence but is a deadly disease if not diagnosed in time, a test used for screening or primary diagnosis of AML would have to achieve sensitivity and specificity greater 90%, preferably greater 95%, preferably greater 98%, preferably >99% to minimize false-negative results while avoiding unacceptable levels of false-positive results. A total of 2013 microarray samples (Affymetrix U 133A chip) from 17 individual studies were compiled to form a new dataset (AML dataset, Table 1, Fig . 1A). Samples were only included into the study when passing all quality control checks. Following recent guidelines suggested by the MAQC consortium the preferred methodologies for data processing, feature selection and classifier development were established using one of the datasets provided by the MAQC consortium (Fig. 4 and protocol of decision making below) prior to application to the AML dataset. To simulate a typical pilot trial setting as a first step of test development 150 samples were drawn randomly from the complete dataset and distributed into three sets (training set (TS), validation cohorts VI and V2, each containing 25 AML cases and 25 controls) (Figure 2A). This initial setting allowed the development of a classifier within TS and two independent validations (in VI resp. V2). This procedure was performed three times and representative data of one experiment are shown. Since classifier performance in subsequent validation cohorts might be strongly influenced by the characteristics of the patient population within TS, it was already suggested in MAQC-II to perform swap analysis of training and validation cohorts. In principle, swapping independent patient cohorts is just a special case of random permutation of samples. We therefore extended this approach from a single swap to 10.000 permutations (termed '10.000 trial simulation approach'; TSA) containing a TS and two independent validation cohorts (Fig. 2A). Applied to AML, a mean AUC of 0.9644 (VI) resp. 0.9636 (V2) was achieved by TSA (Fig. IB). However, depending on the sample distribution to the three independent cohorts (TS, VI, V2), performance of a significant number of classifiers would not have supported further classifier development. Moreover, mean classifier performance did not reach the preferred target of specificity and sensitivity (> 0.99). To elucidate dependency on methodology we varied SVM settings and compared SVM to LDA and PAM algorithms (n = 8 x 10.000 classifier, Fig. 1C). While the results of SVM were further optimized using a linear kernel combined with t-Test instead of radial kernel and Wilcoxon-Test neither PAM nor LDA reached similarly high mean AUC. This was similarly true when reading out MCC (Fig. 4). As TSA clearly established the framework for classifier performance (range, median, 75% percentile), the next issues addressed were the dependency of classifier performance on feature size (n = 5xl0.000 classifier, Fig . ID), the sample distribution in TS (n = 5xl0.000 classifier, Fig . IE) and the sample size in TS (n = 5xl0.000 classifier, Fig. IF) using TSA. Unexpectedly, reducing feature size resulted in inferior classifier performance suggesting that due to the overall small sample size, more features are required to correctly classify in independent validation cohorts (Fig. ID). In contrast, TSA clearly established that further reduction of sample size (Fig. IE) or unequal distribution of samples (Fig. IF) in TS results in reduced overall classifier performance. Taken together, TSA is well-suited to establish overall classifier performance that can be expected independent of the actual clinical situation with subsequent patient recruitment into TS and validation cohorts. Moreover, dependencies of classifier performance are easily uncovered . In clinical biomarker development results from small pilot trials are supposed to form the basis for larger validation trials, however, prediction of classifier performance in the larger cohorts is still an unsolved issue. Furthermore, classifier performance in larger cohorts is expected to improve. To capture the overall improvement by enlarging the cohorts (TS, VI, V2), we repeated the trial simulation approach on the complete AML dataset (Fig. 2A). As shown in Fig. 2B, this improved classifier performance dramatically, with 61.8% of all tests reaching an AUC>0.99 and 98.4% >0.98. Although there was still a slight improvement of the spectrum of classifiers when increasing the feature size (FC>4, left panel), all 60.000 tests performed at least with an AUC of 0.9638. Similar improvements were observed when reading out MCC (Fig. 4). When directly comparing the initial small AML dataset with the complete AML dataset it became clear that only the larger dataset results in a sufficiently high AUC in the majority of classifiers developed (Fig. 2B). In fact, not even 60% of all classifiers generated in the small dataset showed an AUC>0.95. Assessing the MCC showed similar results, while basically all classifiers generated in the large dataset reached an MCC >0.98, not even 80% of all classifiers within the small AML dataset reached an MCC>0.95 (Figs. 2C and 2D). To elucidate whether the improvement in the larger dataset is associated with differences in feature distribution all features (n = 23.000) were evaluated for being part of at least one of 60.000 classifiers (Figs. 3B and 3C). While a total of 3540 features were part of at least one classifier in the small AML dataset, only 680 features were identified in the large dataset. Even more striking, while no feature was identified to be present in all classifiers in the small AML dataset, 8 features were present in all 60.000 classifier in the large dataset. When assessing feature size and classifier performance (here AUC), an enormous variance became apparent in the small AML dataset, while there was clearly less variance in the larger dataset (Fig 3 and Fig. 4). Interestingly, reduced variance in feature size in the large dataset was seen irrespective of filter criteria settings that determine the potential feature size. Together, these results support a robust gene expression profiling- based classifier as a test for primary diagnosis of AML with a sensitivity and specificity of greater 99.5%. At the same time these results also indicate that even initial pilot trials would require larger patient cohorts for robust classifier development, a requirement that can rarely be met.

The Invention is further described in the following Examples and Figures, which are, however, not to be construed as limiting the invention.

Examples

Methods Summary: In this study a total of 2013 gene expression samples (GEP) profiling in 17 independent datasets derived from 17 different studies were included (Table 1). Following suggested standards by MAQC-II a procedure for data processing, feature selection and classifier optimization was developed using Bioconductor (R-packages) as the basis for all subsequent assessments. Using a dataset of peripheral blood derived GEP samples (n = 2013) a smaller dataset (n₂ = 150) from this study was randomly generated prior to applying a permutation approach (n = 10.000) to simulate a typical clinical pilot trial setting (trial simulation approach, TSA) consisting of three small but independent patient cohorts, here termed 'training set' (TS), validation set 1 (VI), and validation set 2 (V2) (Figure 1A). To assess patient distribution issues and classifier performance range and dependencies in this permutation approach molecular diagnosis of acute myeloid leukemia (AML) was chosen as a primary endpoint. Using AUC, MCC, sensitivity and specificity as readouts for the generated classifiers, 1) the range of performance of the classifiers, and 2) their dependency on a) classifier algorithms used, b) feature size, c) sample size within TS, and d) class distribution was assessed.

Description of gene expression profiling (GEP) datasets.

For this analysis new data were generated in our laboratory and also previously published datasets were used. The complete list of datasets is summarized in Table 1. Dataset for the development of a primary diagnostic test for AML: A dataset of gene expression profiles (GEP) was derived from peripheral blood mononuclear cells (PBMC) collected from a total of 19 individual studies, including our own unpublished samples (in total n = 2013). Patients with AML (n=725), ALL (n = 218), chronic leukemias (CML, CLL, n=98), infectious diseases (n = 262), coronary artery disease (n = 101), Parkinson's disease (n = 85), Colitis ulcerosa and Crohn's disease (n=85), Huntington's disease (n = 19), post-infectious chronic fatigue syndrome (n=8), and 560 healthy controls from these studies were included . All GEP samples were generated using the Affymetrix U 133A microarray.

Borovecki F/ GSE1767 Genome-wide expression profiling of human

Krainc D blood reveals biomarkers for Huntington's

disease.

Burczynski ME/ GSE3365 Molecular classification of Crohn's disease and

Dorner AJ ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells.

Burczynski ME/ GSE3365

Dorner AJ

amilo 0/ GSE6269 Gene expression patterns in blood leukocytes

Chaussabel D discriminate patients with acute infections

Ramilo 0/ GSE6269

Chaussabel D

Connolly PH/ GSE1140 http://www.ncbi.nlm.nih.gov/pubmed/15194674

Cooper DM.

Debey-Pascher S/ To be loaded

Schultze JL

Gow JW/ GSE14577 A gene signature for post-infectious chronic

Chaudhuri A. fatigue syndrome.

Gow JW/

Chaudhuri A.

Sinnaeve GSE12288 Gene expression patterns in peripheral blood

PR/Granger CB. correlate with the extent of coronary artery disease.

Sinnaeve

PR/Granger CB.

Scherzer CR/ GSE6613 Molecular markers of early Parkinson's disease

Gullans SR based on gene expression in blood.

Scherzer CR/ GSE6613

Gullans SR

Watford GSE12839 Tpl2 kinase regulates T cell interferon-y

production and host resistance to Toxoplasma gondii

Table 1 summarizes the origin of the samples that have been used for the development of a test to diagnose molecularly AML.

Protocol of decision making during feature selection and model development.

Following recommendations made by MAQC-II the decision making process has been documented electronically (see supplemental information).. All statistical and bioinformatical analysis were performed using R software (www.r-proiect.org) version 2.10.1 and packages from the bioconductor project

(http ://www.bioconductor.org/). Using one of the datasets (multiple myeloma) provided by the MAQC-II project (GSE24080) the preferred protocol for decision making during feature selection and model development was established . In principle, raw data (for Affymetrix: CEL-files) were downloaded from the GEO website and prepared for further analysis by R software.

Quality check procedure : Samples were subjected to an extended quality check prior use. First, a visual inspection of the distribution of raw expression values was performed using pairwise scatterplots of expression values from all arrays of a dataset. Overall median correlation within a combined dataset was required to be above 0.8. Next, the present call rate had to reach a threshold determined within the dataset as median present rate > 0.3. Third overall sample distribution was visually analyzed by density plots.

Normalization procedures: When Affymetrix microarrays were used, all samples of a dataset were normalized using the MAS5 method, a method comprising

background correction, signal intensity and scaling calculations, from the affy package.

Batch-effect removal : Due to our overall strategy and to be able to better mimic expected clinical routine of subsequent data generation, which is naturally prone to batch-effects, we voted against batch-effect removal.

Feature selection approach : As the preferred feature selection approach we used a combination of fold changes and p-values based on t-tests (2-sided, unequal variance = Welch test, unpaired; "FC+P") between experimental groups in the training set. The p-values were adjusted for multiple testing using Benjamini- Hochberg correction. If not stated otherwise we used a FC > 2 and p<0.05. To minimize overfitting issues the number of features was usually kept small. Overall, endpoints with a higher predictability showed a larger number of features that could be successfully used to obtain efficient classifiers, an observation that also became apparent during MAQC-II.

Classification algorithms: During setup we tested several classification algorithms (support vector machine (SVM), linear discrimination analysis (LDA), or prediction analysis for microarrays (PAM)). The different classifier were built and optimized based on the training set using a 10-fold cross-validation design repeated 10-times. The training set was divided 10 times into an internal training and an internal validation set in a ratio 9 : 1 (distribution to internal validation group see supplementary Table 1). In the internal training set the differentially expressed genes between positive and negative samples (cases and controls) were calculated using a t-test. Using the feature list extracted following the feature selection method, the different algorithms were trained on the internal training set and used to calculate the probability score for each case of the respective internal validation set. This approach was repeated 10 times according to the 10 dataset splitting of this 10 fold cross-validation. For each of the 10 cross-validation steps the area under the receiver operator curve (AUC) and the median Mathews Correlation Coefficient (MCC) were calculated for the internal validation set. The optimal classification algorithm was selected according to the maximum AUC and the maximum MCC reached in all algorithms tested. In our hands, SVM performed best and was therefore chosen for further analysis.

Validation of the optimized classifier: The optimized classifier was then applied to the validation cohorts. AUC and median MCC were used to measure the quality of the classifier. Sensitivity and specificity were calculated at the maximum Youden- index (sensitivity + specificity - 1). AUC and MCC values were calculated using prediction probabilities as implemented in the ROCR package. For description of specificity controls see the following paragraph.

Randomized permutation of training and validation sets: To assess robustness of classification, a permutation approach was applied, where each classification was repeated in 10.000 iterations (Trial Simulation Approach, TSA). In this re-sampling design, the dataset was randomly divided in one training set and two validation sets. If not stated otherwise, all divided sets comprise one third of the entire dataset. The classifier was built in the training set based on differentially expressed genes selected by statistical testing in the training set and then applied to the two independent validation sets.

Discussion of Experiments with reference to the Figures

In total gene expression profiling samples derived from peripheral blood mononuclear cells in 17 independent studies were collected from public databases (GEO) or generated in our laboratory, quality controlled, and normalized for further analysis. A total of 2013 samples passed all quality check filters.

To develop classifiers to diagnose AML the procedure shown in Fig. 1A was performed initially. First, 150 samples (75 AML samples and 75 non-AML samples) were randomly drawn from the complete dataset to simulate a typical pilot trial situation. These 150 samples were divided into three independent datasets, each containing 25 AML cases and 25 non-AML samples. One of the three datasets was set to become the training set (TS), one the first validation cohort (VI) and the third dataset to become the second validation cohort (V2). Within TS, the classifier was generated using a defined approach for feature selection, a defined classification algorithm and a lOx cross validation (internal validation). The classifier build in TS was then validated in the independent datasets VI and V2. To better understand the role of feature size classification algorithm, ratio of cases and controls as well as size of training set, these variables were varied (see figures below). The setting described so far, could reflect a clinical situation where TS would have been drawn first, followed by VI and V2. However, a classifier identified in such a typical clinical setting might not be generalizable. In fact, it might be biased and might underperform. To address the influence of patient entry into the independent cohorts, 10.000 clinical trial settings (Trial Simulation Approach, TSA) were tested and for each setting a classifier vor VI respectively V2 determined . In such a scenario, it is possible to determine the average performance as well as the 25, 75, 90, 95 quantiles of test performances to be expected irrespective of patient entry.

Fig . IB shows the graphical representation of the AUC of each of the 10.000 classifiers in VI (black circles) respectively V2 (red circles) (left panel). On the right panel, mean (line in grey box), 25/75 percentiles (boxes) and 95 percentiles (lines) as well as outliers (dots) are shown.

The data clearly demonstrate that there is no significant difference between VI and V2 concerning the AUC results of the 10.000 classifiers. The data also present that already rather high AUC values can be observed using only a rather small number of samples. Nevertheless, the analysis also indicates that under certain circumstances, classifiers would have been generated that would have dismissed any further development (e.g. AUC < 0.9).

As further statistical readouts of classifier performance MCC, sensitivity and specificity have been performed as well . These data are available upon request. In Figure 1C the influence of classification algorithm on classifier performance was on methodology we varied SVM (support vector machine) settings and compared SVM to LDA (linear discrimination analysis) and PAM (prediction analysis for microarrays) algorithms (n=8 x 10.000 classifier, Fig . 1C). While the results of SVM were further optimized using a linear kernel combined with t-Test instead of radial kernel and Wilcoxon-Test neither PAM nor LDA reached similarly high mean AUC. This was similarly true when reading out MCC, specificity and sensitivity

In Figur ID, the influence of feature size in classifier performance is shown. The number of features was altered based on a filter combining different levels of significance (p-value) and fold change for each feature (transcript on the array) between the groups AML and non-AML. Under these conditions, there was a slighlty better performance observed in VI over V2. More important, for both validation cohorts, the classifier performance was better with higher numbers of features (FC > 4 = more features)

In Fig. IE , the influence of sample size and distribution in the test set (TS) on classifier performance is shown. In TS, the number of AML cases was varied between 5 and 25.

Reducing the number of AML cases in TS reduces the performance of the classifier in both validation cohorts VI and V2.

In Fig . IF, the influence of sample size in the test set (TS) on classifier performance is shown. In TS, the number of AML cases as well as the number of non-AML (controls) was varied between 5 and 25.

Reducing the number of both AML cases and controls in TS reduces the performance of the classifier in both validation cohorts VI and V2.

These data indicate that increasing the number of samples would greatly improve the quality of the classifiers that could be obtained from such an approach.

As a next step, the whole dataset comprising all 2013 samples was used (Fig. 2A). These 2013 samples were divided into three independent datasets, each containing 637 samples with equal distribution between AML and non-AML samples. One of the three datasets was set to become the training set (TS), one the first validation cohort (VI) and the third dataset to become the second validation cohort (V2). Within TS, the classifier was generated using a defined approach for feature selection, a defined classification algorithm and a lOx cross validation (internal validation). The classifier build in TS was then validated in the independent datasets VI and V2. To better understand the role of feature size classification algorithm, ratio of cases and controls as well as size of training set, these variables were varied (see figures below). The setting described so far, could reflect a clinical situation where TS would have been drawn first, followed by VI and V2. However, a classifier identified in such a typical clinical setting might not be generalizable. In fact, it might be biased and might underperform. To address the influence of patient entry into the independent cohorts, 10.000 clinical trial settings (Trial Simulation Approach, TSA) were tested and for each setting a classifier for VI respectively V2 determined. In such a scenario, it is possible to determine the average performance as well as the 25, 75, 90, 95 quantiles of test performances to be expected irrespective of patient entry.

In Fig . 2B the AUC in VI and V2 of all classifiers generated in the respective TS are shown for 6 different feature size criteria (FC>4 to FC> 12). In total the figures shows the result (here AUC) of 60.000 classifiers in both VI and V2. The left panel shows the same scale on the y-axis as shown in figure ID, the left panel reduces the scale from 0.96 to 1.

As shown in Fig . 2B, this improved classifier performance dramatically, with 61.8% of all tests reaching an AUC>0.99 and 98.4% >0.98. Although there was still a slight improvement of the spectrum of classifiers when increasing the feature size (FC>4, left panel), all 60.000 tests performed at least with an AUC of 0.9638. Similar improvements were observed when reading out MCC (Fig. 4). When directly comparing the initial small AML dataset with the complete AML dataset it became clear that only the larger dataset results in a sufficiently high AUC in the majority of classifiers developed (Fig . 2B). In fact, not even 60% of all classifiers generated in the small dataset showed an AUC>0.95. Assessing the MCC showed similar results, while basically all classifiers generated in the large dataset reached an MCC >0.98, not even 80% of all classifiers within the small AML dataset reached an MCC>0.95. Fig. 2C correlates the percentage of generated classifiers with their result (here AUC) for the small cohort (150 samples, see Figure 1, red dots) and for the complete cohort (2013 samples, Figure 2B. black dots.

Not even 60% of all classifiers generated in the small dataset showed an AUC>0.95. In contrast, in the complete cohort, 100% of all classifiers reached an AUC of at least 0.98% and more than 80% reached an AUC of at least 0.99%. This clearly indicates that the analysis in the large dataset generates numerous classifiers reaching a quality required for primary molecular diagnosis of AML.

Fig. 2D correlates the percentage of generated classifiers with their result (here MCC) for the small cohort (150 samples, see Figure 1, red dots) and for the complete cohort (2013 samples, Fig . 2B. black dots).

Assessing the MCC showed similar results, while basically all classifiers generated in the large dataset reached an MCC >0.98, not even 80% of all classifiers within the small AML dataset reached an MCC>0.95.

Fig. 3A correlates the feature size of each classifier with its AUC in VI and V2 for both the small cohort (upper panel) and the complete cohort (lower panel). Again, a total of 60.000 classifiers in 5 panels is shown for the small cohort and the same number of classifiers for the large cohort.

It can be clearly seen that the variance of AUC is large in the small cohort and variance is similarly seen across the whole range of feature sizes with a tendency of smaller variance when classifiers with larger feature sizes were generated .

In stark contrast, there was almost no variance in the complete cohort and this was irrespective of feature size. Also the range of feature sizes was clearly smaller in the complete cohort (10-378 features / classifier versus 1-837 features per classifier for the small cohort)

Fig. 3B addresses the question, how often specific features (transcripts) are part of a classifier. For this purpose all 3540 transcripts that appeared in at least one of 60.000 classifiers were plotted against their participation (in percent) of all classifiers.

It can be clearly seen that the majority of all transcripts (> 3000 transcripts) are only part of less than 40% of all classifiers. Only a very small subset of transcript (21) is part of at least 50% of all classifiers. Not a single transcript is observed in more than 90% of all classifiers.

Fig . 3C addresses the question, how often specific features (transcripts) are part of a classifier in the complete dataset. For this purpose all 680 transcripts that appeared in at least one of 60.000 classifiers were plotted against their participation (in percent) of all classifiers.

It can be clearly seen that the majority of all transcripts (> 600 transcripts) are only part of less than 40% of all classifiers. However, 45 transcripts are present in at least 50% of all classifiers, 25 transcripts in more than 80% of all classifiers, 19 transcripts in more than 90% of all classifiers and 8 transcripts even in all classifiers.

These data indicate that there is a small set of transcripts that are always part of a classifier irrespective of the distribution of patients into test and validation sets. These few transcripts are the prime candidates for building the test for the primary molecular diagnosis of AML.

Together, these results support a robust gene expression profiling-based classifier as a test for primary diagnosis of AML with a sensitivity and specificity of greater 99.5%. Table 2

SEQ ID Probe. Set.lD Gene. Symbol Rank mean_AML mean_Control fold change

1 203434_s_at MME 1 6,484842407 194,828945 0,0332848

2 203435_s_at MME 2 11,7774216 390,5671937 0,030154662

3 204007_at FCGR3B 3 96,36588679 2547,897377 0,03782173

4 207008_at CXCR2 4 19,22025237 420,9509119 0,04565913

5 207094_at CXCR1 5 8,434680603 296,4180862 0,028455351

6 210084_x_at TPSAB1 6 469,5698231 14,19884381 33,07099009

7 211163_s_at TNFRSF10C 7 13,7516325 286,181047 0,048052213

8 217023_x_at TPSAB1 /// TPSB2 8 357,8705911 14,72675465 24,30070981

9 209905_at HOXA9 10 331,4165733 15,65550111 21,16933664

10 203691_at PI3 9 9,495926826 169,7706524 0,055933854

11 216782_at — 11 6,005876026 92,38991825 0,065005751

12 204006_s_at FCGR3A /// FCGR3B 12 72,28041694 1134,121235 0,063732531

13 210119_at KCNJ15 13 14,43091797 241,644622 0,059719591

14 201427_s_at SEPP1 18 292,5521236 19,48037107 15,01779008

15 216474_x_at TPSAB1 /// TPSB2 15 483,0454115 32,49118243 14,86696929

16 39318_at TCL1A 17 45,49184435 677,2316601 0,067173239

17 207907_at TNFSF14 14 5,520702158 79,03808 0,069848637

18 209995_s_at TCL1A 19 56,42459636 805,2634375 0,070069736

19 221345_at FFAR2 16 14,15907694 196,3522654 0,072110586

20 214651_s_at HOXA9 21 508,5936926 36,96465891 13,75891751

21 210321_at GZMH 20 25,44312502 344,5439601 0,073845802

22 204561_x_at APOC2 27 154,8462107 11,33686201 13,65864827

23 203828_s_at IL32 22 21,61826865 281,8012063 0,076714606

24 203948_s_at MPO 23 2980,815153 231,3270405 12,88571862

25 215382_x_at TPSAB1 24 419,4320099 33,08337558 12,67802945

26 205683_x_at TPSAB1 26 553,9765291 44,19819564 12,5339173

27 207134_x_at TPSB2 28 468,6948008 38,86841261 12,05850122

28 206622_at TRH 55 175,8067271 15,36030018 11,44552678

29 210549_s_at CCL23 32 65,28562885 5,894866402 11,07499719

30 203949_at MPO 30 2889,547982 264,6356365 10,91896776

31 20505l_s_at KIT 25 365,945213 33,69167105 10,86159284

32 207741_x_at TPSAB1 34 350,7145365 33,07766809 10,6027588

33 204885_s_at MSLN 71 59,36785281 5,529934065 10,73572526

34 220010_at KCNE1L 63 144,630379 13,98374647 10,34274894

35 205131_x_at CLEC11A 29 446,0704585 44,07899432 10,11979664

36 220068_at VPREB3 37 26,77972855 266,9641689 0,100312071

37 204698_at ISG20 33 35,03623094 342,703004 0,102234969

38 20489l_s_at LCK 31 61,8281313 604,9303348 0,102207027

39 205798_at IL7R 35 108,240112 1032,103319 0,10487333

40 207826_s_at ID3 56 7,932373391 75,09538195 0,105630642

41 211796_s_at TRBC1 36 152,617601 1388,961527 0,109878926

42 205568_at AQP9 41 83,23465063 748,7517669 0,111164547

43 37145 at GNLY 38 96,9066424 872,656328 0,111047888 206310_at SPINK2 60 541,0837849 61,01626776 8,86786106

214575_s_at AZU1 64 1705,900814 193,187188 8,830299939

210783_x_at CLEC11A 39 283,5868231 31,88617234 8,893724219

221558_s_at LEF1 40 73,31851205 641,0090366 0,114379842

206591_at RAG1 148 9,613271244 83,70008021 0,114853788

AFFX-r2-Bs-dap-5_at — 62 1,649184496 14,32383753 0,115135661

220418_at UBASH3A 43 5,870364325 50,46535814 0,116324634

205366_s_at H0XB6 42 71,07813035 8,302351445 8,561204716

206804_at CD3G 44 14,19499845 121,4526087 0,116876851

205495_s_at GNLY 45 87,77323834 748,2775156 0,117300382

209488_s_at RBPMS 78 51,68384724 6,148101098 8,406473221

209757_s_at MYCN 77 121,3497462 14,80685543 8,195510973

212775_at 0BSL1 57 80,16776939 9,722280977 8,245777876

TRA@ /// TRAC /// TRAJ17 ///

210972_x_at TRAV20 46 129,3957012 1059,666672 0,122109815

209671_x_at TRA@ /// TRAC 47 101,2817483 829,5941872 0,122085894

205922_at VNN2 58 80,6765747 658,1964947 0,122572173

207339_s_at LTB 48 109,1885328 885,6579051 0,123285224

209670_at TRAC 49 86,08434501 694,3831361 0,123972402

210915_x_at TRBC1 50 172,9595219 1396,137195 0,123884331

210031_at CD247 51 74,38866909 600,5255353 0,123872616

206413_s_at TCL1B 165 15,43778348 124,0294703 0,124468672

210484_s_at MGC31957 /// TNFRSFIOC 52 16,24082917 130,0153704 0,124914686

210997_at HGF 149 64,7825388 8,337912254 7,769635471

213150_at HOXA10 66 177,0410268 22,9407217 7,717325945

221349_at VPREB1 173 24,19474485 180,146343 0,134306056

210998_s_at HGF 150 41,2827123 5,376523535 7,678328203

204115_at GNG11 76 22,76278451 172,8273674 0,131708218

219243_at GIMAP4 68 58,29859795 439,4511796 0,132662286

206067_s_at WT1 88 86,95974851 12,02375551 7,232328405

206666_at GZMK 59 34,74787804 259,2225125 0,134046529

204468_s_at TIE1 83 43,57479354 5,951716432 7,321382669

204548_at STAR 61 71,84858889 9,586105161 7,495076226

211902_x_at TRA@ 53 74,8070672 563,8647769 0,132668452

205488_at GZMA 54 41,56099931 311,1555885 0,133569831

206135_at ST18 194 34,95344191 5,034439616 6,942866452

219630_at PDZK1IP1 69 34,61887348 252,9530427 0,136858893

AFFX-r2-Bs-dap-M_at — 91 3,175115317 22,61724812 0,140384688

205267_at POU2AF1 70 43,69031275 312,8932685 0,139633278

207815_at PF4V1 72 7,484984187 53,6660255 0,139473421

205119_s_at FPR1 80 284,3281629 2014,406449 0,141147365

221602_s_at FAIM3 79 45,84000399 323,4024518 0,141742908

210164_at GZMB 82 56,30162949 394,0862387 0,142866266

205254_x_at TCF7 75 13,51171938 94,6088088 0,142816716

222285_at IGHD 81 8,170971952 56,88368827 0,143643498

213193_x_at TRBC1 67 244,4105938 1711,030032 0,142844129

204890_s_at LCK 73 42,01306176 292,0235388 0,143868751 90 205624_at CPA3 86 676,0833449 97,65229344 6,923373954

91 206150_at CD27 65 50,32670569 344,7974439 0,145960205

92 205484_at SIT1 74 15,68390877 107,6818621 0,145650423

93 213844_at H0XA5 85 365,8363745 53,03963106 6,897415521

94 41469_at PI3 84 25,91120807 175,5752987 0,147578892

95 211709_s_at CLEC11A 89 794,0224893 119,3520958 6,652773745

96 211339_s_at ITK 87 66,57609306 437,1517676 0,152295148

97 217418_x_at MS4A1 90 56,81209315 374,0850183 0,151869469

98 220416_at ATP8B4 93 342,5641787 52,08948971 6,576454878

99 222222_s_at H0MER3 92 27,07998807 4,114495956 6,581605223

100 206871_at ELANE 156 1723,797103 261,233626 6,598679997

101 206515_at CYP4F3 94 29,85213119 194,1115237 0,153788557

102 202718_at IGFBP2 151 190,6137939 28,86367281 6,603934127

103 220807_at HESQl 105 22,50679864 146,2910049 0,153849505

104 213258_at TFPI 96 122,6600569 18,88299885 6,495793274

105 206255_at BLK 157 11,56607876 74,85865397 0,154505567

106 206222_at TNFRSF10C 95 28,52518936 181,4515694 0,157205526

107 219529_at CLIC3 98 10,6577767 68,70552516 0,155122556

108 205609_at ANGPT1 97 160,9818384 25,16903735 6,396026839

109 214567_s_at XCLl /// XCL2 101 8,624927227 55,52068817 0,155346187

110 208406_s_at GRAP2 104 5,636358491 36,03533169 0,156412005

111 213110_s_at COL4A5 99 99,11357882 15,75970386 6,289050843

112 209395_at CHI3L1 158 32,13301161 204,0712832 0,157459742

113 212776_s_at 0BSL1 100 70,13493895 11,14462397 6,293163334

114 210724_at EMR3 102 18,43840105 115,2397638 0,160000337

115 203066_at CHST15 103 76,54644213 479,3878118 0,159675403

116 214470_at KLRB1 106 61,39466397 382,8952588 0,160343234

117 216565_x_at — 107 108,6709256 668,4902474 0,162561722

118 217572_at — 108 29,79450706 184,073138 0,16186233

119 44790_s_at C13orfl8 109 21,57755211 132,6182676 0,162704222

120 205590_at RASGRP1 110 39,44573031 240,2518309 0,164184931

121 205831_at CD2 111 90,90211133 555,0670764 0,163767795

122 210439_at ICOS 152 5,062850753 30,93240296 0,163674667

123 220005_at P2RY13 112 57,08634227 348,4278317 0,163839789

124 208304_at CCR3 113 14,71133034 89,50159919 0,164369469

125 204581_at CD22 159 11,15111378 67,24471136 0,165828859

126 221958_s_at WLS 114 7,625327055 45,98332511 0,165828092

127 205174_s_at QPCT 115 31,87833796 191,2661475 0,166670048

128 218963_s_at KRT23 116 21,0687926 126,3074299 0,166805647

129 207651_at GPR171 117 18,18767815 108,4152486 0,16775941

130 201242_s_at ATP1B1 118 194,195005 32,61092831 5,954905764

131 205899_at CCNA1 119 235,6262453 40,0148415 5,888471287

132 209396_s_at CHI3L1 160 46,96279853 275,7761577 0,170293179

133 220744_s_at IFT122 120 73,23634146 12,42326923 5,895094125

134 213830_at TRD@ 198 32,16267358 178,5521652 0,180130404

135 218805_at GIMAP5 121 79,09887161 463,707162 0,170579361 136 206674 at FLT3 122 657,0140317 113,5788776 5,784649802

XAGEIA /// XAGEIB /// XAGEIC

137 220057_at /// XAGE1D /// XAGE1E 211 30,73374656 5,28138168 5,819262538

138 210356_x_at MS4A1 123 72,73134865 423,6405427 0,171681747

139 20593 l_s_at CREB5 124 13,79652321 80,0034599 0,172449082

140 221601_s_at FAIM3 125 122,8955756 718,390153 0,171070796

141 221211_s_at C21orf7 126 22,21700459 127,5605364 0,174168322

142 214022_s_at IFITM1 127 481,6096005 2790,278586 0,172602694

143 219054_at C5orf23 176 255,0090238 45,90585443 5,555043621

144 201601_x_at IFITM1 128 340,0437507 1955,181704 0,173919258

145 201506_at TGFBI 161 103,2399435 572,9778234 0,180181395

146 206765_at KCNJ2 129 33,088223 186,4384185 0,177475347

147 201189_s_at ITPR3 130 18,63391075 104,4596169 0,17838387

148 220646_s_at KLRF1 131 30,89890892 176,2038458 0,175358879

149 202890_at MAP7 169 85,36716348 15,50625813 5,505336154

150 221234_s_at BACH 2 132 29,10573499 164,6899425 0,176730495

151 206785_s_at KLRC1 /// KLRC2 133 8,121891973 45,34929275 0,179096332

152 213958_at CD6 134 77,20360954 440,6890877 0,175188385

153 206390_x_at PF4 181 184,7433193 993,0692085 0,186032673

154 AFFX-BioB-3_at — 172 114,6197576 629,9850895 0,181940429

155 201058_s_at MYL9 217 9,680994213 49,69003618 0,194827675

156 213906_at MYBL1 135 18,41132161 102,1215408 0,180288326

157 200935_at CALR 136 246,3586224 44,54281106 5,530827906

158 217143_s_at TRA@ /// TRD@ 200 57,53383859 293,0047225 0,196358059

159 205653_at CTSG 209 887,3045511 174,1565166 5,094868503

160 207341_at PRTN3 215 598,6288261 118,041971 5,071321845

161 216191_s_at TRA@ /// TRD@ 235 41,97835302 210,121277 0,199781543

162 210948_s_at LEF1 137 19,45058329 103,8370026 0,18731842

163 220377_at FAM30A 216 131,5058081 26,27473391 5,005029113

164 209960_at HGF 197 67,87936228 13,19224493 5,145398881

165 202761_s_at SYNE2 138 32,0041479 176,3319556 0,181499421

166 206337_at CCR7 139 73,8188086 397,0296006 0,18592772

167 20567 l_s_at H LA-DOB 140 19,27426004 102,1656849 0,188656887

168 204777_s_at MAL 153 61,10519311 319,5547608 0,191219786

169 206480_at LTC4S 166 49,81632329 9,580914719 5,199537284

170 20860 l_s_at TUBB1 174 19,20991684 98,76030769 0,1945105

171 207460_at GZMM 141 15,49596558 83,97740658 0,184525412

172 212914_at CBX7 142 39,4554564 215,0751427 0,183449635

173 216667_at LOC643332 /// RNASE2 175 255,0062238 49,50545939 5,151072771

174 210483_at MGC31957 154 11,78876243 61,64553789 0,191234643

175 205049_s_at CD79A 143 74,47784552 389,7401354 0,191096166

176 220421_at BTNL8 184 6,950711076 34,79597569 0,199756177

177 206398_s_at CD19 188 59,34682501 294,1320186 0,201769346

178 214617_at PRF1 162 176,8364486 902,0238807 0,196044087

179 202075_s_at PLTP 144 91,32586184 17,43885341 5,236918947

180 213539_at CD3D 195 161,8678226 791,3866312 0,204536969

181 212538 at D0CK9 179 9,386344607 46,42889036 0,202166034 182 221969_at PAX5 167 54,66248961 274,2889631 0,199287966

183 202016_at MEST 177 513,8088003 103,1788319 4,979788885

184 207979_s_at CD8B 145 36,7856376 192,1824056 0,191410017

185 210606_x_at KLRD1 163 30,34968459 154,4087646 0,196554157

186 217326_x_at IL23A /// TRBV19 155 8,796525844 44,06721966 0,199616085

187 214014_at CDC42EP2 168 3,982186857 19,87844669 0,200326862

188 219201_s_at TWSG1 182 23,91781181 4,853478362 4,927973305

189 210933_s_at FSCN1 170 97,93080862 19,79532619 4,947168219

190 205758_at CD8A 146 76,32138489 378,7377981 0,201515099

191 219812_at PVRIG 147 39,64281582 201,9232452 0,196326162

192 33304_at ISG20 164 100,3888068 479,7976027 0,209231572

193 210607_at FLT3LG 171 13,42088257 63,58015417 0,211086034

194 216268_s_at JAG1 187 158,1367533 32,30953482 4,894429899

195 206082_at HCP5 178 31,30560915 145,5687142 0,215057262

196 218999_at TMEM140 180 42,22485105 193,8173124 0,217859027

197 210755_at HGF 202 122,3041546 24,68334512 4,954926247

198 210805_x_at RUNX1 183 22,61797142 4,674066365 4,839035146

199 202269_x_at GBP1 201 52,49052695 254,8562973 0,205961271

200 219471_at C13orfl8 186 52,07303611 251,4217343 0,207114298

201 209099_x_at JAG1 192 205,8446797 42,32087979 4,863903602

202 204118_at CD48 185 329,0406975 1491,037992 0,220678949

203 207890_s_at MMP25 189 68,0053477 308,9291247 0,220132523

204 215332_s_at CD8B 190 9,004046574 41,27744739 0,218134772

205 215967_s_at LY9 191 21,97460499 102,3662852 0,21466643

206 220987_s_at Cllorfl7 /// NUAK2 193 72,87829115 319,5156383 0,228089904

207 210113_s_at NLRP1 196 34,24453351 156,2955704 0,219101113

208 219837_s_at CYTL1 229 248,4767409 51,46081148 4,828465269

209 203485_at RTN1 210 12,01325143 57,37660195 0,209375443

210 219528_s_at BCL11B 199 47,46514885 219,0741265 0,216662504

211 213611_at AQP5 206 5,767866015 27,36226467 0,210796368

212 210772_at FPR2 203 32,79449117 147,7694912 0,221930054

213 206420_at IGSF6 204 48,67005371 218,4338085 0,22281374

214 208087_s_at ZBP1 205 7,428788811 32,96717009 0,22533899

215 205821_at KLRK1 207 66,35915665 290,6477224 0,228314731

216 211596_s_at LRIG1 208 17,77256431 80,33400834 0,221233382

217 206366_x_at XCL1 212 18,53302351 83,5298982 0,221872933

218 211532_x_at KIR2DS2 214 3,612556812 16,71297558 0,216152821

219 219521_at B3GAT1 213 6,325672366 28,85097756 0,219253311

220 206980_s_at FLT3LG 218 12,25700708 53,99433053 0,227005446

221 206560_s_at MIA 219 7,174218359 32,40053375 0,221422845

222 205544_s_at CR2 232 7,929147176 37,68130215 0,21042657

223 217078_s_at CD300A 220 49,84409702 218,9016351 0,227700889

224 212827_at IGHM 221 211,0969518 930,2018958 0,226936704

225 204959_at MNDA 222 498,2110764 2175,509169 0,229008953

226 218858_at DEPDC6 223 156,82729 35,79659864 4,381066805

227 201655_s_at HSPG2 228 48,36542278 10,49165472 4,609894632 228 204731_at TGFBR3 224 28,14059147 121,7104946 0,231209244

229 208190_s_at LSR 225 5,466221877 24,41918876 0,223849446

230 219024_at PLEKHA1 226 16,59069748 71,97474514 0,230507207

231 219478_at WFDC1 227 41,33698495 9,246594785 4,470508972

232 203561_at FCGR2A 230 164,7501278 723,1536389 0,22782175

233 201162_at IGFBP7 231 525,9651412 115,7009149 4,545903044

234 210665_at TFPI 236 47,97070476 10,33030902 4,643685361

235 206343_s_at NRG1 239 9,731740853 45,30222202 0,214818179

236 202270_at GBP1 242 21,94638578 101,3709845 0,216495735

237 203413_at NELL2 233 60,31409784 256,9574596 0,234724059

238 1405_i_at CCL5 234 199,3787132 872,8199562 0,228430516

239 208029_s_at LAPTM4B 249 205,7679967 44,51654258 4,622281624

240 213135_at TIAM1 237 44,09140961 192,4269415 0,229133245

241 221724_s_at CLEC4A 238 36,38533489 160,3125276 0,226965013

242 215894_at PTGDR 240 10,09097165 43,50172171 0,231967179

243 213668_s_at S0X4 241 153,0294372 33,88793262 4,51575016

244 207850_at CXCL3 265 126,5750632 27,43978233 4,612830441

245 212097_at CAV1 252 46,70628007 10,05645496 4,644408019

246 211583_x_at NCR3 243 20,58903269 86,96146449 0,236760418

247 205239_at AREG 245 274,5081477 61,12600985 4,490856648

248 219789_at NPR3 244 128,0553738 29,15718266 4,391898053

249 214146_s_at PPBP 246 284,4743409 1260,489401 0,225685627

250 205403_at IL1R2 269 48,40448289 217,90211 0,222138661

251 215401_at — 247 39,77525385 9,189621184 4,32828003

252 209840_s_at LRRN3 248 14,59045825 61,97676034 0,235418214

253 64064_at GIMAP5 250 108,5107297 452,2601507 0,239929893

254 207072_at IL18RAP 251 50,29770937 216,3647967 0,232467158

255 202687_s_at TNFSF10 253 68,11015856 289,9564719 0,234897873

256 AFFX-r2-Ec-bioB-3_at — 254 151,0729408 653,8535262 0,231050128

257 210992_x_at FCGR2C 255 64,18957197 276,2000709 0,232402446

258 213147_at HOXA10 256 159,4771971 37,58347194 4,24328006

259 208602_x_at CD6 257 7,956666243 34,82064492 0,228504276

260 208105_at GIPR 258 6,472909399 28,02840403 0,230941062

261 205826_at MY0M2 259 16,83532566 73,644085 0,228603908

262 210773_s_at FPR2 260 35,49593155 150,6824345 0,235567813

263 220187_at STEAP4 261 5,554693253 23,8798066 0,232610479

264 206301_at TEC 262 23,47813332 5,470628278 4,291670377

265 201596_x_at KRT18 263 163,6557229 38,7918178 4,218820673

266 215447_at — 264 65,63625353 15,33516558 4,28011378

267 211719_x_at FN1 275 27,30741833 5,766984105 4,735129806

268 217683_at HBE1 266 12,76149291 53,87722483 0,236862477

269 201830_s_at NET1 267 182,0513217 43,27333631 4,207009148

270 205118_at FPR1 268 17,74688024 75,79984767 0,234128178

271 205253_at PBX1 291 3,606617912 15,4798246 0,232988293

272 214974_x_at CXCL5 270 11,67376168 49,65947571 0,235076217

273 221764_at C19orf22 271 235,1097173 977,6804763 0,240477051 274 201315_x_at IFITM2 272 1051,742923 4313,573375 0,24382173

275 205259_at NR3C2 273 9,400127322 39,03005123 0,240843325

276 AFFX-BioB-M_at — 274 225,6345859 942,2412407 0,239465835

277 205472_s_at DACH1 276 26,06541147 6,109951983 4,266058316

278 209602_s_at GATA3 277 5,478401298 23,161233 0,236533232

KIR3DL1 /// KIR3DL2 ///

279 211688_x_at LOC727787 278 6,52011132 27,02964596 0,241220744

280 210244_at CAMP 284 90,01931097 383,9264596 0,234470193

281 202723_s_at F0X01 279 55,6684428 231,5998035 0,24036481

282 204647_at HOMER3 280 204,5931968 49,91107525 4,099154262

283 205221_at HGD 281 5,137262489 22,65238755 0,2267868

284 213716_s_at SECTM1 282 76,1690302 314,2311226 0,24239811

285 205442_at MFAP3L 283 11,91382367 48,90532924 0,243609927

286 20987 l_s_at APBA2 285 11,87705149 48,99932907 0,242392125

287 211396_at FCGR2C 286 8,740357118 35,97454095 0,242959518

288 214032_at ZAP70 287 50,44304687 206,0124128 0,244854406

289 205255_x_at TCF7 288 228,8036835 929,503869 0,246156784

290 215925_s_at CD72 289 19,29299684 79,24384826 0,243463654

291 216050_at — 290 4,318961656 17,70995806 0,24387193

292 205382_s_at CFD 292 1245,680507 306,7572469 4,060802212

293 220118_at ZBTB32 293 3,615032013 14,9413273 0,241948519

294 207567_at SLC13A2 294 6,442853014 26,26980642 0,245256966

295 210664_s_at TFPI 295 92,30790656 22,75010566 4,057471553

296 201069_at MMP2 296 183,2057337 45,1295109 4,059555046

297 201171_at ATP6V0E1 297 12,17161755 49,19727804 0,247404288

298 206726_at HPGDS 298 128,7199149 31,66110819 4,065553047

299 203675_at NUCB2 299 581,0657373 144,3560154 4,025227044

300 20732 l_s_at ABCB9 300 9,597679218 38,66261506 0,248241853

301 201324_at EMP1 301 153,7211847 38,08502784 4,036262893

302 215101_s_at CXCL5 302 3,35769507 13,88400719 0,241839047

303 213880_at LGR5 305 6,158370932 24,63759123 0,249958321

304 201243_s_at ATP1B1 303 209,5042172 52,32143794 4,004175447

305 215783_s_at ALPL 304 19,19983355 76,42576535 0,251221999

306 204070_at RARRES3 306 93,47552988 372,5548527 0,250904073

307 221790_s_at LDLRAP1 307 35,08165706 139,9222606 0,250722486

308 210450_at LOC90925 308 5,026756025 19,97748911 0,251621012

309 210279_at GPR18 309 39,03136373 155,3738119 0,251209411

310 205627_at CDA 310 72,29770311 286,2792658 0,252542576

311 206208_at CA4 311 7,276368268 28,73302408 0,2532406

312 202768_at FOSB 312 813,7193284 206,7508908 3,935747629

313 210397_at DEFB1 313 100,8985594 25,76022308 3,916835622

314 202478_at TRIB2 314 46,28447853 181,8258135 0,254553947

315 212768_s_at OLFM4 317 21,86567681 82,21592322 0,265954282

316 210517_s_at AKAP12 315 21,06046949 80,9125085 0,260286943

317 206157_at PTX3 316 175,4752753 45,54020632 3,853194561

318 211372_s_at IL1R2 319 32,1615205 117,4894062 0,273739749

319 207892_at CD40LG 318 9,579237346 37,4450169 0,255821419 320 206145_at RHAG 320 155,697322 40,93290691 3,803720129

321 219790_s_at NPR3 321 25,48030957 6,591865991 3,865416803

322 209101_at CTGF 333 31,43797681 98,65688962 0,31865972

323 207384_at PGLYRP1 327 37,55728221 134,0435242 0,280187218

324 217889_s_at CYBRD1 322 33,92266153 8,813028867 3,849149032

325 AFFX-DapX-M_at — 323 10,45116394 40,57710595 0,257563069

326 AFFX-DapX-3_at — 324 3,009980604 11,57277719 0,260091468

327 213479_at NPTX2 325 57,5121763 15,41351592 3,731282116

328 204163_at EMIUN1 326 19,79347365 5,157990463 3,837438978

329 219753_at STAG 3 328 34,41725114 130,8446191 0,263039102

330 213418_at HSPA6 329 94,75691341 360,5381526 0,262820766

331 209687_at CXCL12 330 38,62336714 10,74669311 3,593976933

332 209774_x_at CXCL2 331 377,0516652 101,2161224 3,725213497

333 207840_at CD160 332 23,59459132 92,22980292 0,255823937

334 209368_at EPHX2 334 7,575194058 28,70908945 0,263860478

335 204482_at CLDN5 335 11,52757686 43,96198629 0,262216925

336 210762_s_at DLC1 336 131,6311686 34,33703476 3,833504249

337 201325_s_at EMP1 337 106,590224 28,47481783 3,743315397

338 201564_s_at FSCN1 338 90,18765003 23,30620864 3,869683458

339 20504 l_s_at 0RM1 /// 0RM2 340 13,62207281 49,31113214 0,276247416

340 209487_at RBPMS 339 49,81636296 13,37095141 3,725715653

341 212094_at PEG10 341 6,11628344 22,72014038 0,269200953

342 219737_s_at PCDH9 347 40,74671663 140,1220029 0,290794563

343 219947_at CLEC4A 342 52,315492 199,6909314 0,261982313

344 202458_at PRSS23 343 11,88231903 45,51876042 0,261042237

345 205040_at 0RM1 345 16,70943711 59,31793055 0,281692853

346 207143_at CDK6 344 98,47496801 25,58440479 3,849023216

347 214735_at IPCEF1 346 29,16411265 113,2214675 0,257584655

348 219396_s_at NEIL1 348 19,29679235 74,06164868 0,260550402

349 209772_s_at CD24 349 53,50458827 196,620889 0,272120569

350 202889_x_at MAP7 350 64,75898466 16,95485668 3,819494666

351 205456_at CD3E 351 88,70541382 340,1003006 0,260821333

352 210426_x_at RORA 352 24,4999862 93,27138492 0,262674198

353 220528_at VNN3 353 20,57181028 78,35327718 0,262552008

354 211893_x_at CD6 354 22,71738031 85,90547549 0,264446244

355 209570_s_at D4S234E /// FOXPl 355 10,86988873 41,60539973 0,26126149

356 207802_at CRISP3 361 33,0800047 107,7207623 0,307090332

357 206298_at ARHGAP22 356 50,68511461 13,5239308 3,747809374

358 213589_s_at B3GNTL1 357 66,46884401 17,42019768 3,815619389

359 206851_at RNASE3 358 792,4918987 222,5644658 3,560729678

360 204627_s_at ITGB3 359 33,67278139 117,0891953 0,287582311

361 208963_x_at FADS1 360 72,86226083 19,36492074 3,762590192

362 37152_at PPARD 362 21,34283775 81,33684836 0,262400599

363 219491_at LRFN4 363 43,87539292 11,54224652 3,80128711

364 219840_s_at TCL6 364 4,902334453 18,15862754 0,269972741

365 210432_s_at SCN3A 367 12,13879748 36,92689918 0,328725069 366 204150_at STAB1 365 536,3675127 145,002979 3,6990103

367 210847_x_at TNFRSF25 366 13,26552709 49,5194275 0,267885308

368 AFFX-r2-Ec-bioB-M_at — 368 225,151872 844,6451295 0,266563867

369 210495_x_at FN1 369 43,91255749 13,55438379 3,23973101

370 215352_at — 370 4,698420824 17,25699317 0,272261846

371 205780_at BIK 371 69,12782123 18,83026074 3,67110271

372 204363_at F3 372 21,74795736 7,063272999 3,079019792

373 210222_s_at RTN1 373 17,05262168 62,39665066 0,273293863

374 206111_at RNASE2 374 1897,890733 510,5854218 3,717087587

375 204420_at F0SL1 375 64,43432114 17,36979658 3,709561068

376 214039_s_at LAPTM4B 376 407,0318192 114,4323026 3,556966082

377 206371_at FOLR3 377 32,78225635 108,9641646 0,300853556

378 214406_s_at SLC7A4 378 3,572451858 13,2279157 0,270069143

379 40850_at FKBP8 379 92,04355549 320,0965874 0,287549318

380 202208_s_at ARL4C 380 83,03493325 308,9671664 0,268750024

381 221557_s_at LEF1 381 6,665880362 24,47212025 0,272386712

382 207723_s_at KLRC3 382 9,003876751 33,35409327 0,269948179

383 209604_s_at GATA3 383 84,75773903 311,082066 0,272461027

384 221166_at FGF23 384 6,890303801 25,16691863 0,273784165

385 216052_x_at ARTN 385 41,92526647 11,54855094 3,630348664

386 212531_at LCN2 386 123,9381705 400,6931348 0,309309443

387 211341_at POU4F1 387 136,563044 44,81182385 3,047477927

388 208789_at PTRF 388 45,82371052 13,50580322 3,392890433

389 212417_at SCAMPI 389 83,08371823 22,29484986 3,726587923

390 203153_at IFIT1 390 50,77713759 161,3290932 0,314742596

391 210327_s_at AGXT 391 11,93635623 3,564682525 3,348504711

392 212956_at TBC1D9 392 43,9621454 156,9219381 0,28015296

393 213345_at NFATC4 393 12,98766488 46,92612515 0,276768321

394 201744_s_at LUM 394 8,553660712 2,246540094 3,807481885

395 203716_s_at DPP4 395 3,900478927 10,60569133 0,367772246

396 204304_s_at PROM1 396 248,231266 76,72629975 3,235282646

397 204823_at NAV3 397 22,83940387 7,865496467 2,903745995

398 211451_s_at KCNJ4 398 5,901161865 21,48271516 0,274693484

399 206385_s_at ANK3 399 8,003621356 28,73572287 0,278525144

400 202178_at PRKCZ 400 28,72043762 103,694556 0,276971509

401 210479_s_at RORA 401 26,24105362 94,9665145 0,276319014

402 206641_at TNFRSF17 402 7,383637239 25,19867483 0,293016886

403 219519_s_at SIGLECl 403 11,35222793 33,21540395 0,341776001

404 221004_s_at ITM2C 404 399,7257188 110,3751871 3,621517928

405 205297_s_at CD79B 405 49,25287699 180,2145746 0,273301297

406 205801_s_at RASGRP3 406 67,14611383 20,29521765 3,308469758

407 217977_at SEPX1 407 275,6100752 1016,988696 0,271006036

408 205291_at IL2RB 408 108,9317352 413,417277 0,263491008

409 205767_at EREG 409 160,5319644 47,62248521 3,370927907

410 206522_at MGAM 410 52,86947241 179,8121031 0,294026217

411 219355_at CXorf57 411 2,509719861 8,364075645 0,300059441 412 204439_at IFI44L 412 57,48378003 171,7652467 0,334664789

413 209850_s_at CDC42EP2 413 21,30414858 79,03557646 0,269551378

414 207533_at CCL1 414 11,19043384 3,970571602 2,818343291

415 204081_at NRGN 415 166,0205022 603,4671564 0,275111082

416 202524_s_at SP0CK2 416 111,5556972 415,7774729 0,268306256

417 213194_at R0B01 417 29,67578968 9,434148553 3,145571592

418 209191_at TUBB6 418 382,1031007 103,235872 3,701262878

419 206545_at CD28 419 22,28602698 82,60916056 0,269776703

420 204044_at QPRT 420 56,55366927 16,02144715 3,529872724

421 219174_at IFT74 421 22,74132093 6,215253954 3,658952812

422 221698_s_at CLEC7A 422 135,8671929 473,0917455 0,287189946

423 205237_at FCN1 423 601,272474 2093,548815 0,28720251

424 219315_s_at TMEM204 424 53,21239446 198,6300266 0,267897031

425 216331_at ITGA7 425 17,87263636 5,3325625 3,35160373

426 221658_s_at IL21R 426 16,91898847 60,9356509 0,277653364

427 210202_s_at BIN1 427 28,64089372 103,5160191 0,276680788

428 214491_at SSTR3 428 5,028913459 17,99671709 0,279435046

429 202444_s_at ERLIN1 429 208,5380356 56,23837595 3,708109135

430 AFFX-r2-Bs-dap-3_at — 430 2,914327944 9,518616748 0,306171371

431 202688_at TNFSF10 431 171,8167109 610,3845345 0,281489293

432 214761_at ZNF423 432 12,86219673 41,62111802 0,309030544

433 211781_x_at — 433 87,3203394 277,0059158 0,31522915

434 214183_s_at TKTL1 434 67,05697902 26,06824719 2,572362404

435 219218_at BAHCC1 435 241,8785928 70,54449223 3,428738164

436 214208_at KLHL35 436 9,84969892 34,6065618 0,284619402

437 219360_s_at TRPM4 437 22,2462255 8,029596359 2,770528493

438 213474_at KCTD7 438 18,1145844 65,90770755 0,274847739

439 214073_at CTTN 439 6,909059922 23,352207 0,295863253

440 210550_s_at RASGRF1 440 4,651249578 15,96492653 0,291341747

441 204661_at CD52 441 421,8538252 1490,368782 0,283053316

442 215229_at LOC100129973 442 1,879865887 6,350893496 0,296000222

443 205730_s_at ABLIM3 443 4,290350355 14,44952868 0,296919744

444 209469_at GPM6A 444 4,418308303 13,47079877 0,32799156

445 211135_x_at LILRB3 445 104,7735173 358,2493216 0,292459779

446 219073_s_at OSBPL10 446 14,56733828 50,48107682 0,288570276

447 216831_s_at RUNXITI 447 5,55785056 1,8521834 3,000702069

448 209892_at FUT4 448 488,1113339 142,4203247 3,427258959

449 205476_at CCL20 449 20,24927654 13,37704654 1,513732981

450 205987_at CD1C 450 33,44184112 104,0186473 0,32149852

451 211743_s_at PRG2 451 388,5653205 147,9242207 2,626786328

452 206394_at MYBPC2 452 3,726424062 11,84805956 0,314517668

453 207655_s_at BLNK 453 79,61767293 253,0620248 0,314617229

454 211429_s_at SERPINA1 454 602,5901045 2073,691338 0,290588138

455 213395_at MLC1 455 196,9332 54,84027081 3,591032596

456 215599_at GUSBP3 456 115,1078605 33,24731704 3,462169905

457 210459_at PSMD4 457 5,513675912 18,15317821 0,303730611 458 217152_at — 458 13,85537991 49,19475683 0,281643427

459 214195_at TPP1 459 9,963782387 34,87030505 0,28573832

460 220848_x_at 0BP2A 460 5,149303189 17,61071259 0,292396072

461 203936_s_at MMP9 461 198,6648713 608,7010696 0,326375098

462 220110_s_at NXF3 462 29,29537075 9,418501985 3,110406602

463 206647_at HBZ 463 16,22106639 44,42459798 0,365137044

464 216598_s_at CCL2 464 17,15333473 27,46638387 0,624521044

465 215215_s_at LOC81691 465 67,00882524 19,02725285 3,521728847

466 205240_at GPSM2 466 47,6865216 14,18149453 3,362587877

467 210233_at 1 LI RAP 467 51,1505373 15,56395547 3,286474149

468 220485_s_at SIRPG 468 27,77533293 100,7989732 0,275551745

469 202728_s_at LTBP1 469 49,61464554 15,85609793 3,129057715

470 203131_at PDGFRA 470 7,612340927 18,83485909 0,404162351

471 213317_at CLIC5 471 9,585497195 29,5976418 0,323860166

472 213759_at — 472 6,452569733 21,92831689 0,29425741

473 205528_s_at RUNX1T1 473 22,96892059 7,761091583 2,959496141

474 209160_at AKR1C3 474 196,6497244 60,56969443 3,24666859

475 209560_s_at DLK1 475 76,04599909 28,4041981 2,677280267

476 213606_s_at ARHGDIA 476 100,5644113 29,04451502 3,462423497

477 217996_at PHLDA1 477 129,2736235 40,81200475 3,167539167

478 219839_x_at TCL6 478 3,376230181 10,9349598 0,30875561

AFFX-

479 HUMRGE/M10098_5_at — 479 155,1917966 367,4382667 0,422361552

480 211794_at FYB 480 47,56925293 165,3015158 0,287772636

481 211005_at LAT /// SPNS1 481 74,56530095 254,7457726 0,292704763

482 32625_at NPR1 482 9,413226801 31,45345243 0,299274835

483 206001_at NPY 483 33,90019072 107,7172116 0,314714707

484 202555_s_at MYLK 484 17,6498803 56,54429021 0,312142574

485 202947_s_at GYPC 485 438,0492591 1473,404162 0,297304209

486 208502_s_at PITX1 486 6,475260259 2,891004469 2,239796004

487 20688 l_s_at LILRA3 487 53,01479537 157,2896643 0,337051996

488 217032_at F0XD4 /// F0XD4L1 488 4,435979139 14,07091796 0,315258688

489 220105_at RTDR1 489 2,35459886 7,523745178 0,31295569

490 206760_s_at FCER2 490 8,269909642 18,27351255 0,452562671

491 202083_s_at SEC14L1 491 50,24268811 178,2359714 0,281888598

492 218876_at TPPP3 492 28,96722244 9,533025961 3,038617806

493 218717_s_at LEPREL1 493 5,797160007 19,54504803 0,296605053

494 213241_at PLXNC1 494 137,3833808 474,617399 0,289461324

495 216442_x_at FN1 495 44,7763732 16,68457747 2,683698359

496 219093_at PID1 496 17,69716266 56,70471044 0,312093343

497 206500_s_at C14orfl06 497 78,58177815 23,25790405 3,378712802

498 204724_s_at COL9A3 498 7,192744843 19,95750597 0,360402991

499 204187_at GMPR 499 82,32427111 262,0740554 0,314125986

500 214200_s_at C0L6A1 500 5,661479341 18,40994903 0,307522815

501 221081_s_at DENND2D 501 67,74893896 244,401497 0,277203453

502 206762_at KCNA5 502 6,458960684 13,90921891 0,464365449

503 205837_s_at GYPA /// GYPB 503 45,09590857 16,67651741 2,704156237 504 20341 l_s_at LMNA 504 334,7553006 96,68020934 3,462500784

505 205900_at KRT1 505 70,31502319 192,2066359 0,365830362

506 208107_s_at LOC81691 506 91,60864249 26,65203461 3,437210098

507 217422_s_at CD22 507 13,092683 41,22798088 0,317567893

508 213558_at PCLO 508 18,52869849 49,81515017 0,371949064

AFFX-r2-Hsl8SrRNA-

509 5_at — 509 170,9156246 396,3749727 0,431196812

510 205768_s_at SLC27A2 510 69,33620639 21,02451925 3,297873571

511 210784_x_at LILRB3 511 112,3644722 363,2603716 0,309322131

512 216953_s_at WT1 512 23,62027137 7,256229692 3,2551714

513 222288_at — 513 16,56522555 6,898363806 2,401326752

514 205529_s_at RUNXITI 514 60,82331353 21,10287 2,882229457

515 211010_s_at NCR3 515 9,280737065 31,33873876 0,296142647

516 201465_s_at JUN 516 245,8359286 84,29698208 2,916307589

517 206823_at L3MBTL 517 14,56830388 48,37964621 0,301124647

518 210548_at CCL23 518 50,97203163 15,97634998 3,190467892

519 214957_at ACTL8 519 8,586441992 28,71619083 0,29901048

520 210225_x_at LILRB3 520 107,2733578 344,2690325 0,311597465

521 202219_at SLC6A8 521 63,51307239 167,8602635 0,378368716

522 206258_at ST8SIA5 522 4,193388344 13,15678551 0,318724383

523 210487_at DNTT 523 136,3501608 348,2639946 0,391513803

524 204793_at GPRASP1 524 28,76781848 98,98179248 0,290637477

525 217646_at SURF1 525 3,398341487 11,58454026 0,293351433

526 206889_at PDIA2 526 6,995534807 23,2639881 0,300702303

527 206865_at HRK 527 4,284780741 13,15347348 0,325752794

528 220051_at PRSS21 528 127,828969 38,114854 3,353783514

529 208173_at IFNB1 529 14,78109024 6,924916205 2,134479293

530 209013_x_at TRIO 530 60,00903118 17,9124573 3,350128359

531 213419_at APBB2 531 3,948544808 11,26179798 0,350614068

532 221054_s_at TCL6 532 2,088266245 6,061255695 0,344527001

533 202391_at BASP1 533 465,5514334 1569,632379 0,296599025

534 216439_at TNK2 534 3,244107408 9,983822017 0,324936422

535 201743_at CD14 535 363,4036127 1079,531628 0,336630816

536 218614_at C12orf35 536 103,3335574 369,2586796 0,279840565

537 219332_at MICALL2 537 108,3058109 30,92565246 3,502135034

538 215184_at DAPK2 538 8,938536759 29,47403245 0,3032682

539 200706_s_at LITAF 539 268,9395082 961,1513014 0,279809753

540 210123_s_at CHRFAM7A /// CHRNA7 540 43,20985297 12,71264868 3,398965396

541 210693_at SPPL2B 541 34,74849862 11,04668857 3,145603173

542 215621_s_at IGHD 542 24,29191874 73,99951176 0,328271338

543 221011_s_at LBH 543 106,8910618 373,5487677 0,286150219

544 205898_at CX3CR1 544 290,80575 893,4310906 0,325493206

545 203382_s_at APOE 545 22,4631062 7,633104462 2,942853241

546 210215_at TFR2 546 110,6857898 36,2278831 3,055265181

547 211965_at ZFP36L1 547 33,5120891 108,2544284 0,309567836

548 218454_at PLBD1 548 376,9197365 1164,065785 0,323795907

549 219667_s_at BANK1 549 49,6639925 163,6574876 0,303463002 550 209890_at TSPAN5 550 56,71657358 176,9493755 0,320524294

551 222313_at — 551 47,22704184 15,0289847 3,14239736

552 203386_at TBC1D4 552 31,95159083 110,4209654 0,289361633

553 206023_at NMU 553 10,27969494 4,515081122 2,276746455

554 215116_s_at DNM1 554 107,4454296 31,12432687 3,452136653

555 202833_s_at SERPINA1 555 395,5598101 1204,491906 0,328403876

556 220448_at KCNK12 556 10,51237011 22,24892968 0,472488801

557 204655_at CCL5 557 203,9761701 673,5873305 0,302820675

558 214433_s_at SELENBP1 558 205,4009036 550,9771577 0,372793864

559 201163_s_at IGFBP7 559 436,8289333 129,6484475 3,369334087

560 205020_s_at ARL4A 560 333,8241393 100,3344959 3,327112339

561 201667_at GJA1 561 26,84940228 10,5567582 2,543337811

562 201951_at ALCAM 562 98,41207931 28,33146363 3,473596725

563 207900_at CCL17 563 5,395262864 14,21679318 0,379499286

564 204961_s_at NCF1 /// NCF1B /// NCF1C 564 227,2586246 703,0853062 0,323230514

565 220085_at HELLS 565 27,58974469 9,313601559 2,962306742

566 202086_at MX1 566 145,8156484 433,4498026 0,336407232

567 206707_x_at FAM65B 567 208,9449058 718,834561 0,290671758

568 214049_x_at CD7 568 57,34040344 150,5641058 0,38083714

569 221210_s_at NPL 569 43,84888418 137,6447879 0,318565525

570 205559_s_at PCSK5 570 13,67756241 47,57753162 0,287479446

571 217147_s_at TRAT1 571 24,10594227 74,64821006 0,322927264

572 209173_at AGR2 572 16,44547721 3,727531975 4,411894337

573 209870_s_at APBA2 573 39,47313158 131,8007164 0,299491025

574 218000_s_at PHLDA1 574 18,11620036 6,156224782 2,942745108

575 204915_s_at S0X11 575 8,008870153 19,88810888 0,402696415

576 209098_s_at JAG1 576 52,41753592 16,05357925 3,265161938

577 211062_s_at CPZ 577 4,993545378 15,3553649 0,325198744

578 215666_at HLA-DRB4 578 11,54287437 28,64390811 0,402978334

579 218788_s_at SMYD3 579 241,2241714 70,15763805 3,438316598

580 201393_s_at IGF2R 580 95,90477435 305,094262 0,314344733

581 212464_s_at FN1 581 28,33237471 12,24010107 2,314717383

582 204836_at GLDC 582 12,2735886 36,41385321 0,337058222

583 204914_s_at S0X11 583 9,743656564 17,28567813 0,56368379

584 205569_at LAMP3 584 7,533652404 23,56021919 0,319761558

585 205947_s_at VIPR2 585 3,856028319 11,72181071 0,328961832

586 213809_x_at — 586 6,924154548 22,70759828 0,304926768

587 203290_at HLA-DQA1 587 59,06591664 147,3332677 0,400900065

588 205551_at SV2B 588 31,02442444 10,38834724 2,986463941

589 210571_s_at CMAH 589 57,00508063 17,60843778 3,237372976

590 206237_s_at NRG1 590 4,193441279 13,14575131 0,318995939

591 211560_s_at ALAS2 591 449,6626641 1171,81268 0,383732547

592 216560_x_at IGL@ 592 16,20418794 32,65910164 0,496161472

593 205805_s_at R0R1 593 6,999813141 20,09972135 0,348254238

594 206589_at GFI1 594 171,5736844 53,47557389 3,208449613

595 213737_x_at LOC728498 595 1421,978175 409,5601867 3,471963881 596 220832_at TLR8 596 30,10469076 81,48279796 0,36946069

597 202207_at ARL4C 597 188,0838384 608,8454928 0,308918832

598 206025_s_at TNFAIP6 598 37,87001587 109,062072 0,3472336

599 209949_at NCF2 599 428,0947114 1394,03312 0,307090775

600 211791_s_at KCNAB2 600 70,2584383 21,02265115 3,342035112

601 215408_at — 601 5,061611061 15,48399674 0,326893059

602 220757_s_at UBXN6 602 78,62345214 253,0770162 0,310670061

603 38487_at STAB1 603 627,2784611 211,1342638 2,970993195

604 205769_at SLC27A2 604 85,24096679 28,03232222 3,040810038

605 206655_s_at GPIBB /// SEPT5 605 52,43712352 134,5859121 0,38961822

606 210370_s_at LY9 606 27,88002563 92,89604242 0,300120704

607 219971_at IL21R 607 11,13659814 36,49920802 0,305118898

608 221933_at NLGN4X 608 5,041460688 12,93262025 0,389825154

609 204446_s_at AL0X5 609 275,4781873 890,9996409 0,309178786

610 205863_at S100A12 610 408,92268 1180,59847 0,346368973

611 206584_at LY96 611 65,72672895 195,2368465 0,336651253

612 210254_at MS4A3 612 494,4628085 196,6079032 2,514969136

613 210794_s_at MEG3 613 53,90626914 24,46522773 2,203383093

614 211322_s_at SARDH 614 9,099867531 28,71755245 0,31687476

615 200965_s_at ABLIM1 615 139,7876868 428,2493685 0,326416563

616 206126_at CXCR5 616 17,05125613 54,53607654 0,312660118

617 210332_at LOC100287322 617 7,822669126 24,37482588 0,320932308

618 214223_at — 618 11,11240596 34,02430162 0,326602029

619 217002_s_at HTR3A 619 6,304007273 19,99209746 0,315324957

620 217007_s_at ADAM15 620 40,64499847 12,57143926 3,233122129

621 219541_at UME1 621 31,325066 96,91941105 0,32320735

622 220139_at DNMT3L 622 4,900931854 15,28529644 0,320630475

623 220359_s_at ARPP21 623 8,167500742 17,12517946 0,476929352

624 220684_at TBX21 624 49,15091719 172,7943202 0,284447528

625 221908_at RNFT2 625 9,84315517 28,14751293 0,349698931

626 202007_at NIDI 626 43,83762291 15,37301939 2,851594849

627 202074_s_at OPTN 627 100,9324187 300,4300815 0,335959762

628 204560_at FKBP5 628 90,55096559 32,97152091 2,746338752

629 206363_at MAF 629 19,28349034 52,38644688 0,368100749

630 206940_s_at POU4F1 630 162,354841 73,96280564 2,195087647

631 206983_at CCR6 631 13,09213821 43,87107396 0,298423016

632 207550_at MPL 632 66,92029938 23,30423292 2,87159417

633 210873_x_at APOBEC3A 633 89,05940092 247,2303209 0,360228473

634 211728_s_at HYAL3 634 50,42111505 15,88323173 3,174487152

635 212400_at FAM102A 635 79,22250974 272,917962 0,290279574

636 212526_at SPG20 636 170,4150398 49,90562301 3,414746266

637 212951_at GPR116 637 4,368074965 13,02488495 0,335363804

638 213217_at ADCY2 638 25,58959654 8,693213393 2,943629171

639 214639_s_at HOXA1 639 14,19516046 4,768993962 2,976552408

640 219511_s_at SNCAIP 640 21,91485939 9,387562922 2,334456724

641 201188_s_at ITPR3 641 21,00405595 66,65039799 0,315137742 642 20129 l_s_at TOP2A 642 113,3517316 41,65536344 2,721179755

643 201418_s_at SOX4 643 266,2805664 89,39503398 2,978695287

644 202411_at IFI27 644 67,01374124 132,2353661 0,506776237

645 203038_at PTPRK 645 18,86888439 53,44034439 0,353083136

646 203979_at CYP27A1 646 26,03933076 64,9624031 0,400836938

647 204011_at SPRY2 647 149,8074771 58,50625409 2,560537834

648 205268_s_at ADD2 648 74,82011176 29,71762923 2,517701234

649 205414_s_at RICH2 649 5,75002671 14,22008435 0,404359536

650 206338_at ELAVL3 650 6,189514338 18,93997948 0,326796254

651 206478_at KIAA0125 651 305,2692051 103,5767082 2,947276569

652 206759_at FCER2 652 19,31055036 45,29108946 0,426365331

653 206964_at NAT8B 653 13,39217244 43,83345397 0,305524006

654 207166_at GNGT1 654 2,651431532 6,160516593 0,430391103

655 208582_s_at DUXl /// DUX3 /// DUX5 655 2,817130586 7,464028784 0,377427616

656 208605_s_at NTRK1 656 13,36495444 6,104661657 2,189303058

657 209116_x_at HBB 657 5151,46449 15690,4194 0,328319107

658 209318_x_at PLAGL1 658 339,5678397 98,19916762 3,457950286

659 209387_s_at TM4SF1 659 27,32845715 9,592380299 2,848975572

660 209969_s_at STAT1 660 77,05106756 227,0532379 0,339352428

661 211397_x_at KIR2DL2 661 10,7564824 34,88724019 0,30832139

662 211699_x_at HBA1 /// HBA2 662 5042,097129 16013,90625 0,314857415

663 212396_s_at KIAA0090 663 54,75826763 17,21615004 3,180633737

664 212589_at RRAS2 664 20,46611417 56,71154848 0,360880891

665 213069_at HEG1 665 12,77477924 36,8291775 0,346865722

666 213094_at GPR126 666 37,38863864 16,22611479 2,304226189

667 213354_s_at LOC100287051 667 2,193403893 5,890678695 0,372351643

668 214142_at ZG16 668 3,992833875 12,25666984 0,325768249

669 214920_at THSD7A 669 29,31285529 13,850589 2,116361643

670 215117_at RAG2 670 5,775528094 15,78789807 0,365819951

671 216485_s_at TPSAB1 671 7,580271498 22,89962185 0,331021689

672 217062_at LOC100287204 672 3,276983203 10,48640413 0,312498275

673 217394_at — 673 11,13730093 36,86690698 0,302094801

674 217523_at CD44 674 265,8406832 86,63753958 3,06842374

675 219463_at C20orfl03 675 106,8746644 57,17508763 1,869252306

676 22075 l_s_at C5orf4 676 53,36359127 126,6530725 0,421336729

677 221920_s_at SLC25A37 677 188,720358 552,4597632 0,341600186

678 222139_at KIAA1466 678 20,74906306 67,17899751 0,30886235

679 222211_x_at SCAND2 679 7,051228533 21,42200102 0,329158258

680 34210_at CD52 680 329,6937304 1033,462373 0,31901861

Table 3

Example 1 Example 2 Example 3 Example 4 Example 5

201242_s_at 201242_s_at 201427_s_at 200935_at 201058_s_at

201427_s_at 201427_s_at 202761_s_at 201189_s 201242_s_at

201601 x at 201506 at 203434 s at 201242 s 201427 s at 202075_s_at 202718_at 203435_s_at 201427_s_at 202718_at

202718_at 202890_at 203691_at 201506_at 203066_at

202890_at 203066_at 203828_s_at 202075_s_at 203434_s_at

203066_at 203434_s_at 203948_s_at 202718_at 203435_s_at

203434_s_at 203435_s_at 203949_at 203066_at 203691_at

203435_s_at 203691_at 204006_s_at 203434_s_at 203828_s_at

203691_at 203828_s_at 204007_at 203435_s_at 203948_s_at

203828_s_at 203948_s_at 204115_at 203485_at 203949_at

203948_s_at 203949_at 204468_s_at 203691_at 204006_s_at

203949_at 204006_s_at 204548_at 203828_s_at 204007_at

204006_s_at 204007_at 204561_x_at 203948_s_at 204115_at

204007_at 204115_at 204698_at 203949_at 204548_at

204115_at 204468_s_at 204885_s_at 204006_s_at 204561_x_at

204468_s_at 204548_at 204890_s_at 204007_at 204698_at

204548_at 204561_x_at 20489 l_s_at 204115_at 204885_s_at

204561_x_at 204698_at 20505 l_s_at 204468_s_at 204890_s_at

204581_at 204885_s_at 205119_s_at 204548_at 20489 l_s_at

204698_at 204890_s_at 205131_x_at 204561_x_at 20505 l_s_at

204885_s_at 20489 l_s_at 205254_x_at 204581_at 205119_s_at

204890_s_at 20505 l_s_at 205267_at 204698_at 205131_x_at

20489 l_s_at 205119_s_at 205366_s_at 204885_s_at 205174_s_at

20505 l_s_at 205131_x_at 205484_at 204890_s_at 205254_x_at

205119_s_at 205239_at 205488_at 20489 l_s_at 205267_at

205131_x_at 205254_x_at 205495_s_at 20505 l_s_at 205366_s_at

205254_x_at 205267_at 205568_at 205119_s_at 205484_at

205267_at 205366_s_at 205590_at 205131_x_at 205488_at

205366_s_at 205484_at 205609_at 205174_s_at 205495_s_at

205484_at 205488_at 205624_at 205254_x_at 205568_at

205488_at 205495_s_at 205683_x_at 205267_at 205624_at

205495_s_at 205568_at 205798_at 205366_s_at 205683_x_at

205568_at 205590_at 205831_at 205403_at 205798_at

205590_at 205609_at 205922_at 205484_at 205831_at

205609_at 205624_at 206067_s_at 205488_at 205922_at

205624_at 205683_x_at 206150_at 205495_s_at 20593 l_s_at

205683_x_at 205798_at 206222_at 205568_at 206067_s_at

205798_at 205831_at 206255_at 205590_at 206135_at

205831_at 205922_at 206310_at 205609_at 206150_at

205922_at 206067_s_at 206413_s_at 205624_at 206222_at

206067_s_at 206135_at 206515_at 205653_at 206310_at

206135_at 206150_at 206591_at 205683_x_at 206413_s_at

206150_at 206222_at 206622_at 205798_at 206515_at

206222_at 206310_at 206666_at 205831_at 206591_at

206255_at 206413_s_at 206785_s_at 205899_at 206622_at

206310_at 206515_at 206804_at 205922_at 206666_at

206413_s_at 206591_at 206871_at 20593 l_s_at 206765_at

206515 at 206622 at 207008 at 206067 s at 206804 at 206591_at 206666_at 207094_at 206135_at 207008_at

206622_at 206674_at 207134_x_at 206150_at 207094_at

206666_at 206765_at 207339_s_at 206222_at 207134_x_at

206804_at 206785_s_at 207741_x_at 206255_at 207339_s_at

206871_at 206804_at 207815_at 206310_at 207651_at

207008_at 206871_at 207826_s_at 206413_s_at 207741_x_at

207094_at 207008_at 207850_at 206480_at 207815_at

207134_x_at 207094_at 207907_at 206515_at 207826_s_at

207339_s_at 207134_x_at 209395_at 206591_at 207907_at

207651_at 207339_s_at 209396_s_at 206622_at 208304_at

207741_x_at 207741_x_at 209488_s_at 206666_at 208406_s_at

207815_at 207815_at 209670_at 206674_at 209488_s_at

207826_s_at 207826_s_at 209671_x_at 206765_at 209670_at

207850_at 207907_at 209757_s_at 206804_at 209671_x_at

207907_at 208406_s_at 209905_at 206871_at 209757_s_at

208406_s_at 209395_at 209995_s_at 207008_at 209905_at

209488_s_at 209396_s_at 210031_at 207094_at 209995_s_at

209670_at 209488_s_at 210084_x_at 207134_x_at 210031_at

209671_x_at 209670_at 210119_at 207339_s_at 210084_x_at

209757_s_at 209671_x_at 210164_at 207341_at 210119_at

209905_at 209757_s_at 210321_at 207651_at 210164_at

209995_s_at 209905_at 210439_at 207741_x_at 210321_at

210031_at 209995_s_at 210484_s_at 207815_at 210484_s_at

210084_x_at 210031_at 210549_s_at 207826_s_at 210549_s_at

210119_at 210084_x_at 210783_x_at 207907_at 210724_at

210164_at 210119_at 210915_x_at 208304_at 210783_x_at

210321_at 210164_at 210972_x_at 208406_s_at 210915_x_at

210439_at 210321_at 210997_at 209395_at 210972_x_at

210484_s_at 210356_x_at 210998_s_at 209396_s_at 210997_at

210549_s_at 210484_s_at 211163_s_at 209488_s_at 210998_s_at

210783_x_at 210549_s_at 211339_s_at 209602_s_at 211163_s_at

210915_x_at 210724_at 211796_s_at 209670_at 211339_s_at

210972_x_at 210783_x_at 211902_x_at 209671_x_at 211709_s_at

210997_at 210915_x_at 212775_at 209757_s_at 211796_s_at

210998_s_at 210972_x_at 212776_s_at 209905_at 211902_x_at

211163_s_at 210997_at 213110_s_at 209995_s_at 212775_at

211339_s_at 211163_s_at 213150_at 210031_at 213110_s_at

211709_s_at 211339_s_at 213193_x_at 210084_x_at 213150_at

211796_s_at 211709_s_at 213844_at 210119_at 213193_x_at

211902_x_at 211796_s_at 213906_at 210164_at 213611_at

212775_at 211902_x_at 214470_at 210321_at 213844_at

212776_s_at 212775_at 214567_s_at 210439_at 214470_at

213110_s_at 212776_s_at 214575_s_at 210484_s_at 214575_s_at

213150_at 212914_at 214651_s_at 210549_s_at 214651_s_at

213193_x_at 213110_s_at 215382_x_at 210724_at 215382_x_at

213258 at 213150 at 216474 x at 210783 x at 216474 x at 213844_at 213193_x_at 216782_at 210915_x_at 216565_x_at

213958_at 213258_at 217023_x_at 210972_x_at 216782_at

214470_at 213844_at 217418_x_at 210997_at 217023_x_at

214567_s_at 214470_at 217572_at 210998_s_at 217418_x_at

214575_s_at 214575_s_at 219243_at 211163_s_at 218805_at

214651_s_at 214651_s_at 219529_at 211339_s_at 218963_s_at

215382_x_at 215382_x_at 219630_at 211709_s_at 219243_at

216474_x_at 216474_x_at 220010_at 211796_s_at 219630_at

216782_at 216782_at 220068_at 211902_x_at 220005_at

217023_x_at 217023_x_at 220416_at 212775_at 220010_at

217418_x_at 217418_x_at 220418_at 212776_s_at 220068_at

218963_s_at 217572_at 220807_at 213110_s_at 220416_at

219243_at 218805_at 221345_at 213150_at 220418_at

219529_at 218963_s_at 221349_at 213193_x_at 220744_s_at

219630_at 219054_at 221558_s_at 213258_at 220807_at

220010_at 219243_at 221601_s_at 213830_at 221211_s_at

220068_at 219630_at 221602_s_at 213844_at 221345_at

220416_at 219837_s_at 222222_s_at 214470_at 221558_s_at

220418_at 220005_at 222285_at 214567_s_at 221602_s_at

220646_s_at 220010_at 37145_at 214575_s_at 221958_s_at

220807_at 220068_at 39318_at 214651_s_at 222222_s_at

221345_at 220377_at 41469_at 215382_x_at 222285_at

221349_at 220416_at 44790_s_at 216474_x_at 37145_at

221558_s_at 220418_at AFFX-BioB-3_at 216565_x_at 39318_at

221602_s_at 220646_s_at AFFX-r2-Bs-dap-5_at 216782_at 41469_at

221958_s_at 220807_at AFFX-r2-Bs-dap-M_at 217023_x_at AFFX-r2-Bs- dap-5_at

222222_s_at 221211_s_at 217418_x_at AFFX-r2-Bs- dap-M_at

222285_at 221345_at 218963_s_at

37145_at 221558_s_at 219054_at

39318_at 221602_s_at 219243_at

41469_at 221958_s_at 219630_at

AFFX-r2-Bs-dap-5_at 222285_at 219837_s_at

AFFX-r2-Bs-dap-M_at 37145_at 220005_at

39318_at 220010_at

41469_at 220057_at

44790_s_at 220068_at

AFFX-r2-Bs-dap-5_at 220416_at

AFFX-r2-Bs-dap-M_at 220418_at

220744_s_at

221211_s_at

221345_at

221349_at

221558_s_at

221602_s_at

221958 s at 222222_s_at

222285_at

37145_at

39318_at

41469_at

44790_s_at

AFFX-BioB-3_at

AFFX-r2-Bs-dap-5_at

AFFX-r2-Bs-dap-M_at

Table 4

Example 6 Example 7 Example 8 Example 9 Example 10

201189_s_at 201189_s_at 201427_s_at 201058_s_at 200935_at

201427_s_at 201242_s_at 201506_at 201242_s_at 201058_s_at

202718_at 201427_s_at 203066_at 201427_s_at 201242_s_at

203066_at 201506_at 203434_s_at 202718_at 201427_s_at

203434_s_at 202890_at 203435_s_at 203066_at 201601_x_at

203435_s_at 203066_at 203691_at 203434_s_at 202718_at

203691_at 203434_s_at 203828_s_at 203435_s_at 203066_at

203828_s_at 203435_s_at 203948_s_at 203691_at 203434_s_at

203948_s_at 203691_at 203949_at 203828_s_at 203435_s_at

203949_at 203828_s_at 204006_s_at 203948_s_at 203691_at

204006_s_at 203948_s_at 204007_at 203949_at 203828_s_at

204007_at 203949_at 204115_at 204006_s_at 203948_s_at

204115_at 204006_s_at 204468_s_at 204007_at 203949_at

204468_s_at 204007_at 204548_at 204115_at 204006_s_at

204548_at 204115_at 204561_x_at 204468_s_at 204007_at

204561_x_at 204468_s_at 204698_at 204548_at 204115_at

204581_at 204548_at 204885_s_at 204561_x_at 204468_s_at

204698_at 204561_x_at 20489 l_s_at 204698_at 204548_at

204885_s_at 204581_at 20505 l_s_at 204885_s_at 204561_x_at

204890_s_at 204698_at 205119_s_at 204890_s_at 204581_at

20489 l_s_at 204885_s_at 205131_x_at 20489 l_s_at 204698_at

20505 l_s_at 204890_s_at 205174_s_at 20505 l_s_at 204885_s_at

205119_s_at 20489 l_s_at 205267_at 205119_s_at 204890_s_at

205131_x_at 20505 l_s_at 205366_s_at 205131_x_at 20489 l_s_at

205174_s_at 205119_s_at 205484_at 205254_x_at 20505 l_s_at

205254_x_at 205131_x_at 205488_at 205267_at 205119_s_at

205267_at 205174_s_at 205495_s_at 205366_s_at 205131_x_at

205366_s_at 205254_x_at 205568_at 205484_at 205254_x_at

205484_at 205267_at 205609_at 205488_at 205267_at

205488_at 205366_s_at 205624_at 205495_s_at 205366_s_at

205495_s_at 205484_at 205683_x_at 205568_at 205484_at

205568_at 205488_at 205798_at 205609_at 205488_at

205590 at 205495 s at 205922 at 205624 at 205495 s at 205624_at 205568_at 20593 l_s_at 205683_x_at 205568_at

205683_x_at 205590_at 206067_s_at 205798_at 205590_at

205798_at 205609_at 206135_at 205899_at 205609_at

205831_at 205624_at 206150_at 205922_at 205624_at

205922_at 205683_x_at 206222_at 206067_s_at 205653_at

206150_at 205798_at 206310_at 206135_at 205683_x_at

206255_at 205831_at 206413_s_at 206150_at 205798_at

206310_at 205899_at 206515_at 206222_at 205831_at

206398_s_at 205922_at 206591_at 206255_at 205922_at

206413_s_at 20593 l_s_at 206622_at 206310_at 206067_s_at

206515_at 206067_s_at 206666_at 206413_s_at 206135_at

206591_at 206135_at 206674_at 206480_at 206150_at

206622_at 206150_at 206804_at 206591_at 206222_at

206666_at 206222_at 206871_at 206622_at 206255_at

206804_at 206255_at 207008_at 206666_at 206310_at

206871_at 206310_at 207094_at 206804_at 206413_s_at

207008_at 206413_s_at 207134_x_at 206871_at 206591_at

207094_at 206515_at 207339_s_at 207008_at 206622_at

207134_x_at 206591_at 207341_at 207094_at 206666_at

207339_s_at 206622_at 207741_x_at 207134_x_at 206674_at

207741_x_at 206666_at 207815_at 207339_s_at 206804_at

207815_at 206674_at 207826_s_at 207741_x_at 206871_at

207826_s_at 206765_at 207907_at 207815_at 207008_at

207907_at 206804_at 208406_s_at 207826_s_at 207094_at

208406_s_at 207008_at 209395_at 207850_at 207134_x_at

209101_at 207094_at 209396_s_at 207907_at 207339_s_at

209395_at 207134_x_at 209488_s_at 208406_s_at 207651_at

209488_s_at 207339_s_at 209670_at 209488_s_at 207741_x_at

209670_at 207651_at 209671_x_at 209670_at 207815_at

209671_x_at 207741_x_at 209757_s_at 209671_x_at 207826_s_at

209757_s_at 207815_at 209905_at 209757_s_at 207907_at

209905_at 207826_s_at 209995_s_at 209905_at 208304_at

209995_s_at 207907_at 210031_at 209995_s_at 208406_s_at

210031_at 208304_at 210084_x_at 210031_at 209395_at

210084_x_at 208406_s_at 210119_at 210084_x_at 209396_s_at

210119_at 209488_s_at 210164_at 210119_at 209488_s_at

210164_at 209670_at 210321_at 210164_at 209670_at

210321_at 209671_x_at 210439_at 210321_at 209671_x_at

210439_at 209757_s_at 210484_s_at 210439_at 209757_s_at

210484_s_at 209905_at 210549_s_at 210484_s_at 209905_at

210549_s_at 209995_s_at 210724_at 210549_s_at 209995_s_at

210783_x_at 210031_at 210783_x_at 210724_at 210031_at

210915_x_at 210084_x_at 210915_x_at 210783_x_at 210084_x_at

210972_x_at 210119_at 210972_x_at 210915_x_at 210119_at

210997_at 210164_at 210997_at 210972_x_at 210164_at

210998 s at 210321 at 210998 s at 210997 at 210321 at 211163_s_at 210484_s_at 211163_s_at 210998_s_at 210439_at

211339_s_at 210549_s_at 211339_s_at 211163_s_at 210484_s_at

211796_s_at 210724_at 211709_s_at 211339_s_at 210549_s_at

211902_x_at 210783_x_at 211796_s_at 211709_s_at 210724_at

212775_at 210915_x_at 211902_x_at 211796_s_at 210783_x_at

213110_s_at 210972_x_at 212775_at 211902_x_at 210915_x_at

213150_at 210997_at 212776_s_at 212775_at 210972_x_at

213193_x_at 210998_s_at 213110_s_at 212776_s_at 210997_at

213258_at 211163_s_at 213150_at 213110_s_at 210998_s_at

213830_at 211339_s_at 213193_x_at 213150_at 211163_s_at

213844_at 211796_s_at 213258_at 213193_x_at 211339_s_at

214575_s_at 211902_x_at 213844_at 213258_at 211709_s_at

214651_s_at 212775_at 214470_at 213844_at 211796_s_at

215382_x_at 212776_s_at 214567_s_at 214470_at 211902_x_at

216191_s_at 212914_at 214575_s_at 214567_s_at 212775_at

216474_x_at 213110_s_at 214651_s_at 214575_s_at 212776_s_at

216782_at 213150_at 215382_x_at 214651_s_at 213110_s_at

217023_x_at 213193_x_at 216474_x_at 215382_x_at 213150_at

217418_x_at 213258_at 216565_x_at 216474_x_at 213193_x_at

217572_at 213844_at 216782_at 216565_x_at 213258_at

218963_s_at 213906_at 217023_x_at 216782_at 213844_at

219243_at 214470_at 217418_x_at 217023_x_at 214022_s_at

219529_at 214567_s_at 217572_at 217418_x_at 214470_at

219630_at 214575_s_at 218963_s_at 219054_at 214567_s_at

220010_at 214651_s_at 219243_at 219243_at 214575_s_at

220068_at 215382_x_at 219529_at 219529_at 214651_s_at

220416_at 216474_x_at 219630_at 219630_at 215382_x_at

220418_at 216565_x_at 219737_s_at 219737_s_at 216474_x_at

220807_at 216782_at 219837_s_at 220005_at 216565_x_at

221234_s_at 217023_x_at 220005_at 220010_at 216782_at

221345_at 217418_x_at 220010_at 220068_at 217023_x_at

221349_at 217572_at 220068_at 220416_at 217418_x_at

221558_s_at 218805_at 220377_at 220418_at 217572_at

221601_s_at 218963_s_at 220416_at 220646_s_at 218805_at

221602_s_at 219054_at 220418_at 220744_s_at 218963_s_at

221958_s_at 219243_at 220807_at 220807_at 219054_at

222222_s_at 219630_at 221211_s_at 221345_at 219243_at

222285_at 220005_at 221345_at 221349_at 219630_at

37145_at 220010_at 221558_s_at 221558_s_at 220005_at

39318_at 220068_at 221602_s_at 221958_s_at 220010_at

41469_at 220377_at 222285_at 222222_s_at 220068_at

44790_s_at 220416_at 37145_at 222285_at 220416_at

AFFX-r2-Bs-dap-5_at 220418_at 39318_at 37145_at 220418_at

AFFX-r2-Bs-dap-M_at 220744_s_at 41469_at 39318_at 220807_at

220807_at 44790_s_at 41469_at 221345_at

221211_s_at AFFX-r2-Bs-dap-5_at 44790_s_at 221349_at 221345_at AFFX-r2-Bs-dap-M_at AFFX-r2-Bs-dap-5_at 221558_s_at

221349_at AFFX-r2-Bs-dap-M_at 221601_s_at

221558_s_at 221602_s_at

221602_s_at 222285_at

221958_s_at 37145_at

222285_at 39318_at

37145_at 41469_at

39318_at AFFX-r2-Bs- dap-5_at

41469_at AFFX-r2-Bs- dap-M_at

44790_s_at

AFFX-r2-Bs-dap-5_at

Table 5

Example 11 Example 12 Example 13 Example 14 Example 15

AUC VSl 0,9591275 0,9591275 0,9591275 0,9591275 0,9591275

AUC VS2 0,8669514 0,8669514 0,8669514 0,8669514 0,8669514

203434_ _s_at 203434_ _s_at 203434_ _s_at 203434_ _s_at 203434_ _s_at

203435_ _s_at 203435_ _s_at 203435_ _s_at 203435_ _s_at 203435_ _s_at

204007_ _at 204007_ _at 204007_ _at 204007_ _at 204007_ _at

207008_ _at 207008_ _at 207008_ _at 207008_ _at 207008_ _at

207094_ _at 207094_ _at 207094_ _at 207094_ _at

210084_ x_at 210084_ x_at 210084_ x_at 210084_ x_at

211163_ _s_at 211163_ _s_at 211163_ _s_at

217023_ x_at 217023_ x_at 217023_ x_at

209905_ _at 209905_ _at

203691 at 203691 at

216782_at 204006 s at

Claims

1. A method for the detection of Acute Myeloid Leukemia (AML) in a human subject based on RNA from a blood sample obtained from said subject, comprising :

concluding based on the measured abundance whether the subject has AML.

2. The method of claim 1, wherein

(i) the abundance of the first 4 RNAs listed in Table 2 is measured ;

(ii) the abundance of at least 6 RNAs listed in Table 2, preferably of the first 6 RNAs listed in Table 2 is measured;

(iii) the abundance of at least 8 RNAs listed in Table 2, preferably of the first 8 RNAs listed in Table 2 is measured,

(iv) the abundance of at least 12 RNAs listed in Table 2, preferably of the first 12 RNAs listed in Table 2 is measured,

(v) the abundance of at least 14 RNAs listed in Table 2, preferably of the first 14 RNAs listed in Table 2 is measured .

3. The method of claim 1 or 2, wherein concluding comprises classifying the sample as being from a healthy individual or from an individual having AML based on the specific difference of the abundance of the at least 4 RNAs in healthy individuals versus the abundance of the at least 4 RNAs in individuals with AML.

4. The method of claims 1 to 3, wherein concluding whether the subject has AML comprises the step of training a classification algorithm on a training set of cases and controls, and applying it to measured RNA abundance.

5. The method of claim 3 or 4, wherein the classifying is achieved by applying a random forest method, a support vector machine (SVM), or a K-nearest neighbor method (K-NN), such as a 3-nearest neighbor method (3-NN), linear discrimination analysis (LDA) or prediction analysis for microarrays (PAM).

6. The method of claims 1 to 5, wherein the measuring of RNA abundance is performed using a microarray, a real-time polymerase chain reaction or sequencing.

7. The method of claims 1 to 6, wherein the measuring of the abundance is performed through a hybridization with probes for determining the abundance of the at least 4 RNAs of Table 2, preferably said probes comprise 15 to 150, most preferably 30 to 70 consecutive nucleotides with a reverse complementary sequence to the at least 4 RNAs whose abundance is to be determined.

8. The method of claims 1 to 7, wherein

(i) the RNA of the sample to be determined is mRNA, cDNA, micro RNA, small nuclear RNA, unspliced RNA, or its fragments; and/or

(ii) the abundance of the RNAs in the sample are increased or decreased as shown in Table 2.

9. A microarray, comprising a solid support and a set of oligonucleotide probes, the set containing from 4 to about 3,000 probes, and including at least 4 probes for detecting an RNA selected from Table 2.

10. The microarray of claim 9, wherein the probes comprise 15 to 150, preferably 30 to 70 consecutive nucleotides with a reverse complementary sequence to the at least 4 RNAs.

11. Use of a microarray of claim 9 or 10 for detection of AML in a human subject based on RNA from a blood sample, comprising measuring the abundance of at least 4 RNAs listed in Table 2.

12. A kit for the detection of AML in a human subject based on RNA obtained from a blood sample, comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2, preferably comprising means for exclusively measuring the abundance of RNAs that are chosen from Table 2.

13. The kit of claim 12, which comprises

probes comprising 15 to 150, preferably 30 to 70 consecutive nucleotides with a reverse complementary sequence to the at least 4 RNAs whose abundance is to be determined, or

a microarray comprising probes with a reverse complementary sequence to the at least 4 RNAs whose abundance is to be determined .

14. Use of a kit of claim 12 or 13 for the detection of AML in a human subject based on RNA from a blood sample, comprising means for measuring the abundance of at least 4 RNAs that are chosen from the RNAs listed in Table 2, comprising measuring the abundance of at least 4 RNAs in a blood sample from a human subject, wherein the at least 4 RNAs are chosen from the RNAs listed in Table 2, and

concluding based on the measured abundance whether the subject has AML.

15. A method for preparing an RNA expression profile that is indicative of the presence or absence of AML in a subject, comprising :

isolating RNA from a blood sample obtained from the subject, and

determining the abundance of from 4 to about 3000 RNAs, including at least 4 RNAs selected from Table 2.