MX2008003932A

MX2008003932A - Methods and materials for identifying the origin of a carcinoma of unknown primary origin

Info

Publication number: MX2008003932A
Application number: MXMX/A/2008/003932A
Authority: MX
Inventors: Yixin Wang; Abhijit Mazumder; Timothy Jatkoe; Dmitri Talantov; Jonathan Baden
Original assignee: Veridex Llc
Priority date: 2005-09-19
Filing date: 2008-03-19
Publication date: 2008-09-02

Abstract

The present invention provides a method of identifying origin of a metastasis of unknown origin by obtaining a sample containing metastatic cells;measuring Biomarkers associated with at least two different carcinomas;combining the data from the Biomarkers into an algorithm where the algorithm normalizes the Biomarkers against a reference;and imposes a cut-off which optimizes sensitivity and specificity of each Biomarker, weights the prevalence of the carcinomas and selects a tissue of origin determining origin based on highest probability determined by the algorithm or determining that the carcinoma is not derived from a particular set of carcinomas;and optionally measuring Biomarkers specific for one or more additional different carcinoma, and repeating the steps for additional Biomarkers.

Description

METHODS AND MATERIALS TO IDENTIFY THE ORIGIN OF AN UNKNOWN PRIMARY ORIGIN CARCINOMA TECHNICAL FIELD This invention provides materials, methods, algorithms, equipment, etc., to identify the origin of a carcinoma of unknown primary origin.

BACKGROUND OF THE INVENTION Carcinoma of unknown primary origin (CUP) is a set of heterogeneous malignancies, confirmed by biopsy, where patients with metastatic disease without a primary tumor or tissue site of origin (ToO) are identifiable. This problem accounts for approximately 3-5% of all cancers, which is the seventh most common malignancy. Ghosh et al. (2005); and Mintzer et al. (2004). The prognosis and therapeutic regimen of patients depend on the origin of the primary tumor, underlining the need to identify the site of the primary tumor. Greco et al. (2004); Lembersky et al. (nineteen ninety six); and Schlag et al. (1994). A variety of methods are currently used to solve this problem. Several methods followed are diagrammed in Figures 1-2. Tumor markers in the serum can be used for differential diagnosis.

Although they lack adequate specificity, they can be used in combination with pathological and clinical information. Ghosh et al. (2005). Immunohistochemical methods (IHC) can be used to identify tumor lineage but very few IHC markers are 100% specific. Therefore, pathologists often use a panel of IHC markers. Several studies have shown accuracies of 66-88% using four to 14 markers of IHC. Brown et al. (1997); DeYoung et al. (2000); and Dennis et al. (2005a). More expensive diagnostic treatments include imaging methods such as chest x-rays, computed tomographic (CT) scans, and positron emission tomographic (PET) scans. Each of these methods can identify the primary origin in 30 to 50% of cases. Ghosh et al. (2005); and Pavlidis et al. (2003). Despite these sophisticated technologies, the ability to solve CUP cases is only 20-30% ante mortem. Pavlidis et al. (2003); and Varadhachary et al. (2004). A promising new approach is based on the ability of the broad genome gene expression profile to identify the origin of tumors. Ma et al. (2006); Dennis et al. (2005b); His et al. (2001); Ramaswamy et al. (2001); Bloom et al. (2004); Giordano et al. (2001); and 20060094035. These studies demonstrated the feasibility of identifying tissue of origin based on the expression profile of the gene. For these expression profile technologies to be useful in the clinical setting, two major obstacles must be overcome. First, since the gene expression profile was driven entirely on primary tissues, candidates for gene marker must be validated on metastatic tissues to confirm that their tissue-specific expression is conserved in the metastasis. Second, gene expression profiling technology must be able to use formalin-fixed paraffin-embedded tissue (FFPE), since fixed tissue samples are the standard material in current practice. The results of formalin fixation produce RNA degradation (Lewis et al (2001), and Masuda et al. (1999)) so that existing microarray protocols will not perform as reliably. Bibikova et al. (2004). In addition, the profile technology must be robust, reproducible and easily accessible. It has been shown that quantitative RTPCR (qRTPCR) generates reliable FFPE tissue results. Abrahamsen et al. (2003); Specht et al. (2001); Godfrey et al. (2000); and Cronin et al. (2004). Therefore, a more practical approach would be to use a broad genome method as a discovery tool and develop a diagnostic test based on a more robust technology. Ramaswamy (2004). This paradigm, however, requires developing a smaller set of genes. Oien et al. Used serial gene expression analysis (SAGE) to identify 61 tumor markers from which they developed an RTPCR method based on eleven genes for five tumor types. Dennis et al. (2002). Another study that coupled SAGE and qRTPCR developed a panel of five genes for four tumor types and achieved an accuracy of 81%. Buckhaults et al. (2003). A more recent study coupled the microarray profile with qRTPCR, but he used 79 markers. Tothill et al. (2005).

BRIEF DESCRIPTION OF THE INVENTION The present invention provides a method of identifying the origin of a metastasis of unknown origin by obtaining a sample containing metastatic cells; measure biomarkers associated with at least two different carcinomas; combine data from biomarkers in an algorithm where the algorithm: normalizes biomarkers against a reference; and it imposes a cut that optimizes sensitivity and specificity of each biomarker, weights the frequency of the carcinomas and selects a tissue of origin; determines the origin based on the highest probability determined by the algorithm or determines that the carcinoma is not derived from a particular set of carcinomas; and optionally measuring specific biomarkers for one or more additional different carcinomas, and repeating the steps as necessary for additional biomarkers.

BRIEF DESCRIPTION OF THE DRAWINGS Figures 1 -2 illustrate methods of the prior art for identifying the origin of a metastasis of unknown origin. Figure 3 illustrates the present CUP diagnostic algorithm. Figures 4A and 4B illustrate microarray data showing intensities of two genes in a tissue panel. (A) Prostate stem cell antigen (PSCA). (B) Clotting factor V (F5). The bar graphs show the intensity on the y axis and the tissue on the x axis. Pane Ca, pancreatic cancer; Pane N, normal pancreas. Figures 5A and 5B illustrate electropherograms obtained from an Agilent bioanalyzer. RNA was isolated from FFPE tissue using a proteinase K digestion of three hours (A) or sixteen hours (B). Sample C22 (red) was a one-year-old block while sample C23 (blue) was a five-year-old block. A ladder of size is shown in green. Figures 6A-6C illustrate a comparison of Ct values obtained from three different methods of qRTPCR: random hexamer primer in reverse transcription followed by qPCR with the resulting cDNA (RH 2 steps), gene-specific initiation (reverse primer) ) in reverse transcription followed by qPCR with the resulting cDNA (GSP 2 steps), or gene-specific initiation and qRTPCR in a one-step reaction (GSP 1 step). RNA from eleven samples was divided into the three methods and RNA levels for three genes were measured: β-actin (A), HUMSPB (B), and TTF (C). The median value of Ct obtained with each method is indicated by the solid line. Figure 7 illustrates diagrams of CUP test plates. Figures 8A-8C is a series of graphs illustrating the test performance over a range of RNA concentrations. Figure 9A is an experimental workflow diagram: nomination and validation of candidate markers; Figure 9B is a flow chart of test optimization and construction and test of prediction algorithm. Figures 10A-10J illustrate the expression of selected tissue-specific gene marker candidates in metastatic carcinomas of FFPE and primary adenocarcinoma of the prostate. For each graph, the X axis represents the normalized marker expression value. Figures 11A-11 D illustrate test optimization. (A and B) Electropherograms obtained from an Agilent bioanalyzer. RNA was isolated from FFPE tissue using a proteinase K digestion of three hours (A) or sixteen hours (B). Sample C22 (red) was a one-year-old block while sample C23 (blue) was a five-year-old block. A ladder of size is shown in green. (C and D) Comparison of values of Ct obtained from three different qRTPCR methods: random hexamer primer in reverse transcription followed by qPCR with the resulting cDNA (RH 2 steps), gene-specific initiation (reverse primer) in reverse transcription followed by qPCR with the resulting cDNA (GSP 2 steps), or gene-specific initiation and qRTPCR in a one-step reaction (GSP 1 step) ). RNA from eleven samples was divided into the three methods and RNA levels for three genes were measured: β-actin (C), HUMSPB (D).

The median value of Ct obtained with each method is indicated by the solid line. Figure 12 is a heat map showing the relative expression levels of the panel of 10 markers through 239 samples. Red indicates higher expression.

DETAILED DESCRIPTION OF THE INVENTION The identification of the primary site in patients with metastatic carcinoma of unknown primary origin (CUP) may allow the application of specific therapeutic regimens and may prolong survival. The marker candidates were then validated by reverse transcriptase polymerase chain reaction (RT-PCR) on 205 metastatic FFPE carcinomas originating from these six tissues as well as metastases originating from other types of cancer to determine specificity. A signature of ten genes was selected to predict the tissue of origin of metastatic carcinomas for these six types of cancer. Next, the RNA isolation and qRTPCR methods were optimized for these ten markers, and the qRTPCR tests were applied to a set of 260 metastatic tumors, generating an overall accuracy of 78%. Finally, an independent set of 48 metastatic samples were tested. Importantly, thirty-seven samples in this set had either a known primary origin or were initially presented as CUP but were subsequently resolved, and the test demonstrated an accuracy of 78%. A biomarker is any indication of the level of expression of an indicated marker gene. The indications can be direct or indirect and measure the overexpression or subexpression of the gene given the physiological parameters and in comparison to an internal control, normal tissue or another carcinoma. Biomarkers include, without limitation, nucleic acids (both overexpression and subexpression as well as direct and indirect). The use of nucleic acids as biomarkers can include any method known in the art including, without limitation, measurement of DNA amplification, RNA, microRNA, loss of heterozygosity (LOH), individual nucleotide polymorphisms (SNPs, Brookes (1999)), Microsatellite DNA, hypo- or hyper-methylation of DNA. The use of proteins as biomarkers includes any method known in the art that includes, without limitation, measuring the amount, activity, modifications such as glycosylation, phosphorylation, ADP-ribosylation, ubiquitination, etc., or immunohistochemistry (IHC). Other biomarkers include imaging, cell counting and apoptosis markers. The indicated genes provided herein are those associated with a particular tumor or tissue type. A marker gene may be associated with numerous cancers but provided that the expression of the gene is sufficiently associated with a tumor or tissue type to be identified using the algorithm described herein to be specific for a particular origin, the gene may be use in the claimed invention to determine the tissue of origin for a carcinoma of unknown primary origin (CUP). Numerous genes associated with one or more cancers are known in the art. The present invention provides preferred markers and even more preferred combinations of marker genes. These are described here in detail. "Origin", as referred to in 'tissue of origin', means either the type of tissue (lung, colon, etc.) or histological type (adenocarcinoma, squamous cell carcinoma, etc.) depending on the particular medical circumstances and will be understood by any expert in the art. A marker gene corresponds to the sequence designated by a SEQ ID NO when it contains that sequence. A segment or gene fragment corresponds to the sequence of said gene when it contains a portion of the referred sequence or its complement sufficient to distinguish it as being the sequence of the gene. A gene expression product corresponds to said sequence when RNA is, MRNA, or hybrid cDNA to the composition having said sequence (v gr, a probe) or, in the case of a peptide or protein, is encoded by said mRNA A segment or fragment of a gene expression product corresponds to the sequence of said gene or gene expression product when it contains a portion of the referenced gene expression product or its complement sufficient to distinguish it as the gene sequence or gene expression product. The methods, compositions, articles and inventive equipment described and claimed in this specification include one or more marker genes. "Marker" or "marker gene" is used throughout this specification to refer to genes and gene expression products that correspond to any gene the over- or under-expression of which is associated with a tumor or tissue type . The preferred marker genes are described in more detail in Table 1.

TABLE 1 CUP Panel The present invention provides a method of identifying the origin of a metastasis of unknown origin by measuring biomarkers associated with at least two different carcinomas in a sample containing metastatic cells; combine data from biomarkers in an algorithm where the algorithm: normalizes biomarkers against a reference; and it imposes a cut that optimizes sensitivity and specificity of each biomarker, weights the frequency of the carcinomas and selects a tissue of origin; determines the origin based on the highest probability determined by the algorithm or determines that the carcinoma is not derived from a particular set of carcinomas; and optionally measuring specific biomarkers for one or more additional different carcinomas, and repeating the steps as necessary for additional biomarkers. The present invention provides a method of identifying the origin of a metastasis of unknown origin by obtaining a sample containing metastatic cells; measure biomarkers associated with at least two different carcinomas; combine data from biomarkers in an algorithm where the algorithm i) normalizes biomarkers against a reference; and ii) it imposes a cut that optimizes sensitivity and specificity of each biomarker, weights the frequency of the carcinomas and selects a tissue of origin; determines the origin based on the highest probability determined by the algorithm or determines that the carcinoma is not derived from a particular set of carcinomas; and optionally measuring specific biomarkers for one or more additional different carcinomas, and repeating steps c) and d) for additional biomarkers. In one embodiment, the marker genes are selected from i) SP-B, TTF, DSG3, KRT6F, p73H, or SFTPC; ii) F5, PSCA, ITGB6, KLK10, CLDN18, TR10 or FKBP10; and / or iii) CDH17, CDX1 or FABP1. Preferably, the marker genes are SP-B, TTF, DSG3, KRT6F, p73H, and / or SFTPC. Most preferably, the marker genes are SP-B, TTF and / or DSG3. Marker genes can also include or be replaced by KRT6F, p73H, and / or SFTPC. In one embodiment, the marker genes are F5, PSCA, ITGB6, KLK10, CLDN18, TR10 and / or FKBP10. Most preferably, the marker genes are F5 and / or PSCA. Preferably, marker genes can include or be replaced by ITGB6, KLK10, CLDN18, TR10 and / or FKBP10. In another embodiment, the marker genes are CDH17, CDX1 and / or FABP1, preferably, CDH17. The marker genes can also include or be replaced by CDX1 and / or FABP1. In one embodiment, gene expression is measured using at least one of SEQ ID NOs: 1 1-58. The present invention also encompasses methods that measure gene expression by obtaining and measuring the formation of at least one of the amplicons SEQ ID NOs: 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54 and / or 58. In one embodiment, the marker genes may be selected from a gender specific marker selected from at least one of: i) in the case of a male patient KLK3, KLK2, NGEP or NPY; or ii) in the case of a female patient PDEF, MGB, PIP, B305D, B726 or GABA-Pi; and / or WT1, PAX8, STAR or EMX2. Preferably, the marker gene is KLK2 or KLK3. In this embodiment, the marker genes may include or be replaced by NGEP and / or NPY. In one embodiment, the marker genes are PDEF, MGB, PIP, B305D, B726 or GABA-Pi, preferably, PDEF and MGB. In this embodiment, marker genes can include or be replaced by PIP, B305D, B726 or GABA-Pi. In one embodiment, the marker genes are WT1, PAX8, STAR or EMX2, preferably, WT1. In this modality, the marker genes can include or be replaced by PAX8, STAR or EMX2. The present invention provides methods to obtain additional clinical information including the site of metastasis to determine the origin of the carcinoma; obtain sets of optimal biomarkers for carcinomas that include the steps of using metastases of known origin, determine biomarkers for them, and compare biomarkers with biomarkers of metastases of unknown origin; provide direction of therapy to determine the origin of a metastasis of unknown origin and identify the appropriate treatment for it; and provide a prognosis when determining the origin of a metastasis of unknown origin and identify the corresponding prognosis for it. The present invention also provides methods for finding biomarkers in determining the level of expression of a marker gene in a particular metastasis, measuring a biomarker for the marker gene to determine its expression, analyzing the expression of the marker gene in accordance with any of the methods provided herein or known in the art and determine whether the marker gene is indeed specific for the tumor of origin. The present invention further provides a composition that contains at least one isolated sequence selected from SEQ ID NOs: 1 1-58. The present invention also provides equipment for conducting a compliance test with the methods provided herein and in addition containing biomarker detection reagents. The present invention also provides microarrays or gene chips to perform the methods described herein. The present invention also provides diagnostic / prognostic portfolios containing isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes as described herein wherein the combination is sufficient to measure or characterize the expression of genes in a biological sample that has metastatic cells in relation to the cells of different carcinomas or normal tissue.

Any method described in the present invention may further include measuring the expression of at least one gene constitutively expressed in the sample. Preferably, markers for pancreatic cancer are coagulation factor V (F5), prostate stem cell antigen (PSCA), integrin, ß6 (ITGB6), kallikrein 10 (KLK10), claudin 18 (CLDN18), trio isoform (TR10) ), and hypothetical protein FLJ22041 similar to FK506 binding proteins (FKBP10). Preferably, the biomarkers for F5 and PSCA are measured together. The biomarkers for ITGB6, KLK10, CLDN18, TR10, and FKBP10 can be measured in addition to or instead of F5 and / or PSCA. F5 is described, for example, by 20040076955; 20040005563; and WO2004031412. PSCA is described, for example, by W01998040403; 20030232350; and WO2004063355. ITGB6 is described, for example, by WO2004018999; and 6339148. KLK10 is described, for example, by WO2004077060; and 20030235820. CLD 18 is described, for example, by WO2004063355; and WO2005005601. TR10 is described, for example, by 20020055627. FKBP10 is described, for example, by WO2000055320. Preferably marker genes for colon cancer are transporter associated with intestinal peptide HPT-1 (CDH17), transcription factor of caudal-type homeobox 1 (CDX1) and fatty acid-binding protein 1 (FABP1). Preferably, a biomarker for CDH17 is measured alone. The biomarkers for CDX1 and FABP1 can be measured in addition to or instead of a biomarker for CDH17. CDH17 is described, for example, by Takamura et al. (2004); and WO2004063355. CDX1 is described, for example, by Pilozzi et al. (2004); 20050059008; and 20010029020. FABP1 is described, for example, by Borchers et al. (1997); Chan et al. (1985); Chen et al. (1986); and Lowe et al. (1985). Preferably, the marker genes for lung cancer are surface-active agent B-protein (SP-B), thyroid transcription factor (TTF), desmoglein 3 (DSG3), keratin 6F isoform 6 (KRT6F), p53-related gene (p73H), and surfactant protein C (SFTPC). Preferably, the biomarkers for SP-B, TTF and DSG3 are measured together. The biomarkers for KRT6F, p73H and SFTPC can be measured in addition to, or in lieu of any of the biomarkers for SP-B, TTF and / or DSG3. SP-B is described, for example, by Pilot-Mathias et al. (1989); 20030219760; and 20030232350. TTF is described, for example, by Jones et al. (2005); US20040219575; W01998056953; WO2002073204; 20030138793; and WO2004063355. DSG3 is descpto, for example, by Wan et al. (2003); 20030232350; aWO2004030615; and WO2002101357. KRT6F is described, for example, by Takahashi et al. (nineteen ninety five); 20040146862; and 20040219572. p73H is described, for example, by Senoo et al. (1998); and 20030138793. SFTPC is described, for example, by Glasser et al. (1988). The marker genes can also be selected from a genus-specific marker such as, in the case of a male patient KLK3, KLK2, NGEP or NPY; or in the case of a female patient PDEF, MGB, PIP, B305D, B726 or GABA-Pi; and / or WT1, PAX8, STAR or EMX2. Preferably, the marker genes for breast cancer are prostatic-derived epithelial factor (PDEF), mammaglobin (MG), prolactin-inducible protein (PIP), B305D, B726, and GABA-p. Preferably, the biomarkers for PDEF and MG are measured together. The biomarkers for PIP, B305D, B726 and GABA-Pi can be measured in addition to or instead of biomarkers for PDEF and / or MG. PDEF is described, for example, by WO2004030615; WO2000006589; WO2001073032; Wallace et al. (2005); Feldman et al. (2003); and Oettgen et al. (2000). MG is described, for example, by WO2004030615; 20030124128; Fleming et al (2000); Watson et al. (1996 and 1998); and 5668267. PIP is described, for example, by Autiero et al. (2002); Clark et al. (1999); Myal et al. (1991) and Murphy et al. (1987). B305D, B726 and GABA-Pi are described by Reinholz et al. (2005). NGEP is described, for example, by Bera et al. (2004). Preferably, markers for ovarian cancer are Wilm 1 tumor (WT1), PAX8, acute steroidogenic regulatory protein (STAR) and EMX2. Preferably, the biomarkers for WT1 are measured. The biomarkers for STAR and EMX2 can be measured in addition to or instead of biomarkers for WT1. WT1 is described, for example, by 5350840; 6232073; 6225051; 20040005563; and Bentov et al. (2003). PAX8 is descpto, for example, by 20050037010; Poleev et al. (1992); Di Palma et al. (2003); Marques et al. (2002); Cheung et al. (2003); Goldstein et al. (2002); Oji et al. (2003); Rauscher et al. (1993); Zapata-Benavides et al. (2002); and Dwight et al. (2003). STAR is described, for example, by Gradi et al. (nineteen ninety five); and Kim et al. (2003). EMX2 is described, for example, by Noonan et al. (2001). Preferably, markers for prostate cancer are KLK3, KLK2, NGEP and NPY. Preferably, the biomarkers for KLK3 are measured. The biomarkers for KLK2, NGEP and NPY can be measured in addition to or instead of KLK3. KLK2 and KLK3 are described, for example, by Magklara et al. (2002). KLK2 is described, for example, by 20030215835; and 5786148. KLK3 is described, for example, by 6261766. The method may also include obtaining additional clinical information including the site of metastasis to determine the origin of the carcinoma. A flow chart is provided in diagram 3. The invention also provides a method for obtaining sets of optimal biomarkers for carcinomas by using metastases of known origin, determining biomarkers for them and comparing biomarkers with biomarkers of metastases of unknown origin. The invention further provides a method for providing therapy direction by determining the origin of a metastasis of unknown origin in accordance with the methods described herein and identifying the appropriate treatment therefor. The invention also provides a method for providing a prognosis in determining the origin of a metastasis of unknown origin in accordance with the methods described herein and identifying the corresponding prognosis for it.

The invention also provides a method for finding biomarkers comprising determining the level of expression of a marker gene in a particular metastasis, measuring a biomarker for the marker gene to determine its expression, analyzing the expression of the marker gene in accordance with the methods described here and determine if the marker gene is indeed specific for the tumor of origin. The invention further provides compositions comprising at least one isolated sequence selected from SEQ ID NOs: 11 -58. The invention also provides equipment, articles, microarrays or gene chips, diagnostic / prognostic portfolios to conduct the tests described herein and patient reports to report the results obtained by the present methods. The simple presence of particular nucleic acid sequences in a tissue sample has only rarely been found to have diagnostic or prognostic value. Information about the expression of several proteins, peptides or mRNA, on the other hand, is increasingly viewed as important. The mere presence of nucleic acid sequences that have the potential to express proteins, peptides, or mRNA (said sequences referred to as "genes") within the genome as such is not determinative of whether a protein, peptide, or mRNA is expressed in a given cell. Whether or not a given gene capable of expressing proteins, peptides or mRNA does so and to what extent such expression occurs, if any, is determined by a variety of complex factors. Regardless of the difficulties in understanding and evaluating these factors, the gene expression test can provide useful information about the occurrence of important events such as tumorigenesis, metastasis, apoptosis and other clinically relevant phenomena. Relative indications of the degree to which genes are active or inactive can be found in gene expression profiles. The gene expression profiles of this invention are used to provide a diagnosis and treat patients for CUP. Sample preparation requires the collection of patient samples. The patient samples used in the method of the invention are those that are suspected to contain diseased cells such as cells taken from a nodule in a tissue fine needle aspiration (FNA). The volume tissue preparation obtained from a biopsy or a surgical specimen and microdissection by laser capture are also suitable for use. Laser capture microdissection (LCM) technology is a way to select the cells that are to be studied, minimizing the variability caused by heterogeneity of the cell type. Consequently, moderate or small changes in the expression of marker genes between normal or benign and cancerous cells can be easily detected. The samples may also comprise circulating epithelial cells extracted from peripheral blood. These can be obtained according to a number of methods but the most preferred method is the magnetic separation technique described in 6136182. Once the sample containing the cells of interest has been obtained, a gene expression profile is obtained. get using a biomarker, for genes in the appropriate portfolio. Preferred methods for establishing gene expression profiles include determining the amount of RNA that is produced by a gene that can encode a protein or peptide. This is achieved by reverse transcriptase PCR (RT-PCR). Competitive RT-PCR, real-time RT-PCR, differential deployment RT-PCR, Northern Blot analysis and other related tests. Although it is possible to conduct these techniques using individual PCR reactions, it is better to amplify complementary DNA (cDNA) or complementary RNA (cRNA) produced from mRNA and analyze it by microarray. A number of different arrangement configurations and methods for their production are known to those skilled in the art and are described for example in, 5445934; 5532128; 5556752; 5242974; 5384261 5405783; 5412087; 5424186; 5429807; 5436327; 5472672; 5527681 5529756; 5545531; 5554501; 5561071; 5571639; 5593839; 5599695 562471 1; 5658734; and 5700637. Microarray technology allows measurement of the steady-state mRNA level of thousands of genes that simultaneously provide a powerful tool for identifying effects such as the initiation, interruption or modulation of uncontrolled cell proliferation. Two microarray technologies are currently widely used, cDNA array and oligonucleotides. Although there are differences in the construction of these chips, essentially all the analysis towards the 3 'end and output are the same. The product of these analyzes are typical measurements of the intensity of the signal received from a labeled probe used to detect a cDNA sequence from the sample that hybridizes to a nucleic acid sequence at a known location in the microarray. Typically, the intensity of the signal is proportional to the amount of cDNA, and therefore mRNA, expressed in the cells of the sample. A large number of these techniques are available and useful. Preferred methods for determining gene expression can be found in documents 6271002; 6218122; 62181 14; and 5004755. The analysis of expression levels is conducted by comparing said signal intensities. This is best done by generating a ratio matrix of the gene expression intensities in a test sample versus those in the control sample. For example, the expression intensities of genes from a diseased tissue can be compared to the expression intensities generated from benign or normal tissue of the same type. A relationship of these expression intensities indicates the duplicate change in gene expression between the test and control samples. The selection can be based on statistical tests that produce categorized lists related to the evidence of significance for each differential expression of the gene between factors related to the site of original origin of the tumor. Examples of such tests include ANOVA and Kruskal-Walls. Categorizations can be used as weights in a model designed to interpret the sum of these weights, up to a cut, as the preponderance of evidence in favor of one class over another. Previous evidence as described in the literature can also be used to adjust the weights. In the present invention, 10 markers were chosen that showed significant evidence of differential expression among 6 tumor types. The selection process included an ad-hoc collection of statistical tests, media-variance optimization and expert knowledge. In an alternative modality, feature extraction methods could be automated to select and test markers through supervised learning approaches. As the database grows, the selection of markers can be repeated in order to produce the highest possible diagnostic accuracy in any given state of the database. A preferred embodiment is to normalize each measurement by identifying a stable control set and scaling this set at zero variance across all the samples. This control set is defined as any individual endogenous transcript or set of endogenous transcripts affected by systematic error in the test, and this error is not known to change independently. All markers are adjusted by the specific factor of sample that generates zero variance for any descriptive statistics of the control set, such as mean or median, or for a direct measurement. Alternatively, if the premise of variation of controls related only to systematic error is not true, but the resulting classification error is lower when normalization is performed, the control set will still be used as established. Non-endogenous spike controls could also be useful, but they are not preferred. After marker selection, these selected variables are used in a classifier designed to produce classification accuracy as high as possible. A supervised learning algorithm designed to relate a set of input measurements to an output set of predictors in order to build a model from the 10 inputs to predict the source tissue can be used. The problem can be stated as: given training data. { (x_, £ /),. . . , (x, -,, y)} produce a classifier h:? ' - and which maps a sample x? ' to your original brand fabric and. The predictions are based on previously resolved cases that are contained in the database and therefore make up the training set. The supervised learning algorithm must find parameters based on the relationships of the input variables of the known outputs that will minimize the expected classification error. These parameters can then be used to predict the tissue of origin of the entry of a new sample. Examples of these algorithms include linear classification models, quadratic classifiers, base three methods, neural networks and prototype methods such as neighbor classifier closest to k, or learning vector quantization algorithms. A specific modality for modeling normalized markers 10 is the LDA method, using default parameters, as described in Venables and Ripley (2002). This method is based on Fisher's linear discriminant analysis, where given the means fJ-y = o <; £ y =? and covariances? = 0,? Y = 1 for class markers and 0 and 1, we look for a linear combination of iv.i that will have the means and variances wt ?, -? C that will increase to the maximum the relation of the variance between the classes for the variance within the classes: LDA can be generalized for a discriminatory multiple class analysis, where y has possible N states, instead of just two. The means and variances of class are estimated from the values contained in the database for the chosen markers. In a preferred embodiment, the covariance matrix is weighted by equal prior probabilities of each type of tumor subjected to the following. Male patients are predicted by a model where precedents are zero for each group of female reproductive organ tumor. Likewise, female patients are predicted by a model where the precedent is zero for male reproductive organs. In the present invention, the precedents are zero for tests in female patients for prostate and zero for male patients tested for breast and ovary. In addition, samples with an identical antecedent to a class marking are tested by a model where the preceding probability is zero for that particular class marking. The above problem can be visualized as maximizing the Rayleigh quotient manipulated as a generalized eigenvalue problem. The reduced subspaces are used in the classification by calculating the distance of each sample to the centroid in the chosen subspace. The model can be adjusted by maximum probability, and subsequent probabilities are calculated using Bayes' theorem. An alternative method can include finding a map of a space of n-dimensional features, where n is the number of variables used, for a set of classification markers will involve the division of the trait space into regions, then assigning a classification to each region. The scores of those algorithms of the nearest neighbor type they are related to the distance between decision boundaries and are not necessarily translated into class probabilities. If there are too many variables to select from, and many of them are random noise, then the selection of variables and the model error override the problem. Therefore, lists categorized in vain cuts are often used as entries to limit the number of variables. Variables Search algorithms such as a genetic algorithm can also be used to select a subset of variables as they test a cost function Simulated annealing can be attempted to limit the risk of capturing the cost function in a single location However, These procedures must be validated with samples independent of the selection and modeling process. Latent variable approaches can also be used. Any unsupervised learning algorithm for estimating multiple low dimensions from high dimensional space can be used to discover associations between the input variables and that can also fit a smaller set of latent variables. Although estimates of the effectiveness of the reductions are subjective, a supervised algorithm can be applied to the reduced set of variables in order to estimate the accuracy of the classification. Therefore, a classifier, which can be constructed from latent variables, can also be constructed from a set of variables significantly correlated with the latent variables. An example of this would include using variables correlated to the principle components, from an analysis of principle components, as inputs to any supervised classification model. These algorithms can be implemented in any software code that has methods to enter the variables, prepare the samples with a function, test a sample based on the model and send the results to a console. R, Octave, C, C ++, Fortran, Java, Perl, and Python all have libraries available under an open source license to perform many of the functions listed above. Commercial packages such as S + and Matlab are also packaged with many of these methods. The code performs the following steps in the following order using R version 2.2.1 (http://www.r-project.org) with the MASS library (Venables et al., 2002) installed. The term LDA refers to the function Ida in the name space MASS. 1) CT values for 10 marker genes and 2 controls are stored on a hard drive for all samples of training sets available. 2) For each sample, the subtraction of the specific average of the sample from the controls of each marker normalizes the values of 10 marker genes. 3) The training data set is composed of metastases with known sites of origin where the sample has at least one of its specific target markers for the tissue marked with a normalized CT value of less than 5. 4) LDA build 4 sets of 2 LDA models from the training data in (3). In each set, a model is specific for female subjects, and has precedent odds for breast and ovary set to zero as well as the preceding prostate probabilities fixed to the equivalent precedents of the other class markers. The other model in each pair is specific for female subjects with the preceding prostate probabilities set to zero, and with precedents for breast and ovary fixed to the equivalent precedents found in the other class markers. to. The first set is used to test samples of CUP found in the colon, the preceding probabilities for colon are set to zero and all other non-reproductive class markers are set to equivalent precedents. b. A second model set is specified for a CUP found in the ovary, with preceding odds for ovaries set to zero and all other markers of nonreproductive class set to equivalent precedents. c. A third set is for a CUP found in the lung, with precedent odds for lung set to zero. All other non-reproductive class markers have equivalent precedents. d. The general model used for all other background tissues. All precedents are set equivalently with the exception of the markers of specified reproductive class that are set as defined in 4. To test a sample, the inventors of the present one executed a program 4 which performs the following. 1) Read in a set of test data. 2) It generates a specific sample average of both controls. 3) For each sample, use the specific sample average to subtract from each marker. 4) Replaces any normalized CT generated from an empirical CT of 40 with 12.

) For each sample in the test set the following is tested. to. If the average of both controls is greater than 34 then the sample is marked as "CTR_FAILURE" with zeros for later probabilities. b. The antecedents are verified for colon, ovary or lung. If an equalization is found then the gender is also verified. The specific model of antecedent and gender is then used to evaluate the sample. c. If breast, pancreas, SCC lung, or prostate is found as antecedent marker, then a marker of "FAILURE_ineligible_sample" is given to the sample and subsequent probabilities are all set to zero. d. The general model for either male or female subjects is used for all other samples. The results are formatted and written to a file. The present invention includes portfolios of expression of genes obtained by this process. Gene expression profiles can be displayed in a number of ways. The most common is to order empirical fluorescence intensities or expression matrix in a graphical dendogram where the columns indicate test samples and the rows indicate genes. The data are arranged in such a way that genes that have similar expression profiles are close to each other. The expression ratio of each gene is displayed as a color. For example, a ratio less than one (downward regulation) appears in the blue portion of the spectrum whereas a ratio greater than one (upward regulation) appears in the red portion of the spectrum. Commercially available computer software programs are available to display such data including "GeneSpring" (Silicon Genetics, Inc.) and "Discovery" and "Infer" (Partek, Inc.). Measurements of the abundance of unique RNA species are collected from primary tumors or metastatic tumors of primary origin known. These readings along with clinical records including but not limited to patient age, gender, site of origin of the primary tumor, and site of metastasis (if applicable) are used to generate a relationship database. The database is used to select RNA transcripts and clinical factors that can be used as marker variables to predict the primary origin of a metastatic tumor. In the case of measuring protein levels to determine gene expression, any method known in the art is suitable as long as it results in adequate specificity and sensitivity. For example, protein levels can be measured by binding to an antibody or antibody fragment specific for the protein and measuring the amount of antibody-bound protein. The antibodies can be labeled by radioactive reagents, fluorescents or other detectable reagents to facilitate detection. Detection methods include, without limitation, enzyme-linked immunosorbent assay (ELISA) and immunoblotting techniques. The modulated genes used in the methods of the invention are described in the examples. Genes that are differentially expressed are either up-regulated or down-regulated in patients with carcinoma of a particular origin relative to those with carcinomas of different origins. The ascending and descending regulation are relative terms that mean that a detectable difference (beyond the contribution of noise in the system used to measure it) is found in the amount of expression of the genes in relation to some baseline. In this case, the baseline is determined based on the algorithm. The genes of interest in the diseased cells are either up-regulated or down-regulated relative to the baseline level using the same measurement method. Sick, in this context, refers to an alteration of the state of a body that interrupts or disturbs, or has the potential to disrupt, the proper performance of bodily functions as occurs with the uncontrolled proliferation of cells. A person is diagnosed with a disease when some aspect of the person's genotype or phenotype is consistent with the presence of the disease. Nevertheless, the act of conducting a diagnosis or prognosis may include the determination of aspects of the disease / condition such as determining the likelihood of relapse, type of therapy and therapy monitoring. In therapy monitoring, clinical judgments are made regarding the effect of a given course of therapy by comparing gene expression over time to determine whether gene expression profiles have changed or are changing to patterns more consistent with normal tissue. Genes can be grouped in such a way that the information obtained about the sets of genes in the group provides a solid basis for making a clinically relevant judgment such as a diagnosis, prognosis or choice of treatment. These sets of genes constitute the portfolios of the invention. As with most prognostic markers, it is often desirable to use the least number of markers sufficient to make a correct medical judgment. This avoids a delay in treatment dependent on additional analysis as well as non-productive use of time and resources. One method to establish gene expression portfolios is through the use of optimization algorithms such as the media-variance algorithm widely used in the establishment of supply portfolios. This method is described in detail in document 20030194734. Essentially, the method requires the establishment of a set of inputs (supplies in financial applications, expression as measured by intensity here) that will optimize the return (v.gr., signal that is generated) that is received for use while the variability of the return is minimized. Many commercial software programs are available to conduct such operations. "Wagner Associates Mean-Variance Optimization Application", referred to as "Wagner Software" throughout this specification, is preferred. This software uses functions of "Wagner Associates Mean-Variance Optimization Library" to determine an efficient frontier and optimal portfolio in the sense of Markowitz is preferred. Arkowitz (1952). The use of this type of software requires that the microarray data be transformed so that it can be treated as an input into the supply return and risk measurements are used when the software is used for its intended financial analysis purpose. The procedure of selecting a portfolio can also include the application of heuristic rules. Preferably, such rules are formulated as a basis in biology and an understanding of the technology used to produce clinical results. Most preferably, they are applied to give results from the optimization method. For example, the half-variance method of portfolio selection can be applied to microarray data for a number of differentially expressed genes in subjects with cancer. The results of the method would be an optimized set of genes that could include some genes that are expressed in peripheral blood as well as in diseased tissue. If the samples used in the test method are obtained from peripheral blood and certain genes differentially expressed in cancer cases could also be differentially expressed in peripheral blood, then a heuristic rule can be applied in which a portfolio is selected from the efficient border excluding those that are differentially expressed in peripheral blood. Of course, the rule can be applied before the formation of the efficient frontier, for example, by applying the rule during the preselection of data. Other heuristic rules may be applied that are not necessarily related to the biology in question. For example, a rule can be applied that only a given percentage of the portfolio can be represented by a particular gene or a group of genes. Commercially available software such as Wagner Software easily accommodates these types of heuristics. This can be useful, for example, when factors other than accuracy and precision (eg, anticipated licensing rights) have an impact on the desire to include one or more genes. The gene expression profiles of this invention can also be used in conjunction with other non-genetic diagnostic methods useful in diagnosis, prognosis or monitoring of cancer treatment. For example, in some circumstances it is beneficial to combine the diagnostic power of the gene expression-based methods described above with data from conventional markers such as serum protein markers (e.g., cancer antigen 27.29 ("CA 27.29" )). A range of such markers exists including analyzes such as CA 27.29. In such a method, blood is periodically obtained from a treated patient and then subjected to an enzyme immunoassay for one of the serum markers described above. When the concentration of the marker suggests the return of tumors or therapy failure, a sample source subject to gene expression analysis is taken. Where there is a suspicious mass, a fine needle aspiration (FNA) is taken and the gene expression profiles of the cells taken from the mass are then analyzed as described above. Alternatively, tissue samples may be taken from areas adjacent to the tissue from which a tumor was previously removed. This approach can be particularly useful when other tests produce ambiguous results. Equipment made in accordance with the invention includes formatted tests to determine gene expression profiles. These may include all or some of the materials needed to conduct the tests such as reagents and instructions and a means through which biomarkers are tested. The articles of this invention include representations of gene expression profiles useful for treatment, diagnosis, prognosis and other type of disease evaluation. These profile representations are reduced to a means that can be automatically read by a machine such as computer-readable medium (magnetic, optical and the like). The articles also include instructions for evaluating the gene expression profiles in said media. For example, the articles may comprise a CD ROM having computer instructions for comparing gene expression profiles of the gene portfolios described above. The articles may also have gene expression profiles digitally recorded therein so that they can be compared with gene expression data from patient samples.

Alternatively, the profiles can be registered in different representation formats A graphic record is one of those formats The grouping algorithms such as those incorporated in software "DISCOVERY" and "INFER" of Partek, Inc mentioned above can help in the visualization of said data. Different types of articles of manufacture according to the invention are means or formatted assays used to reveal gene expression profiles. These may comprise, for example, microarrays in which sequence complements or probes are fixed to a matrix to which the sequences Indicators of the genes of interest are combined by creating a readable determinant of their presence. Alternatively, articles according to the invention can be fashioned into reagent kits to conduct hybridization, amplification and signal generation indicative of the expression level of the genes of interest for detect cancer The following examples are provided to illustrate but not limit the claimed invention All references cited herein are incorporated herein by reference EXAMPLE 1 Materials and methods Discovery of pancreatic cancer marker genes RNA was isolated from normal pancreatic, pancreatic, lung, colon, breast and ovarian tumor tissue using Trizol. The RNA was then used to generate amplified labeled RNA (Lipshutz et al (1999)) which was then hybridized on Affymetrix U133A arrays. The data was then analyzed in two ways. In the first method, the data set was filtered to retain only those data with at least two requirements present throughout the data set. This filtering left 14,547 genes. It was determined that 2,736 genes were over-expressed in pancreatic cancer versus normal pancreas with a p-value less than 0.05. Forty-five genes out of 2,736 were also overexpressed at least twice compared to the highest intensity found from lung and colon tissues. Finally, it was found that six sets of probes were overexpressed at least twice compared to the maximum found intensity of lung, colon, breast and ovarian tissues. In the second method, this data set was filtered to retain only those genes with no more than two requirements present in breast, colon, lung and ovarian tissues. This filtering left 4,654 genes. It was found that 160 genes of the 4,654 genes had at least two requirements present in the pancreatic tissues (normal and cancer). Finally, eight sets of probes were selected which showed the highest differential expression between pancreatic and normal cancer tissues.

Tissue samples A total of 260 metastases of FFPE and primary tissues were purchased from a variety of commercial providers. The samples tested included: 30 breast metastases, 30 colorectal metastases, 56 lung metastases, 49 ovarian metastases, 43 pancreatic metastases, 18 primary prostate and 2 prostate metastases and 32 from other origins (6 from stomach, 6 from kidney, 3 of larynx, 2 of liver, 1 of esophagus, 1 pharyngeal, 1 of bile duct, 1 of pleura, 3 of bladder, 5 of melanoma, 3 of lymphoma).

RNA extraction Isolation of RNA from sections of paraffin tissue was based on the methods and reagents described in the manual High Puré RNA Paraffin Manual Kit (Roche) with the following modifications. Tissue samples embedded in paraffin were sectioned according to the size of the embeded metastasis (2-5 mm = 9 X 10 μm, 6-8 mm = 6 X 10 μm, 8-> 10 mm = 3 X 10 μm), and were placed in Eppendorf 1.5 μm tubes with RNase / DNase. The sections were deparaffinized by incubation in 1 ml of xylene for 2-5 minutes at room temperature after a swirling action of 10-20 seconds. The tubes were then centrifuged and the supernatant was removed and the dewaxing step repeated. After the supernatant was removed, 1 ml of ethanol was added and the sample was swirled for 1 minute, centrifuged and the supernatant was removed. This procedure was repeated one more time.

Residual ethanol was removed and the pellet was dried in an oven at 55 ° C for 5-10 minutes and resuspended in 100 μl of tissue lysis pH regulator, 16 μl of 10% SDS and 80 μl of proteinase K. The samples were vortexed and incubated in a thermomixer set at 400 rpm for 2 hours at 55 ° C. 325 μl of binding buffer and 325 μl of ethanol were added to each sample which was then mixed, centrifuged and the supernatant was added to the filter column. The filter column together with the collection tube were centrifuged for 1 minute at 8000 rpm and the flow was discarded. A series of sequential washes were performed (500 μl of wash buffer I [500 μl of wash buffer II, 300 [mu] l of wash buffer II) in which each solution was added to the column, centrifuged and the flow was discarded. The column was then centrifuged at full speed for 2 minutes, placed in a new 1.5 ml tube and 90 μl of elution pH buffer was added. RNA was obtained after 1 minute of incubation at room temperature followed by 1 minute of centrifugation at 8000 rpm. The sample was treated with DNase with the addition of 10 μl with pH buffer of incubation with DNase, 2 μl of DNase I and incubated for 30 minutes at 37 ° C. DNase was inactivated after the addition of 20 μl of tissue lysis pH buffer, 18 μl of 10% SDS and 40 μl of proteinase K. Again, 325 μl of binding buffer and 325 μl of ethanol was added. to each sample that was then mixed, centrifuged and the supernatant was added onto the filter column. Sequential washings and RNA elution proceeded as set forth above with the exception of 50 μl of elution pH buffer that was used to elute RNA. To eliminate contamination by glass fiber carried on RNA from the column, it was centrifuged for 2 minutes at full speed and the supernatant was removed in a new 1.5 ml Eppendorf tube. The samples were quantified by OD 260/280 readings obtained by spectrophotometer and the samples were diluted to 50 ng / μl. The isolated RNA was stored in RNase-free water at -80 ° C until used.

TaqMan Initiator and probe design Appropriate mRNA reference sequence access numbers together with Oligo 6.0 were used to develop TaqMan® CUP assays (lung markers: human surfactant, lung-associated protein B (HUMPSPBA), transcription factor thyroid gland 1 (TTF1), desmoglein 3 (DSG3), colorectal marker: cadherin 17 (CDH17), breast markers: mammaglobin (MG) r prostate-derived ets-transcription factor (PDEF), ovarian marker: tumor of Wilms 1 (WT1), pancreatic markers: prostate stem cell antigen (PSCA), coagulation factor V (F5), prostaglandic marker kalikrein 3 (KLK3)) and β-actin maintenance test, hydroxymethylamino synthase ( PBGD). Initiators and hydrolysis probes for each test are listed in Table 2. Amplification of genomic DNA was excluded by design tests around splice sites of exon-intron. Hydrolysis probes were labeled at the 5 'nucleotide with FAM as the reporter dye and at the 3' nucleotide with BHQ1 -TT as the internal extinction dye.

Quantitative real-type polymerase chain reaction The quantification of gene-specific RNA was carried out on a 384-well plate in an ABI Prism 7900HT sequence detection system (Applied Biosystems). For each thermocycler, operation calibrators and standard curves were amplified. The calibrators for each marker consisted of in vitro transcripts of target gene that were diluted in RNA carrying rat kidney at 1 X 10 5 copies. Standard curves for maintenance markers consisted of in vitro transcripts of target gene that were serially diluted in rat kidney carrier RNA at 1 X 10 7, 1 X 5 and 1 X 10 3 copies. Objective controls were not included in each test operated to ensure a lack of environmental contamination. All samples and controls were operated in duplicate. qRTPCR was performed with general laboratory use reagents in 10 μl of reaction containing: RT-PCR pH regulator (50 nM Bicin / KOH pH 8.2, 115 nM KAc, 8% glycerol, 2.5 mM MgCl2, 3.5 mM MnS04, 0.5 mM each of dCTP, dATP, dGTP and dTTP), additives (2 mM Tris-Cl pH 8, 0.2 mM bovine albumin, 150 mM trehalose , 0.002% Tween 20), enzyme mixture (2U Tth (Roche), 0.4 mg / μl Ab TP6-25), primer and probe mixture (0.2 μM probe, 0.5 μM initiator). The following cyclization parameters were followed: 1 cycle at 95 ° C for 1 minute; 1 cycle at 55 ° C for 2 minutes; 5% ramp; 1 cycle at 70 ° C for 2 minutes; and 40 cycles of 95 ° C for 15 seconds, 58 ° C for 30 seconds. After the PCR reaction was completed, the baseline and threshold values were set in the ABI 7900HT Prism software and the calculated Ct values were exported to Microsoft Excel.

One-step reaction vs. Two-step First chain synthesis was carried out using either 100mg of random hexamers or gene-specific primers per reaction. In the first step 1 1.5 μl of mixture-1 (primers and 1 ug of total RNA) was heated at 65 ° C for 5 minutes and then cooled on ice. 8.5μl of mixture-2 (1 x pH regulator, 0.01 mM DTT, 0.5 mM each dNTP's, 0.25 U / μl RNasin®, 10 U / μl superscript lll) was added to mixture-1 and incubated at 50 ° C for 60 minutes followed by 95 ° C for 5 minutes. The cDNA was stored at -20 ° C until it was ready to be used. qRTPCR for the second step for the 2-step reaction was performed as stated above with the cyclization parameters: 1 cycle at 95 ° G- for 1 minute; 40 cycles of 95 ° C for 15 seconds, 58 ° C for 30 seconds. qRTPCR for the one-step reaction was performed exactly as was done in the preceding paragraph. Both the one-step and two-step reactions were performed on 100 ng of template (RNA / cDNA). After the PCR reaction was completed, the baseline and threshold values were set in the ABI 7900HT Prism software and the calculated Ct values were exported to Microsoft Excel.

Generation of a heat map For each sample,? Ct was calculated by taking the mean Ct of each CUP marker and subtracting the mean Ct from an average of the maintenance markers (? Ct = Ct (CUP markers) - Ct (average HK marker)). The minimum? Ct for each tissue of the marker set of origin (lung, breast, prostate, colon, ovary and pancreas) was determined for each sample. The tissue of origin with the minimum global Ct was graded with a score of one and the other tissue from other sources was graded with a score of zero. The data were distributed according to the pathological diagnosis. Partek Pro was populated with modified feasibility data and an intensity graph was generated.

Results Discovery of new pancreatic tumor from markers of the state of origin and cancer First, _cinco - pancreatic marker candidates were analyzed: prostate stem cell antigen (PSCA), serine protease inhibitor, member 1 of class A (SERPINA1), cytokeratin 7 (KRT7), matrix metalloprotease 1 1 (MMP1 1), and mucin 4 (MUC4) (Varadhachary et al (2004); Fukushima et al. (2004); Afgani et al. (2001; Jones et al. 2004), Prasad et al (2005), and Moniaux et al (2004)) using a DNA microarray and a panel of 13 pancreatic duct adenocarcinomas, five normal pancreatic tissues, and 98 samples of breast tumors, colorectal, Lung and ovary alone PSCA showed moderate sensitivity (six out of thirty or 46% of pancreatic tumors were detected at high specificity (91 of 98 or 93% were correctly identified as not being of pancreatic origin) (Figure 4A). , KRT7, SERPINA1, MMP11, and MUC4 demonstrated sensitivities of 38%, 31%, 85% and 31%, respectively, at specificities of 66%, 91%, 82%, and 81%, respectively. These data agreed well with qRTPCR performed on 27 metastases of pancreatic origin and 39 non-pancreatic metastases of all markers except for MMP1 1 that showed poor sensitivity and specificity with qRTPCR and metastasis. In conclusion, the microarray data on primary tissue under freezing serves as a good indicator of the marker's ability to identify FFPE metastases as being pancreatic at source using qRTPCR but that additional markers may be useful for optimal performance. Because pancreatic duct adenocarcinoma develops from ductal epithelial cells that comprise only a small percentage of all pancreatic cells (with acinar cells and islet cells comprising the majority) and because the tissues of pancreatic adenocarcinoma contain a significant amount of adjacent normal tissue (Prasad et al. (2005); and Ishikawa et al. (2005)), it has been difficult to identify pancreatic cancer. Markers (that is, up-regulated in cancer) that would also differentiate this organ from the organs. To be used in a CUP panel such differentiation is necessary.

The first tail method (see materials and methods) returned six sets of probes: coagulation factor V (F5), a hypothetical protein FLJ22041 similar to binding proteins of FK506 (FKBP10), integnna ß6 (ITGB6), transglutaminase 2 (TGM2 ), heterogeneous nuclear ribonucleoprotein AO (HNRPO), and BAX delta (BAX). The second tail method (see materials and methods) returned eight sets of probes: F5, TGM2, homodomain transcription factor in paired form 1 (PITX1), trio mRNA isoform (TRIO), mRNA for p73H (p73), a unknown protein for MGC: 10264 (SCD) and two sets of probes for claudin 18. F5 and TGM2 were present in both tail results and of the two, F5 appeared to be the most promising (figure 4B).

Optimization of prep sample and qRTPCR using FFPE tendons Next the RNA isolation and qRTPCR methods were optimized using fixed tissues before examining the marker panel performance. First the effect of reducing the incubation time of proteinase K from sixteen hours to 3 hours was analyzed. There was no effect on yield However, some samples showed longer fragments of RNA when the shortest Proteinase K step was used (Figures 5A and 5B) For example, when RNA was isolated from a one year old block (C22) , no difference was observed in the electropherograms However, when RNA was isolated from a five-year-old block (23), a larger fraction of higher molecular weight RNA was observed, as evaluated as a hump on the shoulder , when digestion with shorter proteinase K was used. This tendency was generally maintained when other samples were processed, independently of the organ of origin for the metastasis of FFPE. In conclusion, shortening the proteinase K digestion time does not sacrifice RNA yields and may help in isolating less degraded RNA longer. Next, three different methods of reverse transcription were compared: reverse transcription with random hexamers followed by qPCR (two steps), reverse transcription with gene-specific primer followed by qPCR (two steps), and one-step qRTPCR using gene-specific primers. RNA was isolated from eleven metastases and the Ct values compared through the three methods for β-actin, human surfactant protein B (HUMSPB), and thyroid transcription factor (TTF) (Figures 6A-6C). There were statistically significant differences (p <0.001) for all comparisons. For all three genes, reverse transcription with random hexamers followed by qPCR (two-step reaction) gave the highest Ct values while reverse transcription with a Gene-specific primer followed by qPCR (two-step reaction) gave Ct values slightly lower (but statistically significant) than the corresponding step 1 reaction. However, the RTPCR of step 2 with gene-specific primers had a longer reverse transcription step. When the Ct values of HUMSPB and TTF were normalized to the corresponding β-actin value for each sample, there were no differences in the Ct value normalized across the three methods. In conclusion, the optimization of the reaction conditions of RTPCR can generate lower Ct values, which can help to analyze older paraffin blocks (Cronin et al (2004)), and a one-step RTPCR reaction with specific primers of gene can generate values of Ct comparable with those generated in the corresponding two-step reaction.

Diagnostic performance of a CUP qRTPCR test Next, reactions of 12 qRTPCR (10 markers and two maintenance genes) were performed on 239 metastases of FFPE. The markers used for the test are shown in Table 2. The lung markers were lung-associated B protein of human surfactant (HUMPSPB), thyroid transcription factor 1 (TTF1) and desmoglein 3 (DSG3). The colorectal marker was cadherin 17 (CDH17).

The breast markers were mammaglobin (MG) and prostate derived Ets transcription factor (PDEF). The ovarian marker was a tumor of Wilms 1 (WT1). The pancreas markers were prostate stem cell antigen (PSCA) and coagulation factor V (F5), and the prostate marker was Kalikrein 3 (KLK3). For gene descriptions, see table 31 TABLE 2 Initiator and probe sequences, access numbers and amplicon lengths 'The probes are 5'FAM-3'BHQ1-TT The analysis of the normalized Ct values in a heating map revealed the high specificity of the breast and prostate markers, moderate specificity of the colon, lung and ovary, and specificity little lower of the markers of the pancreas. Combining standardized qRTPCR data with computational refinement improves the performance of the marker panel. Results were obtained from the standardized qRTPCR data combined with the algorithm and the accuracy of the qRTPCR test was determined.

Discussion In this example, the microarray-based expression profile was used in primary tumors to identify candidate markers for use with metastases. The fact that primary tumors can be used to detect tumor markers of origin for metastasis is consistent with several recent findings. For example, Weigeit et al. Have shown that gene expression profiles of primary breast tumors are maintained in distant metastases. Weigeit et al. (2003). Italiano and colleagues found that the EGFR status as assessed by IHC was similar to 80 primary colorectal tumors and the 80 related metastases. Italiano et al. (2005). Only five of the 80 showed unconformity in the EGFR state. Italiano et al. (2005). Backus et al. Identified putative markers to detect breast cancer metastasis using genome-wide gene expression analysis of breast and other tissues and demonstrated that mammaglobin and CK19 detected clinically operable metastases in breast sentinel lymph nodes with 90% sensitivity and 94% specificity. Backus et al. (2005). The microarray-based studies with primary tissue confirmed the specificity and sensitivity of known markers. As a result, with the exception of F5, all markers used have high specificity for the tissue studied here. Argani et al (2001; Backus et al. (2005); Cunha et al. (2005); Borgono et al. (2004); McCarthy et al. (2003); Hwang et al. (2004); Fleming et al. (2000), Nakamura et al. (2002), and K oor et al. (1997) .A recent study determined that, using IHC, PSCA is overexpressed in prostate cancer metastasis Lam et al. (2005). et al. (2002) also demonstrated that PSCA could be used as a marker tumor of origin for pancreas and prostate As shown here, strong expression of PSCA is found in some prostate tissues at the RNA level but, due to PSA is included in the test, now prostate and pancreatic cancers can be segregated.A novel finding of this study was the use of F5 as a complementary marker (to PSCA) for pancreatic tissue of origin. microarray with primary tissue and the qRTPCR data set with FEPP metastasis, F5 complemented PSCA (Figures 4A and 4B and Table 3).

TABLE 3 Feasibility data Previous investigators have generated CUP tests using IHC or microarrays. His et al. (2001); Ramaswamy et al. (2001); and Bloom et al. (2004). Very recently, SAGE has attached a small qRTPCR marker panel. Dennis (2002); and Buckhaults et al. (2003). This study is the first to combine the expression profile based on a microarray with a small panel of qRTPCR tests. Microarray studies with primary tissue identified some, but not all, markers of the same tissue of origin as those previously identified by SAGE studies. Some studies have shown that a moderate agreement between profile data based on SAGE-DNA microarray exist and that correlation improves genes with higher expression levels, van Ruissen et al. (2005); and Kim (2003). For example, Dennis et al identified PSA, MG, PSCA, and HUMSPB, while Buckhaults et al. (Dennis et al. (2002)) identified PDEF. Running the CUP test using qRTPCR is preferred because it is a robust technology and may have performance advantages over IHC. Al-Mulla et al. (2005); and Haas et al. (2005). As shown here, the qRTPCR protocol was improved through the use of gene-specific primers in a one-step reaction. This is the first demonstration of the use of gene-specific primers in a one-step qRTPCR reaction with FEPP tissue. Other investigators have done a two-step qRTPCR (synthesis of cDNA in a reaction followed by qPCR) or have used random hexamers or specific primers of truncated genes. Abrahamsen et al. (2003); Specht et al. (2001); Godfrey et al. (2000); Cronin et al. (2004); and Mikhitarian et al. (2004).

EXAMPLE 2 Total RNA isolation protocol of CUP FFPE (Highpure kit Cat # 3270289) Purpose: FFPE tissue RNA isolation Procedure: Preparation of working solutions 1. Proteinase K (PK) in equipment. Lyophilized solution is dissolved in 4.5 ml of elution pH regulator. Aliquots are formed and stored at -20 ° C, stable for 12 months. PK-4x250mg (cat # 3115852) It is dissolved lyophilized in 12.5 ml in elution pH regulator (1 x pH regulator TE (pH 7.4-7) Aliquots are formed and stored at -20 ° C. 2. Wash pH regulator I Add 60 ml of absolute ethanol to washing pH regulator I, is stored at room temperature. 3. Wash pH regulator II Add 200 ml of absolute ethanol in wash pH regulator II, store at room temperature. 4. DNase I It is dissolved lyophilized in 400 μl of elution pH regulator. Aliquots are formed and stored at -20 ° C, stable for 12 months. Paraffin blocks are sectioned -30-45 minutes for 12 blocks (12 blocks x 2 tubes = 24 tubes) The sections cut from the block must be processed immediately for RNA extraction. 1. A clean sharp razor blade is used on the microtome to cut sections of 6 X 10 microns thick from cut tissue blocks (size 3-4 x 5-1 mm). Note: New block - discard wax sections until the fabric section is obtained. Block used - discard the first 3 tissue sections. 2. Immediately place cut tissue in 1.5 ml of microfuge tubes and cap tightly to minimize moisture. 3. It is recommended to take the number of sections based on the size of the jumps shown in table 4.

TABLE 4 Deparaffinization -30-45 minutes 1. Add 1.0 ml of xylene to each sample and vortex vigorously for 10-20 seconds and incubate at room temperature for 2-5 min. Centrifuge at full speed for 2 min. Remove the supernatant carefully. Note: if the tissue appears to be floating, centrifuge for an additional 2 minutes. 2. Repeat step 1. 3. Centrifuge at full speed for 2 minutes. Remove the supernatant. 4. Add 1 ml of absolute ethanol and swirl vigorously for 1 minute. Centrifuge at full speed for 2 minutes. Remove the supernatant. 5. Repeat step 4 6. Dry the tube briefly on a paper towel to remove ethanol residues. 7. Dry the tissue tablet for 5-10 minutes at 55 ° C in an oven. Note: it is critical that the ethanol be completely removed and the pellets completely dried, residual ethanol can inhibit PK digestion. Note: if PK is at -20 ° C, heat at room temperature for 20-30 minutes.

RNA extraction -2.5-3 hours 1. Add 100 μl of tissue lysis pH regulator, 16 μl of 10% SDS and 80 μl of solution of working proteinase K solution to a tissue pellet, submit to action swirl briefly at various intervals and incubate for 2 hr at 55 ° C with shaking at 400 rpm. 2. Add 325μl of binding buffer and 325 μl of absolute ethanol. Mix gently by ascending and descending pipette. 3. Centrifuge the lysate at full speed for 2 minutes. 4. Combine the filter tube and collection tube (12 tubes), and pipette the lysate supernatant to the filter. 5. Centrifuge for 30 seconds at 8000 rpm and discard the flow. Note: Repeated step 4-5, if RNA needs to be placed in stock with 2 more tissue pill preparations. 6. Repeat the centrifugation at 8000 rpm for 30 seconds to dry the filter. 7. Add 500 μl of Wash pH regulator working solution I to the column and centrifuge for 15-30 seconds at 8000 rpm, discard the flow. 8. Add 500 μl of working solution of washing pH regulator II. Centrifuge for 15-30 seconds at 8000 rpm, discard the flow 9. Add 300 μl of working buffer solution-washing pH II, centrifuge for 15-30 seconds at 8000 rpm, discard the flow. 10. Centrifuge the pure high filter for 2 minutes at maximum speed. 1 1. Place the high filter tube in a fresh 1.5 ml tube and add 90 μl of elution pH regulator. Incubate for 1-2 minutes at room temperature. Centrifuge 1 minute at 8000 rpm.

Treatment of DNase I -1.5 hours 12. Add 10 μl of pH buffer incubation of 10x DNase and 1.0 μl of DNase I working solution to the eluate and mix. Incubate for 45 minutes at 37 ° C (or 2.0 μl of DNase I for 30 min). 13. Add 20 μl of tissue lysis pH regulator, 18 μl of % SDS and 40 μl of Proteinase K working solution. Briefly swirl. Incubate for 30 min (30-60 min.) At 55 ° C. 14. Add 325 μl of binding pH regulator and 325 μl of absolute ethanol. Mix and apply with pipette in a fresh high pure filter tube with collection tube (12 tubes). 15. Centrifuge for 30 seconds at 8000 rpm and discard the flow. 16. Repeat the centrifugation at 8000 rpm for 30 seconds to dry the filter. 17. Add 500 μl of working solution of washing pH regulator I to the column. Centrifuge for 15 sec at 8000 rpm, discard the flow. 18. Add 500 μl of working solution of washing pH regulator II. Centrifuge for 15 seconds at 8000 rpm, discard the flow. 19. Add 300 μl of Wash pH regulator working solution II. Centrifuge for 15 seconds at 8000 rpm, discard the flow. 20. Centrifuge the pure high filter for 2 minutes at maximum speed. 21. Place the high pure filter tube in a fresh 1.5 ml tube. Add 50 μl of elution pH regulator, incubate for 1 -2 min at room temperature. Centrifuge for 1 minute at 8000 rpm to collect the eluted RNA. 22. Centrifuge the eluted material for 2 minutes at full speed and transfer the supernatant to a new tube without disturbing the glass fibers at the bottom. 23. Take a reading of 260/280 OD and dilute to 50 ng / μl. Store at -80 ° C.

CUP ASR Test Protocol (ABI 7900) Purpose: To use qRTPCR to determine tissue origin of a CUP sample Control preparation: 1. Positive controls (refer to table 5 and plate C in preparation of plates, figure 7).

TABLE 5 Serial dilutions of IVT - 5 ul 1X10 in 470 ul H? O + 25 ul of 10000 rRNA 1 E6. Table 5. Dilute 50,000 CE / μl rRNA at 500 CE / μl - 5μl 50,000 CE / μl + 495 μl H20 10 μl aliquots per separation tube (2 plates); Place mixture at -80 ° C until ready to use. 2. Standard curves (refer to table 6 and plate C in the preparation of plates, figure 7) Step 1: The standard curve was fixed exactly as shown in table 6.

TABLE 6 Table 7. Supply solution - 1 X108 IVT. Dilute 50,000 CE / μl rRNA to 500 CE / μl - 5 μl 50,000 CE / μl + 495 μl H20 Aliquots of 10 μl per separation tube (2 plates); Place mixture at -80 ° C until ready to use.

Enzyme mixture: 1. Master mix: enzyme (TthVantibody (TP6-25), see table 7.

TABLE 7 Form aliquots of 500 μl / tubes and freeze at -20 ° C.

CUP master mix: 1. 2.5X master mix of CUP (frames 8-1 1): TABLE 8 Allow the reagent to mix thoroughly > 15 minutes TABLE 9 Allow the reagent to mix thoroughly > 15 minutes; combine the above mixtures in a sterile container - add the following TABLE 10 Allow the reagent to mix thoroughly > 15 minutes; form aliquots of 1.8 ml / tube and freeze at -20 ° C TABLE 11 Primer and probe mixture: 250 μl / tube aliquots are formed and frozen at -20 ° C Reaction mixture: 1. Master mix of CUP (CMM): (Refer to tables 12-14 and plate A in preparation of plates, figure 7) TABLE 12 Preferably, each operation / plate will have no more than 356 reactions: 12 samples with 12 markers (288 reactions with 2 replicates for each) + 10 standard curve controls in duplicate (20) + 2 positive controls 2 negative for each marker. (4x12 = 48) Adjust water to the sample volume - 4.3 μl of MAX sample; mix well.

TABLE 13 2. ToO markers: mix well.

TABLE 14 ß-actin and PBGD markers: mix well Sample preparation: TABLE 15 1. CUP samples: 12 samples in 96 well plates: A1 -A12 (refer to table 16 and plate B in plate preparation, figure 7); 50μl aliquots of 50ng / μl (2μl / rxn) are formed Plate loading: 1. Preparation of 384-well plate: (refer to plate D is plate preparation, figure 7). 2 μl of sample and 8 μl of CMM are loaded onto the plate. (sample = 50 ng / μl) 4 μl of sample and 6 μl of CMM are loaded onto the plate (sample = 25 ng / μl) the plate is sealed and labeled. It is centrifuged at 2000 rpm for 1 minute. Preparation of ABI 7900HT: Place the ABI 7900. Select the program "CUP 384" and start.

TABLE 16 ROX on Data is analyzed, Ct's are extracted and inserted into algorithm.

EXAMPLE 3 CUP Algorithm The standardized actin Ct values for HPT, MGB, PDEF, PSA, SP-B, TFF, DSG, WT1, PSCA, and F5 are placed in 6 sets based on the tissue of origin from which it was originally selected. The constants 9.00, 1 1.00, 7.50, 5.00, 10.00, 9.50, 6.50, 8.00, 9.00, and 8.00 are subtracted from each? Ct respectively. Then, for each sample, the minimum CT value of each of the 6 sets (HPT, min (MGB, or PDEF), PSA, min (SP-B, TFF, or DSG), WT1, and min (PSCA, or F5)) is selected as the representative variable for the group. These variables and the site of metastasis are used to classify the sample using linear discriminants. Two different models, one for male subjects and one for female subjects, should be constructed from the training data using the MASS 'Ida' library function (Venables et al. (2002) in R (version 2.0). .1) A posterior probability for each ToO is then calculated using the "prediction" function for either male or female models.The variables used in the male models are HPT, PSA, the minimum of ('SP-B', ' TFF ',' DSG3 '), the minimum of (' PSCA ',' F5 '), and the site of metastasis.The category of the metastatic site has 4 levels corresponding to colon, lung, ovary, and all other tissues. the female models, the variables are HPT, the minimum of ('MGB', 'PDEF'), the minimum of ('SP-B', FF, 'DSG3'), WT1, the minimum of ('PSCA', ' F5 '), and the site of metastasis.

Example code R #Training of the male model dat.m < -CUP2.MIN.NORM [, c ('HPT', 'PSA', 'SP.B.TTF.DSG3', 'PSCA.FS', 'Class', 'background')] CUP.Ida.m < -lda (Class ~., dat.m, prior = c (0.0.09,0.23,0.43,0,0.16,0.02) / sum (c (0,0.09,0.23,0.43,0,0.16,0.02))) #Training by modekrfemenino dat.f < -CUP2.MIN.NORM [, c ('HPT', 'MFB.PDEF', 'SP.B.TTF.DSG3', 'WTV,' PSCA.F5 ',' Class', 'background')] CUP. Ida.f < -lda (Class ~., dat.f, prior = c (O.03,0.09,0.23,0.43.0.04.0.16,0) / Sum (c (0.03fO.O9,0.23,0.43,0.04,0.16,0 ))) # if the unknown sample (i) is male prediction (CUP.Ida.m. CUP2.MIN.NOR .TEST [i,]) #if the unknown sample (i) is female prediction (CUP.Ida.f , CUP2.MIN.NORM.TEST [i,]) To execute this code, a data frame called CUP2.MIN.NORM needs to get the training data with the value calculated minimum for each set of tissue of origin as described before.

The class corresponds to the tissue of origin, and the fund corresponds to the metastatic sites described above.

The test data can be obtained in CUP2.MIN.NORM.TEST, and a specific sample in row i can be try using the prediction function. Again, the test data they must be in the same format as the training set and have the minimum value adjustments applied to it as well.

EXAMPLE 4 CUP resolved samples 48 resolved and unresolved CUP samples were compared to determine the correlation to true CUP samples. The methods were those described in Examples 1-3. The results obtained are presented in Table 17. 1 1 samples were tested for unresolved CUP, the diagnosis was made in 8 samples, 3 were from another category.

TABLE 17 EXAMPLE 5 CUP Test Limits Figures 8A-8C illustrate the results obtained, using the methods described in Examples 1-3, to determine the limits of the CUP tests. The test yield was tested over a range of RNA concentrations and it was found that the CUP tests are efficient in the range of 100-12.5 ng RNA.

EXAMPLE 6 QRTPCR test Materials and methods- Frozen tissue samples for microarray analysis. A total of 700 frozen primary human tissues were used to profile microarray gene expression. Samples were obtained from a variety of academic institutions, including Washington University (St. Louis, MO), Erasmus Medical Center (Rotterdam, The Netherlands), and commercial tissue bank companies, including Genomics Collaborative, Inc. (Cambridge, MA) , Asterand (Detroit, Ml), Oncomatrix (La Jolla, CA) and Clinomics Biosciences (Pittsfield, MA). For each specimen, demographic, clinical and pathological information was also collected. The histopathological features of each sample were checked to confirm the diagnosis, and to estimate the preservation of the sample and the tumor content. RNA extraction and hybridization of GeneChip. Cancer sample frozen with more than 70% of tumor cells, benign and normal samples were removed and homogenized with a mechanical homogenizer (UltraTurrex T8, Germany) in Trizol reagent (Invitrogen, Carisbad, CA). The tissue was homogenized in Trizol reagent following the standard Trizol protocol for RNA isolation from frozen tissues (Invitrogen, Carisbad, CA). After centrifugation, the upper liquid phase was collected and the total RNA was precipitated with isopropyl alcohol at -20 ° C. RNA pellets were washed with 75% ethanol, resolved in water and stored at -80 ° C until used. RNA quality was examined with an Agilent 2100 Bioanalyzer RNA 6000 Nano Assay (Agilent Technologies, Palo Alto, CA). Labeled cRNA was prepared and hybridized with the Hu133A Gene Chip high-density oligonucleotide array (Affymetrix, Santa Clara, CA) containing a total of 22,000 sets of probes in accordance with the standard manufacturer's protocol. The arrangements were scrutinized using Affymetrix protocols and scrutinizers. For subsequent analysis, each set of probes was considered a separate gene. The expression values for each gene were calculated using Affymetrix Gene Chip analysis software from MAS 5.0. All chips found three quality control standards: the "present" percent required for the fix was greater than 35%, the scale factor was less than 12 when scaled to a global target intensity of 600, and the Average background level was less than 150. Selection of candidate markers. For selection of tissue marker of origin (ToO) candidates for tissues of lung, colon, breast, ovary, and prostate, the expression levels of the probe sets were measured in the RNA samples covering a total of 682 normal tissues , benign, and cancerous of breast, colon, lung, ovary, prostate.

The tissue-specific marker candidates were selected based on the number of statistical inquiries. In order to generate pancreatic candidates, the gene expression profiles of 13 specimens of primary pancreatic ductal adenocarcinoma, 5 of normal pancreas and 98 of lung, colon, breast and ovarian cancer were used to select pancreatic adenocarcinoma markers. Two inquiries were made. In the first inquiry, the dataset containing 14547 genes with at least 2 requirements "present" in pancreatic samples was created. A total of 2736 genes that overexpressed in pancreatic cancer compared to normal were identified by T test (p <0.05), were identified. Genes whose minimum expression at 1 1 percentile of pancreatic cancer was at least 2 times higher than the maximum in colon and lung cancer was selected, making 45 sets of probes. As a final step, 6 genes with maximal expression of at least 2 times greater than maximum expression in colon, lung, breast, and ovarian cancers were selected. In the second investigation, the data set of 4654 sets of probes with at most 2 requirements "present" in all specimens of breast, colon, lung and ovary was created. A total of 160 genes that have at least 2 requirements "present" in normal and pancreatic cancer samples were selected. From 160 genes, 10 genes were selected after comparing their expression level between pancreatic and normal tissues. The results of both pancreas investigations were combined. In addition to the analysis of gene expression profiles, a few markers were selected from the literature. The results of all the inquiries were combined to make a short list of candidates for ToO markers for each type of tissue. The sensitivity and specificity of each marker was estimated. The markers that demonstrated the best capacity to differentiate tissues by their origin were nominated for RT-PCR tests based on redundancy and complementarity of markers. Metastatic carcinoma of FFPE of known origin and CUP tissues. A total of 386 metastatic FFPE carcinomas (stage III-IV) of known origin and 24 primary FFPE prostate adenocarcinomas were purchased from a variety of commercial providers, including Proteogenex (Los Angeles, CA), Genomics Collaborative, Inc. (Cambridge, MA), Asterand (Detroit, Ml), Ardáis (Lexington, MA) and Oncomatrix (La Jolla, CA). An independent set of 48 known primary and CUP tissue metastatic carcinomas was obtained from Albany Medical College (Albany, NY). For each specimen, demographic, clinical and pathological information of the patient was collected. The histopathological features of each sample were reviewed to confirm diagnosis, and to estimate the preservation of the sample and tumor content. For metastatic samples, metastatic carcinoma and ToO diagnoses were unequivocally established based on the patient's clinical history and the histological evaluation of metastatic carcinoma compared to corresponding primary ones.

Isolation of RNA from FEPP samples. Isolation of RNA from sections of tissue in paraffin was as described in the manual High Puré RNA Paraffin Manual Kit (Roche) with the following modifications. Tissue samples embedded in paraffin were sectioned according to the size of the embeded metastasis (2-5mm = 9 X 10 μm, 6-8mm = 6 X 10 μm, 8-> 10mm = 3 X 10 μm). The sections were deparaffinized as described in the equipment manual, the tissue pellet was dried in an oven at 55 ° C for 5-10 minutes and resuspended in 100 μl of tissue lysis pH regulator, 16 μl of SDS 10% and 80 μl of Proteinase K. The samples were vortexed and incubated in a thermomixer set at 400 rpm for 2 hours at 55 ° C. Subsequent sample processing was performed in accordance with the manual High Puré RNA Paraffin Kit manual. The samples were quantified by OD 260/280 readings obtained by a spectrophotometer and the samples were diluted at 50ng / μl. The isolated RNA was stored in RNase-free water at -80 ° C until used. qRTPCR for preselection of candidates for marker. One μg of total RNA from each sample was transferred inversely with random hexamers using Superscript II reverse transcriptase according to the manufacturer's instructions (Invitrogen, Carisbad, CA). Initiators and MGB probes for candidate gene tag and ACTB gene control were designed using Primer Express software (Applied Biosystems, Foster City, CA) either ABI on-demand test (Applied Biosystems, Foster City, CA) were used . All internally designed primers and probes were tested for optimum application efficiency above 90%. The RT-PCR amplification was carried out in a 20 ml reaction mixture containing 200 ng of template cDNA, 2 x TaqMan® master PCR universal mix (10 ml) (Applied Biosystems, Foster City, CA) , 500nM of forward and reverse initiators and 250nM of probe. The reactions were carried out in an ABI PRISM 7900HT sequence detection system (Applied Biosystems, Foster City, CA). Cycling conditions were: 2 min of activation of AmpErase UNG at 50 ° C, 10 min of polymerase activation at 95 ° C and 50 cycles at 95 ° C for 15 seconds and tempering temperature (60 ° C) for 60 seconds . In each test, the "no template" control together with template cDNA was included in duplicate for both the gene of interest and the control gene. The relative expression of each target gene was represented as? Ct, which is equal to Ct of the target gene subtracted by Ct from the control gene (ACTB). QRTPCR of an optimized step. Appropriate mRNA reference sequence access numbers in conjunction with Oligo 6.0 were used to develop CUP TaqMan® tests (lung markers, human surfactant, lung-associated protein B (HUMPSPBA), thyroid transcription factor 1 (TTF1) , desmoglein 3 (DSG3), colorectal marker: cadherin 17 (CDH17), breast markers: mammaglobin (MG), ets transcription factor derived from prostate (PDEF), ovarian marker: Wilms tumor 1 (WT1), markers of pancreas: prostate stem cell antigen (PSCA), coagulation factor V (F5), prostaglandic marker kalikrein 3 (KLK3)) and maintenance tests of beta-actin (β-Actin), hydroxymethylbilane synthase (PBGD). Specific initiators and hydrolysis probes for the optimized step qRT-PCR test are listed in Table 2 (SEQ ID NOs: 1 1 -58). Amplification of genomic DNA was excluded by designing the test around splice sites of exon-intron. The hydrolysis probes were labeled at the 5 'nucleotide with FAM as the reporter dye and at the 3' nucleotide with BHQ1 -TT as the internal extinction dye. The quantification of gene specific RNA was carried out in a 384-well plate in an ABI Prism 7900HT sequence detection system (Applied Biosystems). For each thermocycler, operation calibrators and standard curves were amplified. The calibrators for each marker consisted of in vitro transcripts of target gene that were diluted in rat kidney carrier RNA at 1 X 10 5 copies. Standard curves for maintenance markers consisted of in vitro transcripts of the target gene that were serially diluted in RNA carrying rat kidney at 1 X107, 1 X105 and 1 X103 copies. Objective controls were not included in each test operated to ensure a lack of environmental contamination. All samples and controls were operated in duplicate. qRTPCR was performed with general laboratory use reagents in 10 μl of reaction containing: RT-PCR pH regulator (50 nM Bicin / KOH pH 8.2, 1 15 nM KAc, 8% glycerol, 2.5 mM MgCl2, 3.5 mM of MnS04, 0.5 mM of each of dCTP, dATP, dGTP and dTTP), additives (2 mM of Tris-Cl pH 8, 0.2 mM of bovine albumin, 150 mM of trehalose, 0.002% Tween 20), mixture of enzymes (2U Tth (Roche), 0.4 mg / μl Ab TP6-25), primer and probe mixture (0.2 μM probe, 0.5 μM primer). The following cyclization parameters were followed: 1 cycle at 95 ° C for 1 minute; 1 cycle at 55 ° C for 2 minutes; 5% ramp; 1 cycle at 70 ° C for 2 minutes; and 40 cycles of 95 ° C for 15 seconds, 58 ° C for 30 seconds. After the PCR reaction was completed, the baseline and threshold values were set in the ABI 7900HT Prism software and the calculated Ct values were exported to Microsoft Excel. One-step reaction vs. Two steps. For comparison of two-step, one-step RT-PCR reactions, the two-step reaction chain synthesis was first carried out using either 100 ng of random hexamers or gene-specific primers per reaction. In the first step 1 1.5 μl of mixture-1 (primers and 1 μg of total RNA) was heated at 65 ° C for 5 minutes and then cooled on ice. 8.5 μl of mixture-2 (1 x pH regulator, 0.01 mM DTT, 0.5 mM each dNTP's, 0.25 U / μl RNasin®, 10 U / μl superscript lll) was added to mixture-1 and incubated at 50 ° C for 60 minutes followed by 95 ° C for 5 minutes. The cDNA was stored at -20 ° C until it was ready to be used. qRTPCR for the second step for the 2-step reaction was performed as stated above with the cyclization parameters: 1 cycle at 95 ° C for 1 minute; 40 cycles of 95 ° C for 15 seconds, 58 ° C for 30 seconds. qRTPCR for the one-step reaction was performed exactly as stated in the preceding paragraph. Both the one-step and two-step reactions were performed on 100 ng of template (RNA / cDNA). After the PCR reaction was completed, the baseline and threshold values were set in the ABI 7900HT Prism software and the calculated Ct values were exported to Microsoft Excel. Development of algorithm Linear discriminators were constructed using the MASS (Venables and Ripley) library function 'Ida' in the R language (version 2.1.1). The model used depends on the tissue from which the metastasis was extracted, as well as the patient's gender. When a site of lung, colon, or ovarian metastasis was found, the previous class is set to zero for the class that is equivalent to the site of metastasis. In addition, previous disparities are set at zero for the breast and ovarian class in male patients, whereas in female patients, the previous disparities of the prostate class are set at zero. All other previous disparities used in the models are equivalent. Additionally, the classification for each sample is based on the highest posterior probability determined by the model of each class. To estimate the performance of the model, cross-validation was performed, leaving one out. In addition to this, the data sets were randomly divided into halves, while retaining the proportional relationship between each class, in training and test sets. This random division was repeated three times. Results The goal of this study was to develop a qRTPCR test to predict tissue of metastatic carcinoma origin. The experimental work consisted of two main parts. The first part included the nomination of tissue-specific marker candidates, its validation on FFPE metastatic carcinoma tissues and the selection of ten markers for the test (Figure 9A.) The second part included qRTPCR test optimization followed by implementation test on another set of FEPP metastatic carcinomas, construction of a prediction algorithm, its cross-validation and validation on an independent sample set (Figure 9B). Characteristics of the sample. RNA from a total of 700 frozen primary tissue samples were used for the gene expression profile and the tissue-specific gene identification. The samples included 545 primary carcinomas (29 of lung, 13 of pancreas, 315 of breast, 128 colorectal, 38 of prostate, 22 of ovary), 37 benign lesions (1 of lung, 4 colorectal, 6 of breast, 26 of prostate) and 1 18 (36 of lung, 5 of pancreas, 36 colorectal, 14 of breast, 3 of prostate, 24 of ovary) normal tissues. A total of 375 metastatic carcinomas of known origin (stage III-IV) and 26 samples of primary prostate adenocarcinoma were used in the study. Metastatic carcinomas originated from lung, pancreas, colorectal, ovarian, prostate as well as other cancers. The "other" category of sample consisted of metastases derived from tissues other than lung, pancreas, colon, breast, ovary and prostate. The characteristics of the patient are summarized in table 18.

TABLE 18 CUP metastatic sample preparation Total number 401 48 Average age 57.8 + 1 1 '' 62.13 + 1 1 7 Gender Female 241 20 Male 160 28 Tissue of origin Lung 65 9 Pancreas 63 2 Colorectal 61 4 Mama 63 5 Ovary 82 2 Prostate 27 2 Kidney 8 8 Stomach 7 0 Other ** 25 5 Carcinoma of unknown primary origin 1 1 Histopathological diagnosis Adenocarcoma, moderately / well differentiated 306 27 Adenocarcinoma, poorly differentiated 49 4 Squamous cell carcinoma 16 5 Low-differentiated carcinoma 16 10 Small cell carcinoma 3 Melanoma 5 Lymphoma 3 Hepatocellular carcinoma 2 Mesothelioma 1 Other ** * 14 2 Metastatic site Lymph nodes 73 1 Brain 17 14 Lung 20 7 Liver 75 1 1 Pelvic region (ovary, bladder, fallopian tubes) 53 2 Abdomen (Omento (omentum, mesentery, colon, peritoneum) i 91 5 Other (skin, thyroid, breast wall, navel) 44 8 Unknown 2 Primary (prostate) 26 * Age is unknown for 26 patients "esophagus, bladder, pleura, liver, gall bladder, bile ducts, larynx, pharynx, lymphoma Non-Hodg? N * "* small cells, mesothelioma, hepatocellular, melanoma, lymphoma Samples were separated into two sets: the validation set (205 specimens) that was used to validate tissue-specific differential expression of the candidates and the set of training (260 specimens) that was used for qRTPCR one step optimized procedure test and training of a prediction algorithm.The first set of 205 samples included 25 lung, 41 pancreas, 31 colorectal, 33 breast, 33 of ovary, 1 of prostate, 23 of other cancer metastases and 18 primary prostate cancers The second set consisted of 260 samples that included 56 of lung, 43 of pancreas, 30 colorectal, 30 of breast, 49 of ovary, 32 of Other metastases from cancer and 20 primary prostate cancers Sixty-four specimens including 16 lung, 21 pancreas, 15 other metastases and 12 primary prostate carcinomas were from the Patient is in both sets. The independent sample set obtained from Albany Medical College was composed of 33 CUP specimens with a suggested primary origin for 22 of them, and 15 metastatic carcinomas of known origin. For CUPs that have a suggested primary origin, a diagnosis was made based on morphological characteristics, and / or test results with a panel of IHC markers. The demographic, clinical and pathological characteristics of the patient are presented in table 18. Marker candidate selection. The analysis of gene expression profiles of 5 primary tissue types (lung, colon, breast, ovarian, prostate) resulted in the nomination of 13 tissue-specific marker candidates for qRTPCR test. Senior candidates have been identified in previous studies of cancers in situ. Argani et al. (2001); Backus et al. (2005); Cunha et al. (2005); Borgono et al. (2004); McCarthy et al. (2003); Hwang et al. (2004); Fleming et al. (2000); Nakamura et al. (2002); and Khoor et al. (1997). In addition to the analysis of the microarray data, two markers were selected from the literature, including a DSG3 complementary lung squamous cell carcinoma marker and the PDEF breast marker. Backus et al. (2005). The microarray data confirmed the high sensitivity and specificity of these markers. A special approach was used to identify specific markers of the pancreas. First, five candidates for pancreatic markers were analyzed: prostate stem cell antigen (PSCA), serine protease inhibitor, class 1 member 1 (SERPINA1), cytokeratin 7 (KRT7), matrix metalloprotease 1 1 (MMP1 1) , and mucin 4 (MUC4) (Varadhachary et al. (2004); Argani et al. (2001); Jones et al. (2004); Prasad et al. (2005); and Moniaux et al. (2004)). DNA microarrays and a panel of 13 pancreatic duct adenocarcinomas, five normal pancreas tissues, and 98 samples of breast, colorectal, lung and ovarian tumors. Only PSCA showed moderate sensitivity (six of thirteen or 46% of pancreatic tumors were detected) at high specificity (91 of 98 or 93% were correctly identified as not being of pancreatic origin). By the congreso, KRT7, SERPINA1, MMP1 1, and MUC4 demonstrated sensitivities def 38%, 31%, 85% and 31%, respectively, to specificities def 66%, 91%, 82% and 81%, respectively. These data agreed well with qRTPCR performed on 27 metastases of pancreatic origin and 39 metastases of non-pancreatic origin for all markers except for MMP1 1 that showed lower sensitivity and specificity with qRTPCR and metastasis. In conclusion, the microarray data on frozen primary tissue serves as a good indicator of the ability of the marker to identify a metastasis of FFPE as being of pancreatic origin using qRTPCR but that additional markers may be useful for optimal performance. Pancreatic duct adenocarcinoma develops from duct epithelial cells that comprise only a small percentage of all pancreatic cells (with acinar and islet cells comprising the majority) in the normal pancreas. In addition, the tissues of pancreatic adenocarcinoma contain a significant amount of adjacent normal tissue. Prasad et al. (2005); and Ishikawa et al. (2005). Because of this, the candidate pancreas markers were enriched for genes elevated in pancreatic adenocarcinoma relative to normal pancreatic cells. The first method of inquiry returned six sets of probes: coagulation factor V (F5), a hypothetical protein FLJ22041 similar to FK506 binding proteins (FKBP10), beta 6 integrin (ITGB6), transglutaminase 2 (TGM2), homogeneous nuclear ribonucleoprotein A0 (HNRPO), and BAX delta (BAX). The second method of inquiry (see materials and methods section for details) returned eight sets of probes: F5, TGM2, homodomain transcription factor in pairs 1 (PITX1), trio isoform mRNA (TRIO), mRNA for p73H (p73), an unknown protein for MGC: 10264 (SCD), and two sets of probes for claudin 18. A total of 23 candidate tissue-specific markers were selected for validation of additional RT-PCR on FFPE tissue from carcinoma metastatic by qRT-PCR. Marker candidates were tested in 205 metastatic FFPE carcinomas, from carcinomas of the lung, pancreas, colon, breast, ovary, prostate and primary prostate. Table 19 provides the gene symbols of the specific tissue markers selected for RT-PCR validation and also summarizes the test results made with these markers.

TABLE 19 Of 23 tested markers, thirteen were rejected based on their cross-reactivity, low level of expression in the corresponding metastatic tissues, or redundancy. Ten markers were selected for the final version of the test. The lung markers were lung-associated B protein of human surface active agent (HUMPSPB), thyroid transcription factor 1 (TTF1), and desmoglein 3 (DSG3). The pancreas markers were prostate stem cell antigen (PSCA) and coagulation factor V (F5), and the prostate marker was kallikrein 3 (KLK3). The colorectal marker was cadherin 17 (CDH17). The breast markers were mammaglobin (MG) and prostate derived Ets transcription factor (PDEF). The ovarian marker was Wilms tumor 1 (WT1). The mean of the normalized relative expression values of selected markers in different metastatic tissues are presented in Figures 10A-10J. Optimization of sample preparation and qRT-PCR using FFPE tissues. Next, the RNA isolation methods and qRTPCR methods were optimized using fixed tissues before examining the performance of the marker panel. First, the effect of reducing the incubation time of proteinase K from sixteen hours to 3 hours was analyzed. There was no effect on performance. However, some samples showed longer fragments of RNA when the shorter proteinase K step was used (Figure 11A, B). For example, when RNA was isolated from a one-year-old block (C22), no difference was observed in the electropherograms. However, when RNA was isolated from a five-year-old block (C23), a larger fraction of higher molecular weight RNAs was observed, as assessed by the hump on the shoulder, when proteinase digestion was used. K shorter. This tendency was generally maintained when other samples were processed, independently of the organ of origin for the metastasis of FFPE. In conclusion, shortening the digestion time with proteinase K does not sacrifice RNA yield and may help in the isolation of less degraded RNA longer. We next compared three different methods of reverse transcription: reverse transcription with random hexamers followed by qPCR (two steps), reverse transcription with a gene-specific primer followed by qPCR (two steps), and a one-step qRTPCR using specific primers from gen. RNA was isolated from eleven metastases and the Ct values were compared through the three methods for β-actin, HUMSPB (figure 1 1 C, D) and TTF. The results showed statistically significant differences (p <0.001) for all comparisons. For both genes, reverse transcription with random hexamers followed by qPCR (two-step reaction) gave the highest Ct values while reverse transcription with a gene-specific primer followed by qPCR (two-step reaction) gave Ct values slightly lower (but statistically significant) than the corresponding 1-step reaction. However, the two-step RTPCR with gene-specific primers had a longer reverse transcription step. When the values of HUMSPB Ct were normalized to the corresponding β-actin value for each sample, there were no differences in the normalized Ct values across the three methods. In conclusion, the optimization of the reaction conditions of RTPCR can generate lower Ct values, which helps to analyze older paraffin blocks (Cronin et al. (2004)), and a one-step PCR reaction with primers Gene-specific assays can generate Ct values comparable to those generated in the corresponding two-step reaction Optimized qRTPCR test diagnostic performance. 12 qRTPCR reactions (10 markers and 2 maintenance genes) were performed on a new set of 260 metastases from FFPE. Twenty one samples gave high Ct values for maintenance genes so that only 239 were used in a heat map analysis . The analysis of the normalized Ct values in a heat map revealed the high specificity of the breast and prostate markers. The markers moderate the specificity of the colon., lung and ovary, and reduce a little the specificity of the pancreas markers (figure 12). The combination of standardized qRTPCR data with computational refinement improves the performance of the marker panel. Using normalized expression values to average the expression of two maintenance genes, an algorithm to predict tissue of metastasis origin was developed by combining the standardized qRTPCR data with the algorithm and the accuracy of the qRTPCR test was determined by performing a cross-validation test leaving one out (LOOCV). For the six tissue types included in the test, it was estimated separately that both the number of false-positive requirements, when one sample was erroneously predicted and another type of tumor included in the test (for example, pancreas as colon), the number of times a sample was not predicted as those included in the tissue types of the test (other). The results of the LOOCV are presented in table 20.

TABLE 20 The tissue of origin was correctly predicted for 204 out of 260 samples tested with an overall accuracy of 78%. A significant proportion of the false-positive requirements was due to the cross-reactivity of the markers in histologically similar tissues. For example, three metastatic squamous cell carcinomas originating from the pharynx, larynx and esophagus were erroneously predicted as lung due to expression of DSG3 in these tissues. Positive expression of CDH17 in carcinomas other than Gl of the colon, including stomach and pancreas, caused false classification of 4 of 6 stomach cancer metastases tested and 3 of 43 metastases of pancreatic cancer tested as colon. In addition to a LOOCV test, the data was randomly divided into 3 separate pairs of training and test sets. Each division contained approximately 50% of the samples of each class. In 50/50 divisions in three separate pairs of training and test sets, the overall test classification accuracies were 77%, 71% and 75%, confirming stability of the test performance. Finally, another independent set of 48 FFPE metastatic carcinomas that included metastatic carcinoma of unknown primary origin, specimens of CUP with a tissue diagnosis of origin made by pathological evaluation including IHC, and CUP specimens that remained CUP after IHC testing they were tested. The accuracy of the prediction of tissue of origin was estimated separately for each category of samples. Table 21 summarizes the results of the test.

TABLE 21 The prediction of the tissue of origin was, with only a few exceptions, consistent with the diagnosis of primary origin or tissue of known origin evaluated by clinical / pathological evaluation including IHC. Similar to the training set, the test was not able to differentiate squamous cell carcinomas originating from different sources and falsely predicted as lung. The test also made putative tissue diagnosis for eight of eleven samples that remained CUP after standard diagnostic tests. One of the CUP cases was especially interesting. A male patient with a history of prostate cancer was diagnosed with metastatic carcinoma of the lung and pleura. The tests of serum PSA and IHC with PSA antibodies on metastatic tissue were negative, so the diagnosis of the pathologist was CUP with an inclination toward gastrointestinal tumors. The strong test (posterior probability of 0.99) predicted the tissue of origin as a colon. Discussion. In this study, expression based on a microarray profile on primary tumors was used to identify candidate markers for use with metastases. The fact that primary tumors can be used to discover tumor markers of origin for metastasis is consistent with several recent findings. For example, Weigeit et al. Have shown that gene expression profiles of primary breast tumors are maintained in distant metastases. Weigeit et al. (2003). Backus et al. Identified putative markers to detect breast cancer metastasis using gene expression analysis of a broad genome of breast and other tissues and demonstrated that mammaglobin and CK19 detected clinically operable metastases in breast sentinel lymph nodes with 90% sensitivity and 94% specificity. Backus et al. (2005). During the development of the test, the selection focused on six types of cancer, including lung, pancreas and colon that are among the most predominant in CUP (Ghosh et al. (2005) and Pavlidis et al. (2005)) and breast, ovary and prostate for which the treatment could potentially be more beneficial for patients. Ghosh et al. (2005). However, additional fabric types and markers may be added to the panel as long as the overall accuracy of the test is not compromised and, if applicable, the logistics of the RTPCR reactions are not encumbered. The microarray-based studies with primary tissue confirmed the specificity and sensitivity of known markers. As a result, most tissue-specific markers have high specificity for the tissues studied here. A recent study found that, using IHC, PSCA is overexpressed in metastasis from prostate cancer. Lam et al. (2005). Dennis et al. (2002) also showed that PSCA could be used as a marker tumor of origin for pancreas and prostate The strong expression of PSCA in some prostate tissues at the RNA level was present but, due to the inclusion of PSA in the test, the Prostate and pancreatic cancers could not be segregated this time A new finding from this study was the use of F5 as a complementary marker (a PSCA) for tissue of pancreatic origin. In both the microarray data set with primary tissue and the qRTPCR data set with FFPE metastasis, it was found that F5 supplemented PSCA Previous researchers have generated CUP tests using IHC (Brown et al (1997), DeYoung et al (2000), and Dennis et al (2005a)) or microarrays Su et al. (2001), Ramaswamy et al (2001); and Bloom et al. (2004) More recently, SAGE has been coupled to a small qRTPCR marker panel Dennis et al (2002); and Buckhaults et al (2003) This study is the first to combine microarray-based expression profile with a small panel of qRTPCR tests. Microarray studies with primary tissue identified some, but not all, markers of the same tissue of origin that Those previously identified by SAGE studies This finding is not surprising given the studies that have shown that there is a moderate agreement between the profile data based on the SAGE- and DNA microarray and that the correlation improves for genes with higher expression levels. Ruissen et al (2005), and Kim et al (2003) For example, Dennis et al. Identified PSA, MG, PSCA, and HUMSPB while Buckhaults et al. (Buckhaults et al. (2003)) identified PDEF. The execution of the CUP test is preferably by qRTPCR because it is a robust technology and may have performance advantages over IHC. Al-Mulla et al. (2005); and Haas et al. (2005). Further, as shown here, the qRTPCR protocol has been improved by the use of gene-specific primers in a one-step reaction. This is the first demonstration of the use of gene-specific primers in a one-step qRTPCR reaction with FFPE tissue. Other investigators have made a synthesis of two-step qRTPCR (cDNA in a reaction followed by qPCR) or have used random hexamers or truncated gene-specific primers. Abrahamsen et al. (2003); Specht et al. (2001); Godfrey et al. (2000); Cronin et al. (2004); and Mikhitarian et al. (2004). In summary, the 78% overall accuracy of the test for six tissue types compares favorably with other studies. Brown et al. (1997); DeYoung et al. (2000); Dennis et al. (2005a); His et al. (2001); Ramaswamy et al. (2001); and Bloom et al. (2004).

EXAMPLE 7 In this study a classifier using gene marker portfolios was constructed by choosing MVO and using this classifier to predict tissue of origin and cancer status for five major cancer types including breast, colon, lung, ovarian and prostate. Three hundred seventy-eight primary cancers, 23 being proliferative epithelial lesions and 103 normal frozen human tissue specimens were analyzed using U133A Human GeneChip from Affymetrix. Leukocyte samples were also analyzed to subtract the expression of potentially coverted genes by co-expression in leukocyte background cells. A bioinformatics method based on novel MVO was developed to select portfolios of gene markers for tissue of origin and cancer status. The data demonstrated that a panel of 26 genes could be used as a classifier to accurately predict the tissue of origin and cancer status among the 5 types of cancers. Therefore, a method of classifying multiple cancers can be obtained by determining the expression of gene profiles from a reasonably smaller number of gene markers. Table 22 shows the markers identified for the indicated tissue origins. For gene descriptions see table 31.

TABLE 22 The set of samples included a total of 299 carcinomas of the colon, breast, pancreas, ovary, prostate, metastatic lung and other carcinomas and samples of primary prostate cancer. QC based on histological evaluation, RNA yield and gene expression of beta-actin control was implemented. Another category of samples included metastases originating from stomach (5), kidney (6), cholangio / gallbladder (4), liver (2), head and neck (4), ileus (1) carcinomas and mesothelioma. Table 23 summarizes the results.

TABLE 23 The test of the previous samples resulted in the narrowness of the set of markers to those in table 24 with the results seen in table 25.

TABLE 24 TABLE 25 The results showed that of 205 metastatic tumors embedded in paraffin; 166 samples (81%) had conclusive test results, table 26.

TABLE 26 Of the false positive results, many false ones are derived from histologically and embryologically similar tissues, table 27.

TABLE 27 The following parameters were considered for model development: Markers are separated into male and female sets and the probability of CUP is calculated separately for male and female patients. The male group included: SP_B, TTF1, DSG3, PSCA, F5, PSA, HPT1; The set of female sex included: SP_B, TTF1, DSG3, PSCA, F5, HPT1, MGB, PDEF, WT1. The background expression was excluded from the test results: Lung: SP_B, TTF1, DSG3; Ovary: WT1; and Colon: HPT1. The CUP model was adjusted to the frequency of CUP (%): lung 23, pancreas 16, colorectal 9, breast 3, ovary 4, prostate 2, other 43. The frequency for breast and ovary was adjusted to 0% for patients of male sex, and for prostate was adjusted to 0% for female patients. The following steps were taken: place markers on a similar scale; reduce the number of variables from 12 to 8 by selecting the minimum value of each specific set of tissue; 1 sample is left; a model is constructed from the remaining samples; the left sample is tested; it is repeated until 100% of the samples are tested leaving at random -50% of the samples (-50% per tissue); a model is constructed from the remaining samples; 50% of the samples are tested; and it is repeated for 3 different random divisions. The accuracy of the classification was adjusted to the frequency of types of cancer to produce the results summarized in Table 28 with the empirical data shown in Table 29.

TABLE 28 TABLE 29 Sample ID Gender Origin BK Prediction BACTIN PBGD Prom CDH17 DSG3 F5 HUMP KLK3 MG PDEF PSCA TTF1 WT1 128 breast Lung other 23.37 30.04 26.71 40.00 37.78 35.74 22.19 40.00 40.00 30.36 29.96 29.39 34.85 134 breast uk breast 19.60 27.00 23.30 40.00 31.27 30.83 40.00 40.00 29.51 25.07 24.67 40.00 34.13 166 breast uk breast 23.47 27.95 25.71 40.00 40.00 26.66 40.00 28.20 24.78 25.19 30.69 40.00 35.32 331 breast ovary breast 25.12 31.40 28.26 40.00 40.00 40.00 40.00 40.00 22.26 26.01 40.00 40.00 40.00 356 breast uk breast 28.59 33.89 31.24 40.00 34.01 40.00 40.00 40.00 35.73 33.19 30.72 40.00 40.00 163 colon uk colon 24.69 30.34 27.52 29.39 40.00 26.52 40.00 40.00 40.00 37.72 40.00 40.00 36.17 184 m colon uk colon 22.47 28.63 25.55 26.22 33.26 28.76 40.00 40.00 40.00 34.07 33.44 40.00 31.64 339 f colon uk colon 28.35 34.29 31.32 33.76 40.00 40.00 40.00 40.00 40.00 35.99 40.00 40.00 40.00 346 m colon lung colon 23.15 28.77 25.96 26.36 40.00 32.64 20.89 40.00 40.00 32.47 40.00 26.75 30.58 or 363 m colon uk colon 24.46 30.62 27.54 26.20 31.84 29.98 34.44 40.00 40.00 30.45 35.00 40.00 30.35 101 m lung Lung 24.68 28.79 26.74 40.00 40.00 39.34 21.57 40.00 40.00 28.21 27.47 40.00 35.76 106 m lung uk lung 22.05 27.50 24.78 40.00 40.00 32.24 23.68 40.00 40.00 25.79 25.02 26.42 37.27 1 10 m lung uk lung 29.19 32.32 30.76 40.00 40.00 40.00 21.21 40.00 40.00 32.77 32.43 30.70 36.13 112 m lung uk other 22.48 27.79 25.14 40.00 37.05 37.38 36.08 40.00 40.00 37.12 36.04 40.00 37.45 199 f lung uk lung 21.21 27.07 24.14 35.65 25.56 31.23 40.00 40.00 28.94 32.19 27.95 32.14 31.60 200 m lung Lung 22.16 26.94 24.55 40.00 24.53 33.69 40.00 40.00 40.00 36.67 38.34 38.61 33.55 313 m lung uk other 24.76 30.05 27.41 38.40 40.00 40.00 40.00 40.00 40.00 40.00 40.00 40.00 35.1 1 323 m lung uk pancreas 23.82 30.24 27.03 32.43 31.82 33.81 40.00 40.00 40.00 33.60 28.12 40.00 31.87 325 m lung Lung Lung 22.09 27.97 25.03 40.00 26.84 34.88 38.61 40.00 38.04 34.29 27.31 39.21 31.23 335 m lung uk another 24.89 29.73 27.31 40.00 29.62 38.00 40.00 40.00 40.00 39.23 40.00 31.12 32.12 347 m lung uk lung 23.40 29.08 26.24 40.00 26.72 37.21 40.00 40.00 40.00 36.10 30.76 40.00 39.44 374 m lung lungs uk 22.50 28.23 25.37 40.00 40.00 38.76 21.38 40.00 37.26 26.56 38.26 24.86 36.60 385 f lung uk lung 21.65 26.44 24.05 37.05 40.00 34.51 19.89 40.00 40.00 27.36 40.00 23.72 37.09 114 f other lung Other 24.80 30.56 27.68 40.00 40.00 28.16 21.51 40.00 40.00 35.76 37.85 28.19 37.21 129 m other lung Other 2149 2825 2487 3947 4000 2886 2065 179 f other uk Other 2397 3045 2721 4000 4000 2979 4000 4000 4000 4000 4000 4000 3264 194 m other uk Other 2528 3247 2888 4000 4000 2890 4000 4000 4000 4000 4000 3475 3541 302 f other colon breast 2567 3147 2857 3417 4000 4000 4000 4000 4000 3055 3247 4000 3820 305 m other uk Other 2380 2974 2677 2964 4000 3406 4000 4000 4000 3182 4000 4000 4000 317 m other uk pancreas 2590 3062 2826 4000 4000 2775 4000 4000 4000 3189 3306 4000 3512 333 f other uk Other 2245 2882 2564 3054 4000 3701 4000 4000 4000 3785 4000 4000 4000 334 m other uk Other 2214 2920 2567 3179 4000 3627 4000 4000 4000 3469 4000 4000 4000 342 f other uk pancreas 2732 3137 2935 3236 4000 2924 4000 4000 4000 3289 4000 4000 3818 382 m other uk Other 2504 3022 2763 4000 4000 3613 4000 4000 4000 3830 4000 4000 3491 404 m other uk Other 2327 3016 2672 4000 3936 3475 4000 4000 4000 3902 4000 4000 3424 354 f ovary uk ovary 2462 3154 2808 4000 4000 3490 4000 4000 4000 3662 4000 4000 2971 148 f ovary uk pancreas 2355 2988 2672 4000 4000 3060 3884 4000 4000 3212 3176 4000 3859 417 f pancreas uk pancreas 2342 2946 2644 2828 3896 2905 3701 4000 4000 3015 3023 4000 3069 136 m prostate lung prostate 2237 2695 2466 4000 4000 2947 2369 2138 4000 2470 2428 3089 3116 or 407 m prostate pulmori prostate 2820 3187 3004 4000 4000 4000 2770 2598 4000 2765 4000 3913 3876 116 f CUP uk pulmonSCC2166 2731 2449 2895 2786 3106 4000 4000 3028 3349 2931 4000 3811 123 m CUP lung colon 2709 3059 2884 2792 3601 4000 4000 4000 4000 4000 4000 4000 3665 157 m CUP uk pancreas 2681 3194 2938 4000 4000 2682 4000 4000 4000 3668 4000 4000 4000 177 m CUP uk pancreas 2544 3152 2848 4000 4000 2715 4000 4000 4000 3967 4000 4000 3471 306 m CUP uk lung 2315 2838 2577 3730 4000 3494 1971 4000 4000 3081 4000 2545 3928 360 m CUP uk Other 2114 2743 2429 3397 3698 3272 4000 4000 4000 2775 4000 4000 4000 372 f CUP uk ovary 2316 2912 2614 4000 4000 3407 4000 4000 4000 3293 4000 4000 2528 187 f CUP uk colon 2444 2980 2712 2683 3591 2632 3055 4000 4000 4000 4000 2975 4000 EXAMPLE 8 Prospective gene signature study of CUP metastatic cancer from unknown site to predict tissue of origin The specific purpose of this study was to determine the capacity of the signature of 10 genes to predict tissue origin of metastatic carcinoma in patients with carcinoma of unknown primary origin (CUP). Primary objective: To confirm the feasibility of conducting gene analysis from core biopsy samples in consecutive patients with CUP. Secondary objective: To correlate the results of the RT-PCR test of the signature of 10 genes with diagnostic work done in the M.D. Anderson Cancer Center (Anderson Cancer Center, MDACC). Third objective: Correlate the frequency of 6 types of cancer predicted by test with the frequency derived from the literature and experience of the MDACC. The method described here was used to perform an analysis of gene expression by microarray of 700 frozen primary carcinomas, and benign and normal specimens and identified gene marker candidates, specific for carcinomas of the lung, pancreas, colon, breast, prostate and ovary. The gene marker candidates were tested by RT-PCR on 205 specimens embedded in paraffin and fixed with formalin (FFPE) of metastatic carcinoma (stage 11 II V) originating from lung, pancreas, colon, breast, ovary and prostate as well as metastasis originating from other types of cancer for specificity control. Other types of metastatic cancer included gastric carcinomas, of renal cells, hepatocellular, cholangio / gallbladder and head and neck carcinomas. The results allowed the selection of a signature of 10 genes that predicted tissue of metastatic carcinoma origin and gave an overall accuracy of 76%. The average CV for repeated measurements in RT-PCR experiments is 1.5%, calculated based on 4 replicated date points. Beta-actin (ACTB) was used as maintenance gene and the median of its expression was similar in metastatic samples of different origin (CV = 5.6%). The specific purpose for this study was to validate the signature capacity of 10 genes to predict tissue origin of metastatic carcinoma in patients with CUP compared to full diagnostic work.

Choice of patients Patients must be at least 18 years of age with an ECOG performance status of 0-2. Patients with a diagnosis of adenocarcinoma or a diagnosis of poorly differentiated carcinoma were accepted. The group of patients with adenocarcinoma include well, moderately or poorly differentiated tumors. The patients have met the criteria for CUP: no primary were detected after full evaluation which is defined as complete history and physical examination, detailed laboratory examination, imaging studies and invasive studies of symptoms and signs. Only untreated patients were allowed in the study. If a patient has been treated with chemotherapy or radiation, participation in the study is allowed if previous tissue (to treatment) is available as archived blocks within a period of 10 years. Patients provided written consent / authorization to participate in this study.

Study design Patients with CUP diagnosis who underwent core needle biopsy or by more accessible excision of the metastatic lesion were allowed in the study. Patients who underwent FNA biopsy were not eligible. The first 60 consecutive patients to be presented who met the criteria for inclusion and consent to the study were enrolled. If repeated biopsy was required in MDACC for diagnostic purposes for treatment, additional tissue was obtained for the study if the patient has given consent. All the participants were registered in the protocol in the system of data management of institutional protocol (PDMS, for its acronym in English). Complete diagnostic work, including clinical and pathological evaluations, was performed on all patients enrolled in accordance with MDACC standards. The diagnostic treatment part of the pathology may have included immunohistochemistry (IHC) tests with markers that included CK-7, CK-20, TTF-1 and another as considered by the pathologist. This is part of the routine work of all patients who have CUP.

Collection of tissue samples The study included specimens of metastatic carcinoma embedded in paraffin and fixed in formalin collected from patients with CUP. Six 10 μm sections were used for RNA isolation, smaller tissue specimens will require nine 10 μm sections. Diagnosis of histopathology and tumor content were confirmed for each sample used for RNA isolation in an additional section stained with hematoxylin and eosin (HE). The tumor sample must have more than 30% tumor content in the HE section. The clinical data_were_ provided anonymously to Veridex and included patient age, gender, tumor histology by optical microscopy, tumor grade (differentiation), site of metastasis, date of specimen collection, description of the diagnostic work performed for the individual patient.

Tissue processing and RT-PCR experiments Total RNA was extracted from each tissue sample using the protocol described above. Only samples that produced more than 1 μm of total RNA of the standard amount of tissue were used for subsequent RTPCR testing. Samples with less RNA were considered degraded and excluded from subsequent experiments. RNA integrity control based on maintenance expression were implemented to exclude samples with degraded RNA, in accordance with the standard Veridex procedure. The RT-PCR test that includes a panel of 10 genes and 1 -2 control genes was used for the analysis of the RNA samples. The reverse transcription and the PCR test are completed using the protocols described above. The relative expression value for each tested gene presented as? Ct, which is equal to Ct of the target gene subtracted by Ct from the control genes, was calculated and used for the prediction of tissue of origin.

Sample size and interpretation of data- A limited sample size of 60 patients was studied due to the exploratory nature of the pilot study. To date, 22 patients have been tested. Samples from one patient did not give enough RNA for RT-PCR test and 3 did not pass the QC control evaluated by RT-PCR with control genes. A total of 18 patients were used to determine the probability of the patient's metastatic injury. The statistical model was used to determine the probability of tissue origin of metastatic carcinoma of the following seven categories: lung, pancreas, colon, breast, prostate, ovary and without test (other). For each sample, the probability for each category is calculated from a linear classification model. The test results are summarized in Table 30. The probability of a patient's metastatic lesion (with known primaries) from one of these 6 sites (colon, pancreas, lung, prostate, ovary, breast) is approximately 76%. This number is derived from the literature given the incidence of several cancers and potential for disseminated and unpublished data generated in M.D. Anderson of the tumor registry. For the samples tested, the frequency of 6 sites was 67% (12 out of 18 samples tested), which were very consistent with the previous observations.

TABLE 30 Although the above invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, the descriptions and examples should not be construed as limiting the scope of the invention.

TABLE 31 REFERENCES Publications of patent application and patents of E.U.A 5242974 5700637 20030194733 5350840 5786148 20030198970 5384261 6004755 20030215803 5405783 6136182 20030215835 5412087 6218114 20030219760 5424186 6218122 20030219767 5429807 6225051 20030232350 5436327 6232073 20030235820 5445934 6261766 20040005563 5472672 6271002 20040009154 5527681 6339148 20040009489 5529756 20010029020 20040018969 5532128 20020055627 20040029114 5545531 20020068288 20040076955 5554501 20020168647 20040126808 5556752 20030044859 20040146862 5561071 20030087818 20040219572 5571639 20030104448 20040219575 5593839 20030124128 20050037010 5599695 20030124579 20050059008 5624711 20030138793 20060094035 5658734 20030190656 Foreign patent and patent publications W01998040403 WO2001073032 WO2004030615 W01998056953 WO2002046467 WO2004031412 WO2000006589 WO2002073204 WO2004063355 WO2000055320 WO2002101357 WO2004077060 WO2001031342 WO2004018999 WO2005005601 Magazine articles Abrahamsen et al. (2003) Towards quantitative mRNA analysis in paraffin-embedded tissues using real-time reverse transcriptase-polymerase chain reaction J Mol Diag 5: 34-41 Al-Mulla et al. (2005) BRCA1 gene expression in breast cancer: a correlative study between real-time RT-PCR and immunohistochemistry J Histochem Cytochem 53: 621 -629 Argani et al. (2001) Discovery of new Markers of cancer through serial analysis of gene expression: prostate stem cell antigen is overexpressed in pancreatic adenocarcinoma Cancer Res 61: 4320-4324 Autiero et al. (2002) Intragenic amplification and formation of extrachromosomal small circular DNA molecules from the PIP gene on chromosome 7 in primary breast carcinomas Int J Cancer 99: 370-377 Backus et al. (2005) Identification and characterization of optimal gene expression Markers for breast cancer detection metastasis J Mol Diagn 7: 327-336 Bentov et al. (2003) The WT1 Wilms' tumor suppressor gene: a novel target for insulin-like growth factor-l action Endocrinol 144: 4276-4279 Bera et al. (2004) NGEP, a gene encoding a membrane protein detected only in prostate cancer and normal prostate Proc Nati Acad Sci USA 101: 3059-3064 Bibikova et al (2004) Quantitative gene expression profiling in formalin-fixed, paraffin-embedded tissues using universal bead arrays Am j Pathol 165: 1799-1807 Bloom et al. (2004) Multi-platform, multi-site, microarray-based human tumor classification Am J Pathol 164: 9-16 Borchers et al. (1997) Heart-type fatty acid binding protein -involvement in growth inhibition and differentiation Prostaglandins Leukot Essent Fatty Acids 57: 77-84 Borgono et al. (2004) Human tissue kallikreins: physiologic roles and applications in cancer Mol Cancer Res 2: 257-280 Brookes (1999) The essence of SNPs Gene 23: 177-186 Brown et al. (1997) Immunohistochemical identification of tumor Markers in metastatic adenocarcinoma. A diagnostic adjunct in the determination of primary site Am J Clin Pathol 107: 12-19 Buckhaults et al. (2003) Identifying tumor origin using a gene expression-based classification map Cancer Res 63: 4144-4149 Chan et al. (1985) Human liver fatty acid binding protein cDNA and amino acid sequence. Functional and evolutionary implications J Biol Chem 260: 2629-2632 Chen et al. (1986) Human liver fatty acid binding protein gene is located on chromosome 2 Somat Cell Mol Genet 12: 303-306 Cheung et al. (2003) Detection of the PAX8-PPAR gamma fusion oncogene in both follicular thyroid carcinomas and adenomas J Clin Endocrinol Metab 88: 354-357 Clark et al. (1999) The potential role for prolactin-inducible protein (PIP) as a Marker of human breast cancer micrometastasis Br J Cancer 81: 1002-1008 Cronin et al. (2004) Measurement of gene expression in archival paraffin-embedded tissue Am J Pathol 164: 35-42 Cunha et al. (2006) Tissue-specificity of prostate specific antigens: Comparative analysis of transcript levéis in prostate and non-prostatic tissues Cancer Lett 236: 229-238 Dennis et al. (2002) Identification from public data of molecular Markers of adenocarcinoma characteristic of the site of origin Can Res 62: 5999-6005 Dennis et al. (2005a) Hunting the primary: novel strategies for defining the origin of tumors J Pathol 205: 236-247 Dennis et al. (2005b) Markers of adenocarcinoma characteristic of the site of origin: development of a diagnostic algorithm Clin Can Res 1 1: 3766-3772 DeYoung et al. (2000) Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach Semin Diagn Pathol 17: 184-193 Di Palma et al. (2003) The paired domain-containing factor Paxd and the homeodomain-containing factor TTF-1 directly interact and synergistically active transcription Biol Chem 278: 3395-3402 Dwight et al. (2003) Involvement of the PAX8 peroxisome proliferator-activated receptor gamma rearrangement in follicular thyroid tumors J Clin Endocrinol Metab 88: 4440-4445 Feldman et al. (2003) PDEF expression in human breast cancer is correlated with invasive potential and altered gene expression Cancer Res 63: 4626-4631 Fleming et al. (2000) Mammaglobin, a breast-specific gene, and its utility as a Marker for breast cancer Ann N and Acad Sci 923: 78-89 Fukushima et al. (2004) Characterization of gene expression in mucinous cystic neoplasms of the pancreas using oligonucleotide microarrays Oncogene 23: 9042-9051 Ghosh et al (2005) Management of patients with metastatic cancer of unknown primary Curr Probl Surg 42: 12-66 Giordano et al. (2001) Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles Am J Pathol.159: 1231 -1238 Glasser et al (1988) cDNA, deduced polypeptide structure and chromosomal assignment of human pulmonary suryactant proteolipid, SPL (pVal) J Biol Chem 263: 9-12 Godfrey et al. (2000) Quantitative mRNA expression analysis from formalin-fixed, paraffin-embedded tissues using 5 'nuclease quantitative reverse transcription-polymerase chain reaction J Mol Diag 2: 84-91 Goldstein et al. (2002) WT1 immunoreactivity in uterine papillary serous carcinomas is different from ovarian serous carcinomas Am J Clin Pathol 1 17: 541 -545 Gradi et al. (1995) The human steroidogenic acute regulatory (StAR) gene is expressed in the urogenital system and encodes to mitochondrial polypeptide Biochim Biophys Acta 1258: 228-233 Greco et al. (2004) Carcinoma of unknown primary site: sequential treatment with paclitaxel / carboplatin / etoposide and gemcitabine / irinotecan: A Minnie Pearl cancer research network phase II trial The Oncologist 9: 644-652 Haas et al. (2005) Combined application of RT-PCR and immunohistochemistry on paraffin embedded sentinel lymph nodes of prostate cancer patients Pathol Res Pract 200: 763-770 Hwang et al. (2004) Wilms tumor gene product: sensitive and contextually specific Marker of serous carcinomas of ovarian surface epithelial origin Appl Immunohistochem Mol Morphol 12: 122-126 Ishikawa et al. (2005) Experimental trial for diagnosis of pancreatic ductal carcinoma based on gene expression profiles of pancreatic ductal cells Cancer Sci 96: 387-393 Italiano et al. (2005) Epidermal growth factor receptor (EGFR) status in primary colorectal tumors correlates with EGFR expression in related metastatic sites: biological and clinical implications Ann Oncol 16: 1503-1507 Jones et al. (2004) Comprehensive analysis of matrix metalloproteinase and tissue inhibitor expression in pancreatic cancer: increased expression of matrix metalloproteinase-7 predicts poor survival Clin Cancer Res 10: 2832-2845 Jones et al. (2005) Thyroid transcription factor 1 expression in small cell carcinoma of the urinary bladder: an immunohistochemical profile of 44 cases Hum Pathol 36: 718-723 Khoor et al. (1997) Expression of surfactant protein B precursor and surfactant protein B mRNA in adenocarcinoma of the lung Mod Pathol 10: 62-67 Kim (2003) Comparison of oligonucleotide-microarray and serial analysis of gene expression (SAGE) in transcript profiling analysis of megakaryocytes derived from CD34 + cells Exp Mol Med 35: 460-466 Kim et al. (2003) Steroidogenic acute regulatory protein expression in the normal human brain and intracranial tumors Brain Res 978: 245-249 Lam et al. (2005) Prostate stem cell antigen is overexpressed in prostate cancer metastases Clin Can Res 11 2591 -2596 Lembersky et al. (1996) Metastases of unknown ppmary site Med Clin North Am. 80: 153-171 Lewis et al. (2001) Unlocking the archive-gene expression in paraffin-embedded tissue J Pathol 195: 66-71 Lipshutz et al. (1999) High density synthetic oligonucleotide arrays Nature Genetics 21: S20-24 Lowe et al. (1985) Human hver fatty acid binding protein. Isolation of a full length cDNA and comparative sequence analyzes of orthologous and paralogous proteins J Biol Chem 260: 3413-3417 Ma et al. (2006) Molecular classification of human cancers using a 92-gene real-time quantitative polymerase chain reaction assay Arch Pathol Lab med 130 465-473 Magklara et al. (2002) Charactepzation of androgen receptor and nuclear receptor co-regulator expression in human breast cancer cell lines exhibiting differential regulation of kallikreins 2 and 3 Int J Cancer 100 507-514 Markowitz (1952) Portfolio Selection J Finance 7: 77-91 Marques et al (2002) Expression of PAX8-PPAR gamma 1 rearrangements in both follicular thyroid carcinomas and adenomas J Clin Endocpnol Metab 87 3947-3952 Masuda et al. (1999) Analysis of chemical modification of RNA from formalin-fixed samples and optimization of molecular biology applications for such samples Nucí Acids Res 27: 4436-4443 McCarthy et al. (2003) Novel Markers of pancreatic adenocarcinoma in fine-needle aspiration: mesothelin and prostate stem cell antigen labeling increases accuracy in cytologically borderline cases Appl Immunohistochem Mol Morphol 1 1: 238-243 Mikhitarian et al. (2004) Enhanced detection of RNA from paraffin-embedded tissue using a panel of truncated gene-specific primers for reverse transcription BioTechniques 36: 1-4 Mintzer et al. (2004) Cancer of unknown primary: changing approaches, to multidisciplinary case presentation from the Joan Karnell Cancer Center of Pennsylvania Hospital The Oncologist 9: 330-338 Moniaux et al. (2004) Multiple roles of mucins in pancreatic cancer, a lethal and challenging malignancy Br J Cancer 91: 1633-1638 Murphy et al. (1987) Isolation and sequencing of a cDNA clone for a prolactin-inducible protein (PIP). Regulation of PIP gene expression in the human breast cancer cell line, T-47D J Biol Chem 262: 15236-15241 Myal et al. (1991) The prolactin-inducible protein (PIPGCDFP-15) gene: cloning, structure and regulation J Mol Cell Endocrinol 80: 165-175 Nakamura et al. (2002) Expression of thyroid transcription factor-1 in normal and neoplastic lung tissues Mod Pathol 15: 1058-1067 Noonan et al. (2001) Characterization of the homeodomain gene EMX2: sequence conservation, expression analysis, and a search for mutations in endometrial cancers Genomics 76: 37-44 Oettgen et al. (2000) PDEF, a novel prostate epithelium-specific Ets transcription factor, interacts with the androgen receptor and activates prostate-specific antigen gene expression J Biol Chem 275: 1216-1225 Oji et al. (2003) Overexpression of the Wilms' tumor gene WT1 in head and neck squamous cell carcinoma Cancer Sci 94: 523-529 Pavlidis et al. (2003) Diagnostic and therapeutic management of cancer of an unknown primary Eur J Can 39: 990-2005 Pilot-Mathias et al. (1989) Structure and organization of the gene encoding human pulmonary surfactant proteolipid SP-B DNA 8: 75-86 Pilozzi et al. (2004) CDX1 expression is reduced in colorectal carcinoma and is associated with promoter hypermethylation J Pathol 204: 289-295 Poleev et al. (1992) PAX8, a human paired box gene: isolation and expression in developing thyroid, kidney and Wilms' tumors Development 1 16:61 1 -623 Prasad et al. (2005) Gene expression profiles in pancreatic intraepithelial neoplasia reflecting the effects of Hedgehog signaling on pancreatic ductal epithelial cells Cancer Res 65: 1619-1626 Ramaswamy (2004) Translating cancer genomics into clinical oncology N Engl J Med 350: 1814-1816 Ramaswamy et al . (2001) Multiclass cancer diagnosis using tumor gene expression signatures Proc Nati Acad Sci USA 98: 15149-15154 Rauscher (1993) The WT1 Wilms tumor gene product: a developmentally regulated transcription factor in the kidney that functions as a tumor suppressor FASEB J 7: 896-903 Reinholz et al. (2005) Evaluation of a panel of tumor Markers for molecular detection of circulating cancer cells in women with suspected breast cancer Clin Cancer Res 1 1: 3722 Schlag et al. (1994) Cancer of unknown primary site Ann Chir Gynaecol 83: 8-12 Senoo et al. (1998) A second p53-related protein, p73L, with high homology to p73 Biochem Biophys Res Comm 248: 603-607 Specht et al. (2001) Quantitative gene expression analysis in microdissected archival formalin-fixed and paraffin-embedded tumor tissue Amer J Pathol 158: 419-429 Su et al. (2001) Molecular classification of human carcinomas by use of gene expression signatures Cancer Res 61: 7388-7393 Takahashi et al. (1995) Cloning and characterization of multiple human genes and cDNAs encoding highly related type II keratin 6 isoforms J Biol Chem 270: 18581 -18592 Takamura et al. (2004) Reduced expression of liver-intestine cadherin is associated with progression and lymph node metastasis of human colorectal carcinoma Cancer Lett 212: 253-259 Tothill et al. (2005) An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin Can Res 65: 4031-4040 van Ruissen et al. (2005) Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips BMC Genomics 6:91 Varadhachary et al. (2004) Diagnostic strategies for unknown primary cancer Cancer 100: 1 76-1785 Venables et al. (2002) Modern Applied Statistics with S. Fourth edition. Springer Wallace et al. (2005) Accurate Molecular detection of non-small cell lung cancer metastases in mediastinal lymph nodes sampled by endoscope ultrasound-guided needle aspiration Cest 127: 430-437 Wan et al. (2003) Desmosomal proteins, including desmoglein 3, serve as novel negative Markers for epidermal stem cell-containing population of keratinocytes J Cell Sci 1 16: 4239-4248 Watson et al. (1996) Mammaglobin, a mammary-specific member of the uteroglobin gene family, is overexpressed in human breast cancer Cancer Res 56: 860-865 Watson et al. (1998) Structure and transcriptional regulation of the human mammaglobin gene, a breast cancer associated member of the uteroglobin gene family localized to chromosome 11 q13 Oncogene 16: 817-824 Weigeit et al. (2003) Gene expression profiles of primary breast tumors maintained in distant metastases Proc Nati Acad Sci USA 100: 15901 -15905 Zapata-Benavides et al. (2002) Downregulation of Wilms' tumor 1 protein inhibits breast cancer proliferation Biochem Biophys Res Commun 295: 784-790

Claims

NOVELTY OF THE INVENTION CLAIMS

1. - An in vitro method of identification of origin of a metastasis of unknown origin comprising the steps of a. measure biomarkers associated with at least two different carcinomas in a sample-containing-metastatic cells; b. combine data from biomarkers in an algorithm where the algorithm; i. normalizes biomarkers against a reference; and ii. it imposes a cut that optimizes sensitivity and specificity of each biomarker, weights the frequency of the carcinomas and selects a tissue of origin; c. determine the origin based on the highest probability determined by the algorithm or determining that the carcinoma is not derived from a particular set of carcinomas; and d. optionally measuring specific biomarkers for one or more different additional carcinoma, and repeating steps b) and c) for additional biomarkers.

2. The method according to claim 1, further characterized in that the marker genes are selected from at least one of a group corresponding to: i. SP-B, TTF, DSG3, KRT6F, p73H, or SFTPC; ii. F5, PSCA, ITGB6, KLK10, CLDN18, TR10 or FKBP10; or iii. CDH17, CDX1 OR FABP1.

3. The method according to claim 2, further characterized in that the marker genes are SP-B, TTF, DSG3, KRT6F, p73H, or SFTPC.

4. The method according to claim 3, further characterized in that the marker genes are SP-B, TTF and DSG3.

5. The method according to claim 4, further characterized in that the marker genes also comprise or are replaced by KRT6F, p73H, and / or SFTPC. 6. - The method according to claim 2, further characterized in that the marker genes are F5, PSCA, ITGB

6, KLK10, CLDN18, TR10 or FKBP10.

7 '.- The method according to claim 6, further characterized in that the marker genes are F5 and PSCA.

8. The method according to claim 7, further characterized in that the marker genes further comprise or are replaced by ITGB6, KLK10, CLDN18, TR10 and / or FKBP10.

9. The method according to claim 1, further characterized in that the marker genes are CDH17, CDX1 or FABP1.

10. The method according to claim 9, further characterized in that the marker gene is CDH17. 1.

The method according to claim 10, further characterized in that the marker gene also comprises or is replaced by CDX1 and / or FABP1.

12. - The method according to any of claims 1-11, further characterized in that the expression of genes is measured using at least one of SEQ ID NOs-1 1 -58.

13. The method according to claim 2, further characterized in that the marker genes are selected in addition to a gender-specific marker selected from at least one of i. in the case of a male patient KLK3, KLK2, NGEP or NPY; or n. in the case of a female patient PDEF, MGB, PIP, B305D, B726 or GABA-Pi; and / or WT1, PAX8, STAR or EMX2 14. The method according to claim 13, further characterized in that the marker gene is KLK2 15. The method according to claim 14, further characterized in that the marker gene is KLK3. . 16 - The method according to claim 15, further characterized in that the marker gene further comprises or is replaced by NGEP and / or NPY. 17. The method according to claim 13, further characterized in that the marker genes are PDEF, MGB, PIP, B305D, B726 or GABA-Pi 18 - The method according to claim 17, further characterized in that the marker genes are PDEF and MGB 19 - The method according to claim 18, further characterized in that the marker genes also comprise or are replaced by PIP, B305D, B726 or GABA-Pi. 20. The method according to claim 13, further characterized in that the marker genes are WT1, PAX8, STAR or EMX2. 21. The method according to claim 20, further characterized in that the marker gene is WT1. 22. The method according to claim 21, further characterized in that the marker gene also comprises or is replaced by PAX8, STAR or EMX2. 23. The method according to any of claims 13-22, further characterized in that gene expression is measured using at least one of SEQ ID NOs: 11 -58. 24. The method according to claim 1 or 2, further characterized in that it further comprises obtaining additional clinical information including the site of metastasis to determine the origin of the carcinoma. 25.- A method for obtaining optimal biomarker sets for carcinomas that includes the steps of using metastases of known origin, determining biomarkers for them and comparing biomarkers with biomarkers of metastases of unknown origin. 26. A method for providing therapy direction in determining the origin of a metastasis of unknown origin according to any of claims 1 -3 and identifying the appropriate treatment for it. 27. A method for providing a prognosis when determining the origin of a metastasis of unknown origin of any of claims 1 -3 and identifying the corresponding prognosis for it. 28. A method for finding biomarkers comprising determining the level of expression of a marker gene in a particular metastasis, measuring a biomarker for the marker gene to determine expression thereof, analyzing the expression of the marker gene of claim 1 and determining whether The marker gene is effectively specific for the tumor of origin. 29. A composition comprising at least one isolated sequence selected from SEQ ID NOs: 1 1-58. 30. A device for conducting a test of any of claims 1 -3, comprising: biomarker address reagents. 31.- A microarray or gene chip for performing the method of any of claims 1 -3. 32. A diagnostic / prognostic portfolio comprising isolated nucleic acid sequences, their complements, or portions thereof of a combination of genes of claims 2-1 1 or 13-22 wherein the combination is sufficient to measure or characterize the expression of genes in a biological sample that has metastatic cells in relation to cells of different carcinomas or normal tissue. 33. The method according to any of claims 2-1 1 or 13-22, further characterized in that it further comprises measuring the expression of at least one gene constitutively expressed in the sample.