CN101115848A

CN101115848A - Transcriptome microarray technology and methods of using the same

Info

Publication number: CN101115848A
Application number: CNA2005800457745A
Authority: CN
Inventors: 保尔·哈金; 帕特里克·约翰斯顿; 卡尔·马利根; 奥斯汀·塔内
Original assignee: Almac Diagnostics Ltd
Current assignee: Almac Diagnostics Ltd
Priority date: 2004-11-03
Filing date: 2005-11-03
Publication date: 2008-01-30

Abstract

Arrays containing a transcriptome of a diseased tissue and methods of using the arrays for diagnosis, prognosis, screening, and identification of disease are provided herein. The transcriptome arrays from diseased tissue are useful for diagnosis of a disease by analysis of the genetic profile of a tissue sample specific to a disease state. The genetic profiles are then correlated with data on the effectiveness of specific therapeutic agents. Correlating expression profiles to the effectiveness of therapeutic agents provides a way to screen and select further patients predicted to respond to those therapeutic agents, thereby minimizing needless exposure to ineffective therapy.

Description

Transcribe the method for organizing microarray technology and using this technology

The cross reference of right of priority and related application

It is 04105479.2 that the right of priority that the application requires has the application number of application on November 3rd, 2004,04105482.6,04105483.4,04105484.2,04105507.0,04105485.9 european patent application, U.S. Provisional Patent Application 60/662 with application on March 14th, 2005, the U.S. Provisional Patent Application 60/700,293 of application on July 18th, 276 and 2005.

Technical field

The application relates to the field of gene and rna expression array technique, especially relates to the sequence and their purposes in diagnosis and treatment plan that contain the transcript of expressing in illing tissue.

The CD-R of Ti Jiaoing goes up the index of file simultaneously

This submission have 3 identical CD-R disks (being labeled as " Copy 1 ", " Copy 2 " and " Copy 3 "), each contains following e-text document.The CD-R disk is created on November 1st, 2005, and the big or small note of each file is following listed.All electronic files on the CD-R disk are incorporated this paper into by reference in full at this.

List of genes A.txt	(30.7Mb)	List of genes S.txt	(6.1Mb)
List of genes A.txt	(30.7Mb)	List of genes S.txt	(6.1Mb)	List of genes B.txt	(1.9Mb)	List of genes T.txt	(29.6Mb)
List of genes C.txt	(2Mb)	List of genes U.txt	(1.7Mb)	List of genes B.txt	(1.9Mb)	List of genes T.txt	(29.6Mb)
List of genes C.txt	(2Mb)	List of genes U.txt	(1.7Mb)	List of genes D.txt	(1.1Mb)	List of genes V.txt	(13.3Mb)
List of genes E.txt	(58.6Mb)	List of genes W.txt	(18.9Mb)	List of genes D.txt	(1.1Mb)	List of genes V.txt	(13.3Mb)
List of genes E.txt	(58.6Mb)	List of genes W.txt	(18.9Mb)	List of genes F_txt	(3.5Mb)	List of genes X.txt	(10kb)
List of genes G.txt	(30.7Mb)	List of genes Y.txt	(28kb)	List of genes F_txt	(3.5Mb)	List of genes X.txt	(10kb)
List of genes G.txt	(30.7Mb)	List of genes Y.txt	(28kb)	List of genes H.txt	(4.1Mb)}	List of genes Z.txt	(5.7Mb)
List of genes I.txt	(30Mb)	List of genes AA.txt	(14.6Mb)	List of genes H.txt	(4.1Mb)}	List of genes Z.txt	(5.7Mb)
List of genes I.txt	(30Mb)	List of genes AA.txt	(14.6Mb)	List of genes J.txt	(18kb)}	List of genes BB.txt	(5.1Mb)

List of genes K.txt	(20kb)	List of genes CC.txt	(34Mb)
List of genes K.txt	(20kb)	List of genes CC.txt	(34Mb)	List of genes L.txt	(9.7Mb)}	List of genes DD.txt	(26.6Mb)
List of genes M.txt	(5.1Mb)	List of genes EE.txt	(4kb)	List of genes L.txt	(9.7Mb)}	List of genes DD.txt	(26.6Mb)
List of genes M.txt	(5.1Mb)	List of genes EE.txt	(4kb)	List of genes N.txt	(238kb)	List of genes FF.txt	(324kb)
List of genes O.txt	(35.8Mb)	List of genes GG.txt	(8.6Mb)	List of genes N.txt	(238kb)	List of genes FF.txt	(324kb)
List of genes O.txt	(35.8Mb)	List of genes GG.txt	(8.6Mb)	List of genes P.txt	(11.8Mb)	List of genes HH.txt	(18.8Mb)
List of genes Q.txt	(3.9Mb)	List of genes II.txt	(9.6Mb)	List of genes P.txt	(11.8Mb)	List of genes HH.txt	(18.8Mb)
List of genes Q.txt	(3.9Mb)	List of genes II.txt	(9.6Mb)	List of genes R.txt	(10.1Mb)	List of genes JJ.txt	(46.1Mb)

In addition, in 3 identical CD-R disks (electronic media of mark " Copy 1-Sequence ListingPart ", " Copy 2-Sequence Listing Part " and " Copy 3-Sequence Listing Part ") of this submission, each contains the sequence table of all sequences described herein.According to 801 parts about the PCTInstruction of the international application that contains big Nucleotide and/or aminoacid sequence table and/or its relevant form, the medium of the computer-reader form that sequence table is only mentioned with 802 parts is submitted to.The electronic media of the computer-reader form on the CD-R disk is incorporated into way of reference in full at this paper.

Background technology

Pharmaceutical industry is constantly pursued such new drug treatment plan and is selected, promptly these select more effective than the medicines that use at present, have more specificity or have still less side effect.The replacement scheme of pharmacological agent constantly develops, because human heritable variation has caused the essence difference of many drug effectivenesses.Therefore, though the selection of available pharmacological agent at present is various in style, under the situation that the patient can not reply, often need more methods of treatment.

Traditionally, the used treatment example of doctor has been stipulated a line pharmacological agent, and this treatment may produce the highest success ratio for the treatment disease.If pharmacological agent does not have effect first, then adopt the prescription of alternative drugs treatment.Very clear, this example is not best methods of treatment for some disease.For example, in the disease of human cancer, treatment first often is most important and the best chance of successful treatment is provided that being necessary so more to select will be for the most effective initial medicine of the disease of particular patient.

It is impossible identifying a best line medicine, because there is not the available method to predict which pharmacological agent will be is the most effective for special cancer physiology.Like this, the patient often unnecessarily stands treatment invalid, drug toxicity.For example, colorectal carcinoma, the none method goes to determine which patient will react to the postoperative adjuvant chemotherapy of surgery.Behind the operative treatment among the patient of 40% risk of relapse 1/3rd benefits from chemotherapy.This meaning is implemented adjuvant chemotherapy causes many patients to accept unnecessary treatment.Cancer therapy and colorectal carcinoma clinical trial be still based on the operability of new active compound and explore, rather than based on the integrated approach of the pharmacogenomics of genetic composition that utilizes tumour and patient's genotype.

The appearance of microarray and molecular gene group has the potential that produces significant effects for the diagnosis capability of disease and prognosis classification, and its aid forecasting individual patient is to the reaction of a certain definite treatment plan.Microarray is used for the analysis of a large amount of genetic information, and individual genetic fingerprint is provided by this.Think that extensively this technology provides essential instrument for the customization therapeutic scheme the most at last.Yet, compile and be used for fully characterizing and predicting that individuality is a problem to the ability of the correct information of the reaction of specific drugs treatment, and some disappointment (Nebert et at.2003.AmJ Pharmacogenomics has been made us in the height expectation of drug application genomics (applied pharmacogenomics); 3 (6): 361-70).

At present the subject matter of microarray be they usually based on the general information content that derives from part order-checking engineering, the engineering that wherein checks order produces expressed sequence tag (EST) information of crossing over histological types.Selectable, this information can result from the gene order-checking engineering of utilizing the algorithm predicts gene to exist.A major issue of this method be microarray production constantly the lastest imformation content so that more sequence information can use.Like this, for than the more information content of having set up, this method has caused a plurality of array versions, and each all previously has a more information than it.Great barrier has been made in routine application to this technology in the case control like this, because the investigator makes data verification very difficult in the face of a plurality of different array platforms with different content.Even in the sequence platform of specific preparation, in early days and be difficult to cross validation information between the late period array version, make the design that studies for a long period of time very difficult like this.

Another problem of present available microarray is that multi-form disease may present different reactions for the treatment of different therapeutical agents.The availability of array is subject in the specific diseased tissue these arrays and how represents.Therefore conventional full genome array is no advantage, because the outer source signal that provides with the unconnected gene of morbid state has caused heavy body test noise, thereby makes ill Analysis of Complex of transcribing group.

Information between the polygene type that traditional general array provides is limited.Yet they are not included in the details content of the specific transcript of expressing under the given independent situation (discrete setting).The general method of general microarray industry is along with more information can be used, and increases the density and the capacity of information.This has caused based on the confusion of using this technology in the research of pharmacogenomics.This main problem relates to the difficulty in the not isostructure of more general array.That is exactly to be difficult to derive from the data and the data association that derives from 40k sequence array of 20k sequence array.These confusions are to be caused with different problems by the note that contrasts.

Summary of the invention

The invention provides comprise with from the array of transcribing the corresponding biomolecules of group of illing tissue with in analysis, use the method for these arrays.This paper described comprise with from the array of transcribing group corresponding nucleic acids molecule of illing tissue with in analysis, use the method for these arrays.It is the set of transcribed nucleic acid thing that illing tissue transcribes group, comprises coding and non-coding nucleic acid sequence, and it is expressed in specific diseased tissue.This paper has also described the array of transcribing corresponding other biomolecules of group that comprises with from illing tissue.These biomolecules comprise protein, polypeptide and antibody.Array provides strong tool for the global expression profile new transcript relevant with morbid state with evaluation of research illing tissue.

The difficult problem that used array ran into before microarray described herein had solved by the employing peculiar methods, this method is that definition is transcribed the group information content entirely and this information content is placed on the array in given disease group.The perfect information content sources is in a plurality of illing tissues sample of progression of disease different steps, and it comprises population and disease heterogeneity.This method has guaranteed that all relevant informations in the given disease group (given disease setting) are available for interrogation, therefore it has greatly increased the potential of developing strong signal, and these signals are diagnosis, prognosis or the predictions of in the given disease group therapy being reacted.In addition, this method has caused having the generation that does not need the perfect information content array that repeatedly upgrades, thereby helps the research and design steady in a long-term of himself.And, because this method has presented fully and stabilised platform, so promoted the cross validation between a plurality of patient groups to study in given disease group.

Disease specific is transcribed group pattern and is comprised the perfect information content in given disease group, and therefore for presented stable, long-term solution based on the pharmacogenomics research and design.

Aspect of method provided herein, transcribe the hereditary feature that group pattern is tested and appraised patient illing tissue sample and be used to diagnose the illness.By identifying hereditary feature from illing tissue's sample or the transcript of suspecting sick tissue sample and the reaction of transcribing group pattern.Detect the complementary sequence hybridization on transcript and the array then or combine.Preferably, transcribing group pattern is the array that is fixed on the computer chip, and the hybridization of the nucleic acid molecule and the array of the technology for detection sample that uses a computer.Hereditary feature with illing tissue's sample is associated with validity and reactive data of this feature to the specific treatment agent then.The express spectra that produces and the relational degree of therapeutical agent validity provide further screening and have selected prediction to the patient that particular therapeutic agent responds, and make unnecessary patient minimized by the situation of unsuccessful treatment thus.

Another aspect of method of the present invention comprises the application of transcribing group that (as array analysis) in this paper method described, be used to detect other method detect less than early stage disease and the illness of organism.These organisms comprise people, animal, plant or bacterium.

The method of array described herein and this array of application provides and utilizes to transcribe to organize and detect, monitor and identify numerous disease and illness.All diseases generally can be divided into neoplastic disease, inflammatory disease and degenerative disease.These classification comprise, and nonrestrictive disease as, cancer, sacroiliitis, asthma, nerve degenerative diseases, cardiovascular disorder, hypertension, mental disorder, transmissible disease, metabolic disease or Immunological diseases.

In one embodiment, transcribing group pattern provides the colorectum the most completely of be sure oing evaluation at present to transcribe the compilation of group.Concentrated about 69,000 transcripts that come from the colorectum tissue be used to generate colorectal, based on the high density oligonucleotide array of transcribing group.About 40,000 in these transcripts are described in the U.S. Provisional Patent Application number 60/662,276.Coming from other about 23,000 transcripts of colorectum tissue and about 5,000 antisense transcripts is described in this paper and transcribes the group sequence to replenish the colorectum described in the U.S. Provisional Patent Application numbering 60/662,276.

Provided herein be used for array transcribe that group be sure of lung the most completely, chest, colon/rectum, liver and the cerebral tissue identified so far transcribe the group version.The present invention concentrated transcript be used to generate lung, chest, colon/rectum, liver and brain illing tissue, based on the high density oligonucleotide array of transcribing group.

Like this, array described herein provides bulk information to material alterations, and these material alterations may become the basis of progression of disease or treatment tolerance.

Pharmacogenomics has following potential, promptly greatly is reduced in the U.S. because the estimation that ADR causes 100,000 dead and 2,000,000 hospital cares (Lazarou et al.JAMA.Apr 15,1998.279 (15): 1200-5.).Do not use the standard trial and error pricing to mate patient and medicine, array described herein and analysis can make the doctor analyze the hereditary feature of patient's sample and give the most suitable pharmacological agent from initial diagnostic phases to this patient.Array described herein not only provides the method for the accuracy that improves in the prescription first active drug, and has improved security, because the possibility of ADR reduces.

Therefore, an object of the present invention is to provide the gene, polynucleotide, Nucleotide and the segmental nucleic acid array that comprise from illing tissue, the expression that is used for screening the target sample disease related gene.

Another object of the present invention provides the method for the new transcribed nucleic acid thing that evaluation expresses in illing tissue.

Another object of the present invention provides the method for the heritable variation that indication disease in the screening tissue or illness exist, this disease or illness with other method detection less than.

Another object of the present invention provides based on to transcribing the method that group analysis diagnoses the illness in the illing tissue.

Another object of the present invention provides the method for analyzing fully that rna expression changes, described rna expression variable effect all gene or transcripts of having identified in the specified disease.

Another object of the present invention provides the method for the express spectra that characterizes specific gene/RNA individual in the illing tissue, and rna expression is associated with suitable and effective therapeutic scheme.

Another object of the present invention provides the multi-form method of difference disease, and express spectra is associated with successful therapeutical agent treatment plan.

Another object of the present invention provides the method for associative expression spectrum and suitable therapeutical agent treatment plan.

Another object of the present invention provides the method that recurs after the prediction cancer therapy.

These and other purpose, feature and benefit of the present invention below will be clearer after the detailed description of disclosed embodiment and additional claim.

The accompanying drawing summary

Fig. 1: the figure that transcribes the group microarray is provided, and it has shown the express spectra of the tumour and the tumour that therapeutical agent tolerates of therapeutical agent sensitivity.

Fig. 2: the BLAST that provides all public's available colons, prostate gland and chest to organize data compares sketch.

Detailed Description Of The Invention

This paper provides the method for transcribing group pattern and using them. Described the group pattern of transcribing that comprises the nucleic acid molecules that comes from illing tissue's transcript, wherein nucleic acid molecules is with array format. Nucleic acid molecules on the array with transcribe the group sequence hybridization from the complementary nucleic acid of illing tissue's sample. This paper defines disease specific and transcribes group and be the set at coding and the non-coding transcript of specificity illing tissue transcription. Other array described herein comprises other biomolecule, transcribes polypeptide or the antibody of the transcript in the group from illing tissue such as expression.

Like this, array provided herein comprises nucleic acid array, polypeptide array, or antibody array. In this article, unless context has requirement in addition, otherwise, when in specific embodiment, addressing nucleic acid array, should be appreciated that corresponding protein arrays and antibody array also should be considered into. In these embodiments, the nucleic acid polypeptide of being encoded by transcript or the antibody that is specific to this polypeptide substitute.

Composition described herein and method can more easily be understood with reference to the description of following detailed particular. Although composition and method are to describe by the specific detail with reference to its some embodiment, can not be interpreted as these details are regarded as limitation of the scope of the invention.

The cell DNA that it will be appreciated by those skilled in the art that the gene form is transcribed into RNA; Coding RNA is translated as protein; RNA alternatively reverse transcription is cDNA. Preferably, the group pattern of transcribing described herein comprises all of illing tissue or all rna transcription thing basically.

Disease specific is transcribed group and is comprised known and transcript unknown function, and randomly comprises extension and the reflection of being transcribed genetic transcription in the group by the protein conduct of coding RNA transcript translation. Disease specific is transcribed group can be along with disease progression or on such as the environmental stimuli of chemotherapy or radiotherapy or impact and change.

As used herein, term " transcript " means the RNA molecule that comes from take DNA or cDNA as the transcription of masterplate. The cDNA that transcript also can form with protein or the reverse transcription of rna transcription thing of rna transcription thing translation divides subrepresentation.

As used herein, term " gene outcome " means the RNA molecule that comes from take DNA or cDNA as the transcription of masterplate and by the peptide molecule of this RNA molecule translation.

As used herein, term " is transcribed group " and is meant the set of coding or the non-coding RNA transcript of specific tissue transcription, and preferably comprises all and basically all rna transcription things that produce in the tissue. These transcripts comprise mRNA (mRNA), selectable montage mRNA, rRNA (rRNA), transfer RNA (tRNA), also have other a large amount of transcripts, they can not translate into protein, such as small nuclear RNA (snRNA), antisense molecule such as siRNA (siRNA) and microRNA, or the rna transcription thing of other Unknown Function. Transcribe group and also comprise the protein of transcribing the interior rna transcription thing translation of group, it is extension and the reflection of transcribing genetic transcription in the group.

As used herein, term " illing tissue " means the tissue from specific organ or tissue type, and this tissue has the special disease classification (such as colorectal cancer, breast cancer, nerve degenerative diseases etc.) with weave connection. Illing tissue also refers to the single cell type from illing tissue, such as epithelial cell, stroma cell or stem cell. For example, the colorectum tissue of disease refers to be diagnosed any colorectum tissue with disease or illness such as cancer. Although carried out in certain embodiments the differentiation of type of cancer, in most of embodiment of transcribing group pattern of the present invention, deliberately do not gone the various cancers type in the dividing tissue.

In addition, be appreciated that in the illing tissue as sample can have some normal, non-illing tissues or with the cell of illing tissue as sample.

Nucleic acid

The nucleic acid molecules, nucleic acid elements or the polynucleotides that are included in the array provided herein can be nucleic acid or the nucleic acid analog of any type, comprise without limitation RNA, DNA, peptide nucleic acid or their mixture and/or fragment. As used herein, term " fragment " refers to the partial sequence such as those sequences provided herein, and described segment can keep enough nucleotide sequence to allow this fragment to keep the specificity of the whole sequence in this fragment source and selective. Fragment can be complementary to whole sequence and keep optionally ability with whole sequence hybridization. Nucleic acid molecules is separated, clones and synthetic preparation. Nucleic acid elements can comprise the carrier sequence or it can be pure basically. Nucleic acid elements can be under the hybridization conditions of routine with the nucleic acid samples that comprises transcript specific molecular or element that derives from tissue sample in complementary transcript hybridize. Those of ordinary skills can be provided by the signal that the hybridization key element is hybridized and produced with the best that provides for given hybridization, and required resolution capability between different genes and the genome location is provided.

Following transcript tabulation provides the sequence that is specific to specific diseased tissue. This tabulation is summarized in the following table 1. The form neutralization runs through the used term of specification " list of genes " and means " tabulation of transcribed nucleic acid thing " and comprise simultaneously coding and noncoding region.

Table 1: the tabulation of sequence table transcript is summed up

Tissue/list of genes	The number of sequence	The sequence table scope
Tissue/list of genes	The number of sequence	The sequence table scope	The colorectum sequence
List of genes A	16,350	SEQ ID NO:1 to SEQ ID NO:16,350	The colorectum sequence
List of genes A	16,350	SEQ ID NO:1 to SEQ ID NO:16,350	List of genes B	2,773	SEQ ID NO:16351 to SEQ ID NO:19,123
List of genes C	1,805	SEQ ID NO:19,124 to SEQ ID NO:20,928	List of genes B	2,773	SEQ ID NO:16351 to SEQ ID NO:19,123
List of genes C	1,805	SEQ ID NO:19,124 to SEQ ID NO:20,928	List of genes D	1318	SEQ ID NO:20,929 to SEQ ID NO:22,246
List of genes E	10356	SEQ ID NO:22,247 to SEQ ID NO:32,802	List of genes D	1318	SEQ ID NO:20,929 to SEQ ID NO:22,246
List of genes E	10356	SEQ ID NO:22,247 to SEQ ID NO:32,802	List of genes F	7,134	SEQ ID NO:32,803 to SEQ ID NO:39,936
List of genes G	22,376	SEQIDNO:39,937 to SEQ ID NO:62,312	List of genes F	7,134	SEQ ID NO:32,803 to SEQ ID NO:39,936
List of genes G	22,376	SEQIDNO:39,937 to SEQ ID NO:62,312	List of genes H	5,672	SEQ ID NO:62,313 to SEQ ID NO:67,984
The lung sequence			List of genes H	5,672	SEQ ID NO:62,313 to SEQ ID NO:67,984
The lung sequence			List of genes I	36,431	SEQ ID NO:67,985 to SEQ ID NO:104,415
List of genes J	24	SEQ ID NO:104,416 to SEQ ID NO:104,439	List of genes I	36,431	SEQ ID NO:67,985 to SEQ ID NO:104,415
List of genes J	24	SEQ ID NO:104,416 to SEQ ID NO:104,439	List of genes K	22	SEQ ID NO:104,440 to SEQ ID NO:104,461
List of genes L	9,727	SEQ ID NO:104,462 to SEQ ID NO:114,188	List of genes K	22	SEQ ID NO:104,440 to SEQ ID NO:104,461
List of genes L	9,727	SEQ ID NO:104,462 to SEQ ID NO:114,188	List of genes M	5,208	SEQ ID NO:114,189 to SEQID NO:119,396
List of genes N	452	SEQ ID NO:119,397 to SEQ ID NO:119,848	List of genes M	5,208	SEQ ID NO:114,189 to SEQID NO:119,396
List of genes N	452	SEQ ID NO:119,397 to SEQ ID NO:119,848	List of genes O	42,790	SEQ ID NO:119,849 to SEQ ID NO:162,638
The mammary gland sequence			List of genes O	42,790	SEQ ID NO:119,849 to SEQ ID NO:162,638
The mammary gland sequence			List of genes P	17,291	SEQ ID NO:162,639 to SEQ ID NO:179,929
List of genes Q	3,278	SEQ ID NO:179,930 to SEQ ID NO:183,207	List of genes P	17,291	SEQ ID NO:162,639 to SEQ ID NO:179,929
List of genes Q	3,278	SEQ ID NO:179,930 to SEQ ID NO:183,207	List of genes R	4,915	SEQ ID NO:183,208 to SEQ ID NO:190,122
List of genes S	4,857	SEQ ID NO:194,123 to SEQ ID NO:194,979	List of genes R	4,915	SEQ ID NO:183,208 to SEQ ID NO:190,122
List of genes S	4,857	SEQ ID NO:194,123 to SEQ ID NO:194,979	List of genes T	34,141	SEQ ID NO:194,980 to SEQ ID NO:229,120

List of genes U	3,911	SEQ ID NO:229,121 to SEQ ID NO:233,031
List of genes U	3,911	SEQ ID NO:229,121 to SEQ ID NO:233,031	List of genes V	16,666	SEQ ID NO:233,032 to SEQ ID NO:249,697
The liver sequence			List of genes V	16,666	SEQ ID NO:233,032 to SEQ ID NO:249,697
The liver sequence			List of genes W	24,744	SEQ ID NO:249,698 to SEQ ID NO:274,441
List of genes X	13	SEQ ID NO:274,442 to SEQ ID NO:274,454	List of genes W	24,744	SEQ ID NO:249,698 to SEQ ID NO:274,441
List of genes X	13	SEQ ID NO:274,442 to SEQ ID NO:274,454	List of genes Y	32	SEQ B7 NO:274,455 to SEQ ID ND:274,486
List of genes Z	6,565	SEQ ID ND:274,487 to SEQ ID NO:281,051	List of genes Y	32	SEQ B7 NO:274,455 to SEQ ID ND:274,486
List of genes Z	6,565	SEQ ID ND:274,487 to SEQ ID NO:281,051	List of genes AA	14,789	SEQ ID NO:281,052 to SEQ ID ND:295,840
List of genes BB	11,851	SEQ ID NO:295,841 to SEQ ID NO:307,691	List of genes AA	14,789	SEQ ID NO:281,052 to SEQ ID ND:295,840
List of genes BB	11,851	SEQ ID NO:295,841 to SEQ ID NO:307,691	List of genes CC	39,979	SEQ ID NO:307,692 to SEQ ID NO:347,670
The brain sequence			List of genes CC	39,979	SEQ ID NO:307,692 to SEQ ID NO:347,670
The brain sequence			List of genes DD	33,275	SEQ ID NO:347,671 to SEQ ID NO:380,945
List of genes EE	5	SEQ ID NO:384,946 to SEO ID ND:380,950	List of genes DD	33,275	SEQ ID NO:347,671 to SEQ ID NO:380,945
List of genes EE	5	SEQ ID NO:384,946 to SEO ID ND:380,950	List of genes FF	341	SEQ ID NO:380,951 to SEQ ID NO:381,291
List of genes GG	8,486	SEQ ID NO:381,292 to SEQ ID NO:389,777	List of genes FF	341	SEQ ID NO:380,951 to SEQ ID NO:381,291
List of genes GG	8,486	SEQ ID NO:381,292 to SEQ ID NO:389,777	List of genes HH	19,081	SEQ ID NO:389,778 to SEQ ID NO:408,858
List of genes II	21,845	SEQ ID NO:408,859 to SEQ ID ND:430,703	List of genes HH	19,081	SEQ ID NO:389,778 to SEQ ID NO:408,858
List of genes II	21,845	SEQ ID NO:408,859 to SEQ ID ND:430,703	List of genes JJ	53,293	SEQ ID NO:430,704 to SR ID NO:483,996

Sequence in each tabulation of list of genes A-JJ is included on the appended CD-R of this specification, and all incorporates them into this paper by reference.

Transcript in the ill colorectum tissue

List of genes A (SEQ ID NO:1 to SEQ ID NO:16,350)

This paper provides the previous set of identified 16,350 transcriptons of expressing in the colorectum tissue.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 4,000 listed among list of genes A nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 6,000,8,000,10,000,12,000,14,000 or 16,000 listed among list of genes A sequence.

List of genes B (SEQ ID NO:16,351 to SEQ ID NO:19,123)

Described the set of 2,773 transcripts, these transcripts neither contradict with public's available expressed sequence tag library that the rectum cancer produces, and also do not contradict with note gene among the Genebank.Herein, these genes are identified recently.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 1,000 listed among list of genes B nucleic acid molecule.In another embodiment, array comprise be complementary among the list of genes B listed at least 50,100,500,1, the nucleic acid molecule of 000,1,500,2000 or 2500 sequence.

List of genes C (SEQ ID NO:19,124 to SEQ ID NO:20,928)

The cDNA library produces the people's colorectum tissue from disease, and this paper identified 1,805 nucleotide sequence by high-flux sequence, and their are former is also expressed in the colorectum cancerous tissue by evaluation.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 500 listed among list of genes C nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 50,200,500,750,1,000,1,400 or 1,750 listed among list of genes C sequences.

List of genes D (SEQ ID NO:20,929 to SEQ ID NO:22,246)

Selectable premessenger RNA montage is main cell processes, and by the different albumen of the monogenic primary transcription deposits yields of this process function, this situation usually takes place with the organizing specific sexual norm.

This paper has identified the set of 1,318 nucleotide sequence recently, and there be (expressing) in these sequences with (montage) form of the remarkable change of the gene of previous note or ESTs in the colorectum cancerous tissue.Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 500 listed among list of genes D nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 50,100,250,500,750,1,000 or 1,250 listed among list of genes D sequences.

List of genes E (SEQ ID NO:22,247 to SEQ ID NO:32,802)

Set up the cDNA library with ill people's colorectum tissue, this paper has identified 10,556 nucleotide sequences, and these sequences had not before been identified in the colorectum cancerous tissue and expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 500 listed among list of genes E nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 1,000,2,000,5,000 or 10,000 listed among list of genes E sequence.

List of genes F (SEQ ID NO:32,803 to SEQ ID NO:39,936)

Set up the cDNA library with ill people's colorectum tissue, this paper has identified 7,134 nucleotide sequences, and these sequences had not before been identified in the colorectum cancerous tissue and expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 500 listed among list of genes F nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 1,000,2,500,5,000 or 7,000 listed among list of genes F sequence.

List of genes G (SEQ ID NO:39,937 to SEQ ID NO:62,312)

This paper has identified the set of 22,376 nucleotide sequences, and these sequences had not before been identified in the colorectum cancerous tissue and expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 4,000 listed among list of genes G nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 6000,8,000,10,000,12,000,14,000,16,000 or 19,000 listed among list of genes G sequences.

List of genes H (SEQ ID NO:62,313 to SEQ ID NO:67,984)

This paper has identified the set of 5,672 nucleotide sequences recently, and these sequences constitute antisense and corresponding reverse complemental transcript.

Antisense transcript and the inclusion body (inclusion) of adopted transcript is arranged accordingly is the key character of array.General commercial obtainable array mainly concentrates on the detection coding adopted proteic transcript.Along with the increase of the interest of the effect of endogenous sense-rna transcript in cancer and other disease, identified that colorectum transcribes the antisense sequences in the group now.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 2,000 listed among list of genes H nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 3,000,4,000 or 5,000 listed among list of genes H sequence.

Transcript in the diseased lung tissue

List of genes I (SEQ ID NO:67,985 to SEQ ID NO:104,415)

This paper provides the previous set that has shown 36,431 transcripts that relate in the lung cancer.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 4,000 listed among list of genes I nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 6,000,8,000,15,000,20,000,30,000 or 35,000 listed among list of genes I sequence.

List of genes J (SEQ ID NO:104,416 to SEQ ID NO:104,439)

This paper has described the set of 24 transcripts, and these transcripts are contradicted by public's available EST library of cancerous lung tissue preparation, or not with Genbank in note gene contradiction.These genes are that this paper identifies recently.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 5 listed among list of genes J nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 6,10,15,18,20 or 22 listed among list of genes J sequences.

List of genes K (SEQ ID NO:104,440 to SEO ID NO:104,461)

This paper has identified the set of 22 expressed sequence tag by high-flux sequence, and these expressed sequence tag before be not reported in the lung tissue and have expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 5 listed among list of genes k nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 6,10,15,18 or 20 listed among list of genes k sequences.

List of genes L (SEQ ID NO:104,462 to SEQ ID NO:114,188)

This paper has identified that recently 9,727 are accredited as the transcript set that contains sequence, and wherein said sequence exists with (montage) form of the remarkable change of the lung cancer associated gene of previous note or ESTs.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 3,000 listed among list of genes D nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 4,000,5,000,7,000 or 9,000 listed among list of genes D sequence.

List of genes M (SEQ ID NO:114,189 to SEQ ID NO:119,396)

This paper has identified the set of 5,208 note genes recently, and these genes have been accredited as in the diseased lung tissue and have expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 2,500 listed among list of genes M nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 3,000,4,000 or 5,000 listed among list of genes M sequence.

List of genes N (SEQ ID NO:119,397 to SEQ ID NO:119,848)

This paper has identified that the set of 452 transcripts is single copy EST nucleotide sequence, and the note gene is expressed and before be not accredited as to these transcripts in cancerous lung tissue.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 200 listed among list of genes N nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 250,300,350 or 400 listed among list of genes N sequences.

List of genes O (SEQ ID NO:119,849 to SEQ ID NO:162,638)

This paper has identified 42,790 transcript set recently, and these transcripts have been formed the antisense and corresponding reverse complemental (reverse complement) transcript of the sequence of expressing in the cancerous lung tissue.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 20,000 listed among list of genes O nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 25,000,30,000,35,000 or 40,000 listed among list of genes O sequence.

List of genes P (SEQ ID NO:162,639 to SEQ ID NO:179,929)

This paper provides 17,291 set that before have been presented at the expressed sequence tag of expressing in the breast carcinoma tissue.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 3,000 listed among list of genes P nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 4,000,5,000,7,000,10,000,12,000,15,000 or 17,000 listed among list of genes P sequence.

List of genes Q (SEQ ID NO:179,930 to SEQ ID NO:183,207)

This paper has described the set of 3,278 transcripts, and these transcripts do not contradict with the EST library of public's available by breast carcinoma tissue preparation, or does not contradict with note gene among the Genbank.These genes are that this paper identifies recently.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 1,000 listed among list of genes Q nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 4,000 or 6,000 listed among list of genes Q sequence.

List of genes R (SEQ ID NO:183,208 to SEQ ID NO:190,122)

This paper has identified the set of 6,915 transcripts by high-flux sequence, and these transcripts before be not reported in the ill mammary tissue and have expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 2,000 listed among list of genes R nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 4,000 or 6,000 listed among list of genes R sequence.

List of genes S (SEQ ID NO:190,123 to SEQ ID NO:194,979)

This paper has identified that recently 4,857 are accredited as the transcript set that contains sequence, and wherein said sequence (montage) form with the remarkable change of the gene of previous note or ESTs in ill mammary tissue exists.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 1,000 listed among list of genes S nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 2,000 or 4,000 listed among list of genes S sequence.

List of genes T (SEQ ID NO:194,980 to SEQ ID NO:229,120)

This paper has identified the set of 34,141 transcripts of expressing in mammary tissue.These transcripts before be not confirmed to be in the breast carcinoma tissue and had expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 10,000 listed among list of genes T nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 15,000,20,000,25,000 or 30,000 listed among list of genes T sequence.

List of genes U (SEQ ID NO:229,121 to SEQ ID NO:233,031)

This paper crowd, the set of 3,911 transcripts is accredited as single copy EST nucleotide sequence, and the note gene is expressed and before be not accredited as to these transcripts in breast carcinoma tissue.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 1,000 listed among list of genes U nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 1,500,2,000,2,500 or 3,000 listed among list of genes U sequences.

List of genes V (SEQ ID NO:233,032 to SEQ ID NO:249,697)

This paper has identified the set of 16,666 transcripts recently, and described transcript has constituted the sequence of expressing in breast carcinoma tissue antisense has adopted transcript accordingly with it.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 8,000 listed among list of genes V nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 10,000,12,000,14,000 or 16,000 listed among list of genes V sequence.

The transcript of ill hepatic tissue

List of genes W (SEQ ID NO:249,698 to SEQ ID NO:274,441)

This paper provides 24,744 set that before have been accredited as the transcript of expressing at the hepatic tissue relevant with hepatitis.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 4,000 listed among list of genes W nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 6,000,8,000,10,000,12,000,14,000,16,000,19,000 or 21,000 listed among list of genes V sequence.

List of genes X (SEQ ID NO:274,442 to SEO ID NO:274,454)

This paper has described the set of 13 transcripts, and these transcripts do not contradict from the EST library of the relevant hepatic tissue of hepatitis preparation with public's available, or does not contradict with note gene among the Genbank.These genes are that this paper identifies recently.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 8 listed among list of genes X nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 10 or 12 listed among list of genes X sequences.

List of genes Y (SEQ ID NO:274,455 to SEQ ID NO:274,486)

This paper has been tested and appraised the previous set that still before is not reported in 32 transcripts of expressing in the relevant hepatic tissue of hepatitis by high flux screening.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 15 listed among list of genes Y nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 20,25 or 30 listed among list of genes Y sequences.

List of genes Z (SEQ ID NO:274,487 to SEQ ID NO:281,051)

This paper has identified the set of 6,565 transcripts, and these transcripts exist with (montage) form of the remarkable change of the gene of previous note or ESTs and express in the relevant hepatic tissue of hepatitis.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 3,000 listed among list of genes Z nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 4,000,5,000 or 6,000 listed among list of genes Z sequence.

List of genes AA (SEQ ID NO:281 to SEQ ID NO:295,840)

This paper has identified that recently being integrated in the relevant hepatic tissue of hepatitis of 14,789 transcripts express.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 8,000 listed among list of genes AA nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 8,000,10,000,12,000 or 14,000 listed among list of genes AA sequence.

List of genes BB (SEQ ID NO:295,841 to SEQ ID NO:307,691)

This paper has identified that 11,851 is the set of the transcript of single copy EST nucleotide sequence, and the note gene is expressed and before be not accredited as to these transcripts in the relevant hepatic tissue of hepatitis.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 6,000 listed among list of genes BB nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 8,000 or 10,000 listed among list of genes BB sequence.

List of genes CC (SEQ ID NO:307,692 to SEQ ID NO:347,670)

This paper has identified the set of 39,979 transcripts recently, and described transcript has constituted the antisense of the sequence of expressing and adopted transcript is arranged accordingly in the relevant hepatic tissue of hepatitis.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 20,000 listed among list of genes CC nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 25,000,30,000 or 35,000 listed among list of genes CC sequence.

The transcript of ill cerebral tissue

List of genes DD (SEQ ID NO:347,671 to SEQ ID NO:380,945)

This paper provides the set of 33,275 transcripts of expressing of previous evaluation in the relevant cerebral tissue of nerve degenerative diseases.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 15,000 listed among list of genes DD nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 20,000,25,000 or 30,000 listed among list of genes DD sequence.

List of genes EE (SEQ ID NO:380,946 to SEO ID NO:380,950)

This paper has identified 5 transcripts set that contain following sequence recently, and described sequence with at public's available is not contradicted by the EST library that the relevant cerebral tissue of nerve degenerative diseases prepares, or does not contradict with note gene among the Genbank.These genes are that this paper identifies recently.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 3 listed among list of genes EE nucleic acid molecule.

List of genes FF (SEQ ID NO:380,95J to SEQ ID NO:381,291)

This paper has identified by high-flux sequence that the set of 341 transcripts, these transcripts before be not reported in the relevant cerebral tissue of nerve degenerative diseases and has expressed.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 150 listed among list of genes FF nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 200 or 300 listed among list of genes FF sequences.

List of genes GG (SEQ ID NO:381,292 to SEQ ID NO:389,777)

This paper has identified the set of 8,486 transcripts recently, and transcript wherein exists with (montage) form of the remarkable change of the gene of previous note or ESTs and expresses in the relevant cerebral tissue of nerve degenerative diseases.

List of genes HH (SEQ ID NO:389,778 to SEQ ID NO:408,858)

This paper provides the set of 19,081 transcripts of expressing at the relevant cerebral tissue of nerve degenerative diseases of identifying recently.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 8,000 listed among list of genes HH nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 12,000,15,000,17,000 or 19,000 listed among list of genes DD sequence.

List of genes II (SEQ ID NO:408,859 to SEQ ID NO:430,703)

This paper has identified that 21,845 is the set of the transcript of single copy EST nucleotide sequence, and the note gene is expressed and before be not accredited as to these transcripts in the relevant cerebral tissue of nerve degenerative diseases.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 10,000 listed among list of genes II nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 12,000,15,000,17,000 or 20,000 listed among list of genes II sequence.

List of genes JJ (SEQ ID NO:430,704 to SEQ ID NO:483,996)

This paper has identified the set of 53,293 transcripts recently, and described transcript has constituted the antisense of the sequence of expressing and adopted transcript is arranged accordingly in the relevant cerebral tissue of nerve degenerative diseases.

Therefore, in one embodiment, provide the array that comprises the nucleic acid molecule that is complementary at least 30,000 listed among list of genes JJ nucleic acid molecule.In another embodiment, array comprises the nucleic acid molecule that is complementary at least 35,000,40,000,45,000 or 50,000 listed among list of genes JJ sequence.

Array

As mentioned above, transcript provided herein tabulation can be used for preparing illing tissue by the nucleic acid molecule that use is complementary to sequence provided herein and transcribes group pattern.Term " array " and " microarray " can use alternately at this paper.The class miniature array that is associated with computer chip represented in those skilled in the art's term after commonly used.As used herein, term " tissue-specific element " is meant the biomolecules of the transcript specificity element that is attached to ill target sample source on the array, and it comprises nucleic acid, polypeptide and antibody molecule.

List of genes A-H provides the transcript sequence that is associated with ill colorectum tissue.In one embodiment, provide the array that comprises at least one nucleic acid molecule, the ill colorectum of this nucleic acid molecule complementation in list of genes B, C, D, E, F, G, H or its combination organized transcript.In another embodiment, the array that comprises nucleic acid molecule is provided, the ill colorectum of this nucleic acid molecule complementation in list of genes B, C, D, E, F, G, H or its combination organized at least 70% of transcript, for example at least 80% or at least 90% nucleic acid molecule.In another embodiment, the array that comprises nucleic acid molecule is provided, the ill colorectum of this nucleic acid molecule complementation in each list of genes B, C, D, E, F, G and H organized at least 70% of transcript, for example at least 80% or at least 90% nucleic acid molecule.List of genes I-O provides the sequence of the transcript that is associated with the diseased lung tissue.In one embodiment, provide and comprised the array that the diseased lung that is complementary in list of genes J, K, L, M, N, O or its combination is organized the nucleic acid molecule of transcript.In another embodiment, provide and comprised at least 70% of the transcript that is complementary in list of genes J, K, L, M, N, O or its combination, for example array of at least 80% or at least 90% diseased lung tissue core acid molecule.In another embodiment, provide and comprised at least 70% of the transcript that is complementary among list of genes J, K, L, M, N, the O, for example array of 80% or 90% diseased lung tissue core acid molecule.

List of genes P-V provides the transcript that is associated with ill mammary tissue sequence.In one embodiment, provide the array that comprises the nucleic acid molecule that is complementary to the ill mammary tissue transcript in list of genes Q, R, S, T, U, V or its combination.In another embodiment, provide and comprised at least 70% of the ill mammary tissue transcript that is complementary in list of genes Q, R, S, T, U, V or its combination, for example array of at least 80% or at least 90% nucleic acid molecule.In another embodiment, provide and comprised at least 70% of the ill mammary tissue transcript that is complementary among list of genes Q, R, S, T, U, the V, for example array of at least 80% or at least 90% nucleic acid molecule.

List of genes W-CC provides the transcript that is associated with ill hepatic tissue sequence.In one embodiment, provide the array that comprises the nucleic acid molecule that is complementary to the ill hepatic tissue transcript in list of genes X, Y, Z, AA, BB, CC or its combination.In another embodiment, provide and comprised at least 70% of the ill hepatic tissue transcript that is complementary in list of genes X, Y, Z, AA, BB, CC or its combination, for example array of at least 80% or at least 90% nucleic acid molecule.In another embodiment, provide and comprised at least 70% of the ill hepatic tissue transcript that is complementary at least among list of genes X, Y, Z, AA, BB, the CC, for example array of at least 80% or at least 90% nucleic acid molecule.

List of genes DD-JJ provides the transcript that is associated with ill cerebral tissue sequence.In one embodiment, provide the array that comprises the nucleic acid molecule that is complementary to the ill cerebral tissue transcript in list of genes EE, FF, GG, HH, II, JJ or its combination.In another embodiment, provide and comprised at least 70% of the ill cerebral tissue transcript that is complementary in list of genes EE, FF, GG, HH, II, JJ or its combination, for example array of at least 80% or at least 90% nucleic acid molecule.In another embodiment, provide and comprised at least 70% of the ill cerebral tissue transcript that is complementary at least among list of genes EE, FF, GG, HH, II, the JJ, for example array of at least 80% or at least 90% nucleic acid molecule.

In another embodiment, provide the array of the nucleic acid molecule that comprises the list of genes A-H, the J-O that are complementary to from two or more different carcinoma tissues, the nucleotide sequence among the Q-V, with at polytype cancer.In another embodiment, provide and comprised at least 70% of the transcript that is complementary in list of genes A-H, J-O and Q-V or its combination, for example array of at least 80% or at least 90% nucleic acid molecule.

Preferably, the array in each embodiment described herein comprises the combination of the nucleic acid molecule that nucleic acid molecule that one or more this paper identify recently or this paper identifies recently.Comprise following combination, i.e. this combination contains the nucleic acid molecule of identifying recently for the disease of the type of specified disease, disease or wide region.

Express

For the expression of the nucleotide sequence that obtains proteins encoded, sequence to be incorporated in the carrier with one or more control sequences, control sequence is operably connected on the nucleic acid to control its expression.Carrier randomly comprises other sequence, insert expression of nucleic acids as promotor or enhanser to drive, comprise nucleotide sequence so that peptide with the form production of fusion rotein, and/or comprises the nucleic acid of secretion signal of encoding, so that the polypeptide that produces in the host cell is secreted from cell.

Aspect order of the present invention, provide the carrier that comprises isolating polynucleotide.

In another aspect of this invention, provide the host cell that comprises carrier.

Peptide obtains by the following method: the carrier transfection that will incorporate specific nucleic acid squences into is to host cell, and wherein the carrier in host cell has function; Cultivate host cell to produce peptide; With recovering peptide from host cell or surrounding medium.

Like this, the method for production polypeptide within the scope of the present invention.Method comprises the expression of polypeptides by the nucleic acid encoding molecule.This can obtain easily by cultivate host cell in carrier-containing substratum, under the felicity condition that can cause or allow expression of polypeptides.

Carrier and host cell

Can select or make up appropriate carriers to comprise suitable adjusting sequence, including but not limited to promoter sequence, terminator fragment, polyadenylation sequence, enhancer sequence, marker gene and other suitable sequence.

Carrier can be plasmid, virus, as suitable phage or phagemid.Detailed content referring to, as MOLECULAR CLONING:A LABORATORY MANUAL:2nd edition, Sambrooket al., 1989, Cold Spring Harbor Laboratory Press.Many known technology and experimental program are used to control nucleic acid, for example prepare nucleic acid construct, mutagenesis, order-checking, the cytotropic importing of DNA and genetic expression, and protein analysis, these are described in detail in CURRENT PROTOCOLS IN MOLECULARBIOLOGY, Ausubel et al eds., John Wiley ﹠amp; Sons, 1992.

The clone of the polypeptide in different host cells and expression system are known.Appropriate host cell comprises bacterium, eukaryotic cell such as mammalian cell and yeast, and rhabdovirus system.

Therefore, the present invention further provides the host cell that comprises heterologous nucleic acids disclosed herein.

The array preparation

Use the polynucleotide design and make up transcript array described herein.In one embodiment, arrange the nucleic acid elements preparation and singly transcribe group pattern, although array can comprise corresponding to a plurality of groups of transcribing under the situation of needs.Transcribe group and can comprise a plurality of illing tissues transcript from a disease or a plurality of diseases.The disease specific array is included in the transcript of transcribing in the given disease group.

For example, in colorectal carcinoma, transcribe in the cell type in a certain scope that these transcripts can be found in colorectum tumour cell microenvironment, and cell can comprise, as, stroma cell, epithelial cell, lymphocyte, endotheliocyte, stem cell etc.In another embodiment, secretion by matter interaction or differential protein, cell or cancerous tumor cell have changed the expression of transcript in its peripheral cell (as matrix, endothelium or the lymphocyte of finding in the tumor microenvironment) before the canceration, therefore and producing the transcript of colorectal carcinoma feature, this transcript is included on the disease specific array.And when utilizing the disease specific array as the instrument of the genetic marker of identification diagnosis, prognostic or predictability, actual mark can comprise the transcript that derives from some or all these individual cells groups.

Array provided herein can be used for any suitable purpose, as, but be not limited to diagnosis, prognosis, pharmacological agent, drug screening etc.For given array, each nucleic acid elements can be complete sequence or the sequence that splits into different lengths.All fragments that there is no need to form complete sequence are presented on the array.

In one embodiment, use nucleic acid known in the art to fix or on behalf of transcript and the segmental tissue specific nucleic acid element of transcript, combination technology will be fixed on the array in a plurality of physics independences site.Tight (discreet) part of together having formed whole transcript or transcript in the fragment in a plurality of physics independences site.Fragment can be complementary to the discontinuous part of the sequential portion or the transcript of transcript.The existence of target transcript in the sample is represented in segmental hybridization on the nucleic acid molecule of target sample and the array.Hybridize and hybridize by the detection method of this area routine and detect and be described in detail in hereinafter.

In one embodiment, use other nucleotide sequence in a plurality of probes difference target sequences and the illing tissue's sample.In some embodiments, at least 2% target sequence is presented on the array by the combination of probe.In another embodiment, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% target sequence is presented on the array.Selectively, when sequence was represented bigger sequence or transcript, list of genes A was presented on array on to the combination by probe of 60% the sequence of list of genes JJ at least.In further embodiment, list of genes A is at least 70%, at least 80% of list of genes JJ, or the combination of at least 90% transcript by probe is presented on the array.The existence of full transcript in the tissue sample is represented in nucleic acid fragment in the sample and those the segmental hybridization on the array.

In another embodiment, the nucleic acid elements corresponding with full transcript or full transcript fragment is only in a physics independence site, be fixed on the array with " array of spots " form.A plurality of copies of specific nucleic acid element can be attached to array substrate in site independently.Preferably, " array of spots " of the type comprises the new one or more nucleic acid molecule identified of this paper.

As indicated above, transcript specificity element or the corresponding one or more nucleic acid elements of its fragment that provide with list of genes A-JJ preferably are provided array.As indicated above, the array that is specific to certain disease such as particular cancers can be designed to comprise the group of transcribing with respect to all or predetermined percentage of specified disease.For example, in one embodiment, the nucleotide sequence of all or the subgroup selected can be provided in the list of genes (list of genes A-H) that is associated with specified disease such as colorectal carcinoma that provides above array.In another embodiment, array can comprise transcribes group, and this is transcribed group the nucleotide sequence of all in the list of genes (list of genes A-V) that provides above with general type disorders such as cancers is associated or the subgroup selected for example is provided.Also have, in other embodiment, array can comprise transcribes group, and this transcribes the nucleotide sequence that the subgroup all or that select in the list of genes (list of genes DD-JJ) that the list of genes (list of genes W-CC) that is associated as relevant hepatic tissue with hepatitis with particular type organ and disease that provides above or relevant cerebral tissue with nerve degenerative diseases be associated for example is provided group.

In other embodiment, array comprise with list of genes A-JJ at least 50% corresponding nucleic acid elements of the set transcript specificity element that provides.In other embodiment, array comprise with list of genes A-JJ in the set transcript specificity element that provides at least 60%, for example at least 70%, at least 80% or greater than 90% corresponding nucleic acid elements.Show the existence of target gene in the sample from the hybridization of corresponding nucleic acids element on the target transcript specificity element of illing tissue's sample and the array.Other nucleic acid elements or its fragment corresponding to other transcript that provides among the list of genes A-JJ can be placed in discrete physics independence site on the array.

It will be appreciated by those skilled in the art that nucleic acid elements on set array is complementary to the transcript specific sequence in the set objective sample.The array that comprises native sequences also can be designed to identify the existence of antisense molecule in the target sample.To endogenous sense-rna transcript is interested, because nearest document has related to the endogenous antisense in cancer and other disease.

In one embodiment, described array is the array of following nucleic acid elements, and promptly on behalf of ill colorectum tissue, this nucleic acid elements transcribe that group, diseased lung tissue are transcribed group or ill mammary tissue is transcribed group.In this array, preferably exist in more than 75%, 80%, 90%, 95% or 98% of transcript total amount that transcribing of ill colorectum, lung or mammary tissue transcribed in the group respectively.In some embodiments, remaining nucleic acid elements is the contrast element.

The array that is used for mensuration described herein provided herein is by suitable technique construction known in the art.Referring to for example U.S. Patent number 5,486,452; 5,830,645; 5,807,552; 5,800,992 and 5,445,934.In each array, independently nucleic acid elements can only show once and maybe can repeat.Array can randomly also comprise the contrast nucleic acid elements.

Can use any suitable matrix as nucleic acid elements fix with in conjunction with thereon solid mutually.For example matrix can be the filter of glass, plastics, metal, oil gidling matrix and any material.The surface of matrix can be any suitable structure.For example, the surface can be flat condition, or become carinate or the ditch shape so that the nucleic acid elements that is fixed on the matrix be separated.In selectable embodiment, nucleic acid adhesive is to microballon (bead), and it is respectively recognizable.Nucleic acid elements adheres on the matrix in any suitable manner so that they can be used for hybridization, comprises covalently or non-covalently combination.

In other embodiment, whether relevant with susceptibility or resistance to specified disease reagent according to the expression of transcript, the polynucleotide or the protein molecule of transcribing in the group can divide into groups on array.Be grouped in the zone that provides such on the array like this, i.e. the set of transcript shows whether the individuality with specific array collection of illustrative plates will or not react (for example referring to Fig. 1) to the particular therapeutic agent reaction.

Illing tissue's sample

Any suitable destination organization or cell can be as the illing tissue's samples in the method described herein.It will be appreciated by those skilled in the art that term " illing tissue's sample " comprises abnormal sample, suspects ill sample and as the analysis normal specimens of conventional screening test part.

Illing tissue's sample is preferred processed to obtain one or more transcript specificity elements, then it is combined with array to allow to hybridize and be attached to the detection of the transcript specificity element of array.Term used herein " transcript specificity element " comprises any suitable nucleic acid from the rna transcription thing in the sample, as DNA or RNA.From the nucleic acid of rna transcription thing can be the DNA that transcribes by the cDNA of mRNA reverse transcription, by this cDNA, by the DNA of this cDNA amplification, and the RNA that transcribes by the DNA of this amplification etc.When purpose changes for measuring gene copy number, preferably utilize genomic dna.Selectively, when detecting transcript (one or more) expression level, preferably use RNA or cDNA.For example, for quantitative expression, transcript specificity element can be the RNA molecule of transcribing of any kind, messenger RNA(mRNA) (mRNA) for example, selectable montage mRNA, ribosome-RNA(rRNA) (rRNA), transfer RNA (tRNA) and other does not translate into proteinic transcript on a large scale, as examine interior microRNA (snRNA), and antisense molecule, as siRNA and microRNA (microRNA).Transcript specificity element also can be the nucleic acid from RNA.

According to the purpose of method, those of ordinary skills will select suitable ill target cell and tissue.For example, in the method for identifying the transcript relevant, can use any known biological sample or cell or tissue that can show or express the pathological condition symptom with the particular pathologies situation.

Array described herein is used for identifying in cancer by difference inductive transcript.In this case, target cell can be a tumour cell, for example colon cancer cell or stomach cancer cell.Target cell derives from any tissue source, comprise the humans and animals tissue, such as but not limited to, the new sample that obtains, the tissue of freezing sample, biopsy samples, humoral sample, blood sample, preservation such as paraffin embedding fixed tissue sample (tissue block just), or cell culture.

For diagnosis, illing tissue's testing sample is preferably from the biological sample of suspecting ill individuality.Under the ideal state, this tissue sample corresponding to or be incorporated into array, wherein said array comprises the one or more integral parts of transcribing group fully from homologue.Term " integral part " is defined as approximately the whole group of transcribing greater than 50%, 75%, 80%, 90%, 95% or 98% at this paper.For example, be diagnosing, the transcript specificity element application of lung tissue sample source is in whole all of group or the array of integral part of transcribing that comprises the diseased lung tissue.

The group of transcript specificity element can be by any suitable separate nucleic acid known in the art or purification process available from ill destination organization or cell.For example, be used for the commercial obtainable test kit of separate nucleic acid, as from QIAGEN ^(Alameda, the QIAAMP that is used for DNA isolation CA) ^Organize test kit to be used for method described herein.In addition, the separation of nucleic acid and purification process are described in LABORATORYTECHNIQUES IN BIOCHEMISTRY AND MOLECULAR BIOLOGY:HYBRIDIZATION WITH NUCLEIC ACID PROBES, PART I.THEORY ANDNUCLEIC ACID PREPARATION, P.Tijssen, ed.Elsevier, the 3rd chapter of N.Y (1993).

Size per sample and isolating method, obtain transcript specificity element can with or different amplification use together.Suitable amplification method includes but not limited to polymerase chain reaction (PCR) (Innis, et al, PCR PROTOCOLS:A GUIDE TO METHODS AND APPLICATION, AcademicPress, Inc.San Diego, (1990)), ligase chain reaction (LCR) (LCR) is (referring to Wu and Wallace, Genomics, 4:560 (1989), Landegren, et al, Science, 241:1077 (1988) and Barringer, etal, Gene, 89:117 (1990)), transcription amplification (Kwoh, et al., Proc.Natl.Acad.Sci.USA, 86:1173 (1989)), and self-sustained sequence replication (Guatelli, et al., Proc.Nat.Acad.Sci.USA, 87:1874 (1990)).The detailed content that relates to quantitative PCR is provided at PCRPROTOCOLS:AGUIDETO METHODS AND APPLICATIONS, Innis et al., and Academic Press, Inc.N.Y is in (1990).

In certain embodiments, only needing to detect specific transcript specificity element exists or does not exist.Under these circumstances, the detection of hybridization signal shows the existence of transcript specificity element in the sample.In other embodiment, need the expression of the one or more transcript specificity elements in the quantitative sample.In this case, the hybridization signal of the concentration of transcript specificity element and detection is proportional in the sample.The technician can understand ratio needn't be accurately (for example transcription rate double to cause doubling of mRNA transcript and doubling of hybridization signal).More undemanding ratio, for example to have caused the situation of 5-15 times of difference of intensity for hybridization be acceptable to 10 of target mRNA concentration times of differences.When needs more accurate when quantitative,, suitable standard can be used for the variation of correcting sample preparation and hybridization-mediated.

Hybridization

In method provided herein, under the condition of the suitable rigorous degree of selecting, hybridize to array from the transcript specificity element of illing tissue's sample.The technician clearly knows the rigorous degree of variation hybridization conditions to select sample is more suitable for.For example, the damping fluid that adopts non-strict dcq buffer liquid (for example 6xSSPE 0.01%Tween-20) and strictness is (as 100mMMES, 0.1M[Na+], 0.01%Tween-20), those skilled in the art or those of ordinary skill can change washing time (general 0-20 time) separately, and flushing temperature (general 15-50 ℃) is to obtain optimal hybridization.The method of Utopian hybridization conditions be to those skilled in the art know (referring to LABORATORY TECHNIQUES IN BIOCHEMISTRYAND MOLECULAR BIOLOGY, Vol.24:Hybridization With Nucleic Acid Probes, P.Tijssen, ed.Elsevier, N.Y, (1993)).

In one embodiment, under low rigorous condition, hybridize, and by flushing continuously under the rigorous condition that progressively raises, obtaining required hybridization specificity level, thereby eliminate the duplex of wrong hybridization.By and the hybridization of gene specific element and and the hybridization of the contrast of various existence between relatively estimate the hybridization specificity.

Mark and detection

The transcript specificity element that hybridizes to the nucleic acid elements of array provided herein preferably detects by one or more labels that detection adheres to from the sample transcript specificity element of illing tissue's sample.

To any proper method of nucleic acid, label can be before hybridization, introduce in the hybridization or hybridization back by adhered labels known in the art.Appropriate means can comprise the primary transcript specificity element (as mRNA, polyA mRNA, cDNA etc.) that directly label joined sample or during the transcript specificity element amplification of sample or join amplification product, for example Nucleotide of the primer of applying marking or mark afterwards.

The mark that is suitable for method described herein is including but not limited to being used for painted and having vitamin H, the magnetic bead (for example Dynabeads) of the streptavidin binding substances of mark, fluorescence dye (, rhodamine red, green fluorescent protein and analogue thereof), radio isotope tracer (as 3H, 125I, 35S, 14C or 32P), enzyme as fluorescein, Texas (as horseradish peroxidase, alkaline phosphatase and be used for other enzyme of ELISA) and colorimetric mark, as Radioactive colloidal gold and stained glass and plastics (as polystyrene, polypropylene, latex etc.) pearl.

According to the selection of mark, those of skill in the art can select the method for suitable detection mark known in the art.For the method for the nucleic acid of the hybridization of describing labeling nucleic acid and certification mark in detail referring to LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULARBIOLOGY, Vol.24:Hybridization With Nucleic Acid Probes, P.Tijssen, ed.Elsevier, N.Y, (1993).

Protein arrays

In other embodiment, design and made up protein arrays.As used herein, term " albumen " and " polypeptide " can exchange.Tissue-specific element in these arrays comprises albumen, peptide, antibody, peptide nucleic acid(PNA) etc.The antibody that produces at ill peptide molecule of transcribing group coding can be fixed on the discrete site of array and be attached on the polypeptide that combines the detectable mark that is specific to antibody.Can contact with the array of mark from the isolated albumen of target sample, any labelled protein is displaced from fixed antibody all and can manifests by the disappearance of the discrete detectable mark of site of array.Albumen metathetical characteristics may be relevant to the reaction or the unreacted of specific therapy agent with the individuality of expressing array features on the array.

Perhaps, protein arrays can comprise ill peptide molecule of transcribing group coding.Peptide molecule can be attached on the discrete site of transcribing the histone array, and with separating from expressing ill antibody test of transcribing the individuality of group.

Antibody can be polyclonal or more preferably be mono-clonal.Can use complete antibody, or its fragment is (as Fab or F (ab ') ₂).When term " mark " relates to probe or antibody, it is intended to comprise the direct mark that forms on probe or the antibody by the detectable material of coupling (physical connection just), and the direct mark to probe or antibody that forms by the reaction with another reagent that directly has been labeled.The example of indirect labelling comprise use fluorescently-labeled secondary antibody to the detection of primary antibody and utilize vitamin H to the end mark of dna probe so that it can be with fluorescently-labeled streptavidin detection.Term " biological sample " means and comprises tissue, cell and separate biological fluid from the experimenter, and is present in the intravital tissue of experimenter, cell and liquid.Just, detection method can be used in the detection bodies and vitro detection biological sample RNA, protein, and genomic dna.For example, external, the technology that detects RNA comprises Northem hybridization and in situ hybridization.External, detect proteic technology and comprise enzyme linked immunosorbent assay (ELISAs), western blotting, immunoprecipitation and immunofluorescence.External, detect genomic technology and comprise DNA hybridization.And, in vivo, detect proteic technology and comprise the antibody that imports mark to the experimenter.For example, antibody can carry out mark with radioactive mark, and wherein the existence of mark and position can be detected by the standard imaging technique.

Test kit

This paper provides the transcript specificity element in the detection of cancerous diseased tissues sample to exist or to its quantitative test kit.For example, test kit can comprise the one or more arrays of transcribing group from one or more illing tissues.Molecule on the array can be that this paper describes polynucleotide, polypeptide or antibody molecule.Test kit randomly also comprise detectable mark or carried out mark compound or can the detection of biological sample in gene product expression medicament and be used for the mark sample and be poised for battle the reagent that the hybridization that lists complementary sequence exerts an influence.Test kit randomly also comprises the instrument of transcript amount in the test sample, as colorimetric scale and equipment.

Test kit can comprise more than one array, wherein each array corresponding to the tissue that tormented by various disease and wherein each array comprise a plurality of groups of transcribing corresponding to the tissue that tormented by a kind of disease.Compound and medicament can be packaged in the proper container.Test kit can further comprise the specification sheets that uses test kit to detect albumen or nucleic acid.

The using method of prediction medicine (predictive medicine)

This paper provides the array that uses foregoing description in the method for predicting pharmaceutical field.This field comprises diagnositc analysis, prognostic analysis, forecast analysis, pharmacogenomics and to the detection of the clinical trial of various disease.

Term " disease " and " morbid state " comprise and can cause or the potential disease that causes small molecules collection of illustrative plates, CC or the organoid change of the cell in the ill organism.This disease can be divided into three main classifications: neoplastic disease, inflammatory disease and degenerative disease.The example of disease is including but not limited to metabolic disease (obesity for example, emaciation, diabetes, apositia or the like), cardiovascular disorder (atherosclerosis for example, ischemia/reperfusion, hypertension, myocardial infarction, restenosis, myocardosis, arteritis or the like), immunologic derangement (for example chronic inflammation disease and disorder, as Crow engler (Crohn ' s) disease, inflammatory bowel, reactive arthritis, arthritis deformans, osteoarthritis, comprise lymphatic disease, insulin-dependent diabetes, organ specificity autoimmunization, comprise multiple sclerosis, struma lymphomatosa and Graves disease, contact dermatitis, psoriasis, transplant rejection, graft versus host disease, sarcoidosis, the irritated situation of heredity, as asthma and transformation reactions, comprise allergic rhinitis, stomach and intestine allergy, comprise food anaphylaxis, eosinophilia, conjunctivitis, glomerulonephritis, to some pathogenic agent susceptible such as helminth (for example leishmaniasis) and some virus infection, comprise hiv virus, and bacterial infection, comprise pulmonary tuberculosis and lepromatous leprosy or the like), myopathy (polymyositis for example, muscular dystrophy, central core disease, central nucleus (multinuclear myotube) myopathy, congenital myotonia, the cellulosic myopathy, congenital paramyotonia, periodic paralysis, mitochondrial myopathy or the like), neurological disorder (neuropathy for example, Alzheimer, Parkinson's disease, Huntington Chorea, amyotrophic lateral sclerosis, motor neuron, traumatic nerve injury, multiple sclerosis, acute disseminated encephalomyelitis, acute necrosis hemorrhagic leukoencephalitis, dysmyelination (dysmyelination) disease, mitochondriopathy, the migraine disorder, infectation of bacteria, fungi infestation, apoplexy, old and feeble, dull-witted, peripheral nervous disease and abalienation such as dysthymia disorders and schizophrenia or the like), oncobiology disorder (leukemia for example, the cancer of the brain, prostate cancer, liver cancer, ovarian cancer, cancer of the stomach, colorectal carcinoma, laryngocarcinoma, mammary cancer, skin carcinoma, melanoma, lung cancer, sarcoma, cervical cancer, carcinoma of testis, bladder cancer, the internal secretion cancer, carcinoma of endometrium, the esophageal carcinoma, neurospongioma, lymphoma, neuroblastoma, osteosarcoma, carcinoma of the pancreas, the hypophysis cancer, kidney etc.) and the disease of ophthalmology (for example retinitis glucagonoma and macular degeneration).This term also comprises disorder, and it is caused by known and unknown oxidative stress, hereditary cancer syndromes and metabolic trouble.

Generally, be used to predict following the carrying out of method of medicine: will combine with array described herein from the transcript specificity element that ill target cell or tissue or suspection have a cell or tissue of pathological condition, under the condition of the making nucleic acid molecular hybridization that allows transcript specificity element and array, pass through hatching of enough time then, detect hybridization then; The detection of hybridization shows the existence of illing tissue in the sample, or analyze pattern that transcript expresses and with reference the comparison of expressing from the transcript specificity element of reference sample, with the information about selection of diagnosis, prognosis, drug screening, resistance, treatment etc. of sampling, more detailed description is as follows.

Diagnositc analysis

The diagnositc analysis that utilizes array described herein is provided, to be used to measure the activity of albumen and/or expression of nucleic acid and biological sample (as blood, serum, cell, tissue), thereby determine the individual torment that whether is subjected to disease or illness, or have ill sign, or whether individuality has the risk that develops into the disease relevant with paraprotein, expression of nucleic acid or activity or develop into disease.Early diagnosis will be beneficial to treatment and increase successfully to treat prophylactically treats individuality before the symptom that also can make doctor even disease or illness begins.

Array described herein is can be used to identify the nucleic acid molecule of differential expression under the pathologic condition, as the ill-condition of colorectum tissue, lung tissue, mammary tissue, hepatic tissue or cerebral tissue.

Rna transcription thing in the detection of biological sample or the existence of gene prod whether exemplary method comprise the acquisition biological sample, it comprises the nucleic acid elements from experimenter to be measured, biological sample is contacted with compound that can detect protein or nucleic acid or reagent, hybridize to the existence of the transcript of array described herein in like this can the detection of biological sample.The reagent that detects RNA or genomic dna is preferably can hybridizing to from the RNA of sample or the nucleic acid probe of genomic dna of mark.Nucleic acid probe can be, for example, total length nucleic acid or its part are at least 11,15,30,50,100,250,500,1 as total length, 000 or the oligonucleotide of polynucleotide more, and under rigorous condition its fully specific hybridization to RNA or genomic dna.

Biological sample combines with array with transcript specificity element in the detection of biological sample.In one embodiment, biological sample comprises the protein molecular from experimenter to be measured.Selectively, biological sample comprises from experimenter's nucleic acid elements to be measured, as RNA molecule or genomic dna molecule.Preferred biological sample be biological liquid (as serum), cell sample or in a usual manner as needle biopsy from the isolating biopsy sample of experimenter.

Array can also be used for identifying the sudden change of the gene of the transcript generation that causes that illing tissue exists.Like this; the invention provides a kind of the evaluation and unusual rna expression or the active relevant disease or the method for illness; wherein testing sample is available from the experimenter; detect albumen or nucleic acid (as RNA, genomic dna) then, wherein the existence of albumen or nucleic acid can be diagnosed as the patient and has or be in and develop into and abnormal gene expression or the active relevant disease or the risk of illness.

Diagnositc analysis provide in a kind of evaluation sample with pathologic condition (as the cancerous symptom of commitment, and this symptom presymptomatic with can not be detected by any other method) the method for the relevant one or more transcript specificity elements of susceptibility, or identify the in esse method of ill state.If the reference pattern that sample crossing pattern or expression pattern and non-disease are expressed with reference to the transcript specificity element of sample is compared, the transcript specificity element of corresponding target cell is associated with showing with pathological condition with reference to the differential expression between the sample.Equally, if expression pattern and the reference pattern of expressing with reference to the transcript specificity element of sample from the disease of special pathological condition are relatively, crossing pattern or expression pattern and reference pattern conform to substantially then show in sample tissue or the cell pathological condition or to the existence of the susceptibility of pathological condition.

The reference pattern of preliminary assay can be made up of the expression pattern of the subgroup that runs through full array or nucleic acid molecule, for example is measured as the subgroup with particular associative relevant with special pathological condition.The new subgroup of such nucleic acid molecule can be used for the structure of the nucleic acid elements array that is associated with special pathological condition.New matrix-like like this becomes another aspect of the present invention.

Differential expression can be qualitative or quantitative.For example, the expression difference in the reference sample can be the rise expressed of one or more transcript specificity elements of the target cell of sample or the downward modulation of expression.Express with reference to the corresponding transcript (one or more) in the sample (or contrast) than non-disease, the differential expression that measures can be the growth that one or more transcripts are expressed, or than the expression of non-disease with reference to the corresponding transcript of sample (or contrast), the reduction of the increase of the expression of one or more transcripts and the expression of one or more other transcripts.Therefore, the expression pattern of being regulated can be indicated specific cell or tissue function.

In preferred embodiments, RNA kind relevant with pathological condition or gene recombination are to the nucleotide sequence that is complementary to one or more sequences from list of genes A-JJ.Pathological condition can be any morbid state, and for example pathological condition can be a cancer.Array described herein can be used to distinguish the subgroup of the type and the cancer relevant with set tissue of cancer (as chest (mammary gland), colorectum, lung etc.).

In one embodiment, if express greater than the respective element in the reference sample 0.1 times, 0.5 times, 1 times, 1.5 times, 2 times, 5 times, 10 times or higher, think that then being expressed as of transcript specificity element in the target cell raised or downward modulation.Certainly, when estimating this qualitative difference, use correction coefficient to measure expression level, for example based on the measured expression of reference nucleic acid elements, known its all expressed at target cell with in reference to sample.Can use any suitable non-disease with reference to sample (or contrast).For example, can be that the mean value that maybe can contain the expression values of gene element described in the described a plurality of cells that do not have related pathologies is expressed from tissue identical with the target cell source and/or organ and/or experimenter's cell with reference to sample.

As described herein, array described herein can be estimated the expression of transcribing group of significant proportion in the specific diseased tissue, and therefore can be used for the evaluation of the differential expression pattern of a large amount of gene elements relevant with the particular pathologies situation.

Measured the cognation of transcript or gene and pathological condition, the existence of this transcript, copy number or expression level can be used to diagnose the method to the susceptibility existence of this situation.This purposes has been represented another independent aspects of method described herein.

Prognostic analysis

This paper also provides prognostic analysis to be used for whether mensuration is individual will recover or recur after not having preliminary drug intervention or carrying out under preliminary drug intervention such as the surgery, and wherein said individual the diagnosis has and paraprotein, expression of nucleic acid or relevant disease or the illness of activity.

Prognostic analysis described herein can be used to measure the overall survival rate that does not have positive or negative after any treatment or the preliminary drug intervention, and then determines whether to carry out prognostic analysis and identify the most effective further treatment.For example, analysis can be used to measure the cocktail whether patient should only accept surgical intervention or can use pharmaceutical agent, biological reagent or therapeutical agent to combine before or after surgical intervention.These reagent be particularly useful for prognostic poor and its do not have will be no longer under treatment and the drug intervention from the individuality of disease or illness rehabilitation.In such embodiment, the hybridization that transcript specificity element and disease are transcribed group pattern shows and does not undergo surgery after operation or chemotherapy or disease or the development or chemotherapy just has the possibility of recurrence.

In preferred prognostic analysis, array is used for the crossing pattern from sample is compared with the crossing pattern from known illing tissue, wherein, known illing tissue is to specific treatment negative response or active responding, thereby illing tissue experiences or does not experience the recurrence of disease, the recurrence after alleviating as cancer.

Forecast analysis

Provide forecast analysis to be used to select the individual disease of suitable particularly treatment influence or treatment of conditions agent or prevention reagent.Treatment reagent is including but not limited to micromolecular compound, agonist, antagonist, protein (comprising peptide and antibody or antibody fragment), plan peptide class, nucleic acid, gene therapy vector, radiotherapy, chemotherapy, and other candidate therapeutic reagent.

Then the information that obtains is used to measure the reaction of disease association tissue to drug treatment.These methods comprise to be measured behind patient's tumor resection the reaction of particular treatment, and after the tumour diffusion recurrence, tumour is to the reaction of radiotherapy, postoperation radiotherapy or chemotherapy.

Be used to regulate the active candidate agent except screening, array described herein can be as measuring the mode of action of reagent as treatment reagent.

Pharmacogenomics is analyzed

Array described herein also is used to detect albumen, expression of nucleic acid or the activity that the genotype by individuality causes, measuring individual response capacity to medicaments, thus select to be specific to the suitable treatment of this individuality or prevention reagent (as medicine) (this paper middle finger " Pharmacogenomics ").

In this ability, array described herein can be used for prognosis or forecast analysis to identify that the patient is to reactivity and resistance based on the particular medication of genetic map.In this analysis, the patient is relevant with the crossing pattern from the transcript specificity element of these patient illing tissue samples to the historical data of pharmacological agent reaction.Then this information is used to measure the reaction of following patient to the same medicine treatment.These methods comprise to be measured the prognosis after patient tumors excision back, the tumour diffusion recurrence and measures the reaction of tumour to radiotherapy, postoperation radiotherapy or chemotherapy.

The exemplary therapeutical agent treatment of transcribing group analysis provided herein of will using includes but not limited to arthritis drug treatment, chemotherapeutics, treatment antibody, treatment albumen or peptide, treatment nucleic acid, antipsychotics, thymoleptic, antasthmatic, antiviral drug and antibacterium medicine, antihypertensive drug, cholesterol lowering drug and antifungal drug.Array can also be used to identify the offensiveness and the tumor recurrence evaluation by stages of progression of disease, disease.

Array provided herein can also be used to measure the degree of individual adverse effect to the special treatment agent, with the dosage of accurate titration treatment time and still less unfavorable drug reaction is provided.Different polymorphisms can cause metabolism increase or that reduce of special treatment agent.If conventional degrading enzymatic activity is owing to polymorphism is lowered, standard dose may cause more disadvantageous than usual reaction so.Aspect the effectiveness and toxicity of many medicines, the genetic polymorphism in drug metabolism enzyme, transporter, acceptor and the other medicines target is relevant with individual difference.For example, thiopurine methyltransferase (TPMT) causes change (McLeod and Yu, 2003, the Cancer Invest.21 (4): 630-40) of degraded of the reagent Ismipur of general description.This genetic mutation has significant clinical implication because have the homotype sudden change in the relevant TPMT gene of functional type patient experience extreme or fatal toxicity behind the administration routine dose 6-MP.In this embodiment, the expression pattern of sample is compared with the reference pattern of expressing from the transcript of reference sample, when occur expression pattern substantially corresponding to the prediction reference pattern in one or more the time, show that then individuality may experience the adverse effect of treatment.

In preferred embodiments, comprise also the target cell that contacts or the control sample of tissue and also combine, to be used for comparison with array with therapeutical agent.

Array described herein can also be used to detect clinical trial new or existing treatment.Particularly, array is used for the patient that preliminary election has the patient group of pathological condition, or prescreen has the patient of pathological condition, the patient of preliminary election or screening is used the test of cure agent of carrying out clinical trial or other therapeutical agent with the treatment pathological condition, thereby the patient produces optimum reaction to medicine. Drug discovery and researching and analysing

Array provided herein can be used for drug discovery and research method.For example, array can be used to measure one or more transcript/gene pairs test of cure agent of transcribing group, new synthetic compound and the reaction of other reagent interested.Reagent can be that known to have therepic use maybe can be the candidate therapeutic agent of newly developing.

Like this, the array described herein candidate agent that can be used to screen or regulate target cell or function of organization in a large number.Consistent with method, the crossing pattern of the sample of handling with candidate therapeutic agent that will be on one or more arrays described herein contrasts with the crossing pattern of untreated control sample.The interelement difference of transcript specificity of handling the hybridization of sample and control sample shows that candidate's medicament is in the ability of regulating target cell or function of organization.

Composition provided herein and method will be described in greater detail in the specific embodiment.The following examples are explanatory purpose, have no intention to limit by any way or define the present invention.

Embodiment

Embodiment 1: colorectal carcinoma is transcribed the initial list of group sequence

Should obtain initial colorectal carcinoma in the following method and transcribe the group pattern sequence, it is disclosed in European patent application EP 04105479.2, EP 04105482.6, EP 04105483.4, EP 04105484.2, in EP04105507.0 and EP04105485.9 and U.S. Provisional Patent Application 60/662,276 and 60/700,293.

Material and method

The public data screening

All disclosed expressed sequence tag (ESTs) that obtain from all download databases revert to the FASTA form, and all 921 databases are connected to the unique sequence file that contains 272,686 single EST.Use these EST of combined sorting of the certain filter among the Paracel Filtering Package (PFP) (can in network address www.paracel.com, obtain) not enter the set program then to guarantee undesirable sequential element.Select to be provided with to cover up low complex area, carrier sequence and tumor-necrosis factor glycoproteins.Filter out the sequence that comprises the sequence, Mitochondrial DNA and the ribosome-RNA(rRNA) that pollute the E.coli sequence.After these screening steps, the inferior quality stub area that preceding stage covers up and anyly mainly comprise low complicated repeating sequences and remove with " trimjunk " algorithm (Paracel Filtering Package).At last, comprising the sequence screening that is less than 100 good bases comes out.

Data screening

The screening of EST is at " Phred " output file but not carry out on the original FASTA sequential file." Phred " file comprises the quality information of sequence, just washes for each base to be called statistical significance.Also allow to use other the filtering algorithm that is known as " qualclean ".Qualclean has left out the inferior quality sequence from the beginning and the ending of sequential file.Listed those of other used filter algorithm and public data are identical.

Data family

The set of disclosed and inner data is to carry out for 50 times by using Paracel software " Paracel TranscriptAssembler (PTA) " (seeing network address www.paracel.com) at a bunch threshold values (clustering threshold).These sequences of (contig) of gathering together are carried out BLAST at Genbank NT database, to carry out note and to identify the direction of sequence.Be accredited as to compare with listed those among the Genbank at contig and be under the rightabout situation, data are reversed replenishes and being included in the last data set of both direction.

The result

The colorectum source sequence rallies in the public data storehouse

In order to identify the sequence that can express in the colorectum tissue, (see network address: oncogene group analysis project (CGAP) access check cgap.nci.nih.gov) has come from the sequence information of colorectum tissue, colorectum cancerous tissue or colorectum source cell system in the healthy association of American National network address.921 all est database tabulations have been identified with CGAP.Then from UniGene database retrieval storehouse itself.With check and correction information in the single database, totally 272,686 independent sequences have been produced.Produce totally 18,721 contigs and 41,023 single copy EST (singlet) with Paracel transcript Combination tool set independent sequence then.Contig with 18,721 contigs and following listed order-checking project generation compares then.This has relatively shown at final amt is the contig in 16,350 open sources, has only limited redundancy.

The evaluation of the sequence that new colorectum is expressed

For the transcript of identifying that other can be expressed in the colorectum tissue, no matter be normal or virulent, produce the cDNA library from RNA storehouse from 80 normal and pernicious colorectum tumor tissues.Enter cloning vector with the RNA reverse transcription and by the direction clone.Then the library is transformed into bacterium and the single clone of dull and stereotyped cultivation generation.Select totally 50,000 clones and order-checking to determine their identity.Gather 50,000 clones then and produce 10,396 identical sequences altogether, the combination unique sequence obtains 4,129 contigs and 6,267 single copy EST.Then to 4,129 contigs and 6, the database that the sequence information in 267 single copy EST sources comprises Genebank with respect to public's available carries out BLAST once more identifying new sequence fully, and carries out BLAST once more with respect to the database that all public's available colon cloned tissue libraries produce and before also be not reported in the sequence of expressing in the colorectal carcinoma to identify.This analysis has identified that altogether 2,773 were not before also reported the new sequence as note gene or EST in gene pool.

Embodiment 2: the further evaluation of colorectal carcinoma sequence

Other colorectum sequence information identifies that by the detection on the microarray that contains public's information available other transcript of expressing in the colorectum tissue obtains.These sequences have been replenished the initial group pattern sequence information of transcribing, and provide more complete representative colorectal carcinoma to transcribe the array of group.

Method

40 of marks are from the RNA of colorectum tissue (27 tumour and 13 normal), and hybridize on the microarray that contains disclosed available information.Obtain transcript tabulation from these arrays, be used for those at least one of array exist and background technology in the target (just identifying the transcript of at least one colorectum sample, expressing) described.

Use on the chip GI or and the initial work of probe groups correlation number shown some difference between the complete sequence of target sequence and note target.Therefore, determine to use the open sequence library of actual sequence check of target to retrieve these sequences from public data, to proofread and correct these sequences of public database, described public database is represented target most, and this target is empirically determined to be expressed in the colorectum tissue by the array test.

From complete sequence, extract these sequences then and it is carried out BLAST with respect to interim patent sequence list (just those transcripts of identifying are gathered in order-checking and public data storehouse internally).Thereby the tabulation of 21,909 transcripts of having derived, the transcript in this tabulation does not occur in the sequence list in the U.S. Provisional Patent Application 60/662,276.

Whole tabulations of this sequence are carried out BLAST to disclosed est database (dbEST) using under the high rigorous condition (covering 90% target).Proofread and correct those sequences that those and dbEST are misfitted then from the public data storehouse.Successfully retrieve the set of 16,377 sequences by this method.

6,635 sequences of remainder are carried out BLAST to the RefSeq database.1,663 set has produced very strong inconsistent with RefSeq in the target.And these sequences are proofreaied and correct from the public data storehouse.

From 4,972 targets of remainder, extracted GI quantity, they are used for retrieving correlated series from the public data storehouse.

These three sequence list connect into a single file also with inner copy sequential detection software retrieval.It has produced the final tabulation of 22,376 no tumor-necrosis factor glycoproteinss.

Embodiment 3: colorectal carcinoma is transcribed the antisense sequences of group

Along with the increase of the scientific exchange interest of endogenous sense-rna transcript effect, the colorectal carcinoma database is checked the existence of antisense transcript.

Method

After the set, inner and disclosed data contig carries out BLAST with purpose that reaches note and the direction of identifying sequence to Genbank NT database.Be accredited as to compare with listed those among the Genbank at contig and be under the rightabout situation, data are reversed replenishes and being included in the last data set of both direction.Like this, combine antisense and have adopted transcript to form the gene order (list of genes H) of 5,672 transcripts accordingly.

Embodiment 4: lung cancer is transcribed the tabulation of group sequence

The list of genes I that is used to derive is similar to those methods of the colorectal carcinoma sequence that is used to derive to the method for the described lung cancer transcript of list of genes O array sequence.

These 55,626 lung cancer sequences are results of the inside set in disclosed available lung EST library.They are previous unique sets that show the data relevant with lung cancer.The note gene is expressed and before be not accredited as to a part in these sequences in lung cancer.

The result

Lung in the public data storehouse comes rallying of source sequence

In order to identify the sequence that in lung tissue, to express, derived from the sequence information of lung tissue, lung tumor tissue and lung tumor source cell system from the CGAP access check.Use the CGAP inlet to identify total tabulation in 301 EST libraries.Then from UniGene database retrieval library itself.Check and correction information produces whole 471,630 independent sequences in single database.Use Paracel transcript Combination tool set independent sequence to produce 36,431 contigs and 19,195 single copy EST altogether then.

The evaluation of new lung expressed sequence

In order to identify other the transcript in the lung tissue of may being expressed in, no matter be normal or virulent, origin comes from and surpasses 80 RNA storehouses normal and the malign lung tumor tissues and set up the cDNA library.Be cloned into cloning vector with the RNA reverse transcription and by direction.Then the library is transformed into bacterium and cultivates and produce independently clone.Select 4,032 clones and order-checking to determine their identity altogether.Screening and cloning produces totally 3,450 unique sequences then, and 602 contigs and 1,589 single copy EST are concentrated and provided to these sequences.The sequence information in contig and single copy EST source carries out once more BLAST identifying new sequence fully with respect to the available database that comprises gene pool of the public then, and carries out once more BLAST with respect to the database that the available lung tissue of all public library produces and before also be not reported in the sequence of expressing in the lung cancer to identify.24 new sequences of before also not reporting as note gene or EST have been identified in this analysis altogether in gene pool.

Embodiment 5: mammary cancer is transcribed the tabulation of group sequence

The list of genes P that is used to derive is similar to those methods that are used to obtain colorectal carcinoma sequence and lung cancer sequence to the method for the mammary cancer transcript array sequence described in the list of genes V.

These 87,059 mammary cancer sequences are results of the inside set in disclosed available mammary gland EST library.They are previous unique sets that show with breast cancer related data.The note gene is expressed and before be not accredited as to the part of these sequences in mammary cancer.

The result

Mammary gland in the public data storehouse comes rallying of source sequence

In order to identify the sequence that in mammary tissue, to express, derived from the sequence information of mammary tissue, breast tumor tissues and breast tumor source cell system from the CGAP access check.Use the CGAP inlet to identify total tabulation in 1,130 EST library.Then from UniGene database retrieval library itself.Check and correction information produces totally 288,854 independent sequences in single database.Use Paracel transcript Combination tool set independent sequence to produce 17,2911 contigs and 24,178 single copy EST altogether then.

The evaluation of new mammary gland expressed sequence

In order to identify other the transcript in the mammary tissue of may being expressed in, no matter be normal or virulent, origin comes from and surpasses 120 RNA storehouses normal and the malignant breast tumor tissue and set up the cDNA library.Be cloned into cloning vector with the RNA reverse transcription and by direction.Then the library is transformed into bacterium and cultivates and produce independently clone.Select 157,260 clones and order-checking to determine their identity altogether.Screening and cloning produces totally 127,306 unique sequences then, and 14,489 contigs and 24,308 single copy EST are concentrated and provided to these sequences.The sequence information in contig and single copy EST source carries out once more BLAST identifying new sequence fully with respect to the database of the available Genebank of comprising of the public then, and carries out once more BLAST with respect to the database that the available mammary tissue of all public library produces and before also be not reported in the sequence of expressing in the mammary cancer to identify.3,278 new sequences that before also be not reported among the Genebank as note gene or EST have been identified in this analysis altogether in gene pool.

Embodiment 6: the tabulation of transcribing the group sequence in the hepatic tissue source relevant with hepatitis

The list of genes W that is used to derive is similar to those methods of be used to derive colorectal carcinoma sequence and lung cancer sequence to the method for the transcript array sequence of the relevant hepatic tissue of the described hepatitis of list of genes CC.

The result of the inside set that these 86,122 ill hepatic tissue sequences are disclosed available liver EST libraries.They are the unique sets that before demonstrated the data that relate to the hepatic tissue relevant with hepatitis.The note gene is expressed and before be not accredited as to the part of these sequences in the relevant hepatic tissue of hepatitis.

The result

Rallying of ill liver sequence in the public data storehouse

In order to identify the sequence that in the relevant hepatic tissue of hepatitis, to express, derived from the sequence information of the clone that hepatic tissue, the hepatic tissue hepatic tissue relevant with hepatitis that hepatitis is relevant originate from the CGAP access check.Use the CGAP inlet to identify total tabulation in 63 EST libraries.Then from UniGene database retrieval library itself.Check and correction information produces whole 326,079 independent sequences in single database.Use Paracel transcript Combination tool set independent sequence to produce 24,744 contigs and 37,503 single copy EST altogether then.Then contig and the contig that produces from following order-checking project are compared, this project has provided the contig in final 24,744 open sources.

The new Sequence Identification that the hepatic tissue relevant with hepatitis expressed

In order to identify other the transcript in the hepatic tissue relevant with hepatitis of may being expressed in, origin comes from the RNA storehouse that surpasses 40 normal and ill hepatic tissue samples and has set up the cDNA library.Be cloned into carrier with the RNA reverse transcription and by direction.Then the library is transformed into bacterium and cultivates and produce independently clone.Select 4,944 clones and order-checking to determine their identity altogether.Screening and cloning produces totally 2,869 unique sequences then, and 45 contigs and 2,300 single copy EST are concentrated and provided to these sequences.The sequence information in contig and single copy EST source carries out BLAST once more with respect to public's available database (comprising NCBI RefSeq set) then, identifying new sequence fully, and carry out once more BLAST with respect to the database that the available hepatic tissue of all public library produces and before also be not reported in the sequence of expressing in the relevant hepatic tissue of hepatitis to identify.13 new sequences of before also not reporting as note gene or EST have been identified in this analysis altogether in Genebank.

Embodiment 7: with nerveDegeneration The tabulation of transcribing the group sequence in relevant cerebral tissue source

The list of genes DD that is used to derive is similar to those methods of derive colorectal carcinoma sequence and lung cancer sequence to the method for the transcript array sequence of the retrograde cerebral tissue of neurocyte described in the list of genes JJ.

The result of the inside set that these 136,326 ill cerebral tissue sequences are disclosed available brain EST libraries.They are previous unique data acquisitions that show the cerebral tissue that relates to nervus retrogression.The note gene is expressed and before be not accredited as to the part of these sequences in the relevant cerebral tissue of nervus retrogression.

The result

Rallying of ill cerebral tissue sequence in the public data storehouse

In order to identify the sequence that in the relevant cerebral tissue of nervus retrogression, to express, derived from the sequence information of the clone that cerebral tissue, the cerebral tissue cerebral tissue relevant with nervus retrogression that nervus retrogression is relevant originate from the CGAP access check.Use public database to identify total tabulation in 674 EST libraries.Then from UniGene database retrieval library itself.Check and correction information produces totally 656,559 independent sequences in single database.Use Paracel transcript Combination tool set independent sequence to produce 33,275 contigs and 65,022 single copy EST altogether then.

The new Sequence Identification that the cerebral tissue relevant with nervus retrogression expressed

In order to identify other the transcript in the cerebral tissue relevant with nervus retrogression of may being expressed in, origin comes from the RNA storehouse that surpasses 20 normal and ill brain tissue samples and has set up the eDNA library.Be cloned into carrier with the RNA reverse transcription and by direction.Then the library is transformed into independently clone of bacterium and dull and stereotyped cultivation generation.Select 7,200 clones and order-checking to determine their identity altogether.Screening and cloning produces totally 3,115 sequences then, and these sequences are provided 346 contigs and 1,671 single copy EST by set.The sequence information in contig and single copy EST source carries out BLAST identifying new sequence fully once more with respect to public's available database (comprising NCBI RefSeq set) then, and carries out once more BLAST with respect to the database that the available cerebral tissue of all public library produces and before also be not reported in the sequence of expressing in the relevant cerebral tissue of nervus retrogression to identify.5 new sequences of before also not reporting as note gene or EST have been identified in this analysis altogether in Genebank.

Embodiment 8: from the comparison of colorectum, prostate gland and breast tumor sequence

Sequence from colorectum tumour, tumor of prostate and breast tumor compares with the conventional sequence of expressing.Fig. 2 provides the BLAST comparative graph of the open available sequences of representing all colons, prostate gland and mammary tissue.This is the comparison of all sequences after the set of open available sequences, obtains as described above.The boundary E value (cut offE-value) that is used to carry out the parameter of these sequences of BLAST is 0.1, and per-cent identity is 90%.The standard cut off value is checked and visual (visualization) from the manual of thousands of independent BLAST results.When allowing there is suitable specified difference constantly between sequence, the finding that satisfies these standards can clearly be categorized as " identical " finding.Failing standard compliant finding just is considered to not meet the requirement of array purpose of design.For each result, two values have been provided." zero homology " result shown with respect to the quantity that is not had the sequence of homology by the database of BLAST.Second value defined be not for " meeting (no hit) " and in this case, and chain to be looked into has and is less than 50% percentage range, that is to say that sequence to be looked into has to be less than target sequence and to represent 50% of length.

Zero homology sequence is the subgroup of " not meeting (no-hit) " sequence.The number that the number of total sequence deducts " not meeting " sequence obtains the number of common sequences between two colonies.All documents of mentioning in this specification sheets are incorporated this paper into as a reference.

The various modifications and the variant of the embodiment that the present invention describes all do not leave scope and spirit of the present invention to those skilled in the art.Though the present invention describes in conjunction with specific preferred embodiment, it should be appreciated that claim should not be subject to so specific embodiment.Really, the tangible for those skilled in the art various modifications of implementing mode of the present invention are covered by the present invention.

Claims

1. comprise the array that illing tissue transcribes group.

2. array as claimed in claim 1, wherein said illing tissue comprise suffer from tumor disease, the tissue of inflammatory disease or degenerative disease.

3. array as claimed in claim 1, wherein said illing tissue comprise suffer from colorectal carcinoma, the tissue of lung cancer or mammary cancer.

4. as each described array of claim 1-3, the wherein said group of transcribing comprises one or more tissue-specific elements, each expression is from the transcript of ill colorectum organization order, and each described transcript is independently selected from the transcript described in list of genes B, list of genes C, list of genes D, list of genes E, list of genes F, list of genes G or the list of genes H.

5. as each described array of claim 1-3, the wherein said group of transcribing comprises one or more tissue-specific elements, each expression is from the transcript of diseased lung organization order, and each described transcript is independently selected from the transcript described in list of genes J, list of genes K, list of genes L, list of genes M, list of genes N or the list of genes O.

6. as each described array of claim 1-3, the wherein said group of transcribing comprises one or more tissue-specific elements, each expression is from the transcript of ill mammary tissue sequence, and each described transcript is independently selected from the transcript described in list of genes Q, list of genes R, list of genes S, list of genes T, list of genes U or the list of genes V.

7. as each described array of claim 1-3, the wherein said group of transcribing comprises one or more tissue-specific elements, each expression is from the transcript of ill hepatic tissue sequence, and each described transcript is independently selected from the transcript described in list of genes X, list of genes Y, list of genes Z, list of genes AA, list of genes BB or the list of genes CC.

8. as each described array of claim 1-3, the wherein said group of transcribing comprises one or more tissue-specific elements, each expression is from the transcript of ill cerebral tissue sequence, and each described transcript is independently selected from the transcript described in list of genes EE, list of genes FF, list of genes GG, list of genes HH, list of genes II or the list of genes JJ.

9. as each described array of claim 1-3, the wherein said group of transcribing comprises one or more tissue-specific elements, each expression is from the transcript of cancerous tissue sequence, and each described transcript is independently selected from the transcript described in list of genes B, C, D, E, F, G, H, J, K, L, M, N, O, Q, R, S, T, U or the V.

10. as each described array of claim 4-9, wherein saidly transcribe the tissue-specific element that group comprises the transcript at least one described list of genes of expression of 70%.

11. array according to claim 10 is wherein saidly transcribed the tissue-specific element that group comprises the transcript in each described list of genes of expression of 70%.

12. according to each described array of claim 4-11, the described tissue-specific element that wherein represents described transcript is the nucleic acid molecule with the sequence that is complementary to described transcript.

13. according to each described array of claim 4-11, the described tissue-specific element that wherein represents described transcript is described transcript encoded polypeptides.

14. according to each described array of claim 4-11, the described tissue-specific element that wherein represents described transcript is the antibody that is specific to described transcript encoded polypeptides.

15. the described array of aforementioned each claim, the wherein said group of transcribing comprises the coding that derives from illing tissue and the nucleic acid molecule of non-encoding transcription thing.

16. the purposes of the described array of aforementioned each claim in the method for diagnosis patient pathological condition, it comprises:

A) will contact with array from the transcript specificity element of patient's biological sample; With

B) detect combining of transcript specificity element and array;

Wherein bonded detects the diagnosis of indication pathological condition.

17. the described array of aforementioned each claim detect whether the patient had disease or illness by diagnosis will recover after preliminary medical intervention or the method for recurrence in purposes.

18. to the purposes in reactive method of the therapeutical agent of treatment pathological condition, it comprises the described array of aforementioned each claim patient that detection suffers pathological condition:

B) detection arrays and transcript specificity element combines;

Wherein bonded detects the reactivity of indication patient's pathological condition to the treatment of therapeutical agent.

19. according to claim 16,17 or 18 described purposes, when being subordinated to claim 12, wherein, in step b, it is the detection of hybridization that bonded detects.