[go: up one dir, main page]

EP1222258A2 - Molecules for disease detection and treatment - Google Patents

Molecules for disease detection and treatment

Info

Publication number
EP1222258A2
EP1222258A2 EP00965340A EP00965340A EP1222258A2 EP 1222258 A2 EP1222258 A2 EP 1222258A2 EP 00965340 A EP00965340 A EP 00965340A EP 00965340 A EP00965340 A EP 00965340A EP 1222258 A2 EP1222258 A2 EP 1222258A2
Authority
EP
European Patent Office
Prior art keywords
dec
polynucleotide
sequence
mddt
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00965340A
Other languages
German (de)
French (fr)
Inventor
David M. Hodgson
Stephen E. Lincoln
Frank D. Russo
Peter A. Spiro
Steven C. Banville
R. Bratcher; Shawn
E. Dufour; Gerard
Howard J. Cohen
Bruce H: Rosen
Purvi Shah
Michael S. Chalup
Jennifer L. Hillman
L. Jones; Anissa
Jimmy Y. Yu
Lila B. Greenawalt
Scott R. Panzer
Ann M. Roseberry
J. Wright; Rachel
Wensheng Chen
Tommy F. Liu
Pierre E. Yap
Theresa K. Stockdreher
Stefan Amshey
Willy T. Fong
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Incyte Corp
Original Assignee
Incyte Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics Inc filed Critical Incyte Genomics Inc
Publication of EP1222258A2 publication Critical patent/EP1222258A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2217/00Genetically modified animals
    • A01K2217/05Animals comprising random inserted nucleic acids (transgenic)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to molecules for disease detection and treatment and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of molecules for disease detection and treatment.
  • the human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders. The identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment.
  • cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body.
  • a wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis.
  • Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation.
  • Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer.
  • Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis).
  • Oncoproteins, encoded by oncogenes can affect cell proliferation in a variety of ways and include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins.
  • tumor-suppressor genes are involved in inhibiting cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in aberrant cell proliferation and cancer. Thus a wide variety of genes and their products have been found that are associated with cell proliferative disorders such as cancer, but many more may exist that are yet to be discovered.
  • DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity. DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global fashion.
  • a genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes.
  • the interactions may be expected, such as when the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes.
  • the present invention relates to human disease detection and treatment molecule polynucleotides (mddt) as presented in the Sequence Listing.
  • mddt human disease detection and treatment molecule polynucleotides
  • the invention provides an isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d).
  • the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25. In another alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-
  • the invention further provides a composition for the detection of expression of disease detection and treatment molecule polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d); and a detectable label.
  • the invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d).
  • the method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof.
  • the probe comprises at least 30 contiguous nucleotides.
  • the probe comprises at least 60 contiguous nucleotides.
  • the invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO:l- 25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 -25 ; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d).
  • the invention provides a cell transformed with the recombinant polynucleotide.
  • the invention provides a transgenic organism comprising the recombinant polynucleotide.
  • the invention provides a method for producing a disease detection and treatment molecule polypeptide, the method comprising a) culturing a cell under conditions suitable for expression of the disease detection and treatment molecule polypeptide, wherein said cell is transformed with the recombinant polynucleotide, and b) recovering the disease detection and treatment molecule polypeptide so expressed.
  • the invention also provides a purified disease detection and treatment molecule polypeptide (MDDT) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25. Additionally, the invention provides an isolated antibody which specifically binds to the disease detection and treatment molecule polypeptide.
  • MDDT disease detection and treatment molecule polypeptide
  • the invention further provides a method of identifying a test compound which specifically binds to the disease detection and treatment molecule polypeptide, the method comprising the steps of a) providing a test compound; b) combining the disease detection and treatment molecule polypeptide with the test compound for a sufficient time and under suitable conditions for binding; and c) detecting binding of the disease detection and treatment molecule polypeptide to the test compound, thereby identifying the test compound which specifically binds the disease detection and treatment molecule polypeptide.
  • the invention further provides a microarray wherein at least one element of the microarray is an isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d).
  • the invention also provides a method for generating a transcript image of a sample which contains polynucleotides.
  • the method comprises a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
  • the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d).
  • the method comprises a) exposing a sample comprising the target polynucleotide to a compound, and b) detecting altered expression of the target polynucleotide.
  • the invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide comprising a polynucleotide sequence selected from the group consisting of i) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-
  • RNA equivalent of i)-iv an RNA equivalent of i)-iv.
  • Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence selected from the group consisting of i) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; ii) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; iii) a polynucleotide sequence complementary to i), iv) a polynucleotide sequence complementary to ii), and v) an RNA equivalent of i)-iv), and alternatively, the target polynucleotide comprises a fragment of a polynucleotide sequence selected from the group consisting of i-v above; c) quantifying the amount of hybridization complex; and d)
  • Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits.
  • Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated “start” and “stop” nucleotide positions.
  • SEQ ID NO:s sequence identification numbers
  • template IDs template identification numbers
  • Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated “start” and “stop” nucleotide positions.
  • the reading frames of the polynucleotide segments are shown, and the polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated.
  • SP signal peptide
  • TM transmembrane
  • Table 4A and Table 4B show the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with component sequence identification numbers (component IDs) corresponding to each template.
  • the component sequences, which were used to assemble the template sequences, are defined by the indicated “start” and "stop” nucleotide positions along each template.
  • Table 5 shows the tissue distribution profiles for the templates of the invention.
  • Table 6 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention.
  • the first column of Table 6 lists analytical tools, programs, and algorithms, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
  • mddt refers to a nucleic acid sequence
  • MDDT refers to an amino acid sequence encoded by mddt
  • a “full-length” mddt refers to a nucleic acid sequence containing the entire coding region of a gene endogenously expressed in human tissue.
  • adjuvants are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
  • Alleles refers to an alternative form of a nucleic acid sequence. Alleles result from a “mutation,” a change or an alternative reading of the genetic code. Any given gene may have none, one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence.
  • the present invention encompasses allelic mddt.
  • amino acid sequence refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin.
  • the amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence.
  • “Amplification” refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies well known in the art.
  • Antibody refers to intact molecules as well as to fragments thereof, such as Fab, F(ab') 2 , and Fv fragments, which are capable of binding the epitopic determinant.
  • Antibodies that bind MDDT polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen.
  • the polypeptide or peptide used to immunize an animal e.g., a mouse, a rat, or a rabbit
  • an animal e.g., a mouse, a rat, or a rabbit
  • RNA e.g., a mouse, a rat, or a rabbit
  • Antisense sequence refers to a sequence capable of specifically hybridizing to a target sequence.
  • the antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
  • PNA peptide nucleic acid
  • Antisense sequence refers to a sequence capable of specifically hybridizing to a target sequence.
  • the antisense sequence can be DNA, RNA, or any nucleic acid mimic or analog.
  • Antisense technology refers to any technology which relies on the specific hybridization of an antisense sequence to a target sequence.
  • a “bin” is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program.
  • Bioly active refers to an amino acid sequence having a structural, regulatory, or biochemical function of a naturally occurring amino acid sequence.
  • “Clone joining” is a process for combining gene bins based upon the bins' containing sequence information from the same clone.
  • the sequences may assemble into a primary gene transcript as well as one or more splice variants.
  • “Complementary” describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3 -T-C-A-5').
  • a “component sequence” is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences.
  • a "consensus sequence” or “template sequence” is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
  • GELVIEW fragment assembly system Genetics Computer Group (GCG), Madison WI
  • RDMS relational database management system
  • Consensus sequence or “template sequence” is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
  • “Conservative amino acid substitutions” are those substitutions that, when made, least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions.
  • the table below shows amino acids which may be substituted for an original amino acid in
  • Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
  • “Deletion” refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent. “Derivative” refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group.
  • element and “array element” refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray.
  • E-value refers to the statistical probability that a match between two sequences occurred by chance.
  • a “fragment” is a unique portion of mddt or MDDT which is identical in sequence to but shorter in length than the parent sequence.
  • a fragment may comprise up to the entire length of the defined sequence, minus one nucleotide/amino acid residue.
  • a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides.
  • a fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments may be preferentially selected from certain regions of a molecule.
  • a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence.
  • these lengths are exemplary, and any length that is supported by the specification, including the Sequence Listing and the figures, may be encompassed by the present embodiments.
  • a fragment of mddt comprises a region of unique polynucleotide sequence that specifically identifies mddt, for example, as distinct from any other sequence in the same genome.
  • a fragment of mddt is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish mddt from related polynucleotide sequences.
  • the precise length of a fragment of mddt and the region of mddt to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
  • a fragment of MDDT is encoded by a fragment of mddt.
  • a fragment of MDDT comprises a region of unique amino acid sequence that specifically identifies MDDT.
  • a fragment of MDDT is useful as an immunogenic peptide for the development of antibodies that specifically recognize MDDT.
  • the precise length of a fragment of MDDT and the region of MDDT to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
  • a "full length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, followed by an open reading frame and a stop site, and encoding a "full length” polypeptide.
  • “Hit” refers to a sequence whose annotation will be used to describe a given template. Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E-value.
  • Homology refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of an mddt or between a reference amino acid sequence and a fragment of an MDDT.
  • Hybridization refers to the process by which a strand of nucleotides anneals with a complementary strand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step.
  • the defined hybridization conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched.
  • Permissive conditions for annealing of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency.
  • T m thermal melting point
  • High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1 % SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%.
  • blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 ⁇ g/ml. Useful variations on these conditions will be readily apparent to those skilled in the art.
  • Hybridization particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins. Other parameters, such as temperature, salt concentration, and detergent concentration may be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of about 35-50% v/v, may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skill in the art.
  • Immunogenic describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or cell lines.
  • “Insertion” or “addition” refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or residue, respectively, is added to the sequence.
  • Labeling refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal.
  • “Microarray” is any arrangement of nucleic acids, amino acids, antibodies, etc., on a substrate.
  • the substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or an appropriate membrane.
  • Linkers are short stretches of nucleotide sequence which may be added to a vector or an mddt to create restriction endonuclease sites to facilitate cloning.
  • Polylinkers are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and Hindlll) and those which provide blunt ends (e.g., EcoRV, SnaBI, and Stul).
  • Naturally occurring refers to an endogenous polynucleotide or polypeptide that may be isolated from viruses or prokaryotic or eukaryotic cells.
  • Nucleic acid sequence refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oligomer, oligonucleotide, or polynucleotide.
  • the nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be either double-stranded or single-stranded, and can represent either the sense or antisense (complementary) strand.
  • Oligomer refers to a nucleic acid sequence of at least about 6 nucleotides and as many as about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may be used as, e.g., primers for PCR, and are usually chemically synthesized.
  • operably linked refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
  • a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
  • operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
  • PNA protein nucleic acid
  • PNAs refers to a DNA mimic in which nucleotide bases are attached to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can prevent gene expression by targeting complementary messenger RNA.
  • Percent identity between polynucleotide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEG ALIGN version 3.12e sequence alignment program. This program is part of the LASERGENE software package, a suite of molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in Higgins, D.G. and Sharp, P.M. ( 1989) CABIOS 5: 151-153 and in Higgins, D.G. et al. ( 1992)
  • CABIOS 8 189-191.
  • the "weighted” residue weight table is selected as the default.
  • Percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polynucleotide sequence pairs.
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local Alignment Search Tool
  • BLAST 215:403-410 which is available from several sources, including the NCBI, Bethesda, MD, and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/.
  • the BLAST software suite includes various sequence analysis programs including "blastn,” that is used to determine alignment between a known polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called “BLAST 2 Sequences” that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 Sequences" can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2/.
  • BLAST 2 Sequences can be used for both blastn and blastp (discussed below). BLAST programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences” tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters may be, for example:
  • Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
  • nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
  • percent identity and % identity refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm.
  • Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide.
  • blastp (March-07-1999) with blastp set at default parameters.
  • Such default parameters may be, for example:
  • Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
  • Post-translational modification of an MDDT may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu and the MDDT.
  • Probe refers to mddt or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes.
  • Primers are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme.
  • Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the figures and Sequence Listing, may be used.
  • PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
  • Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope.
  • the Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarray s. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.)
  • the PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences.
  • this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments.
  • the oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above.
  • “Purified” refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are naturally associated.
  • a "recombinant nucleic acid” is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra.
  • the term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid.
  • a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence.
  • a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell.
  • such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
  • Regulatory element refers to a nucleic acid sequence from nontranslated regions of a gene, and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host proteins to carry out or regulate transcription or translation.
  • Reporter molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
  • RNA equivalent in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
  • Samples may contain nucleic or amino acids, antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or blots or imprints from such cells or tissues).
  • source e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or blots or imprints from such cells or tissues).
  • Specific binding or “specifically binding” refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A,” the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
  • substitution refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
  • Substrate refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles or capillaries.
  • the substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
  • a “transcript image” refers to the collective pattern of gene expression by a particular tissue or cell type under given conditions at a given time.
  • Transformation refers to a process by which exogenous DNA enters a recipient cell. Transformation may occur under natural or artificial conditions using various methods well known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell being transformed.
  • Transformants include stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as cells which transiently express inserted DNA or RNA.
  • a "transgenic organism,” as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art.
  • the nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus.
  • the term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule.
  • the transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, and plants and animals.
  • the isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
  • a “variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or even at least 98% or greater sequence identity over a certain defined length.
  • the variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties.
  • a variant may be described as, for example, an "allelic” (as defined above), “splice,” “species,” or “polymorphic” variant.
  • a splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing.
  • the corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule.
  • Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other.
  • a polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species.
  • Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base.
  • SNPs single nucleotide polymorphisms
  • the presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
  • variants of the polynucleotides of the present invention may be generated through recombinant methods.
  • One possible method is a DNA shuffling technique such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number 5,837,458; Chang, C.-C. et al. ( 1999) Nat. Biotechnol.
  • DNA shuffling is a process by which a library of gene variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening.
  • fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized.
  • fragments of a given gene may be recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable manner.
  • a "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% or greater sequence identity over a certain defined length of one of the polypeptides.
  • cDNA sequences derived from human tissues and cell lines were aligned based on nucleotide sequence identity and assembled into "consensus" or "template” sequences which are designated by the template identification numbers (template IDs) in column 2 of Table 1.
  • the sequence identification numbers (SEQ ID NO:s) corresponding to the template IDs are shown in column 1.
  • the template sequences have similarity to GenBank sequences, or "hits," as designated by the GI Numbers in column 3.
  • the statistical probability of each GenBank hit is indicated by a probability score in column 4, and the functional annotation corresponding to each GenBank hit is listed in column 5.
  • the invention incorporates the nucleic acid sequences of these templates as disclosed in the Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states characterized by defects in disease detection and treatment molecule molecules.
  • the invention further utilizes these sequences in hybridization and amplification technologies, and in particular, in technologies which assess gene expression patterns correlated with specific cells or tissues and their responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, the sequences of the present invention are used to develop a transcript image for a particular cell or tissue.
  • Derivation of Nucleic Acid Sequences cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines.
  • the human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Genomics, Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
  • Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA).
  • cell lines Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • a pharmaceutical agent such as 5'-aza-2'-deoxycytidine
  • an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides).
  • Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed.
  • Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc.
  • Sequencing can be carried out using, for example, the ABI 373 or 377 (PE Biosystems) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art.
  • the nucleotide sequences of the Sequence Listing have been prepared by current, state-of- the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art.
  • Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art.
  • PHRAP Phils Revised Assembly Program
  • GCG GELVIEW fragment assembly system
  • cDNA sequences are used as "component" sequences that are assembled into “template” or “consensus” sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). A series of BLAST comparisons is performed and low-information segments and repetitive elements
  • RNA sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available.
  • RDMS relational database management system
  • a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves.
  • the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
  • bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged.
  • a resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
  • the cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 1.1; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853; and Table 6.) These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
  • BLAST Altschul, S.F. ( 1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410).
  • BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845).
  • GenBank e.g., GenBank
  • SwissProt e.g., GenBank
  • BLOCKS e.g., BLOCKS
  • PFAM e.g., PFAM
  • other databases e.g., GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query mddt or MDDT of the present invention.
  • the mddt of the present invention may be used for a variety of diagnostic and therapeutic purposes.
  • an mddt may be used to diagnose a particular condition, disease, or disorder associated with disease detection and treatment molecules.
  • Such conditions, diseases, and disorders include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix
  • the mddt can be used to detect the presence of, or to quantify the amount of, an mddt-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is established.
  • a polynucleotide complementary to a given mddt can inhibit or inactivate a therapeutically relevant gene related to the mddt.
  • the expression of mddt may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of mddt expression.
  • the level of expression of mddt may be compared among different cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at different developmental stages, or among cell types or tissues undergoing various treatments.
  • This type of analysis is useful, for example, to assess the relative levels of mddt expression in fully or partially differentiated cells or tissues, to determine if changes in mddt expression levels are correlated with the development or progression of specific disease states, and to assess the response of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies.
  • Methods for the analysis of mddt expression are based on hybridization and amplification technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utilize, for example, microarrays, and PCR-based procedures.
  • the mddt, their fragments, or complementary sequences may be used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences.
  • the mddt may be hybridized to naturally occurring or recombinant nucleic acid sequences under appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the nucleic acid sequence of at least one of the mddt allows for the detection of nucleic acid sequences, including genomic sequences, which are identical or related to the mddt of the Sequence Listing. Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ ID NO: 1-25 and tested for their ability to identify or amplify the target nucleic acid sequence using standard protocols.
  • Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in SEQ ID NO: 1-25 and fragments thereof, can be identified using various conditions of stringency. (See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions are discussed in "Definitions.”
  • a probe for use in Southern or northern hybridization may be derived from a fragment of an mddt sequence, or its complement, that is up to several hundred nucleotides in length and is either single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substrates containing mddt. Microarrays are particularly suitable for identifying the presence of and detecting the level of expression for multiple genes of interest by examining gene expression correlated with, e.g., various stages of development, treatment with a drug or compound, or disease progression. An array analogous to a dot or slot blot may be used to arrange and link polynucleotides to the surface of a substrate using one or more of the following: mechanical
  • Such an array may contain any number of mddt and may be produced by hand or by using available devices, materials, and machines.
  • Microarrays may be prepared, used, and analyzed using methods known in the art.
  • methods known in the art See, e.g., Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/251 1 16; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 2155; and Heller, MJ. et al. (1997) U.S. Patent No. 5,605,662.)
  • Probes may be labeled by either PCR or enzymatic techniques using a variety of commercially available reporter molecules.
  • commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies).
  • mddt may be cloned into commercially available vectors for the production of RNA probes.
  • Such probes may be transcribed in the presence of at least one labeled nucleotide (e.g., 32 P-ATP, Amersham Pharmacia Biotech).
  • polynucleotides of SEQ ID NO: 1-25 or suitable fragments thereof can be used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures well known in the art, e.g., cDNA library screening, PCR amplification, etc.
  • the molecular cloning of such full length cDNA sequences may employ the method of cDNA library screening with probes using the hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6.
  • These procedures may also be employed with genomic libraries to isolate genomic sequences of mddt in order to analyze, e.g., regulatory elements.
  • Gene identification and mapping are important in the investigation and treatment of almost all conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder.
  • cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream
  • diabetes may result when a particular individual's immune system is activated by an infection and attacks the insulin-producing cells of the pancreas.
  • Alzheimer's disease has been linked to a gene on chromosome 21 ; other studies predict a different gene and location. Mapping of disease genes is a complex and reiterative process and generally proceeds from genetic linkage analysis to physical mapping.
  • a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition.
  • Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers.
  • RFLP radio frequency domain
  • markers and their locations are known from previous studies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online Mendelian Inheritance in Man (OMIM) World Wide Web site.
  • mddt sequences may be used to generate hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences.
  • Either coding or noncoding sequences of mddt may be used, and in some instances, noncoding sequences may be preferable over coding sequences.
  • conservation of an mddt coding sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping.
  • the sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries.
  • HACs human artificial chromosomes
  • YACs yeast artificial chromosomes
  • BACs bacterial artificial chromosomes
  • PI constructions or single chromosome cDNA libraries.
  • Fluorescent in situ hybridization may be correlated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of mddt on a physical chromosomal map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder.
  • the mddt sequences may also be used to detect polymorphisms that are genetically linked to the inheritance of a particular condition, disease, or disorder.
  • In situ hybridization of chromosomal preparations and genetic mapping techniques may be used for extending existing genetic maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the number or arm of the corresponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques.
  • any sequences mapping to that area may represent associated or regulatory genes for further investigation.
  • the nucleotide sequences of the subject invention may also be used to detect differences in chromosomal architecture due to translocation, inversion, etc., among normal, carrier, or affected individuals.
  • a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., translocations or inversions) that may be correlated with disease.
  • This process requires a physical map of the chromosomal region containing the disease-gene of interest along with associated markers. A physical map is necessary for determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are well known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is determined, the DNA from that region is obtained by consulting the catalog and selecting clones from that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
  • the mddt of the present invention may be used to design probes useful in diagnostic assays. Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of mddt expression. Labeled probes developed from mddt sequences are added to a sample under hybridizing conditions of desired stringency. In some instances, mddt, or fragments or oligonucleotides derived from mddt, may be used as primers in amplification steps prior to hybridization. The amount of hybridization complex formed is quantified and compared with standards for that cell or tissue. If mddt expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease.
  • Qualitative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays.
  • PCR enzyme-linked immunosorbent assay
  • the probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of mddt expression, or to evaluate the efficacy of a particular therapeutic treatment.
  • the candidate probe may be identified from the mddt that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the treatment of an individual patient.
  • standard expression is established by methods well known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile.
  • Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods well known to those skilled in the art may be use to determine the significance of such therapeutic agents.
  • the polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA.
  • the polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA, which can then be sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique ID database is established for an individual, positive identification of that individual can be made from extremely small tissue samples.
  • oligonucleotide primers derived from the mddt of the invention may be used to detect single nucleotide polymorphisms (SNPs).
  • SNPs are substitutions, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans.
  • Methods of SNP detection include, but are not limited to, single-stranded conformation polymo ⁇ hism (SSCP) and fluorescent SSCP (fSSCP) methods.
  • SSCP single-stranded conformation polymo ⁇ hism
  • fSSCP fluorescent SSCP
  • oligonucleotide primers derived from mddt are used to amplify DNA using the polymerase chain reaction (PCR).
  • the DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like.
  • SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels.
  • the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high-throughput equipment such as DNA sequencing machines.
  • sequence database analysis methods termed in silico SNP (isSNP) are capable of identifying polymo ⁇ hisms by comparing the sequences of individual overlapping DNA fragments which assemble into a common consensus sequence.
  • SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASSARRAY system (Sequenom, Inc., San Diego CA).
  • DNA-based identification techniques are critical in forensic technology. DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc., can be amplified using, e.g., PCR, to identify individuals. (See, e.g., Erlich, H. (1992) PCR Technology, Freeman and Co., New York, NY).
  • polynucleotides of the present invention can be used as polymo ⁇ hic markers. There is also a need for reagents capable of identifying the source of a particular tissue.
  • reagents can comprise, for example, DNA probes or primers prepared from the sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue cultures for contamination.
  • the polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, and as an antigen to elicit an immune response.
  • the mddt of the invention or their mammalian homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) cells.
  • ES embryonic stem
  • Such techniques are well known in the art and are useful for the generation of animal models of human disease.
  • mouse ES cells such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture.
  • the ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244: 1288-1292).
  • a marker gene e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244: 1288-1292).
  • the vector integrates into the corresponding region of the host genome by homologous recombination.
  • homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) Clin. Invest. 97: 1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330).
  • Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain.
  • the blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains.
  • Transgenic animals thus generated may be tested with potential therapeutic or toxic agents.
  • the mddt of the invention may also be manipulated in vitro in ES cells derived from human blastocysts.
  • Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
  • the mddt of the invention can also be used to create "knockin" humanized animals (pigs) or transgenic animals (mice or rats) to model human disease.
  • knockin technology a region of mddt is injected into animal ES cells, and the injected sequence integrates into the animal cell genome.
  • Transformed cells are injected into blastulae, and the blastulae are implanted as described above.
  • Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease.
  • a mammal inbred to overexpress mddt resulting, e.g., in the secretion of MDDT in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
  • MDDT encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides.
  • the binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule.
  • Examples of such molecules include antibodies, oligonucleotides, proteins (e.g., receptors), or small molecules.
  • the molecule is closely related to the natural ligand of the polypeptide, e.g., a ligand or fragment thereof, a natural substrate, or a structural or functional mimetic.
  • the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, e.g., the active site. In either case, the molecule can be rationally designed using known techniques.
  • the screening for these molecules involves producing appropriate cells which express the polypeptide, either as a secreted protein or on the cell membrane.
  • Preferred cells include cells from mammals, yeast, Drosophila. or E. coli. Cells expressing the polypeptide or cell membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed.
  • An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. Alternatively, the assay may assess binding in the presence of a labeled competitor.
  • the assay can be carried out using cell-free preparations, polypeptide/molecule affixed to a solid support, chemical libraries, or natural product mixtures.
  • the assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard.
  • an ELISA assay using, e.g., a monoclonal or polyclonal antibody can measure polypeptide level in a sample.
  • the antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate.
  • All of the above assays can be used in a diagnostic or prognostic context.
  • the molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule.
  • the assays can discover agents which may inhibit or enhance the production of the polypeptide from suitably manipulated cells or tissues.
  • a transcript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly inco ⁇ orated by reference herein.)
  • a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type.
  • the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray.
  • the resultant transcript image would provide a profile of gene activity pertaining to disease detection and treatment molecules.
  • Transcript images which profile mddt expression may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples.
  • the transcript image may thus reflect mddt expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line.
  • Transcript images which profile mddt expression may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular finge ⁇ rints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N. L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties.
  • finge ⁇ rints or signatures are most useful and refined when they contain expression information from a large number of genes and gene families. Ideally, a genome-wide measurement of expression provides the highest quality signature. Even genes whose expression is not altered by any tested compounds are important as well, as the levels of expression of these genes are used to normalize the rest of the expression data. The normalization procedure is useful for comparison of expression data after treatment with different compounds. While the assignment of gene function to elements of a toxicant signature aids in inte ⁇ retation of toxicity mechanisms, knowledge of gene function is not necessary for the statistical matching of signatures which leads to prediction of toxicity.
  • the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound.
  • Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified.
  • the transcript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
  • proteome refers to the global pattern of protein expression in a particular tissue or cell type.
  • proteome expression patterns, or profiles are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time.
  • a profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type.
  • the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra).
  • the proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains.
  • the optical density of each protein spot is generally proportional to the level of the protein in the sample.
  • the optical densities of equivalently positioned protein spots from different samples for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment.
  • the proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry.
  • the identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
  • a proteomic profile may also be generated using antibodies specific for MDDT to quantify the levels of MDDT expression.
  • the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-1 1 ; Mendoze, L. G. et al. (1999) Biotechniques 27:778-88). Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
  • Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level.
  • There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N. L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile.
  • the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases.
  • the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound.
  • Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified.
  • the amount of each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
  • Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the MDDT encoded by polynucleotides of the present invention.
  • the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated with antibodies specific to the MDDT encoded by polynucleotides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
  • Transcript images may be used to profile mddt expression in distinct tissue types. This process can be used to determine disease detection and treatment molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of mddt expression characteristic of diseased tissue. Transcript images of tissues before and after treatment may be used for diagnostic pu ⁇ oses, to monitor the progression of disease, and to monitor the efficacy of drug treatments for diseases which affect the activity of disease detection and treatment molecules.
  • Transcript images of cell lines can be used to assess disease detection and treatment molecule activity and/or to identify cell lines that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, and a transcript image following treatment may indicate the efficacy of these agents in restoring desired levels of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in disease detection and treatment molecule activity. Candidate pharmaceutical agents may be evaluated by comparing their associated transcript images with those of pharmaceutical agents of known effectiveness.
  • Antisense Molecules The polynucleotides of the present invention are useful in antisense technology. Antisense technology or therapy relies on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression.
  • Antisense technology or therapy relies on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression.
  • Agrawal, S., ed. 1996 Antisense Therapeutics, Humana Press Inc., Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40: 1-49; Sharma, H.W. and R.
  • An antisense sequence is a polynucleotide sequence capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J.J. et al. (1991) Antisense Res. Dev. l(3):285-288; Lee, R. et al.
  • the binding which results in modulation of expression occurs through hybridization or binding of complementary base pairs.
  • Antisense sequences can also bind to DNA duplexes through specific interactions in the major groove of the double helix.
  • the polynucleotides of the present invention and fragments thereof can be used as antisense sequences to modify the expression of the polypeptide encoded by mddt.
  • the antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (PE Biosystems) or other automated systems known in the art.
  • Antisense sequences can also be produced biologically, such as by transforming an appropriate host cell with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.) In therapeutic use, any gene delivery system suitable for introduction of the antisense sequences into appropriate target cells can be used.
  • Antisense sequences can be delivered intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein.
  • Antisense sequences can also be introduced intracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors.
  • viral vectors such as retrovirus and adeno-associated virus vectors.
  • the nucleotide sequences encoding MDDT or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host.
  • an appropriate expression vector i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host.
  • Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding MDDT and appropriate transcriptional and translational control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; and Ausubel, supra. Chapters 9, 10, 13, and 16.)
  • a variety of expression vector/host systems may be utilized to contain and express sequences encoding MDDT. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammalian) cell systems.
  • microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors
  • yeast transformed with yeast expression vectors insect cell systems infected with viral expression vectors (e.g., baculovirus)
  • plant cell systems transformed with viral expression vectors e.g., cauliflower mosaic virus
  • Expression vectors derived from retroviruses, adenoviruses, or he ⁇ es or vaccinia viruses, or from various bacterial plasmids may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population.
  • the invention is not limited by the host cell employed.
  • sequences encoding MDDT can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems may be used to recover transformed cell lines.
  • the mddt of the invention may be used for somatic or germline gene therapy.
  • Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCID)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C et al.
  • SCID severe combined immunodeficiency
  • ADA adenosine deaminase
  • mddt hepatitis B or C virus
  • fungal parasites such as Candida albicans and Paracoccidioides brasiliensis
  • protozoan parasites such as Plasmodium falciparum and Trypanosoma cruzi
  • diseases or disorders caused by deficiencies in mddt are treated by constructing mammalian expression vectors comprising mddt and introducing these vectors by mechanical means into mddt-deficient cells.
  • Mechanical transfer technologies for use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450).
  • Expression vectors that may be effective for the expression of mddt include, but are not limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), PCMV-SCRIPT, PCMV-TAG, PEGSH/PERV (Stratagene, La Jolla CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA).
  • the mddt of the invention may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or ⁇ -actin genes), (ii) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi, F.M.V. and Blau, H.M.
  • a constitutively active promoter e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or ⁇ -actin genes
  • TRANSFECTION KIT available from Invitrogen
  • transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1 :841-845).
  • the introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols.
  • diseases or disorders caused by genetic defects with respect to mddt expression are treated by constructing a retrovirus vector consisting of (i) mddt under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional retrovirus c/s-acting RNA sequences and coding sequences required for efficient vector propagation.
  • Retrovirus vectors e.g., PFB and PFBNEO
  • Retrovirus vectors are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A.
  • the vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61 :1647-1650; Bender, M.A. et al. (1987) J. Virol. 61 : 1639-1646; Adam, M.A. and Miller, A.D. (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471 ; Zufferey, R.
  • VSVg vector producing cell line
  • U.S. Patent Number 5,910,434 to Rigg discloses a method for obtaining retrovirus packaging cell lines and is hereby inco ⁇ orated by reference. Propagation of retrovirus vectors, transduction of a population of cells (e.g., CD4 + T-cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267;
  • an adenovirus-based gene therapy delivery system is used to deliver mddt to cells which have one or more genetic abnormalities with respect to the expression of mddt.
  • the construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art. Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby inco ⁇ orated by reference.
  • a he ⁇ es-based, gene therapy delivery system is used to deliver mddt to target cells which have one or more genetic abnormalities with respect to the expression of mddt.
  • HSV he ⁇ es simplex virus
  • HSV he ⁇ es simplex virus
  • Patent Number 5,804,413 teaches the use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be transferred to a cell under the control of the appropriate promoter for pu ⁇ oses including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV vectors, see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 163: 152-161, hereby inco ⁇ orated by reference.
  • he ⁇ esvirus The manipulation of cloned he ⁇ esvirus sequences, the generation of recombinant virus following the transfection of multiple plasmids containing different segments of the large he ⁇ esvirus genomes, the growth and propagation of he ⁇ esvirus, and the infection of cells with he ⁇ esvirus are techniques well known to those of ordinary skill in the art.
  • an alphavirus (positive, single-stranded RNA virus) vector is used to deliver mddt to target cells.
  • SFV Semliki Forest Virus
  • the biology of the prototypic alphavirus, Semliki Forest Virus (SFV) has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, H. and Li, K-J. (1998) Curr. Opin. Biotech.
  • RNA RNA that normally encodes the viral capsid proteins. This subgenomic RNA replicates to higher levels than the full-length genomic RNA, resulting in the ove ⁇ roduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase).
  • enzymatic activity e.g., protease and polymerase
  • mddt inserting mddt into the alphavirus genome in place of the capsid-coding region results in the production of a large number of mddt RNAs and the synthesis of high levels of MDDT in vector transduced cells.
  • alphavirus infection is typically associated with cell lysis within a few days
  • the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the needs of the gene therapy application (Dryga, S.A. et al. ( 1997) Virology 228:74-83).
  • the wide host range of alphaviruses will allow the introduction of mddt into a variety of cell types.
  • the specific transduction of a subset of cells in a population may require the sorting of cells prior to transduction.
  • the methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art.
  • Anti-MDDT antibodies may be used to analyze protein expression levels. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For descriptions of and protocols of antibody technologies, see, e.g., Pound J.D. (1998) Immunochemical Protocols, Humana Press, Totowa, NJ.
  • amino acid sequence encoded by the mddt of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity.
  • appropriate software e.g., LASERGENE NAVIGATOR software, DNASTAR
  • the optimal sequences for immunization are selected from the C-terminus, the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be exposed to the external environment when the polypeptide is in its natural conformation. Analysis used to select appropriate epitopes is also described by Ausubel (1997, supra. Chapter 1 1.7).
  • Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic.
  • Peptides used to induce specific antibodies may have an amino acid sequence consisting of at five amino acids, preferably at least 10 amino acids, and most preferably 15 amino acids.
  • a peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet cyanin (KLH; Sigma, St. Louis MO) for antibody production.
  • KLH keyhole limpet cyanin
  • a peptide encompassing an antigenic region may be expressed from an mddt, synthesized as described above, or purified from human cells.
  • mice, goats, and rabbits may be immunized by injection with a peptide.
  • various adjuvants may be used to increase immunological response.
  • peptides about 15 residues in length may be synthesized using an ABI 431 A peptide synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra).
  • Rabbits are immunized with the peptide- KLH complex in complete Freund's adjuvant.
  • the resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- rabbit IgG.
  • BSA bovine serum albumin
  • Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting.
  • isolated and purified peptide may be used to immunize mice (about 100 ⁇ g of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of peptide is sufficient for labeling and screening several thousand clones.
  • Hybridomas of interest are detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody.
  • wells of a multi-well plate FAST, Becton-Dickinson, Palo Alto, CA
  • affinity-purified, specific rabbit-anti-mouse or suitable anti-species
  • IgG antibodies at 10 mg/ml.
  • the coated wells are blocked with 1 % BSA and washed and exposed to supernatants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 mg/ml.
  • Clones producing antibodies bind a quantity of labeled peptide that is detectable above background. Such clones are expanded and subjected to 2 cycles of cloning.
  • Cloned hybridomas are injected into pristane-treated mice to produce ascites, and monoclonal antibody is purified from the ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several procedures for the production of monoclonal antibodies, including in vitro production, are described in Pound (supra). Monoclonal antibodies with antipeptide activity are tested for anti-MDDT activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
  • Antibody fragments containing specific binding sites for an epitope may also be generated.
  • such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments.
  • construction of Fab expression libraries in filamentous bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity (Pound, supra. Chaps. 45-47).
  • Antibodies generated against polypeptide encoded by mddt can be used to purify and characterize full-length MDDT protein and its activity, binding partners, etc.
  • Anti-MDDT antibodies may be used in assays to quantify the amount of MDDT found in a particular human cell. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions.
  • the peptides and antibodies of the invention may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
  • Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are well known in the art. Examples include ELISA, RIA, and fluorescent activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes between the MDDT and its specific antibody and the measurement of such complexes. These and other assays are described in Pound (supra).
  • RNA was treated with DNase.
  • poly(A+) RNA was isolated using oligo d(T)-coupled paramagnetic particles (Promega Co ⁇ oration (Promega), Madison WI), OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA purification kit (QIAGEN).
  • RNA was isolated directly from tissue lysates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc., Austin TX).
  • RNA was provided with RNA and constructed the corresponding cDNA libraries.
  • cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene Cloning Systems, Inc. (Stratagene), La Jolla CA) or SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse transcription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL SI 000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia
  • cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), pSPORTl plasmid (Life Technologies), or pINCY (Incyte).
  • PBLUESCRIPT plasmid plasmid
  • pSPORTl plasmid plasmid
  • pINCY Incyte
  • Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5 ⁇ , DH10B, or ElectroMAX DH10B from Life Technologies.
  • Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector system (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN).
  • Plasmids were resuspended in 0.1 ml of distilled water and stored, with or without lyophilization, at 4°C
  • plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format.
  • Host cell lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland).
  • cDNA sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 thermal cycler (PE Biosystems) or the PTC-200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific Co ⁇ ., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton).
  • cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE Biosystems).
  • Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (PE Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel, 1997, supra. Chapter 1.1). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VHI.
  • Component sequences from chromatograms were subject to PHRED analysis and assigned a quality score.
  • the sequences having at least a required quality score were subject to various preprocessing editing pathways to eliminate, e.g., low quality 3' ends, vector and linker sequences, polyA tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaller than 50 base pairs.
  • low-information sequences and repetitive elements e.g., dinucleotide repeats, Alu repeats, etc.
  • sequences were then subject to assembly procedures in which the sequences were assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bins using BLASTn (v.l .4 WashU) and CROSSMATCH. Candidate pairs were identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at least 82% local identity were accepted into the bin. The component sequences from each bin were assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP.
  • each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) strands are inherently disclosed herein.
  • the component sequences which were used to assemble each template consensus sequence are listed in Tables 4A and 4B , along with their positions along the template nucleotide sequences. Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-split.
  • Assembled templates were also subject to analysis by STITCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
  • bins were clone joined based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actually belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
  • the template sequences were further analyzed by translating each template in all three forward reading frames and searching each translation against the Pfam database of hidden Markov model-based protein families and domains using the HMMER software package (available to the public from Washington University School of Medicine, St. Louis MO). Regions of templates which, when translated, contain similarity to Pfam consensus sequences are reported in Table 2, along with descriptions of Pfam protein domains and families. Only those Pfam hits with an E-value of ⁇ 1 x 10 "3 are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam protein domains and families.)
  • HMMER analysis as reported in Tables 2 and 3 may support the results of BLAST analysis as reported in Table 1 or may suggest alternative or additional properties of template-encoded polypeptides not previously uncovered by BLAST or other analyses.
  • Template sequences are further analyzed using the bioinformatics tools listed in Table 6, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Template sequences may be further queried against public databases such as the GenBank rodent, mammalian, vertebrate, prokaryote, and eukaryote databases.
  • Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular cell type or tissue have been bound.
  • a membrane on which RNAs from a particular cell type or tissue have been bound See, e.g., Sambrook, supra, ch. 7; Ausubel, 1995, supra, ch. 4 and 16.
  • Analogous computer techniques applying BLAST were used to search for identical or related molecules in cDNA databases such as GenBank or LIFESEQ (Incyte Genomics). This analysis is much faster than multiple membrane-based hybridizations.
  • the sensitivity of the computer search can be modified to determine whether any particular match is categorized as exact or similar.
  • the basis of the search is the product score, which is defined as:
  • the product score takes into account both the degree of similarity between two sequences and the length of the sequence match.
  • the product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences).
  • the BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score.
  • the product score represents a balance between fractional overlap and quality in a BLAST alignment.
  • a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared.
  • a product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other.
  • a product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
  • a tissue distribution profile is determined for each template by compiling the cDNA library tissue classifications of its component cDNA sequences.
  • Each component sequence is derived from a cDNA library constructed from a human tissue.
  • Each human tissue is classified into one of the following categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract.
  • Template sequences, component sequences, and cDNA library /tissue information are found in the LIFESEQ GOLD database (Incyte Genomics, Palo Alto CA).
  • Table 5 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with percentage values of > 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of ⁇ 10% in all tissue categories. VII. Transcript Image Analysis
  • Oligonucleotide primers designed using an mddt of the Sequence Listing are used to extend the nucleic acid sequence.
  • One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template.
  • the initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result in hai ⁇ in structures and primer-primer dimerizations are avoided.
  • Selected human cDNA libraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed. High fidelity amplification is obtained by PCR using methods well known in the art. PCR is performed in 96-well plates using the PTC-200 thermal cycler (MJ Research).
  • the reaction mix contains DNA template, 200 nmol of each primer, reaction buffer containing Mg 2 *, (NH 4 ) 2 S0 4 , and ⁇ - mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C
  • the parameters for primer pair T7 and SK+ are as follows: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6:
  • the plate is scanned in a FLUOROSKAN II (Labsystems Oy) to measure the fluorescence of the sample and to quantify the concentration of DNA.
  • FLUOROSKAN II Labelsystems Oy
  • a 5 ⁇ l to 10 ⁇ l aliquot of the reaction mixture is analyzed by electrophoresis on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence.
  • the extended nucleotides are desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech).
  • CviJI cholera virus endonuclease Molecular Biology Research, Madison WI
  • sonicated or sheared prior to religation into pUC 18 vector
  • the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega).
  • Extended clones are religated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector
  • the cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified using the same conditions as described above.
  • Samples are diluted with 20% dimethysulfoxide (1:2, v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE Biosystems).
  • DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE Biosystems).
  • the mddt is used to obtain regulatory sequences (promoters, introns, and enhancers) using the procedure above, oligonucleotides designed for such extension, and an appropriate genomic library.
  • Hybridization probes derived from the mddt of the Sequence Listing are employed for screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 1000 nucleotides in length is specifically described, but essentially the same procedure may be used with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using a T4 polynucleotide kinase, ⁇ 32 P-ATP, and 0.5X One-Phor-All Plus (Amersham Pharmacia Biotech) buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The probe mixture is diluted to 10 7 dpm/ ⁇ g/ml hybridization buffer and used in a typical membrane-based hybridization analysis.
  • the DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed through a 0.7% agarose gel.
  • the DNA fragments are transferred from the agarose to nylon membrane (NYTRAN Plus, Schleicher & Schuell, Inc., Keene NH) using procedures specified by the manufacturer of the membrane.
  • Prehybridization is carried out for three or more hours at 68 °C, and hybridization is carried out overnight at 68 °C.
  • blots are sequentially washed at room temperature under increasingly stringent conditions, up to 0.1 x saline sodium citrate (SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORHMAGER cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of standard and experimental lanes are compared. Essentially the same procedure is employed when screening RNA.
  • the cDNA sequences which were used to assemble SEQ ID NO: 1-25 are compared with sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other implementations of the Smith- Waterman algorithm. Sequences from these databases that match SEQ ID NO: 1-25 are assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as PHRAP (Table 6). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences have been previously mapped.
  • SHGC Stanford Human Genome Center
  • WIGR Whitehead Institute for Genome Research
  • Genethon Genethon
  • a mapped sequence in a cluster will result in the assignment of all sequences of that cluster, including its particular SEQ ID NO:, to that map location.
  • the genetic map locations of SEQ ID NO: 1-25 are described as ranges, or intervals, of human chromosomes.
  • the map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p-arm.
  • centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers.
  • cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.
  • Mb megabase
  • the cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
  • Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and polyA + RNA is purified using the oligo (dT) cellulose method.
  • Each polyA + RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/ ⁇ l oligo-dT primer (21mer), IX first strand buffer, 0.03 units/ ⁇ l RNase inhibitor, 500 ⁇ M dATP, 500 ⁇ M dGTP, 500 ⁇ M dTTP, 40 ⁇ M dCTP, 40 ⁇ M dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech).
  • the reverse transcription reaction is performed in a 25 ml volume containing 200 ng polyA + RNA with GEMBRIGHT kits (Incyte).
  • Specific control polyA + RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA (W. Lei, unpublished). As quantitative controls, the control RNAs at 0.002 ng,
  • 0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1 : 100,000, 1 : 10,000, 1 : 1000, 1 : 100 (w/w) to sample mRNA respectively.
  • the control mRNAs are diluted into reverse transcription reaction at ratios of 1:3, 3: 1, 1 : 10, 10: 1, 1:25, 25: 1 (w/w) to sample mRNA differential expression patterns.
  • each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA.
  • Probes are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.
  • reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol.
  • the probe is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 ⁇ l 5X SSC/0.2% SDS.
  • Sequences of the present invention are used to generate array elements.
  • Each array element is amplified from bacterial cells containing vectors with cloned cDNA inserts.
  • PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert.
  • Array elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 ⁇ g. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech).
  • Purified array elements are immobilized on polymer-coated glass slides.
  • Glass microscope slides (Corning) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments.
  • Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Co ⁇ oration (VWR), West Chester, PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 110°C oven.
  • Array elements are applied to the coated glass substrate using a procedure described in US
  • Patent No. 5,807,522 inco ⁇ orated herein by reference.
  • 1 ⁇ l of the array element DNA is loaded into the open capillary printing element by a high-speed robotic apparatus.
  • the apparatus then deposits about 5 nl of array element sample per slide.
  • Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C followed by washes in 0.2% SDS and distilled water as before.
  • PBS phosphate buffered saline
  • Hybridization reactions contain 9 ⁇ l of probe mixture consisting of 0.2 ⁇ g each of Cy3 and Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer.
  • the probe mixture is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 cm 2 coverslip.
  • the arrays are transferred to a wate ⁇ roof chamber having a cavity just slightly larger than a microscope slide.
  • the chamber is kept at 100% humidity internally by the addition of 140 ⁇ l of 5x SSC in a corner of the chamber.
  • the chamber containing the arrays is incubated for about 6.5 hours at 60° C.
  • the arrays are washed for 10 min at 45° C in a first wash buffer (IX SSC,
  • Detection 5 Reporter-labeled hybridization complexes are detected with a microscope equipped with an
  • Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5.
  • the excitation laser light is focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY).
  • the slide containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 0 scanned past the objective.
  • the 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
  • a mixed gas multiline laser excites the two fluorophores sequentially. Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate 5 filters positioned between the array and the photomultiplier tubes are used to filter the signals. The emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, although the apparatus is capable of recording the spectra from both fluorophores simultaneously.
  • the sensitivity of the scans is typically calibrated using the signal intensity generated by a 0 cDNA control species added to the probe mix at a known concentration.
  • a specific location on the array contains a complementary DNA sequence, allowing the intensity of the signal at that location to be correlated with a weight ratio of hybridizing species of 1 : 100,000.
  • the output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital (A/D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an IBM-compatible PC computer.
  • the digitized data are displayed as an image where the signal intensity is mapped using a o linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal).
  • the data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore 's emission spectrum.
  • a grid is superimposed over the fluorescence signal image such that the signal from each spot 5 is centered in each element of the grid.
  • the fluorescence signal within each element is then integrated to obtain a numerical value corresponding to the average intensity of the signal.
  • the software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
  • Sequences complementary to the mddt are used to detect, decrease, or inhibit expression of the naturally occurring nucleotide.
  • the use of oligonucleotides comprising from about 15 to 30 base pairs is typical in the art. However, smaller or larger sequence fragments can also be used.
  • Appropriate oligonucleotides are designed from the mddt using OLIGO 4.06 software (National Biosciences) or other appropriate programs and are synthesized using methods standard in the art or ordered from a commercial supplier.
  • OLIGO 4.06 software National Biosciences
  • a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent transcription factor binding to the promoter sequence.
  • To inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding and processing of the transcript.
  • MDDT expression and purification of MDDT is accomplished using bacterial or virus-based expression systems.
  • cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription.
  • promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element.
  • Recombinant vectors are transformed into suitable bacterial hosts, e.g.,
  • BL21 (DE3).
  • Antibiotic resistant bacteria express MDDT upon induction with isopropyl beta-D- thiogalactopyranoside (1PTG).
  • Expression of MDDT in eukaryotic cells is achieved by infecting insect or mammalian cell lines with recombinant Autographica californica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus.
  • AcMNPV Autographica californica nuclear polyhedrosis virus
  • the nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding MDDT by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription.
  • baculovirus Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, supra; and Sandig, supra.)
  • MDDT is synthesized as a fusion protein with, e.g., glutathione S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates.
  • GST glutathione S-transferase
  • a peptide epitope tag such as FLAG or 6-His
  • FLAG an 8-amino acid peptide
  • 6-His a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra. Chapters 10 and 16). Purified MDDT obtained by these methods can be used directly in the following activity assay.
  • MDDT or biologically active fragments thereof, are labeled with 125 I Bolton-Hunter reagent.
  • Bolton-Hunter reagent See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.
  • Candidate molecules previously arrayed in the wells of a multi-well plate are incubated with the labeled MDDT, washed, and any wells with labeled MDDT complex are assayed. Data obtained using different concentrations of MDDT are used to calculate values for the number, affinity, and association of MDDT with the candidate molecules.
  • molecules interacting with MDDT are analyzed using the yeast two-hybrid system as described in Fields, S. and O.
  • MDDT may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
  • MDDT function is assessed by expressing mddt at physiologically elevated levels in mammalian cell culture systems.
  • cDNA is subcloned into a mammalian expression vector containing a strong promoter that drives high levels of cDNA expression.
  • Vectors of choice include pCMV SPORT (Life Technologies) and pCR3.1 (Invitrogen Co ⁇ oration, Carlsbad CA), both of which contain the cytomegalovirus promoter.
  • 5-10 ⁇ g of recombinant vector are transiently transfected into a human cell line, preferably of endothelial or hematopoietic origin, using either liposome formulations or electroporation.
  • 1-2 ⁇ g of an additional plasmid containing sequences encoding a marker protein are co-transfected.
  • marker protein provides a means to distinguish transfected cells from nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector.
  • Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a CD64-GFP fusion protein.
  • FCM Flow cytometry
  • FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with cell death. These events include changes in nuclear DNA content as measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to the cell surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow Cytometry, Oxford, New York NY.
  • CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions of human immunoglobulin G (IgG).
  • Transfected cells are efficiently separated from nontransfected cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., Lake Success NY).
  • mRNA can be purified from the cells using methods well known by those of skill in the art. Expression of mRNA encoding MDDT and other genes of interest can be analyzed by northern analysis or microarray techniques.
  • PAGE polyacrylamide gel electrophoresis
  • the MDDT amino acid sequence is analyzed using LASERGENE software (DNASTAR) to determine regions of high immunogenicity, and a corresponding peptide is synthesized and used to raise antibodies by means known to those of skill in the art.
  • LASERGENE software DNASTAR
  • Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well described in the art. (See, e.g., Ausubel, 1995, supra. Chapter 11.)
  • peptides 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (PE Biosystems) using fmoc -chemistry and coupled to KLH (Sigma) by reaction with N- maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity.
  • ABI 431 A peptide synthesizer PE Biosystems
  • MBS N- maleimidobenzoyl-N-hydroxysuccinimide ester
  • Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant.
  • Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radio- iodinated goat anti-rabbit IgG.
  • Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
  • XVII Purification of Naturally Occurring MDDT Using Specific Antibodies
  • Naturally occurring or recombinant MDDT is substantially purified by immunoaffinity chromatography using antibodies specific for MDDT.
  • An immunoaffinity column is constructed by covalently coupling anti-MDDT antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions.
  • Media containing MDDT are passed over the immunoaffinity column, and the column is washed under conditions that allow the preferential absorbance of MDDT (e.g., high ionic strength buffers in the presence of detergent).
  • the column is eluted under conditions that disrupt antibody/MDDT binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate ion), and MDDT is collected.
  • GNAT Acetyltransferase
  • dec 207 335 forward 3 UBA UBA-domain 1.90E-06

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present invention provides purified disease detection and treatment molecule polynucleotides (mddt). Also encompassed are the polypeptides (MDDT) encoded by mddt. The invention also provides for the use of mddt, or complements, oligonucleotides, or fragments thereof in diagnostic assays. The invention further provides for vectors and host cells containing mddt for the expression of MDDT. The invention additionally provides for the use of isolated and purified MDDT to induce antibodies and to screen libraries of compounds and the use of anti-MDDT antibodies in diagnostic assays. Also provided are microarrays containing mddt and methods of use.

Description

MOLECULES FOR DISEASE DETECTION AND TREATMENT
This application claims the benefit of U.S. Ser. No. 60/156,565 filed September 28, 1999 and U.S. Ser. No. 60/168,197 filed November 30, 1999.
TECHNICAL FIELD
The present invention relates to molecules for disease detection and treatment and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of molecules for disease detection and treatment.
BACKGROUND OF THE INVENTION
The human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders. The identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment.
For example, cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways and include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins. In contrast, tumor-suppressor genes are involved in inhibiting cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in aberrant cell proliferation and cancer. Thus a wide variety of genes and their products have been found that are associated with cell proliferative disorders such as cancer, but many more may exist that are yet to be discovered.
DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity. DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as when the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes.
The discovery of new molecules for disease detection and treatment satisfies a need in the art by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of molecules for disease detection and treatment.
SUMMARY OF THE INVENTION
The present invention relates to human disease detection and treatment molecule polynucleotides (mddt) as presented in the Sequence Listing. The mddt uniquely identify genes encoding structural, functional, and regulatory disease detection and treatment molecules.
The invention provides an isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). In one alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25. In another alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-
25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The invention further provides a composition for the detection of expression of disease detection and treatment molecule polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d); and a detectable label.
The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof. In one alternative, the probe comprises at least 30 contiguous nucleotides. In another alternative, the probe comprises at least 60 contiguous nucleotides. The invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO:l- 25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 -25 ; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). In one alternative, the invention provides a cell transformed with the recombinant polynucleotide. In another alternative, the invention provides a transgenic organism comprising the recombinant polynucleotide. In a further alternative, the invention provides a method for producing a disease detection and treatment molecule polypeptide, the method comprising a) culturing a cell under conditions suitable for expression of the disease detection and treatment molecule polypeptide, wherein said cell is transformed with the recombinant polynucleotide, and b) recovering the disease detection and treatment molecule polypeptide so expressed.
The invention also provides a purified disease detection and treatment molecule polypeptide (MDDT) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25. Additionally, the invention provides an isolated antibody which specifically binds to the disease detection and treatment molecule polypeptide. The invention further provides a method of identifying a test compound which specifically binds to the disease detection and treatment molecule polypeptide, the method comprising the steps of a) providing a test compound; b) combining the disease detection and treatment molecule polypeptide with the test compound for a sufficient time and under suitable conditions for binding; and c) detecting binding of the disease detection and treatment molecule polypeptide to the test compound, thereby identifying the test compound which specifically binds the disease detection and treatment molecule polypeptide. The invention further provides a microarray wherein at least one element of the microarray is an isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The invention also provides a method for generating a transcript image of a sample which contains polynucleotides. The method comprises a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
Additionally, the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The method comprises a) exposing a sample comprising the target polynucleotide to a compound, and b) detecting altered expression of the target polynucleotide.
The invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide comprising a polynucleotide sequence selected from the group consisting of i) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-
25; ii) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; iii) a polynucleotide sequence complementary to i), iv) a polynucleotide sequence complementary to ii), and v) an RNA equivalent of i)-iv). Hybridization occurs under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence selected from the group consisting of i) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; ii) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25; iii) a polynucleotide sequence complementary to i), iv) a polynucleotide sequence complementary to ii), and v) an RNA equivalent of i)-iv), and alternatively, the target polynucleotide comprises a fragment of a polynucleotide sequence selected from the group consisting of i-v above; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
DESCRIPTION OF THE TABLES Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probability scores, and functional annotations corresponding to the GenBank hits.
Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the polynucleotide segments are indicated. Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated.
Table 4A and Table 4B show the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) corresponding to the polynucleotides of the present invention, along with component sequence identification numbers (component IDs) corresponding to each template. The component sequences, which were used to assemble the template sequences, are defined by the indicated "start" and "stop" nucleotide positions along each template. Table 5 shows the tissue distribution profiles for the templates of the invention.
Table 6 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention. The first column of Table 6 lists analytical tools, programs, and algorithms, the second column provides brief descriptions thereof, the third column presents appropriate references, all of which are incorporated by reference herein in their entirety, and the fourth column presents, where applicable, the scores, probability values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
DETAILED DESCRIPTION OF THE INVENTION Before the nucleic acid sequences and methods are presented, it is to be understood that this invention is not limited to the particular machines, methods, and materials described. Although particular embodiments are described, machines, methods, and materials similar or equivalent to these embodiments may be used to practice the invention. The preferred machines, methods, and materials set forth are not intended to limit the scope of the invention which is limited only by the appended claims.
The singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. All technical and scientific terms have the meanings commonly understood by one of ordinary skill in the art. All publications are incorporated by reference for the purpose of describing and disclosing the cell lines, vectors, and methodologies which are presented and which might be used in connection with the invention. Nothing in the specification is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.
Definitions
As used herein, the lower case "mddt" refers to a nucleic acid sequence, while the upper case "MDDT" refers to an amino acid sequence encoded by mddt. A "full-length" mddt refers to a nucleic acid sequence containing the entire coding region of a gene endogenously expressed in human tissue. "Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
"Allele" refers to an alternative form of a nucleic acid sequence. Alleles result from a "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence. The present invention encompasses allelic mddt.
"Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin. The amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence. "Amplification" refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies well known in the art.
"Antibody" refers to intact molecules as well as to fragments thereof, such as Fab, F(ab')2, and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind MDDT polypeptides can be prepared using intact polypeptides or using fragments containing small peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized chemically, and can be conjugated to a carrier protein if desired. Commonly used carriers that are chemically coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH). The coupled peptide is then used to immunize the animal. "Antisense sequence" refers to a sequence capable of specifically hybridizing to a target sequence. The antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
"Antisense sequence" refers to a sequence capable of specifically hybridizing to a target sequence. The antisense sequence can be DNA, RNA, or any nucleic acid mimic or analog.
"Antisense technology" refers to any technology which relies on the specific hybridization of an antisense sequence to a target sequence. A "bin" is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program.
"Biologically active" refers to an amino acid sequence having a structural, regulatory, or biochemical function of a naturally occurring amino acid sequence.
"Clone joining" is a process for combining gene bins based upon the bins' containing sequence information from the same clone. The sequences may assemble into a primary gene transcript as well as one or more splice variants. "Complementary" describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3 -T-C-A-5').
A "component sequence" is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences.
A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS). "Conservative amino acid substitutions" are those substitutions that, when made, least interfere with the properties of the original protein, i.e., the structure and especially the function of the protein is conserved and not significantly changed by such substitutions. The table below shows amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions.
Original Residue Conservative Substitution
Ala Gly, Ser
Arg His, Lys
Asn Asp, Gin, His
Asp Asn, Glu
Cys Ala, Ser
Gin Asn, Glu, His
Glu Asp, Gin, His
Gly Ala
His Asn, Arg, Gin, Glu
He Leu, Val
Leu He, Val
Lys Arg, Gin, Glu
Met Leu, He
Phe His, Met, Leu, Trp, Tyr
Ser Cys, Thr
Thr Ser, Val
Trp Phe, Tyr
Tyr His, Phe, Trp
Val He, Leu, Thr
Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
"Deletion" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent. "Derivative" refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group.
The terms "element" and "array element" refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microarray. "E-value" refers to the statistical probability that a match between two sequences occurred by chance.
A "fragment" is a unique portion of mddt or MDDT which is identical in sequence to but shorter in length than the parent sequence. A fragment may comprise up to the entire length of the defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments may be preferentially selected from certain regions of a molecule. For example, a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, including the Sequence Listing and the figures, may be encompassed by the present embodiments.
A fragment of mddt comprises a region of unique polynucleotide sequence that specifically identifies mddt, for example, as distinct from any other sequence in the same genome. A fragment of mddt is useful, for example, in hybridization and amplification technologies and in analogous methods that distinguish mddt from related polynucleotide sequences. The precise length of a fragment of mddt and the region of mddt to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment.
A fragment of MDDT is encoded by a fragment of mddt. A fragment of MDDT comprises a region of unique amino acid sequence that specifically identifies MDDT. For example, a fragment of MDDT is useful as an immunogenic peptide for the development of antibodies that specifically recognize MDDT. The precise length of a fragment of MDDT and the region of MDDT to which the fragment corresponds are routinely determinable by one of ordinary skill in the art based on the intended purpose for the fragment. A "full length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" polypeptide.
"Hit" refers to a sequence whose annotation will be used to describe a given template. Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E-value.
"Homology" refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of an mddt or between a reference amino acid sequence and a fragment of an MDDT.
"Hybridization" refers to the process by which a strand of nucleotides anneals with a complementary strand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step. The defined hybridization conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may be varied among experiments to achieve the desired stringency.
Generally, stringency of hybridization is expressed with reference to the temperature under which the wash step is carried out. Generally, such wash temperatures are selected to be about 5°C to 20°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. An equation for calculating Tm and conditions for nucleic acid hybridization is well known and can be found in Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; specifically see volume 2, chapter 9.
High stringency conditions for hybridization between polynucleotides of the present invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1 % SDS, for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%. Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 μg/ml. Useful variations on these conditions will be readily apparent to those skilled in the art. Hybridization, particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins. Other parameters, such as temperature, salt concentration, and detergent concentration may be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of about 35-50% v/v, may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skill in the art.
"Immunogenic" describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or cell lines. "Insertion" or "addition" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or residue, respectively, is added to the sequence.
"Labeling" refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal.
"Microarray" is any arrangement of nucleic acids, amino acids, antibodies, etc., on a substrate. The substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or an appropriate membrane.
"Linkers" are short stretches of nucleotide sequence which may be added to a vector or an mddt to create restriction endonuclease sites to facilitate cloning. "Polylinkers" are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and Hindlll) and those which provide blunt ends (e.g., EcoRV, SnaBI, and Stul).
"Naturally occurring" refers to an endogenous polynucleotide or polypeptide that may be isolated from viruses or prokaryotic or eukaryotic cells.
"Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oligomer, oligonucleotide, or polynucleotide. The nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be either double-stranded or single-stranded, and can represent either the sense or antisense (complementary) strand. "Oligomer" refers to a nucleic acid sequence of at least about 6 nucleotides and as many as about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may be used as, e.g., primers for PCR, and are usually chemically synthesized.
"Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
"Peptide nucleic acid" (PNA) refers to a DNA mimic in which nucleotide bases are attached to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can prevent gene expression by targeting complementary messenger RNA. The phrases "percent identity" and "% identity", as applied to polynucleotide sequences, refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
Percent identity between polynucleotide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEG ALIGN version 3.12e sequence alignment program. This program is part of the LASERGENE software package, a suite of molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in Higgins, D.G. and Sharp, P.M. ( 1989) CABIOS 5: 151-153 and in Higgins, D.G. et al. ( 1992)
CABIOS 8: 189-191. For pairwise alignments of polynucleotide sequences, the default parameters are set as follows: Ktuple=2, gap penalty=5, window=4, and "diagonals saved"=4. The "weighted" residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polynucleotide sequence pairs. Alternatively, a suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available from several sources, including the NCBI, Bethesda, MD, and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence analysis programs including "blastn," that is used to determine alignment between a known polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 Sequences" can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2/. The "BLAST 2 Sequences" tool can be used for both blastn and blastp (discussed below). BLAST programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters may be, for example:
Matrix: BLOSUM62 Reward for match: 1
Penalty for mismatch: -2
Open Gap: 5 and Extension Gap: 2 penalties
Gap x drop-off: 50
Expect: 10 Word Size: 11
Filter: on Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
The phrases "percent identity" and "% identity", as applied to polypeptide sequences, refer to the percentage of residue matches between at least two polypeptide sequences aligned using a standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide.
Percent identity between polypeptide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence alignment program (described and referenced above). For pairwise alignments of polypeptide sequences using CLUSTAL V, the default parameters are set as follows: Ktuple=l , gap penalty=3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default residue weight table. As with polynucleotide alignments, the percent identity is reported by CLUSTAL V as the "percent similarity" between aligned polypeptide sequence pairs. Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9
(May-07-1999) with blastp set at default parameters. Such default parameters may be, for example:
Matrix: BLOSUM62 Open Gap: 11 and Extension Gap: 1 penalty
Gap x drop-off: 50
Exped: 10
Word Size: 3
Filter: on Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
"Post-translational modification" of an MDDT may involve lipidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur synthetically or biochemically. Biochemical modifications will vary by cell type depending on the enzymatic milieu and the MDDT. "Probe" refers to mddt or fragments thereof, which are used to detect identical, allelic or related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme.
Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
Probes and primers as used in the present invention typically comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may be considerably longer than these examples, and it is understood that any length supported by the specification, including the figures and Sequence Listing, may be used.
Methods for preparing and using probes and primers are described in the references, for example Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; Ausubel et al.,1987, Current Protocols in Molecular Biology, Greene Publ. Assoc. & Wiley-Intersciences, New York NY; Innis et al., 1990, PCR Protocols, A Guide to Methods and Applications, Academic Press, San Diego CA. PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA). Oligonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabilities. For example, the PrimOU primer selection program (available to the public from the Genome Center at University of Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome-wide scope. The Primer3 primer selection program (available to the public from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) allows the user to input a "mispriming library," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oligonucleotides for microarray s. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, thereby allowing selection of primers that hybridize to either the most conserved or least conserved regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of oligonucleotide selection are not limited to those described above.
"Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are naturally associated. A "recombinant nucleic acid" is a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra. The term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid. Frequently, a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a cell.
Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
"Regulatory element" refers to a nucleic acid sequence from nontranslated regions of a gene, and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host proteins to carry out or regulate transcription or translation. "Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
"Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or blots or imprints from such cells or tissues).
"Specific binding" or "specifically binding" refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody will reduce the amount of labeled A that binds to the antibody.
"Substitution" refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
"Substrate" refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles or capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound. A "transcript image" refers to the collective pattern of gene expression by a particular tissue or cell type under given conditions at a given time.
"Transformation" refers to a process by which exogenous DNA enters a recipient cell. Transformation may occur under natural or artificial conditions using various methods well known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell being transformed.
"Transformants" include stably transformed cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as cells which transiently express inserted DNA or RNA. A "transgenic organism," as used herein, is any organism, including but not limited to animals and plants, in which one or more of the cells of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art.
The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, fungi, and plants and animals. The isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or even at least 98% or greater sequence identity over a certain defined length. The variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties. A variant may be described as, for example, an "allelic" (as defined above), "splice," "species," or "polymorphic" variant. A splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state. In an alternative, variants of the polynucleotides of the present invention may be generated through recombinant methods. One possible method is a DNA shuffling technique such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number 5,837,458; Chang, C.-C. et al. ( 1999) Nat. Biotechnol. 17:793-797; Christians, F.C. et al. (1999) Nat. Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or improve the biological properties of MDDT, such as its biological or enzymatic activity or its ability to bind to other molecules or compounds. DNA shuffling is a process by which a library of gene variants is produced using PCR-mediated recombination of gene fragments. The library is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene may be recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable manner.
A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% or greater sequence identity over a certain defined length of one of the polypeptides.
THE INVENTION
In a particular embodiment, cDNA sequences derived from human tissues and cell lines were aligned based on nucleotide sequence identity and assembled into "consensus" or "template" sequences which are designated by the template identification numbers (template IDs) in column 2 of Table 1. The sequence identification numbers (SEQ ID NO:s) corresponding to the template IDs are shown in column 1. The template sequences have similarity to GenBank sequences, or "hits," as designated by the GI Numbers in column 3. The statistical probability of each GenBank hit is indicated by a probability score in column 4, and the functional annotation corresponding to each GenBank hit is listed in column 5.
The invention incorporates the nucleic acid sequences of these templates as disclosed in the Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states characterized by defects in disease detection and treatment molecule molecules. The invention further utilizes these sequences in hybridization and amplification technologies, and in particular, in technologies which assess gene expression patterns correlated with specific cells or tissues and their responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, the sequences of the present invention are used to develop a transcript image for a particular cell or tissue. Derivation of Nucleic Acid Sequences cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines. The human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Genomics, Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources. Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
Sequencing of the cDNAs
Methods for DNA sequencing are well known in the art. Conventional enzymatic methods employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. Biochemical Corporation, Cleveland OH), Taq polymerase (PE Biosystems, Foster City CA), thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA template of interest. Methods have been developed for the use of both single-stranded and double- stranded templates. Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed. Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (PE Biosystems). Sequencing can be carried out using, for example, the ABI 373 or 377 (PE Biosystems) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art. The nucleotide sequences of the Sequence Listing have been prepared by current, state-of- the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art. Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology. John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
Assembly of cDNA Sequences
Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art.
Alternatively, cDNA sequences are used as "component" sequences that are assembled into "template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). A series of BLAST comparisons is performed and low-information segments and repetitive elements
(e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available. When additional sequences are added into the RDMS, a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves. After the new sequences have been assigned to templates, the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
Once gene bins have been generated based upon sequence alignments, bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged.
A resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
Analysis of the cDNA Sequences
The cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 1.1; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853; and Table 6.) These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
Computer programs known to those of skill in the art for performing computer-assisted searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local
Alignment Search Tool (BLAST; Altschul, S.F. ( 1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410). BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropriate search tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query mddt or MDDT of the present invention.
Other approaches to the identification, assembly, storage, and display of nucleotide and polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incorporated by reference herein in their entirety. Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif,
BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
Human Disease Detection and Treatment Molecule Sequences
The mddt of the present invention may be used for a variety of diagnostic and therapeutic purposes. For example, an mddt may be used to diagnose a particular condition, disease, or disorder associated with disease detection and treatment molecules. Such conditions, diseases, and disorders include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; and an autoimmune/inflammatory disorder, such as actinic keratosis, acquired immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome, allergies, ankylosing spondylitis, amyloidosis, anemia, arteriosclerosis, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cirrhosis, contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome, gout, Graves' disease, Hashimoto's thyroiditis, paroxysmal nocturnal hemoglobinuria, hepatitis, hypereosinophilia, irritable bowel syndrome, episodic lymphopenia with lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma,
Sjόgren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, primary thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic cancer including lymphoma, leukemia, and myeloma. The mddt can be used to detect the presence of, or to quantify the amount of, an mddt-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is established. Alternatively, a polynucleotide complementary to a given mddt can inhibit or inactivate a therapeutically relevant gene related to the mddt.
Analysis of mddt Expression Patterns
The expression of mddt may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of mddt expression. For example, the level of expression of mddt may be compared among different cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at different developmental stages, or among cell types or tissues undergoing various treatments. This type of analysis is useful, for example, to assess the relative levels of mddt expression in fully or partially differentiated cells or tissues, to determine if changes in mddt expression levels are correlated with the development or progression of specific disease states, and to assess the response of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies. Methods for the analysis of mddt expression are based on hybridization and amplification technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utilize, for example, microarrays, and PCR-based procedures.
Hybridization and Genetic Analysis
The mddt, their fragments, or complementary sequences, may be used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The mddt may be hybridized to naturally occurring or recombinant nucleic acid sequences under appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the nucleic acid sequence of at least one of the mddt allows for the detection of nucleic acid sequences, including genomic sequences, which are identical or related to the mddt of the Sequence Listing. Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ ID NO: 1-25 and tested for their ability to identify or amplify the target nucleic acid sequence using standard protocols.
Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in SEQ ID NO: 1-25 and fragments thereof, can be identified using various conditions of stringency. (See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions are discussed in "Definitions."
A probe for use in Southern or northern hybridization may be derived from a fragment of an mddt sequence, or its complement, that is up to several hundred nucleotides in length and is either single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substrates containing mddt. Microarrays are particularly suitable for identifying the presence of and detecting the level of expression for multiple genes of interest by examining gene expression correlated with, e.g., various stages of development, treatment with a drug or compound, or disease progression. An array analogous to a dot or slot blot may be used to arrange and link polynucleotides to the surface of a substrate using one or more of the following: mechanical
(vacuum), chemical, thermal, or UV bonding procedures. Such an array may contain any number of mddt and may be produced by hand or by using available devices, materials, and machines.
Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g., Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/251 1 16; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 2155; and Heller, MJ. et al. (1997) U.S. Patent No. 5,605,662.)
Probes may be labeled by either PCR or enzymatic techniques using a variety of commercially available reporter molecules. For example, commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies). Alternatively, mddt may be cloned into commercially available vectors for the production of RNA probes. Such probes may be transcribed in the presence of at least one labeled nucleotide (e.g., 32P-ATP, Amersham Pharmacia Biotech).
Additionally the polynucleotides of SEQ ID NO: 1-25 or suitable fragments thereof can be used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures well known in the art, e.g., cDNA library screening, PCR amplification, etc. The molecular cloning of such full length cDNA sequences may employ the method of cDNA library screening with probes using the hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6. These procedures may also be employed with genomic libraries to isolate genomic sequences of mddt in order to analyze, e.g., regulatory elements.
Genetic Mapping
Gene identification and mapping are important in the investigation and treatment of almost all conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder. For example, cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream, and diabetes may result when a particular individual's immune system is activated by an infection and attacks the insulin-producing cells of the pancreas. In some studies, Alzheimer's disease has been linked to a gene on chromosome 21 ; other studies predict a different gene and location. Mapping of disease genes is a complex and reiterative process and generally proceeds from genetic linkage analysis to physical mapping.
As a condition is noted among members of a family, a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers. (See, for example, Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) Occasionally, genetic markers and their locations are known from previous studies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online Mendelian Inheritance in Man (OMIM) World Wide Web site.
In another embodiment of the invention, mddt sequences may be used to generate hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences.
Either coding or noncoding sequences of mddt may be used, and in some instances, noncoding sequences may be preferable over coding sequences. For example, conservation of an mddt coding sequence among members of a multi-gene family may potentially cause undesired cross hybridization during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B.J. ( 1991 ) Trends Genet. 7 : 149- 154.)
Fluorescent in situ hybridization (FISH) may be correlated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of mddt on a physical chromosomal map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder. The mddt sequences may also be used to detect polymorphisms that are genetically linked to the inheritance of a particular condition, disease, or disorder.
In situ hybridization of chromosomal preparations and genetic mapping techniques, such as linkage analysis using established chromosomal markers, may be used for extending existing genetic maps. Often the placement of a gene on the chromosome of another mammalian species, such as mouse, may reveal associated markers even if the number or arm of the corresponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques. Once a disease or syndrome has been crudely correlated by genetic linkage with a particular genomic region, e.g., ataxia-telangiectasia to 1 lq22-23, any sequences mapping to that area may represent associated or regulatory genes for further investigation. (See, e.g., Gatti, R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequences of the subject invention may also be used to detect differences in chromosomal architecture due to translocation, inversion, etc., among normal, carrier, or affected individuals.
Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., translocations or inversions) that may be correlated with disease. This process requires a physical map of the chromosomal region containing the disease-gene of interest along with associated markers. A physical map is necessary for determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are well known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is determined, the DNA from that region is obtained by consulting the catalog and selecting clones from that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
Diagnostic Uses
The mddt of the present invention may be used to design probes useful in diagnostic assays. Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of mddt expression. Labeled probes developed from mddt sequences are added to a sample under hybridizing conditions of desired stringency. In some instances, mddt, or fragments or oligonucleotides derived from mddt, may be used as primers in amplification steps prior to hybridization. The amount of hybridization complex formed is quantified and compared with standards for that cell or tissue. If mddt expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease. Qualitative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays.
The probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of mddt expression, or to evaluate the efficacy of a particular therapeutic treatment. The candidate probe may be identified from the mddt that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the treatment of an individual patient. In a typical process, standard expression is established by methods well known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods well known to those skilled in the art may be use to determine the significance of such therapeutic agents. The polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA. The polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for amplifying and isolating such selected DNA, which can then be sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique ID database is established for an individual, positive identification of that individual can be made from extremely small tissue samples.
In a particular aspect, oligonucleotide primers derived from the mddt of the invention may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP detection include, but are not limited to, single-stranded conformation polymoφhism (SSCP) and fluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from mddt are used to amplify DNA using the polymerase chain reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels. In fSCCP, the oligonucleotide primers are fluorescently labeled, which allows detection of the amplimers in high-throughput equipment such as DNA sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP (isSNP), are capable of identifying polymoφhisms by comparing the sequences of individual overlapping DNA fragments which assemble into a common consensus sequence. These computer-based methods filter out sequence variations due to laboratory preparation of DNA and sequencing errors using statistical models and automated analyses of DNA sequence chromatograms. In the alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). DNA-based identification techniques are critical in forensic technology. DNA sequences taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, semen, etc., can be amplified using, e.g., PCR, to identify individuals. (See, e.g., Erlich, H. (1992) PCR Technology, Freeman and Co., New York, NY). Similarly, polynucleotides of the present invention can be used as polymoφhic markers. There is also a need for reagents capable of identifying the source of a particular tissue.
Appropriate reagents can comprise, for example, DNA probes or primers prepared from the sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue cultures for contamination. The polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, and as an antigen to elicit an immune response. Disease Model Systems Using mddt
The mddt of the invention or their mammalian homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) cells. Such techniques are well known in the art and are useful for the generation of animal models of human disease. (See, e.g., U.S. Patent Number 5,175,383 and U.S. Patent Number 5,767,337.) For example, mouse ES cells, such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture. The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Science 244: 1288-1292). The vector integrates into the corresponding region of the host genome by homologous recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) Clin. Invest. 97: 1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. The mddt of the invention may also be manipulated in vitro in ES cells derived from human blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
The mddt of the invention can also be used to create "knockin" humanized animals (pigs) or transgenic animals (mice or rats) to model human disease. With knockin technology, a region of mddt is injected into animal ES cells, and the injected sequence integrates into the animal cell genome. Transformed cells are injected into blastulae, and the blastulae are implanted as described above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to overexpress mddt, resulting, e.g., in the secretion of MDDT in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
Screening Assays
MDDT encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule. Examples of such molecules include antibodies, oligonucleotides, proteins (e.g., receptors), or small molecules. Preferably, the molecule is closely related to the natural ligand of the polypeptide, e.g., a ligand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, Coligan et al., (1991 ) Current Protocols in Immunology 1(2): Chapter 5.) Similarly, the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, e.g., the active site. In either case, the molecule can be rationally designed using known techniques. Preferably, the screening for these molecules involves producing appropriate cells which express the polypeptide, either as a secreted protein or on the cell membrane. Preferred cells include cells from mammals, yeast, Drosophila. or E. coli. Cells expressing the polypeptide or cell membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed.
An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. Alternatively, the assay may assess binding in the presence of a labeled competitor.
Additionally, the assay can be carried out using cell-free preparations, polypeptide/molecule affixed to a solid support, chemical libraries, or natural product mixtures. The assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard.
Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate.
All of the above assays can be used in a diagnostic or prognostic context. The molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the assays can discover agents which may inhibit or enhance the production of the polypeptide from suitably manipulated cells or tissues.
Transcript Imaging and Toxicological Testing
Another embodiment relates to the use of mddt to develop a transcript image of a tissue or cell type. A transcript image represents the global pattern of gene expression by a particular tissue or cell type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly incoφorated by reference herein.) Thus a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totality of transcripts or reverse transcripts of a particular tissue or cell type. In one embodiment, the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a plurality of elements on a microarray. The resultant transcript image would provide a profile of gene activity pertaining to disease detection and treatment molecules.
Transcript images which profile mddt expression may be generated using transcripts isolated from tissues, cell lines, biopsies, or other biological samples. The transcript image may thus reflect mddt expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a cell line.
Transcript images which profile mddt expression may also be used in conjunction with in vitro model systems and preclinical evaluation of pharmaceuticals, as well as toxicological testing of industrial and naturally-occurring environmental compounds. All compounds induce characteristic gene expression patterns, frequently termed molecular fingeφrints or toxicant signatures, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N. L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known toxicity, it is likely to share those toxic properties. These fingeφrints or signatures are most useful and refined when they contain expression information from a large number of genes and gene families. Ideally, a genome-wide measurement of expression provides the highest quality signature. Even genes whose expression is not altered by any tested compounds are important as well, as the levels of expression of these genes are used to normalize the rest of the expression data. The normalization procedure is useful for comparison of expression data after treatment with different compounds. While the assignment of gene function to elements of a toxicant signature aids in inteφretation of toxicity mechanisms, knowledge of gene function is not necessary for the statistical matching of signatures which leads to prediction of toxicity. (See, for example, Press Release 00-02 from the National Institute of Environmental Health Sciences, released February 29, 2000, available at http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore, it is important and desirable in toxicological screening using toxicant signatures to include all expressed gene sequences.
In one embodiment, the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound. Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified. The transcript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the transcript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
Another particular embodiment relates to the use of MDDT encoded by polynucleotides of the present invention to analyze the proteome of a tissue or cell type. The term proteome refers to the global pattern of protein expression in a particular tissue or cell type. Each protein component of a proteome can be subjected individually to further analysis. Proteome expression patterns, or profiles, are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time. A profile of a cell's proteome may thus be generated by separating and analyzing the polypeptides of a particular tissue or cell type. In one embodiment, the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra). The proteins are visualized in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains. The optical density of each protein spot is generally proportional to the level of the protein in the sample. The optical densities of equivalently positioned protein spots from different samples, for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment. The proteins in the spots are partially sequenced using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry. The identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
A proteomic profile may also be generated using antibodies specific for MDDT to quantify the levels of MDDT expression. In one embodiment, the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microarray to the sample and detecting the levels of protein bound to each array element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-1 1 ; Mendoze, L. G. et al. (1999) Biotechniques 27:778-88). Detection may be performed by a variety of methods known in the art, for example, by reacting the proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
Toxicant signatures at the proteome level are also useful for toxicological screening, and should be analyzed in parallel with toxicant signatures at the transcript level. There is a poor correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N. L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatures may be useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile. In addition, the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reliable and informative in such cases. In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the corresponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample. Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the MDDT encoded by polynucleotides of the present invention.
In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated with antibodies specific to the MDDT encoded by polynucleotides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
Transcript images may be used to profile mddt expression in distinct tissue types. This process can be used to determine disease detection and treatment molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of mddt expression characteristic of diseased tissue. Transcript images of tissues before and after treatment may be used for diagnostic puφoses, to monitor the progression of disease, and to monitor the efficacy of drug treatments for diseases which affect the activity of disease detection and treatment molecules.
Transcript images of cell lines can be used to assess disease detection and treatment molecule activity and/or to identify cell lines that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, and a transcript image following treatment may indicate the efficacy of these agents in restoring desired levels of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in disease detection and treatment molecule activity. Candidate pharmaceutical agents may be evaluated by comparing their associated transcript images with those of pharmaceutical agents of known effectiveness.
Antisense Molecules The polynucleotides of the present invention are useful in antisense technology. Antisense technology or therapy relies on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics, Humana Press Inc., Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40: 1-49; Sharma, H.W. and R. Narayanan (1995) Bioessays 17(12): 1055-1063; and Lavrosky, Y. et al. (1997) Biochem. Mol. Med. 62(1): 1 1-22.) An antisense sequence is a polynucleotide sequence capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J.J. et al. (1991) Antisense Res. Dev. l(3):285-288; Lee, R. et al. (1998) Biochemistry 37(3):900-1010; Pardridge, W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen, P. E. and Haaima, G. (1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of expression occurs through hybridization or binding of complementary base pairs. Antisense sequences can also bind to DNA duplexes through specific interactions in the major groove of the double helix.
The polynucleotides of the present invention and fragments thereof can be used as antisense sequences to modify the expression of the polypeptide encoded by mddt. The antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (PE Biosystems) or other automated systems known in the art. Antisense sequences can also be produced biologically, such as by transforming an appropriate host cell with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.) In therapeutic use, any gene delivery system suitable for introduction of the antisense sequences into appropriate target cells can be used. Antisense sequences can be delivered intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the cellular sequence encoding the target protein. (See, e.g., Slater, J.E., et al. (1998) J. Allergy Clin. Immunol. 102(3):469-475; and Scanlon, K.J., et al. (1995) 9(13): 1288-1296.) Antisense sequences can also be introduced intracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g., Miller, A.D. (1990) Blood 76:271; Ausubel, F.M. et al. (1995) Current Protocols in Molecular Biology. John Wiley & Sons, New York NY; Uckert, W. and W. Walther ( 1994) Pharmacol. Ther. 63(3):323-347.) Other gene delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems known in the art. (See, e.g., Rossi, J.J. (1995) Br. Med. Bull. 51(l):217-225; Boado, R.J. et al. (1998) J. Pharm. Sci. 87(11): 1308-1315; and Morris, M.C. et al. (1997) Nucleic Acids Res. 25(14):2730- 2736.)
Expression In order to express a biologically active MDDT, the nucleotide sequences encoding MDDT or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational control of the inserted coding sequence in a suitable host. Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding MDDT and appropriate transcriptional and translational control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; and Ausubel, supra. Chapters 9, 10, 13, and 16.)
A variety of expression vector/host systems may be utilized to contain and express sequences encoding MDDT. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammalian) cell systems. (See, e.g., Sambrook, supra; Ausubel, 1995, supra. Van Heeke, G. and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Bitter, G.A. et al. (1987) Methods Enzymol. 153:516-544; Scorer, CA. et al. (1994) Bio/Technology 12: 181-184; Engelhard, E.K. et al. (1994) Proc. Natl. Acad. Sci. USA 91 :3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7: 1937-1945; Takamatsu, N. (1987) EMBO J. 6:307-311; Coruzzi, G. et al. (1984) EMBO J. 3: 1671-1680; Broglie, R. et al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105; The McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, New York NY, pp. 191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Harrington, J.J. et al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, adenoviruses, or heφes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of nucleotide sequences to the targeted organ, tissue, or cell population. (See, e.g., Di Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. USA 90(13):6340-6344; Buller, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al.
(1994) Mol. Immunol. 31(3):219-226; and Verma, LM. and N. So ia (1997) Nature 389:239-242.) The invention is not limited by the host cell employed.
For long term production of recombinant proteins in mammalian systems, stable expression of MDDT in cell lines is preferred. For example, sequences encoding MDDT can be transformed into cell lines using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems may be used to recover transformed cell lines. (See, e.g., Wigler, M. et al. (1977) Cell 11 :223-232; Lowy, I. et al. (1980) Cell 22:817-823.; Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14; Hartman, S.C. and R.CMulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051; Rhodes, CA.
(1995) Methods Mol. Biol. 55: 121-131.)
Therapeutic Uses of mddt
The mddt of the invention may be used for somatic or germline gene therapy. Gene therapy may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCID)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; Bordignon, C et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207- 216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from Factor VHI or Factor IX deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, I.M. and Somia, N. (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g., in the case of cancers which result from unregulated cell proliferation), or (iii) express a protein which affords protection against intracellular parasites (e.g., against human retroviruses, such as human immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA. 93:1 1395-11399), hepatitis B or C virus (HBV, HCV); fungal parasites, such as Candida albicans and Paracoccidioides brasiliensis; and protozoan parasites such as Plasmodium falciparum and Trypanosoma cruzi). In the case where a genetic deficiency in mddt expression or regulation causes disease, the expression of mddt from an appropriate population of transduced cells may alleviate the clinical manifestations caused by the genetic deficiency.
In a further embodiment of the invention, diseases or disorders caused by deficiencies in mddt are treated by constructing mammalian expression vectors comprising mddt and introducing these vectors by mechanical means into mddt-deficient cells. Mechanical transfer technologies for use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450).
Expression vectors that may be effective for the expression of mddt include, but are not limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), PCMV-SCRIPT, PCMV-TAG, PEGSH/PERV (Stratagene, La Jolla CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). The mddt of the invention may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or β-actin genes), (ii) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi, F.M.V. and Blau, H.M. (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX plasmid (Invitrogen); the ecdysone-inducible promoter (available in the plasmids PVGRXR and PIND; Invitrogen); the FK506/rapamycin inducible promoter; or the RU486/mifepristone inducible promoter (Rossi, F.M.V. and Blau, H.M. supra), or (iii) a tissue-specific promoter or the native promoter of the endogenous gene encoding MDDT from a normal individual. Commercially available liposome transformation kits (e.g., the PERFECT LIPID
TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver polynucleotides to target cells in culture and require minimal effort to optimize experimental parameters. In the alternative, transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1 :841-845). The introduction of DNA to primary cells requires modification of these standardized mammalian transfection protocols.
In another embodiment of the invention, diseases or disorders caused by genetic defects with respect to mddt expression are treated by constructing a retrovirus vector consisting of (i) mddt under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) along with additional retrovirus c/s-acting RNA sequences and coding sequences required for efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are commercially available (Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 92:6733-6737), incoφorated by reference herein. The vector is propagated in an appropriate vector producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61 :1647-1650; Bender, M.A. et al. (1987) J. Virol. 61 : 1639-1646; Adam, M.A. and Miller, A.D. (1988) J. Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471 ; Zufferey, R. et al. (1998) J. Virol. 72:9873-9880). U.S. Patent Number 5,910,434 to Rigg ("Method for obtaining retrovirus packaging cell lines producing high transducing efficiency retroviral supernatant") discloses a method for obtaining retrovirus packaging cell lines and is hereby incoφorated by reference. Propagation of retrovirus vectors, transduction of a population of cells (e.g., CD4+ T-cells), and the return of transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267;
Bonyhadi, M . (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. U.S.A. 95: 1201-1206; Su, L. (1997) Blood 89:2283-2290).
In the alternative, an adenovirus-based gene therapy delivery system is used to deliver mddt to cells which have one or more genetic abnormalities with respect to the expression of mddt. The construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in the art. Replication defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incoφorated by reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:511-544 and Verma, I.M. and Somia, N. (1997) Nature 18:389:239-242, both incoφorated by reference herein. In another alternative, a heφes-based, gene therapy delivery system is used to deliver mddt to target cells which have one or more genetic abnormalities with respect to the expression of mddt. The use of heφes simplex virus (HSV)-based vectors may be especially valuable for introducing mddt to cells of the central nervous system, for which HSV has a tropism. The construction and packaging of heφes-based vectors are well known to those with ordinary skill in the art. A replication-competent heφes simplex virus (HSV) type 1-based vector has been used to deliver a reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res.169:385-395). The construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 5,804,413 to DeLuca ("Heφes simplex virus strains for gene transfer"), which is hereby incoφorated by reference. U.S. Patent Number 5,804,413 teaches the use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be transferred to a cell under the control of the appropriate promoter for puφoses including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV vectors, see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 163: 152-161, hereby incoφorated by reference. The manipulation of cloned heφesvirus sequences, the generation of recombinant virus following the transfection of multiple plasmids containing different segments of the large heφesvirus genomes, the growth and propagation of heφesvirus, and the infection of cells with heφesvirus are techniques well known to those of ordinary skill in the art. In another alternative, an alphavirus (positive, single-stranded RNA virus) vector is used to deliver mddt to target cells. The biology of the prototypic alphavirus, Semliki Forest Virus (SFV), has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA replication, a subgenomic RNA is generated that normally encodes the viral capsid proteins. This subgenomic RNA replicates to higher levels than the full-length genomic RNA, resulting in the oveφroduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase).
Similarly, inserting mddt into the alphavirus genome in place of the capsid-coding region results in the production of a large number of mddt RNAs and the synthesis of high levels of MDDT in vector transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the needs of the gene therapy application (Dryga, S.A. et al. ( 1997) Virology 228:74-83). The wide host range of alphaviruses will allow the introduction of mddt into a variety of cell types. The specific transduction of a subset of cells in a population may require the sorting of cells prior to transduction. The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are well known to those with ordinary skill in the art. Antibodies
Anti-MDDT antibodies may be used to analyze protein expression levels. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For descriptions of and protocols of antibody technologies, see, e.g., Pound J.D. (1998) Immunochemical Protocols, Humana Press, Totowa, NJ.
The amino acid sequence encoded by the mddt of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity. The optimal sequences for immunization are selected from the C-terminus, the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be exposed to the external environment when the polypeptide is in its natural conformation. Analysis used to select appropriate epitopes is also described by Ausubel (1997, supra. Chapter 1 1.7). Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of at five amino acids, preferably at least 10 amino acids, and most preferably 15 amino acids. A peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet cyanin (KLH; Sigma, St. Louis MO) for antibody production. A peptide encompassing an antigenic region may be expressed from an mddt, synthesized as described above, or purified from human cells.
Procedures well known in the art may be used for the production of antibodies. Various hosts including mice, goats, and rabbits, may be immunized by injection with a peptide. Depending on the host species, various adjuvants may be used to increase immunological response.
In one procedure, peptides about 15 residues in length may be synthesized using an ABI 431 A peptide synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra). Rabbits are immunized with the peptide- KLH complex in complete Freund's adjuvant. The resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting. In another procedure, isolated and purified peptide may be used to immunize mice (about 100 μg of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of peptide is sufficient for labeling and screening several thousand clones. Hybridomas of interest are detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody. In a typical protocol, wells of a multi-well plate (FAST, Becton-Dickinson, Palo Alto, CA) are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species
IgG) antibodies at 10 mg/ml. The coated wells are blocked with 1 % BSA and washed and exposed to supernatants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 mg/ml. Clones producing antibodies bind a quantity of labeled peptide that is detectable above background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are injected into pristane-treated mice to produce ascites, and monoclonal antibody is purified from the ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several procedures for the production of monoclonal antibodies, including in vitro production, are described in Pound (supra). Monoclonal antibodies with antipeptide activity are tested for anti-MDDT activity using protocols well known in the art, including ELISA, RIA, and immunoblotting.
Antibody fragments containing specific binding sites for an epitope may also be generated. For example, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F(ab')2 fragments. Alternatively, construction of Fab expression libraries in filamentous bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity (Pound, supra. Chaps. 45-47). Antibodies generated against polypeptide encoded by mddt can be used to purify and characterize full-length MDDT protein and its activity, binding partners, etc.
Assays Using Antibodies
Anti-MDDT antibodies may be used in assays to quantify the amount of MDDT found in a particular human cell. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions. The peptides and antibodies of the invention may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are well known in the art. Examples include ELISA, RIA, and fluorescent activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes between the MDDT and its specific antibody and the measurement of such complexes. These and other assays are described in Pound (supra).
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The following preferred specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever. The disclosures of all patents, applications, and publications mentioned above and below, in particular U.S. Ser. No. 60/156,565 and U.S. Ser. No. 60/168,197 are hereby expressly incoφorated by reference.
EXAMPLES
I. Construction of cDNA Libraries RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods. Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA purity. In most cases, RNA was treated with DNase. For most libraries, poly(A+) RNA was isolated using oligo d(T)-coupled paramagnetic particles (Promega Coφoration (Promega), Madison WI), OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc., Austin TX).
In some cases, Stratagene was provided with RNA and constructed the corresponding cDNA libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP vector system (Stratagene Cloning Systems, Inc. (Stratagene), La Jolla CA) or SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse transcription was initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL SI 000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia
Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), pSPORTl plasmid (Life Technologies), or pINCY (Incyte). Recombinant plasmids were transformed into competent E. coli cells including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5α, DH10B, or ElectroMAX DH10B from Life Technologies.
II. Isolation of cDNA Clones
Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector system (Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN).
Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or without lyophilization, at 4°C
Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216: 1-14.) Host cell lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland).
III. Sequencing and Analysis cDNA sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 thermal cycler (PE Biosystems) or the PTC-200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific Coφ., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton). cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (PE Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel, 1997, supra. Chapter 1.1). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VHI.
IV. Assembly and Analysis of Sequences
Component sequences from chromatograms were subject to PHRED analysis and assigned a quality score. The sequences having at least a required quality score were subject to various preprocessing editing pathways to eliminate, e.g., low quality 3' ends, vector and linker sequences, polyA tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaller than 50 base pairs. In particular, low-information sequences and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) were replaced by "n's", or masked, to prevent spurious matches.
Processed sequences were then subject to assembly procedures in which the sequences were assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bins using BLASTn (v.l .4 WashU) and CROSSMATCH. Candidate pairs were identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at least 82% local identity were accepted into the bin. The component sequences from each bin were assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) strands are inherently disclosed herein. The component sequences which were used to assemble each template consensus sequence are listed in Tables 4A and 4B , along with their positions along the template nucleotide sequences. Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-split. Assembled templates were also subject to analysis by STITCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
Once gene bins were generated based upon sequence alignments, bins were clone joined based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actually belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
The final assembled templates were subsequently annotated using the following procedure. Template sequences were analyzed using BLASTn (v2.0, NCBI) versus gbpri (GenBank version 118). "Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 100% local identity over 100 base pairs, or a homolog match having an E-value, i.e. a probability score, of ≤ 1 x 10"8. The hits were subject to frameshift FASTx versus GENPEPT (GenBank version 118). (See Table 6). In this analysis, a homolog match was defined as having an E-value of < 1 x 10 ®. The assembly method used above was described in "System and Methods for Analyzing Biomolecular Sequences," U.S.S.N. 09/276,534, filed March 25, 1999, and the LIFESEQ Gold user manual (Incyte) both incoφorated by reference herein.
Following assembly, template sequences were subjected to motif, BLAST, and functional analyses, and categorized in protein hierarchies using methods described in, e.g., "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997; "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for
Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incoφorated by reference herein.
The template sequences were further analyzed by translating each template in all three forward reading frames and searching each translation against the Pfam database of hidden Markov model-based protein families and domains using the HMMER software package (available to the public from Washington University School of Medicine, St. Louis MO). Regions of templates which, when translated, contain similarity to Pfam consensus sequences are reported in Table 2, along with descriptions of Pfam protein domains and families. Only those Pfam hits with an E-value of < 1 x 10"3 are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam protein domains and families.)
Additionally, the template sequences were translated in all three forward reading frames, and each translation was searched against hidden Markov models for signal peptide and transmembrane domains using the HMMER software package. Construction of hidden Markov models and their usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Curr. Opin. Str. Biol. 6:361-365.) Regions of templates which, when translated, contain similarity to signal peptide or transmembrane domain consensus sequences are reported in Table 3. Only those signal peptide or transmembrane hits with a cutoff score of 1 1 bits or greater are reported. A cutoff score of 11 bits or greater corresponds to at least about 91-94% true-positives in signal peptide prediction, and at least about 75% true-positives in transmembrane domain prediction.
The results of HMMER analysis as reported in Tables 2 and 3 may support the results of BLAST analysis as reported in Table 1 or may suggest alternative or additional properties of template-encoded polypeptides not previously uncovered by BLAST or other analyses.
Template sequences are further analyzed using the bioinformatics tools listed in Table 6, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Template sequences may be further queried against public databases such as the GenBank rodent, mammalian, vertebrate, prokaryote, and eukaryote databases.
V. Analysis of Polynucleotide Expression
Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel, 1995, supra, ch. 4 and 16.) Analogous computer techniques applying BLAST were used to search for identical or related molecules in cDNA databases such as GenBank or LIFESEQ (Incyte Genomics). This analysis is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the computer search can be modified to determine whether any particular match is categorized as exact or similar. The basis of the search is the product score, which is defined as:
BLAST Score x Percent Identity
5 x minimum {length(Seq. 1), length(Seq. 2)}
The product score takes into account both the degree of similarity between two sequences and the length of the sequence match. The product score is a normalized value between 0 and 100, and is calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score. The product score represents a balance between fractional overlap and quality in a BLAST alignment. For example, a product score of 100 is produced only for 100% identity over the entire length of the shorter of the two sequences being compared. A product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
VI. Tissue Distribution Profiling
A tissue distribution profile is determined for each template by compiling the cDNA library tissue classifications of its component cDNA sequences. Each component sequence, is derived from a cDNA library constructed from a human tissue. Each human tissue is classified into one of the following categories: cardiovascular system; connective tissue; digestive system; embryonic structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract. Template sequences, component sequences, and cDNA library /tissue information are found in the LIFESEQ GOLD database (Incyte Genomics, Palo Alto CA).
Table 5 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with percentage values of > 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of <10% in all tissue categories. VII. Transcript Image Analysis
Transcript images are generated as described in Seilhamer et al., "Comparative Gene
Transcript Analysis," U.S. Patent Number 5,840,484, incoφorated herein by reference.
VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA
Oligonucleotide primers designed using an mddt of the Sequence Listing are used to extend the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template. The initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result in haiφin structures and primer-primer dimerizations are avoided. Selected human cDNA libraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed. High fidelity amplification is obtained by PCR using methods well known in the art. PCR is performed in 96-well plates using the PTC-200 thermal cycler (MJ Research). The reaction mix contains DNA template, 200 nmol of each primer, reaction buffer containing Mg2*, (NH4)2S04, and β- mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C In the alternative, the parameters for primer pair T7 and SK+ are as follows: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C The concentration of DNA in each well is determined by dispensing 100 μl PICOGREEN quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.5 μl of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Incoφorated (Corning), Corning NY), allowing the DNA to bind to the reagent. The plate is scanned in a FLUOROSKAN II (Labsystems Oy) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 μl to 10 μl aliquot of the reaction mixture is analyzed by electrophoresis on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence.
The extended nucleotides are desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones are religated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector
(Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coli cells. Transformed cells are selected on antibiotic-containing media, individual colonies are picked and cultured overnight at 37 CC in 384- well plates in LB/2x carbenicillin liquid media.
The cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide (1:2, v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE Biosystems). In like manner, the mddt is used to obtain regulatory sequences (promoters, introns, and enhancers) using the procedure above, oligonucleotides designed for such extension, and an appropriate genomic library.
IX. Labeling of Probes and Southern Hybridization Analyses Hybridization probes derived from the mddt of the Sequence Listing are employed for screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 1000 nucleotides in length is specifically described, but essentially the same procedure may be used with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using a T4 polynucleotide kinase, γ32P-ATP, and 0.5X One-Phor-All Plus (Amersham Pharmacia Biotech) buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The probe mixture is diluted to 107 dpm/μg/ml hybridization buffer and used in a typical membrane-based hybridization analysis.
The DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed through a 0.7% agarose gel. The DNA fragments are transferred from the agarose to nylon membrane (NYTRAN Plus, Schleicher & Schuell, Inc., Keene NH) using procedures specified by the manufacturer of the membrane. Prehybridization is carried out for three or more hours at 68 °C, and hybridization is carried out overnight at 68 °C. To remove non-specific signals, blots are sequentially washed at room temperature under increasingly stringent conditions, up to 0.1 x saline sodium citrate (SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORHMAGER cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of standard and experimental lanes are compared. Essentially the same procedure is employed when screening RNA.
X. Chromosome Mapping of mddt
The cDNA sequences which were used to assemble SEQ ID NO: 1-25 are compared with sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other implementations of the Smith- Waterman algorithm. Sequences from these databases that match SEQ ID NO: 1-25 are assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as PHRAP (Table 6). Radiation hybrid and genetic mapping data available from public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences have been previously mapped. Inclusion of a mapped sequence in a cluster will result in the assignment of all sequences of that cluster, including its particular SEQ ID NO:, to that map location. The genetic map locations of SEQ ID NO: 1-25 are described as ranges, or intervals, of human chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p-arm. (The centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) The cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
XL Microarray Analysis
Probe Preparation from Tissue or Cell Samples
Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and polyA+ RNA is purified using the oligo (dT) cellulose method. Each polyA+ RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/μl oligo-dT primer (21mer), IX first strand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μM dGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription reaction is performed in a 25 ml volume containing 200 ng polyA+ RNA with GEMBRIGHT kits (Incyte). Specific control polyA+ RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA (W. Lei, unpublished). As quantitative controls, the control RNAs at 0.002 ng,
0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1 : 100,000, 1 : 10,000, 1 : 1000, 1 : 100 (w/w) to sample mRNA respectively. The control mRNAs are diluted into reverse transcription reaction at ratios of 1:3, 3: 1, 1 : 10, 10: 1, 1:25, 25: 1 (w/w) to sample mRNA differential expression patterns. After incubation at 37° C for 2 hr, each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Probes are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.
(CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 μl 5X SSC/0.2% SDS.
Microarray Preparation
Sequences of the present invention are used to generate array elements. Each array element is amplified from bacterial cells containing vectors with cloned cDNA inserts. PCR amplification uses primers complementary to the vector sequences flanking the cDNA insert. Array elements are amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 μg. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech).
Purified array elements are immobilized on polymer-coated glass slides. Glass microscope slides (Corning) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR Scientific Products Coφoration (VWR), West Chester, PA), washed extensively in distilled water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 110°C oven. Array elements are applied to the coated glass substrate using a procedure described in US
Patent No. 5,807,522, incoφorated herein by reference. 1 μl of the array element DNA, at an average concentration of 100 ng/μl, is loaded into the open capillary printing element by a high-speed robotic apparatus. The apparatus then deposits about 5 nl of array element sample per slide.
Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C followed by washes in 0.2% SDS and distilled water as before.
Hybridization
Hybridization reactions contain 9 μl of probe mixture consisting of 0.2 μg each of Cy3 and Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The probe mixture is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 cm2 coverslip. The arrays are transferred to a wateφroof chamber having a cavity just slightly larger than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 μl of 5x SSC in a corner of the chamber. The chamber containing the arrays is incubated for about 6.5 hours at 60° C. The arrays are washed for 10 min at 45° C in a first wash buffer (IX SSC,
0.1% SDS), three times for 10 minutes each at 45° C in a second wash buffer (0.1X SSC), and dried.
Detection 5 Reporter-labeled hybridization complexes are detected with a microscope equipped with an
Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY). The slide containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 0 scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers.
In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate 5 filters positioned between the array and the photomultiplier tubes are used to filter the signals. The emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, although the apparatus is capable of recording the spectra from both fluorophores simultaneously. The sensitivity of the scans is typically calibrated using the signal intensity generated by a 0 cDNA control species added to the probe mix at a known concentration. A specific location on the array contains a complementary DNA sequence, allowing the intensity of the signal at that location to be correlated with a weight ratio of hybridizing species of 1 : 100,000. When two probes from different sources (e.g., representing test and control cells), each labeled with a different fluorophore, are hybridized to a single array for the puφose of identifying genes that are differentially expressed, 5 the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture.
The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital (A/D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an IBM-compatible PC computer. The digitized data are displayed as an image where the signal intensity is mapped using a o linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore 's emission spectrum.
A grid is superimposed over the fluorescence signal image such that the signal from each spot 5 is centered in each element of the grid. The fluorescence signal within each element is then integrated to obtain a numerical value corresponding to the average intensity of the signal. The software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
XII. Complementary Nucleic Acids
Sequences complementary to the mddt are used to detect, decrease, or inhibit expression of the naturally occurring nucleotide. The use of oligonucleotides comprising from about 15 to 30 base pairs is typical in the art. However, smaller or larger sequence fragments can also be used. Appropriate oligonucleotides are designed from the mddt using OLIGO 4.06 software (National Biosciences) or other appropriate programs and are synthesized using methods standard in the art or ordered from a commercial supplier. To inhibit transcription, a complementary oligonucleotide is designed from the most unique 5' sequence and used to prevent transcription factor binding to the promoter sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding and processing of the transcript.
XIII. Expression of MDDT Expression and purification of MDDT is accomplished using bacterial or virus-based expression systems. For expression of MDDT in bacteria, cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription. Examples of such promoters include, but are not limited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g.,
BL21 (DE3). Antibiotic resistant bacteria express MDDT upon induction with isopropyl beta-D- thiogalactopyranoside (1PTG). Expression of MDDT in eukaryotic cells is achieved by infecting insect or mammalian cell lines with recombinant Autographica californica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding MDDT by either homologous recombination or bacterial-mediated transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, supra; and Sandig, supra.)
In most expression systems, MDDT is synthesized as a fusion protein with, e.g., glutathione S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude cell lysates. GST, a 26- kilodalton enzyme from Schistosoma iaponicum. enables the purification of fusion proteins on immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from MDDT at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra. Chapters 10 and 16). Purified MDDT obtained by these methods can be used directly in the following activity assay.
XIV. Demonstration of MDDT Activity
MDDT, or biologically active fragments thereof, are labeled with 125I Bolton-Hunter reagent. (See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules previously arrayed in the wells of a multi-well plate are incubated with the labeled MDDT, washed, and any wells with labeled MDDT complex are assayed. Data obtained using different concentrations of MDDT are used to calculate values for the number, affinity, and association of MDDT with the candidate molecules. Alternatively, molecules interacting with MDDT are analyzed using the yeast two-hybrid system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH). MDDT may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
XV. Functional Assays
MDDT function is assessed by expressing mddt at physiologically elevated levels in mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containing a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV SPORT (Life Technologies) and pCR3.1 (Invitrogen Coφoration, Carlsbad CA), both of which contain the cytomegalovirus promoter. 5-10 μg of recombinant vector are transiently transfected into a human cell line, preferably of endothelial or hematopoietic origin, using either liposome formulations or electroporation. 1-2 μg of an additional plasmid containing sequences encoding a marker protein are co-transfected.
Expression of a marker protein provides a means to distinguish transfected cells from nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector. Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is used to identify transfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of the cells and other cellular properties.
FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with cell death. These events include changes in nuclear DNA content as measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to the cell surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow Cytometry, Oxford, New York NY.
The influence of MDDT on gene expression can be assessed using highly purified populations of cells transfected with sequences encoding MDDT and either CD64 or CD64-GFP. CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions of human immunoglobulin G (IgG). Transfected cells are efficiently separated from nontransfected cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill in the art. Expression of mRNA encoding MDDT and other genes of interest can be analyzed by northern analysis or microarray techniques.
XVI. Production of Antibodies
MDDT substantially purified using polyacrylamide gel electrophoresis (PAGE; see, e.g., Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to immunize rabbits and to produce antibodies using standard protocols.
Alternatively, the MDDT amino acid sequence is analyzed using LASERGENE software (DNASTAR) to determine regions of high immunogenicity, and a corresponding peptide is synthesized and used to raise antibodies by means known to those of skill in the art. Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well described in the art. (See, e.g., Ausubel, 1995, supra. Chapter 11.)
Typically, peptides 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (PE Biosystems) using fmoc -chemistry and coupled to KLH (Sigma) by reaction with N- maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., Ausubel, supra.) Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radio- iodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well known in the art, including ELISA, RIA, and immunoblotting. XVII. Purification of Naturally Occurring MDDT Using Specific Antibodies
Naturally occurring or recombinant MDDT is substantially purified by immunoaffinity chromatography using antibodies specific for MDDT. An immunoaffinity column is constructed by covalently coupling anti-MDDT antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions.
Media containing MDDT are passed over the immunoaffinity column, and the column is washed under conditions that allow the preferential absorbance of MDDT (e.g., high ionic strength buffers in the presence of detergent). The column is eluted under conditions that disrupt antibody/MDDT binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate ion), and MDDT is collected.
All publications and patents mentioned in the above specification are herein incoφorated by reference. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the above-described modes for carrying out the invention which are obvious to those skilled in the field of molecular biology or related fields are intended to be within the scope of the following claims.
TABLE 1
SEQ ID Template ID Gl Probability Annotation NO: Number Score
16 233624.1 1. dec g 131 560 9.00E-31 amyloid precursor protein-binding protein 1
7 246526.2.dec g7542723 1.00E-168 DHHC1 protein (Homo sapiens)
5 345638. l .oct g7406641 2.00E-90 EMeg32 protein (Mus musculus)
18 198840.3.dec g643590 0 Human alternatively spliced mRNA for NACP (precursor of non-A beta component
197170. l .oct g4389513 8.00E-45 Human homolog of Mus musculus izL protein (AA 4-1561) (Homo sapiens)
1 1 040422.12.dec g3341980 4.00E-66 huntingtin-interacting protein HYPA/FBP1 1 21 349415.4.dec g533523 1.00E-159 MAGE-6 antigen (Homo sapiens) 22 474778.3.dec g2077825 7.00E-62 MNK1 (Homo sapiens) 15 196774.3.dec g6457278 1.00E-59 pre-B lymphocyte protein 3 (Homo sapiens) 14 059263.6.dec g 1694682 1.00E-1 16 Src-like adapter protein (Homo sapiens) 13 012432.5.dec gl314316 2.00E-13 WD-40 motifs; up-regulated by thyroid hormone in tadpoles (Xenopus laevis)
TABLE 2
SEQ ID
NO: Template ID ! Start Stop Frame 1 Dfam Hit Pfam Description E-value
1 348736.2.oct 265 450 forward 1 KRAB PF01352 KRAB box 2.50E-07
2 0251 19.6.oct 179 367 forward 2 KRAB PF01352 KRAB box 1.80E-28
3 474539. l .oct 2 280 forward 2 PH PF0016 PH (pleckstrin 2.10E-08 homology) domain
4 197170. l .oct 194 262 forward 2 zf-C2H2 PF00096 Zinc finger, 3.10E-08 C2H2 type
5 345638. l .oct 248 640 forward 2 Acetyltransf PF00583 0.00033
Acetyltransferase (GNAT) family
6 408784.1. dec 207 335 forward 3 UBA UBA-domain 1.90E-06
7 246526.2.dec 570 764 forward 3 zf-DHHC DHHC zinc finger 2.60E-34 domain
8 200488.5.dec 89 619 forward 2 Peptidase_C15 Pyroglutamyl 3.30E-04
9 474878.1. dec 1003 1 1 16 forward 1 zf-C3HC4 Zinc finger, C3HC4 1.50E-05 type (RING finger)
10 335916.2.dec 1053 1 151 forward 3 ank Ank repeat 1.10E-06
1 1 D40422.12.dec 478 567 forward 1 WW_rsp5_WWP WW domain 2.40E-12
12 977651 ,2.dec 718 924 forward 1 NifU-like NifU-like domain 3.60E-30
13 012432.5.dec 280 396 forward 1 WD40 WD domain, G-beta 7.00E-05 repeat
14 059263.6.dec 645 875 forward 3 SH2 Src homology domain 1.30E-33
15 196774.3.dec 695 949 forward 2 ig Immunoglobulin 2.10E-09
16 .33624.1 1. dec 345 656 forward 3 ThiF_family ThiF family 4.00E-05
16 .33624.1 1. dec 245 730 forward 2 ThiF_family ThiF family 4.90E-04
17 228585.3.dec 927 1250 forward 3 PH PH domain 1.50E-06
17 228585.3.dec 294 833 forward 3 RhoGEF RhoGEF domain 7.00E-39
17 228585.3.dec 21 185 forward 3 SH3 Src homology domain 1.20E-08
18 198840.3.dec 137 502 forward 2 Synuclein Synuclein 2.40E-72
19 082154.5.dec 50 340 forward 2 FCH Fes/CIP4 homology 7.60E-05 domain
20 368396.5.dec 3391 3555 forward 1 SH3 Src homology domain 2.40E-21
21 349415.4.dec 2408 3094 forward 2 MAGE MAGE family 1.20E-134
22 474778.3.dec 297 542 forward 3 pkinase Eukaryotic protein 6.50E-13 kinase domain
23 330933.5.dec 209 604 forward 2 DAGKc Diacylglycerol kinase 4.80E-04 catalytic domain (presumed)
24 998036.2.dec 168 332 forward 3 SH3 Src homology domain 9.60E-20
24 998036.2.dec 956 1 126 forward 2 SH3 Src homology domain 2.00E-17
25 999304.1. dec 78 218 forward 3 KRAB KRAB box 2.30E-17 TABLE 3
SEQ ID NO: Template ID Start Stop Frame Domain Type
5 345638. l .oct 1601 1657 forward 2 TM
5 345638. l .oct 243 296 forward 3 TM
7 246526.2.dec 366 419 forward 3 TM
7 246526.2.dec 738 812 forward 3 TM
7 246526.2.dec 738 797 forward 3 TM
7 246526.2.dec 375 452 forward 3 TM
7 246526.2.dec 855 91 1 forward 3 TM
7 246526.2.dec 849 923 forward 3 TM
7 246526.2.dec 861 938 forward 3 TM
7 246526.2.dec 735 797 forward 3 TM
7 246526.2.dec 855 908 forward 3 TM
7 246526.2.dec 2714 2797 forward 2 TM
9 474878.1. dec 1493 1561 forward 2 SP
9 474878.1. dec 126 194 forward 3 SP
9 474878.1. dec 852 902 forward 3 TM
9 474878.1. dec 2092 2163 forward 1 SP
9 474878.1. dec 1514 1573 forward 2 TM
10 335916.2.dec 579 638 forward 3 SP
10 335916.2.dec 555 638 forward 3 SP
10 335916.2.dec 1306 1389 forward 1 SP
1 1 040422.12.dec 865 933 forward 1 SP
1 1 040422.12.dec 945 1001 forward 3 SP
1 1 040422.12.dec 939 1007 forward 3 SP
1 1 040422.12.dec 939 1001 forward 3 TM
1 1 040422.12.dec 939 986 forward 3 SP
1 1 040422.12.dec 939 1001 forward 3 SP
1 1 040422.12.dec 945 1055 forward 3 SP
15 196774.3.dec 84 158 forward 3 SP
15 196774.3.dec 1 1 1 164 forward 3 TM
15 196774.3.dec 84 146 forward 3 SP
16 233624.1 1. dec 508 585 forward 1 SP
17 228585.3.dec 2343 2396 forward 3 TM
17 228585.3.dec 4942 4998 forward 1 SP
17 228585.3.dec 4975 5019 forward 1 SP
17 228585.3.dec 5218 5298 forward 1 SP
17 228585.3.dec 1633 1713 forward 1 SP
17 228585.3.dec 4417 4491 forward 1 SP
17 228585.3.dec 4942 5010 forward 1 SP
17 228585.3.dec 4942 5016 forward 1 SP
17 228585.3.dec 4975 5034 forward 1 SP
17 228585.3.dec 4942 5034 forward 1 SP
20 368396.5.dec 597 680 forward 3 SP
20 368396.5.dec 2585 2659 forward 2 SP
20 368396.5.dec 2585 2668 forward 2 SP
20 368396.5.dec 1051 1 137 forward 1 SP
20 368396.5.dec 1051 1 128 forward 1 SP
20 368396.5.dec 748 813 forward 1 SP
23 330933.5.dec 3492 3551 forward 3 TM
23 330933.5.dec 2174 2239 forward 2 TM
23 330933.5.dec 2627 2677 forward 2 TM TABLE 3
SEQ ID NO: Template ID Start Stop Frame Domain Type
23 330933.5.dec 2502 2552 forward 3 TM
23 330933.5.dec 2940 3026 forward 3 SP
23 330933.5.dec 2592 2651 forward 3 SP
23 330933.5.dec 2502 2549 forward 3 SP
23 330933.5.dec 2502 2567 forward 3 SP
23 330933.5.dec 2502 2555 forward 3 SP
23 330933.5.dec 2502 2561 forward 3 SP
90 O
*o
© o
H Q- O O C o o co ^ O O Γ^ LO LO O IO -— o o <3 co o •— CN -g iO cO CM cO -— LO O O -— 00 00 CO T M -O ^t o> n t O N CO 'vT ^ O o O O 00 -o o <ϊ CO ^ N O O O O Λ lO MO N ll - OI -O I N C W ^ CM iO C lO O MO O O ffl 0>
^ o •— r~~ o co CM U Jj iO ιO iO IO lO -O r- r- r- i- <3 0 io iO O O ifl 3 -0 -O O O O O C CNI iO iO <3 ( -O C) ( O ^ O cO ^ co "vi- CM r— co ^r α.
t ^ 0 ^ ^ O ∞ ι- 3 O O O '- CNI ^ - > ^ W O O M O- WΪ O N D , N !≤ £8 M o r^ t^ o~ o o o o o l-, l-, t-, <-J o n o o r- N ^ io o Ma o o o i- '- JJI '- 'Ϊ ^' OO IO CO OO IO IO ^_ CM "vT CM C
CM CM CM CM CN CN CM CN CM '- ,— "~ "~ eO eO CO CO ^ CN CN CN CN CM CN CN CO CO ^ ■- CO ^ O ^ }
Q
υO oO oO oO oO oO oO oO oO oO oO υO oO oO oO oO υOoO oO oO oO oO oO oO oO oO oO oO Oo oO oO oO oO oO oO oO oO oO oO oO oO oO oO Oo oO oO υO υO oO oO
^ C CN < CN < CN CN CM CN < CN CM CN CM CM j > ) ^
.o ) v 5 ) ) ) ) ) ) ) c c c c c c o^ o^ c o^ c c o^
Q. CO C CO CO CO C C CO CO CO CO CO CO C CO -— ■ — ■ — ■ — • — ■ — ■ — ■ — ■ — • — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — ■ — co co co co co co co co r N N N N S N N N N r- r- - r- r- r- r- r- ,— ,— ,— i— ,— i— ■— ,— ,— ,— ,— ,— ,— ,— ,— ,— ■— ,— ■— c ∞ co ∞ ω oo ω co ∞ ω co o o oo ω oo w O W io Λ io io iΛ io w io uj j in fl io io iΛ io io Λ io io io w io uj ^ ^ ^ ^ ^ 'ϊ ^ ^
I— c CO CO CO CO CO CO CO cO cO CO cO cO cO cO O O O O O O O O O O O O O O O O O O O O O O O O O O O ^r '^- '^- ^- ^ ^- ^- '^
o
90 Ω CN CM CM CN CM CN CN CN CM CN CM CM CN CN CN CN CN CM CN CN CM CM CM CM CM CN CN CO CO CO CO CO cO CO CO G5
UJ O
CΛ m
D
Z
O
O O O O O O O O O O O O O ^O O O O O ^O O -O - O O O O O -O -O -O O O O O -O O O O -O O ^ O -O O O O O O O l l l l l l J l J J J J l l l I I VJ I I I I VI I I I I I l l l l l l l l l l l l I I VI I I l J vJ J l
~~ ~~J ~~J -^i - — I ~^J -^J — I ' — I ' — I ^J ^ ^ — I — I — J — J — J — ~~J — I -^J ~-J ~~ι — J — I — J ~^ι — I ^J -~J ~~ - — I -~ — I -~J ^ -— --^J ^ — J — J ~-J — I -^J ^J — I --J — I "O
P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P Q b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b ^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O rj
co co to io i r to r fo to — ω ω a O' ii ω ω ω -' c ^ — ' lo to to to to — — • o o o o o O O O O O O «0 O O 00 00 00 00 J Oι Oι £-. ii ^ -+
O O O CO M M O' O' Ji O Ol OJ M M O O -' O O M M Q — ■ — ' lo cn o — ■ o o o to cn cn o n co co ό3 θ ^ ∞ oo M ω -i M θ> ω o -» IO —
M U M W OJ t- OO C M O — ■ I VI ■O Ol t> j Ji - ' O M Cn 00 - ' — ' CO O O - ■ > £* .£» O 00 q.
O £> 2 σi ω * J- C- CO U- Ci) W J- C. J- -O O O O O CO O M M CJi O co to o o o r o o to o io o co oo o •O M O M M O M M U fj
^ ω to oo o i io o i ω ω oo αi i i ^ ^ g g ^ ^ M tS cβ S ϊo O M M M O M M O M O M O -' M O O W 00 J. M O W O -Q
TABLE 4
SEQ ID NO: Template ID Component ID Start Stop
5 345638. l.oct 3524936H1 1221 1489
5 345638. l.oct 5395774T1 1288 1796
5 345638. l.oct 2452828F6 1197 1623
5 345638. l.oct 1439670T6 1289 1695
5 345638. l.oct 2452828H1 1197 1431
5 345638. l.oct 5539601 H2 1215 1439
5 345638. l.oct 2270342R6 1338 1690
5 345638. l.oct 2270342H1 1338 1591
5 345638. l.oct 2270342T6 1338 1642
5 345638. l.oct g4175458 1372 1736
5 345638. l.oct gl 154300 1380 1548
5 345638. l.oct g2241560 1399 1691
5 345638. l.oct g2318255 1412 1690
5 345638. l.oct 2671159T6 1467 1891
5 345638. l.oct g703628 1476 1696
5 345638. l.oct 5588741 HI 1523 1721
5 345638. l.oct 2968103H2 1624 1926
5 345638. l.oct 2825566H1 1634 1935
5 345638. l.oct g765459 1656 1851
5 345638. .oct g982043 1667 1843
5 345638. .oct g3872249 1790 2138
5 345638. .oct g3934859 1821 2138
5 345638. .oct 2160009F6 1824 2138
5 345638. .oct 2160081 HI 1824 2069
5 345638. .oct 2842652H1 1862 2090
5 345638. .oct 1980041R6 1865 2138
5 345638. .oct 1980041 HI 1865 2112
5 345638. .oct 3642137H1 215
5 345638. .oct 1439670H1 263
5 345638. .oct 1438620H1 250
5 345638. .oct 1439670F6 470
5 345638. .oct 1438620F1 417
5 345638. .oct 4509739H1 49 297
5 345638. .oct 3361128H1 68 328
5 345638. .oct 4977040H1 69 322
5 345638. .oct 3471990H1 69 309
5 345638. .oct 5863955H1 71 314
5 345638. .oct 3074876H1 72 344
5 345638. .oct 376471OH1 82 291
5 345638. .oct 269902H1 104 450
5 345638. .oct g4332349 182 516
5 345638. .oct 1953604H1 248 464
5 345638. .oct 5988669H1 271 460
5 345638. .oct 4970243H1 275 547
5 345638. .oct 3519477H1 295 462
5 345638. .oct 4970569H1 343 602
5 345638. .oct 4598609H1 405 587
5 345638/ .oct 2671159H1 485 729
5 345638.1 .oct 2671159F6 485 927
5 345638.1 .oct 2671152H1 485 729 m ©
M M M i M M M M M M M M M M O O OO^ o o o o cn c c cn cn ϋi tn cn cn ϋi cn ϋi cn c oi cn ϋi oi cii c ϋi ϋi r
o.ko> — i i uO b ''1 O π -+
IO IO —■ —■ —■ j- ω ω ω u ω ω ω ω c ω ω M f i M io -' -' v1 (n J- O O O -O 00 oo oo oo oo o cn O O O O O Ol Ol Ol Ol Ol CΛ vl O 4-» — ' — ' O" M m r j κι
— ' -O OO OO Co Co Co Co lO tO — ' — ' 00 4^ 1- — — ' Vl vj ^ oo — ' O O Ol t . cn to O Q O Co Oi O O O Oi — ' Co O Oi Oi lO O Co Oi Co Co tO O Ol —' O* 4l» O O vl vl Co O ∞ OO f ω oo o n n oo n
_- _, _, _, _ f _ , _, —, _, —■ ^ O Oi vl vJ O O O Oi Oi Oi vl Oi vl vl O Co vl CO OO vl 4-> - 3 - > 4 ^ o ^ g g ^ α ^ g g g ^ ^ a ^ ^ ^ ^ ^ ^ ^ ^ ω ^ ^ -o c» c oo „ o O O C θ 4^ 4^ I C ^ 4i i. IO IO 00 C O 00 00 GJ Oi £ cn vi oo o •O vl 45>. O O -Q
TABLE 4
SEQ ID NO: Template ID Component ID Start Stop
7 246526.2.dec 1861916H1 1419 1695
7 246526.2.dec g574012 1420 1616
7 246526.2.dec 4591933H1 1448 1710
7 246526.2.dec 1861 162T6 1467 1863
7 246526.2.dec 1861 162F6 1474 1901
7 246526.2.dec 1861 162H1 1475 1798
7 246526.2.dec 3856851 HI 1478 1761
7 246526.2.dec 5597738H1 1495 1704
7 246526.2.dec 5919136H1 1509 1777
7 246526.2.dec 2616733H1 1510 1748
7 246526.2.dec g3228879 1517 1913
7 246526.2.dec 1363803F1 1544 1994
7 246526.2.dec 1363803H1 1544 1791
7 246526.2.dec g 1379338 1572 1905
7 246526.2.dec g2341495 1580 1906
7 246526.2.dec 4854205H1 1588 1848
7 246526.2.dec 358373H1 1590 1808
7 246526.2.dec 4793209H1 1600 1887
7 246526.2.dec g 1009757 1605 1748
7 246526.2.dec 4836901 HI 1607 1888
7 246526.2.dec 6603577H1 1631 2157
7 246526.2.dec 5294946H1 1648 1893
7 246526.2.dec 2289413H1 1662 1880
7 246526.2.dec 2749412H1 1676 1915
7 246526.2.dec 51 14945H1 1689 1960
7 246526.2.dec 4223825H1 1696 1996
7 246526.2.dec 4220586H1 1698 1962
7 246526.2.dec 161 1734H1 1712 1923
7 246526.2.dec g847365 1729 2060
7 246526.2.dec g844344 1734 2069
7 246526.2.dec g783315 1734 1983
7 246526.2.dec 6321704H1 1734 1933
7 246526.2.dec 4161027H1 1756 2045
7 246526.2.dec 658192H1 1760 2002
7 246526.2.dec g2027049 1762 2050
7 246526.2.dec 1931913T6 1774 1848
7 246526.2.dec 1482438H1 1781 1980
7 246526.2.dec 1647267F6 1781 2251
7 246526.2.dec 1647343H1 1781 2022
7 246526.2.dec 5853125H1 1791 2045
7 246526.2.dec gl231286 1793 1908
7 246526.2.dec 2425258H1 1799 2041
7 246526.2.dec 3719040H1 1806 2061
7 246526.2.dec 1494991 HI 1819 2038
7 246526.2.dec gl 243109 1824 2193
7 246526.2.dec g890161 1843 2149
7 246526.2.dec 4583873H1 1853 1995
7 246526.2.dec 2129527H1 1862 2131
7 246526.2.dec 4654582H1 1862 2124
7 246526.2.dec g893529 1861 2146 m
O
VJ VJ Vl vJ VI Vl Vl VI VI Vl Vl VI VI VI VI VI Vl Vl Vl Vl Vl Vl VI VI VI Vl Vl vl Vl Vl Vl VI VI Vl Vl vl vl Vl Vl Vl VI VI VI VI Vl VI VI VI VI VI Ό z p
lo i to ro to io ro ro io io ro ro to io io ro — ■ — . _. _. _. _. _. _. _. _- — . _. _ _. c
_. —. —. — i — . —. —. — ' O O O O O O O O O ^O O ^O O ^O O ^ O ^J -O OO OO OO ?}- g σj g oo co co en ϋι 45- vι oo oo vi , ~0 £-. £- C0 IO IO I IO tO -O O vl C0 IO IO — ' O OO O O ^ CO CO Co CO CO IO O O O vj Ω vi vi — ' O Oi o en en o o o o vi io co o en oo — - co — ' θ θ θ o o o ro ro vi --c.
n M \ -, \ N M M fo M M W M S3 M M M M M I N3 N) M I N3 W M M NI I M M M I M tO tO I W
S t -i M M M m m K S a \ N ft ^ K ^ O- K « Cn U O' « 0' 0- 0' U M W O> M Cn Q - a M W W O O -' I M W
IO Cn 45> 0 — ' 45- C0 45- O — — ' Vl vl vl OO OJ O vl o oo ω j- oo i. o O ω ω M j- M ϋi oo co -' ω ω o o o co ω ti O M O O t-'D
C m
© vl vl vl vl vl vj vl vl vl vj vl vl vl vl vl vl vl vl vl vl vl vj vl vl vl vj vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl rj
O ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro to to ro ro ro ro io ro ro ro ro to to to ro io ro ro ro o ro ro ro o ro ro ro ro to to ro ro ro ro _.
45- 45- 45-. 4--. 45-. 45- 45- 45- 45- 45- * 45- 45- * 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- 45- φ
<>• c o o o o o o o O O o o O O O O o o o o z O O o o Z O O o o o o o o_ o o o o o o o o α o o_ cn cn en en cn en en en en en en cn en cn cn en en cn Oi cn en en cn oi OoiσOi O:i Oi Oi Oi Oi Oi Oi Oi Oi Oi en en en en en en en en en en cn ro ro ro ro to to ro ro ro ro ro to ro ro to to to o ro ro ro ro ro ro to to ro ro ro ro ro ro ro ro to ro ro ro ro ro ro ro ro ro ro ro to eron eron eron T3J o o o o o e> e> e> o o o e> o o o o o o o o o o o o o o o o o o o o o α - o o o o o o α o o o-
IO ro fo ho ho ro ro ro ro ro ro ro ro ro io io io io ro ho ho io ho ho io to ho io ho ho to ho ho ro ro ro ro ro ro io io io io ro ro to ho 0ho0ho0ho^-
Ω. Ω α Ω Ω Ω Ω Ω Ω Ω Ω Ω. α Ω Ω Ω Ω α Ω Ω.
Φ α Ω. Ω. α Φ
Φ Φ Φ αφ αφ Φ Φ Φ Φ Φ Φ Φ Φ Φ aΦ Φ Φ αΦ Φα Φ Φ Φ Φ Φ Φ φ φ Φ Φ Φ
O O o o o o o o O O O O o o o o o o o o O O o o O O o o O o o
M M M M M M M M M M M M M O, c^ o• c^ O' C ϋl Ol W w fc ^- t- 4- ω ω ω ω f ^) l ^ι to l -' -' ■-' -• -' •-J -' -• ϊ CD oo (» ∞ M α (> σ o- c> o> M σ α ^ ^ o o> Λ ^ 4- M n ω ^3 0 0 ω ω n ι ^ -J -J 5 <3 -o ) co M Ol ϋl Q ∞ ∞ - ω ω c^ cjι ω ιo ιo ιo ro to ro o fθ N3 vi c/ι ω c> 3 c» c» o ^ ι ro ω c^ w
CΛ m
0
Vl Vl Vl Vl Vl Vl Vl Vj Vl Vj Vl VI VJ VJ VJ VI VI VI VI VI VJ VI Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl VI VI VI Vl VI Vl Vj Vl Vl Vj Vl Vl
-. -• — ' O O O O O O O O O O O O O 3 0 ^ <) > 3 ) ) ) ) ) <3 ) 3 3 00 00 00 00 C» OO OO OO C» C» C» OO Cβ C» 00 00 00 00 00 c cn en co o en 4-* j-* 45* 45* 45* o co ro — - o o O O -O O M M M O Cn U U M t O O O M O O O O O O W U C U M I I -' -' -' O Ci o en co o en o o o o oo o <5 ro o 45- ro <3 i ω ω o t. o o σ c- CD <5 -O- o o M >o > θ 'θ O θ> co t- j, o n c t- oo j- CΛ -' =!
_, _. _. _. _- _. _. _. _. _. _. _l _. _. _J _. _. _. _. _. _. _. _. _. _. _. _. _. _. _. _. _. _. _J _. _ _. _J _. _. _. _. _. _. _. _. _. _. _J _. CΛ
C- W ^ n M M M M M fc ^ M M ^ M N5 M N3 U W M M M M M -' -' W -- M M M O O ^ J ^ M t O t I -' O N} tO t t O ? Cn O Co Co vl OJ 45* C vJ OO Co O Cn vj vl vl C vl O vl vl vl vl vl C> 0 ^ vl vl C C» vl CX) IO IO O M <) vl vl ^ vl vl O O vl vl Oo vl to ; n O <) θ' i o (> ω o oo ^ o ω -' fc ) M j- cn t M -' ϋi ιι θ « o -' θ M fc t o ϋi M cιo c M cn N) cn M N) M co -' -' θ σ oO
0 vj vl vl vl vj vj vl vl vl vl vl vl VI VI VI Vl vl vl Vl Vl Vl Vl VI VI VI VI VI VI VI VI VI VI Vl Vl Vl vJ VI VI VJ VI VI Vl Vl Vl Vl Vl Vl vl VI Vl
O to to ro ro ro ro ro ro ro ro ro to ro ro ro ro ro ro ro ro ro ro to ro ro ro troo rroo ro to ro ro ro ro ro to ro ro ro to ro ro ro ro ro ro ro ro ro ro
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45- 45* 45> 4 455** 4 455-» 45* 45* 45* 45* 45* 45* 45* 45- 45- 45- 45- 45* 45* 45- 45* 45* 45* 45* 45* 45* 45* 45* o o o O O O O o e> e> o O o o O O o o o o* o o o o o ς oo oo O O O O o- o o o o o o α O" e> e> o o o O 0 o o φ -, cn en en en oi oi oi en en en en Ol en cn en en en cn i en en en en cn en en eenn eenn cn en en en cn cn cn oi en en en en en en en en en en en ro to ro ro ro ro ro ro ro ro ro ro ro ro ro ro to ro ro o ro ro ro ro ro to ttoo rroo ro ro o ro ro ro to " ro " ro ro ro ro ro ro ro ro ro ro ro 01 Ol Oi 3 o o o e> e> o o o o e> o o e> O o o o o o o o e> o o e> e> eo> ee>> o o o o o- o o o o e> e> o o o o o e> to to to -π e> o o ^ to ro io ro ro ro ro ro io io ro to ho ho ho o ro ro ro ro ro ro ro ro ro ho to ho io io ho ro ho io io to ro ro ro o io ho ho ho ho ho ho ho ho ho d Ω Ω Ω Ω Ω C- Ω. Ω. Ω Ω. Ω Ω CL Ω. Ω Ω. α α CL Ω. α Ω. d Ω. Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ α α
Φ φ φ Φ Φ Φ Φ Φ Φ Φ φ φ d Φ dφ φ φ α Φ α φ O O O O O O O O O O O o o O o o o o o O O o o O o o o o Φ o
ιo ro ιo ιθ M ro ro ιo ιo ro ιo ι ro ro ro ro to ιo ιo ιo ro to ιo ι ro M NJ J to ιo ro ιo ro ro ιo to r^
^ CO C C*5 0J CO C0 03 W C OJ CO OJ OJ CJ J OJ OJ fO fO NJ tO I N3 IO IO NJ KJ N3 N3 tO tO t N3 IO r M
— ' Oo oo oo o o o en o oi oi io to — ■ — ■> — ' --' θ 3 θo oo vi vi c> c ( c3i en c^ c3i 4 4> 45* 45* eΛj ω ro t ιo — — ' O O oo oo vi vi c en y J C» vi 4* cxi 5* cji c» eji c ^ ( i ) en w ^ i -» > * 3 io > en io r w NJ ^ ^ 4^ vi o cn io ) N-i o vi
ro io io io ro ro Nj io i M io N io io ro io ro to io io io to io io io io io i NJ i M ro ro ro ro ro io to ro ro ro ro ro ro ro ro ro ro ro — ' Co
O O- O- O' O Oi O- O O' O O O Cr O O' ^" C C < C> C C> C> C> 45* C C O^ 45- C 4^ C> ( i 45* 45> C O- C> e> C C 00 CJi 0i C> C C < tO ? to — > 45* O Oo Cn 45* 45* 45* 45* OJ rθ .C .k. 45» 3 .^ Cn O> .^ .^ 45* & 45* O 4* O vl Cn O 03 .^ 4* O 00 4* ^. 45* ^ 45> .^. vj — ' O ω J- ti Co O ro ro ro oo -o oo oo o co cn oi o oi CD ro — ' S 45* OO IO C> OO O OO O CIO I OJ OJ O O i* IO vl — - 00 IO 0J O 0J O C0 vl 00 Cn Cn ^. lO O C O
CΛ rn
©
O O O O O ) < <) <) O O <3 O 3 ) ) 3 ) O O O ) <» C» C» (» CB C CB CB CX) ∞ C» M M M M M M M
45* 4^ 4-* 45* 45» 45* 45» 45* 45* 45* 45* 45* 45* 4^ 45» 45* 45. 45* 45* 45* 45* 45* 45* 45* 45* to ro ro ro to ro NT ro ro ro ro ro ro to ro ro ro to to ro ro ro ro ro ro _, VI VI VI VI VI VI VI Vl Vl Vl vJ VI VI VI VI VI Vl Vl Vl Vl vJ vl VI VI VI C 1 C 5 C ) C ) o o o o o O 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* φ 45* 45* 45* 45* 45- 45* 45* 45* 45* 45* 45* 45^ 45* 45* 45* 45* 45* 45* 45* 45* 45» 45* 45. 45* 45* a O o C ) C ) C ) O O Q- o o o o c (> 0 0 0 0- 5-^ cx> ∞ oo oo oo oo co oo c» c» oo c» cx> oo αo αo oo oo oo oo oo oo oo oo oo 45* 45* 45* 45* 45* 45* oi en en en en en cn cn cn cn en en en vl vl vl vl vj vj vj vj vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl VI c» 00 O c» 00 00 C» c» 00 00 oo ro ro ro ro ro ro to ro ro ro ro ro ro ro -n CO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO 00 00 00 00 00 00 00 00 00 00 00 oo o o o o o o o o o o o o o o ^-
Oi Oi en cn en en en en en cn cn ho ho ho ho ro ro to io ro ro ro ro ro ro
Ω Ω. Ω Ω Ω. Ω. Ω. Ω. Ω α Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω α α Ω Ω Ω Ω Ω Ω d Ω. Ω. Ω Ω Ω Ω " Ω Ω Ω. Ω. Ω. Ω. Ω. Ω α φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ Φ φ α CL CL Φ Φ Φ Φ Φ Φ Φ O O O O O O O O O O O O O O O O O O o o o o o o C) O C) O o O O O O O o Φ φ
C) o o
ro ro ro ro ro ro ro ro ro ro ro ro ro ro cΛ vl O O O O O O O O O -i Oj cjn cn e cji e en en cn cn cn i Oi ^ ^ β ^ O _i 45* 45* 45* 00 O 5*. 4c5o* 4
O -O O OO OO vl vJ Co O O O •O O 00 vl O Cn 4 co co o ro ro o n o cn en oo oi Conj Oooi eion e—n ' tOi 4θ5* 44 445 ro5* 4ro5* 4—5* ' 4—5* ' Ω en — o o oo en o oo ^. -o o oi O -^ 00 vl — 45- •o £-. — ■ en en vi en oo o oo e o vi ^ o vi ^ cn S cn co vi cn o co ^-
— ' — ' I IO IO — ■ — . —. —. —. —. o o o ro ro oo o o oo oo o o en oj o oo vι cn o o cn co
O ϋi O tO CM- IO M IO - ■ IO
CO m
©
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O z O
^ 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 4i» 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 45. 45. 4i. 45. 45*
VI VI VI Vl VI VJ Vl Vl Vl Vl Vl VI VJ VI Vl VI Vl Vl Vj Vl VJ VI VI VI VI VJ Vl Vl Vl Vj Vl Vj Vl Vl Vl VI Vl VI VI VI VI VI Vl VI VI VI VJ VJ VJ Vj 45. 45. 45. 45* 45* 45* 45* 45* ^ 45* 45* ii. 45* 45. 45. 45* 45* ^. 45* ^. i. 45i 45* 45* 45* 45* 45* 45* 45* 45* i-, ^ J5* 45* 4-* Ji* J-* J-* ^ ^ 4-. 45* 45* 45* 45* 45* 45* 4-* 45* 45* ω c» CB CB (B CB θJ C» c» oo α) ca c» ω oo oo oo c» c» cxι cB C» c» CB Cιo cB C» c» vj vj vl vl vl vj vl vl vl vj vl vl vl vl vl vl vl vj vj vj vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vj vl vl vl vl vl vl vl vl vl vj vl vl vi vl -π C» C» C» C» C» C» C» C» 0O O0 OO O0 O0 C» O0 C» C» C» C» C» C» C» 00 0O O0 OO O0 O0 O0 00 O0 O0 C» O0 O0 0O rø
ΩΩΩΩ.Ω.Ω.ΩΩΩΩ.Ω.ΩΩ.αΩ.ΩΩΩΩΩΩΩΩΩΩΩ.Ω.Ω.Ω.ΩΩΩΩΩΩΩΩΩΩΩΩΩ.Ω.Ω.Ω.Ω Ω Ω. Ω φ
Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ F=j
00000 00 00000 00000000000000 Φ0 Φ0 Φ0 Φ0 Φ0 Φ0 Φ0 Φ00Φ Φ0 Φ0 Φ0 Φ0 Φ0 Φ0 Φ0 Φ0 Φ00Φ Φ0 Φ0 O O O
ro ro ro ro ro ro ro ro ro ro ro ro io ro 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O O O O O O O O O O 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 vl vl vl vl vl vj vj vj + co co co co to ro ro — — ' O o o o O O 00 vl vl O O 45* 4^ 45* co co — 1 — . — ■ — . o o o o en cn 45* co ro — ■ 0 0 0 00 00 00 0- o en to Ω vi co — ■ — ' -O J5> U O U M OO -O 00 j- Oi - 4- rO O i- M O- CO U 00 vl O f-. 00 IO O O Cn O IO — ' Co O O O OO O O O vl £* vl Cn 45. 00 ^-
to io Nj ro iO to ro io io ro ro ro io io io io NJ io io io i to io io io to io io io io io ro — . to ro ro ro io ro ro ro ro ro ro ro ro ro io — > CΛ
45* IO ^-. f-. £-. 45* f-. 45* 45. 45* Co i-, 4i. 45. tO CO CO tO IO OJ OJ i* i-, 45. — " tO tO CO — . Co —- — - OJ O — — ' Co Co - • — ' O O O O O O Co O O O ^f ι o ιo ro ro ιo to ro oj to oo ιo ιo fo oo o oo ^. oo vι oo to — - 10 45. 0 — ' to o oo oo co o oo o io oo oo ro — - oo to — ' Oo ^. c oo oo £* vj ϋι i ϋι α o- oi M M M θι oι oi M θ M n o <) CB n o fc CD & M ω ω > M M ^ ω O (n ι o- C>' J- θ M -' θ' θ' θ J- 3O
CΛ m
©
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O z o
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 45* 45. 45* 45* 45* 45* 45* 45. 45. 45* 45. 45. 45* 45* 45* 45. 45* 45* 45* 45. 45. 45. 45* 45* 45* 45. 45* 45* 45* 45. 45. 45. 45. 45. 45* 45* 45* 45* 45*
VI VI VI VI vj vj vl vj vl vι vι VI vl VI J J VI VI vj vl VI VI VI VI VI VI VI VI VI vι vι vι vι vj VI VI VI VI Vl vl vJ vl vl vl vl vl vl vl vl VI
45* 45* 45* 45* J5* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 45. 45. 45. 45. 45. 45. 45* 45* 45* 45. 45* 45* 45. 45. 45. 45* 45* f-. 45. 45. 45. 45. 45. 45. 45. 45. 45. 45. 45. 45. 45. 45. 45* φ
CX> CX> 00 0O O0 00 OO 00 00 00 0O O0 00 00 00 00 00 α0 00 00 0O O0 O0 O0 O0 O0 C0 00 00 0O O0 C» C» C» OO C» αo 00 O0 α3 C» C» ∞ vl vj vl vj vj vj vl vj vl vl vj vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vj
C» C» C» C» C» 00 C» C» C» C» C» C» C» O0 C» C» C» C» C» C» CXι C» OT C» Cβ C» C»
ΩΩ.Ω.ΩΩ.ΩΩΩ.ΩΩ.Ω.Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω Ω Ω. Ω CD φ φ φ φ φ φ φ φ φ φ φ φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ 0 0 0 0 0 0 0 0 0 0 0 0 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 0 O O Ό
45. OO OO OJ OJ OJ I — • lo to to io ro ro ro to io ro ro ro ro ro io ro ro ro io io ro ro ro io cΛ — ' O 00 00 O O — ' 00 oo vι e> e> en 45* 45* co 4> co co co ro ro o ro — _. o. —IrOo. o " o o vi ro ro ro oo — O. —Oo. Cc—nn. —lrOo. Ir—O —IOo. —— —. ■ —O o. —O o -- Ovvli Ovvlι Ovvlι OOc OOo OOe> OOo OOo OCenn Oi*.* Ω Ωy
45* o cn o oj oo ro vi 45* ϋι oo ro ro cn ro ^. S r o Oo —. —Oo. Oo — O
VI ro E. o en oι oι ro oo vi oo o o — ' O — ■ — ' S. to — ' O o c o Tl-
o O' oi J- CD t. M - - co — ■ — ■ co co ro 45. -' io ιo oo ιo ιo 45. ιo ro Nj NJ M io ro rθ N-> ιo ro ro ro ro ro ro ro to ιo to tθ fθ i ιo to to cΛ
0J O Cn O O O O O 45. O O O O vl — . .C O1 VI 00 O 01 VI VI IO 45. 45. 4 to5. oro 4io5. 4io5. to0o 4ro5* 4—5* - 4ro5. CvOi iio*. i—*. ' rO0 4io5* tco* —45* ■ I—O ' OioJ OvJi OtoJ Of*J ^ro. *ro* *to. rio* θJ-
— ' en o cn co vi cn o ιo vι o c> vι θ 45. o — ' Oo oι oι ro ro 45. — ' vi oo vι ro o o o en o en 45. cn o oo oι θι co co o o o ro en o c> c> o
CΛ m
© o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o D z o
45* 45* 45* 45. 45* 45* 45* 45. 45. 4-* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 45* 45. 45* 45* 45* 45* 45* 45* 45* 45. 45. 45. 45* 45* 45. 45. 45. 45. vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vj vj
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. ^. 5* 4. * 45* 45* 45* 45* 45* 45* 45* 45* i* 45* 45. 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45. 45* 45. 45* 45* 45* 45* 45* 45* 45* 45* 45. oo oo oo co oo cD CB Oo αi cs oo o) C» C» O0 O0 C» C» C» C» C» 0O C» 00 C» C» 00 00 O0 CD αo θ0 C» ∞ ∞ C» C» <» C» C» 00 00 0O O0 C0 O0 O0 O0 ω _
VJ vJ V| VJ VI VJ VJ VI VI VI Vl VI Vl Vl vl VI VI VI VI VI V| vl vl Vl Vl Vl Vl VI Vl VI VI VI Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl Vl vl VI VI V| -π CB C» (» CB C» CB C» C» C» CO C» C» C» CB CB CB C» C» C» C» CB C» C» C» α
Ω. Ω. Ω. Ω Ω Ω. Ω Ω. Ω. Ω. CL Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω Ω Ω. Ω. Ω. Ω. Ω. Ω Ω Ω. Ω Ω Ω. Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω. Ω. Ω. Ω. Ω. Φ
Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Fj
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O O O O O O O O O O O O w
OJ to to ro ro — • — • 0 0 0 0 0 0 0 0 0 0 O O O O O O OO OO OO vl vl O O O O e> O O O Cn en ^. 45* 45* 45. --? o 00 vl vl — > o o o o cn 4^. 45. Co — ■ oo vi o o o ro — ' — — - o o oo vι vι cn o o o o ιo co to vi vi vi vι &. co oo ιo en — ' O VI S Ω cn — ■ ro o — ■ o oo vl o o o o CO VI — " O O O O C0 O 00 00 00 45* ro o oι vι — ■ — " o o oo ro co en cn eo co vi o o o o io vi cn o o q.
en o vι en 45* 45* cn 45* cjι 45* 45* oj oo oo 45* ro co cn co co ro ro co ro ro ro ro co — • _, 00 O 00 ,. , 00 00 O 00 ,~, 00 vl 00 vl O -5
45* 00 VI CO to vi io en co vi — - io cn .fc. o o — - o co — . .fe. 0 45. 0 0 vi 45. 0 0 ^ o O O — - vl -J CO IO O vl O to 00 O
45* O vl vl — < o o vi co ro o — - ro eo co vi ιo £*. oo oo o oo o oo o oo — - lo oo o CO vl 0 — *. x. o o o O j — - ro vi to o — o -g
CΛ m
© oooooooooooooooooooooooooo o o o o o o o o o o o o o o o o o o o o o o o o π z o eo co CO eo CO CO CO CO CO co CO CO CO CO CO co CO co co CO CO eo 0J co eo co 45* 45. 45. 45* 45. 45. 45. 45. 45* 45* 45. 45. 45. 45* 45. 45. 45* 45. 45. 45* 45. 45. 45. 45*
CO CO CO CO 00 CO Co Co Co CO 00 CO CO co CO CO CO CO CO 00 Co CO CO CO CO CO vl vι VI vι vι VI vl VI VI vι vl vι VJ vl vl vl vι vi vι VI Vl VI vi vl
Ol en Ol en en Oi cn Ol cn Ol Ol Ol cn Ol en Ol Oi Oi cn Ol Oi en en Oi en cn 45* 45. 45* 45. 45* 45* 45* 45* 45. 45. 45. 45. 45. 45* 45. 45* 45. 45. 45* 45* 45. 45. 45. 45* φ
O o o o o O o O o o o o O O o o o o o o oo oo oo oo oo oo 00 00 oo oo oo oo oo 00 oo oo oo oo oo oo 00 00 00 oo
VI VI VI VI VI VI VI VI VI VI VI VI VI VI VJ vl VI vl VI VI VI VI VI vi o o o o o o o o o O O o O o O o o o o o o o o 9° 9° 9° 9° 9° 9° po oo 9° po po 9° 9° 9° oo 00 o io to ho ho ho ho ho ho ho ho ho to i 9° 00 to to ho ho o ho h o ho o ho ho 9° 00 9° 00 00 ro o 9° Ω
Ω Ω. L Ω Ω CL Ω. Ω. Ω Ω L Ω Ω CL Ω. Ω. Ω. d Ω. Ω. Ω. , Ω d Ω Ω Ω. Ω. Ω. d Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω Ω CL Ω. Ω. Ω. Ω. φ
Φ Φ φ φ φ Φ φ Φ Φ φ φ φ Φ φ Φ φ φ φ Φ φ φ φ Φ Φ φ Φ Φ φ φ φ φ φ φ φ φ Φ φ Φ φ φ Φ φ Φ Φ Φ Φ Φ Φ Φ Φ
O O O O O O O O O O O O O o O o o o O O o o O O o o o o o o O o o o o O O O O O O O O O O O O O O O Ό
to -- -- — - O O O O O O vl vl vl vl 0 4^. oo co OJ co ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro — . —. —. — ■ ro ro —- —- — — - 45.45* 45* 4 CO CO CO CO CO 7 c c - eή ι eή - < _£ ^ - P _ ^ - ^ - en o -■ en o vi vi 45. £* 45. oo 45* _1 ro ro ro ro ro i vi vi co ro — - O O Oo oo vi vi o oo o S 00 vl vl 45* 45*
O OO CO OO OJ OJ CO O o o en o o oo oo vi O CO VI VI 00 —' —- — ' Cn v co ∞ ro ro co cn o vi ro vi o oo oo co 45* co o O Ol O vl O =5-
45* co ro cn ιo co ro ιo o co o ro αo o oo cn θ θ vi cn .^ o co .fe .fe ro ro ro ro ro ro to ro ro io ro ro ro IO — . —. —. —. —. —. — - CΛ
J, 4- 45, i* i 5. C 4i 4i i. :-* ^ ( i o oo o o o o o cn ^- i oo o— - vioi c—^ - eono ovoi coo con or o* 4—5* . OOJ- 4-5*' W" U}, 0c^ Oi Ic^ NCAI Wft g4- OgO SUI Of' Oo MOO C- _ . cn en en vi o oo o en . _ o cn oo vi co .fe. o ro O o-- . c "- . n o "_ c" _ _ n f-. oo ro £-. i-. cn vi co oo oTi
TABLE 4
SEQ ID NO: Template ID Component ID Start Stop
10 33591ό.2.dec 1627014H1 1488 1720
10 335916.2.dec g1939354 1506 1766
10 335916.2.dec g2107851 1506 1858
10 335916.2.dec 912981H1 1615 1747
10 335916.2.dec 2111286H1 1616 1863
10 335916.2.dec 3790008H1 1693 1809
10 335916.2.dec 1214420H1 1700 1936
10 335916.2.dec 3535232H1 1798 2072
10 33591ό.2.dec 3257037H1 1810 2064
10 335916.2.dec 3210776H1 1820 2024
11 040422.12.dec 3343947H1 1 210
11 040422.12.dec 3343947F6 1 398
11 040422.12.dec 4183830H1 23 207
11 040422.12.dec 4792750H1 25 295
11 040422.12.dec 3159520H1 27 304
11 040422.12.dec 3296383H1 28 279
11 040422.12.dec 5197324H1 29 284
11 040422.12.dec 5197324F6 29 299
11 040422.12.dec g3341989 42 1400
11 040422.12.dec 5978581 HI 51 292
11 040422.12.dec 3898429H1 52 272
11 040422.12.dec 5605234H1 53 276
11 040422.12.dec 5302780H1 53 291
11 040422.12.dec 3592605H1 64 359
11 040422.12.dec 3593031 HI 64 368
11 040422.12.dec g1727841 70 483
11 040422.12.dec 6552493H1 112 701
11 040422.12.dec 6557908H1 112 591
11 040422.12.dec 4051117H1 224 509
11 040422.12.dec 3293317H1 477 729
11 040422.12.dec 2928983H1 499 798
11 040422.12.dec 3032868H1 515 808
11 040422.12.dec 5570320H1 666 831
11 040422.12.dec 4418383H1 691 897
11 040422.12.dec 4747915H1 718 987
11 040422.12.dec g3096317 720 1177
11 040422.12.dec 3343947T6 725 1356
11 040422.12.dec 4371455H1 742 1023
11 040422.12.dec 1978317T6 801 1359
11 040422.12.dec 1978317R6 811 1196
11 040422.12.dec 1978317H1 811 1110
11 040422.12.dec g3037965 922 1400
11 040422.12.dec 482585T6 937 1380
11 040422.12.dec 5029812H1 937 1150
11 040422.12.dec 658005H1 937 1115
11 040422.12.dec 1610157T1 937 995
11 040422.12.dec 1610157T6 941 1360
11 040422.12.dec g1046767 974 1300
11 040422.12.dec g5113655 983 1401
11 040422.12.dec g2161987 985 1403 CΛ m © ro ro io to ro ro to to ro ro ro ro ro io ro ro io M M M ro ro ro io ro io io ro M M M ro io ro ro ro r^
o O o o o o O o o o o o o o o o o o o O o o o o o o o o o o o o o o o o o O O o
VI vl vl vι vl vi VI VI vl vl VI vi vi vl vl VI vl vi VI vl VI vl vl vl VI vl vl VI VI VI vi vl VI VI VI VI VI VI VI VI
VI vl vl vl vl vi vl vi VI vl VI vi vl vi VI VI vl vl vι vi VI vl vl vl VI vl vl VI VI vl vl VI VI VI VI VI VI VI VI vi
O O o o O o O O o O O o O O o O O O O O O O O o o o o o o o o o o
Cn Cn en Oi cn Oi Oi en en Oi cn Oi cn cn Oi Oi Oi Cn Cn en en en Cn Ol en Ol Ol Oi Oi en en Ol Ol en en cn Ol Ol en en to ro ro ro ro ro to ro ro to ro ro ro ro to to ro ro ro ro ro ro ro ro ro ro ro ro to to to ro ro ro ro to to to to ro
Ω. Ω. Ω Ω. Ω Ω Ω Ω. Ω. Ω. Ω. Ω Ω Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω.
Φ Φ Φ φ Φ φ Φ φ φ Φ Φ Φ Φ Φ Φ Φ Φ Φ φ Φ φ Φ φ φ Φ Φ Φ φ Φ Φ φ φ Φ Φ φ Φ Φ φ φ Φ
O O O O O O O O O O O O O O O O O O o O O O O O O O O O O O O O O O O O O O o o o
CΩ CΩ CΩ CΩ CΩ CΩ CΩ CΩ CΩ CΩ o en ro co 45* 45* 45* 45* Co 45. C eo ro 45. 45. ro 5- 00 o o CΩ ro
00 CΩ ro ro
CO co co Ω CO
Co Co 45* 45. to CO o o o o O cn en ro vi CΩ O en έit Oi O
00 ro 4 ro CΩ CΩ 45* ro CO en to CD VI CO it O o £ VI VI 45* VI 00 o o CO Oi V 5 ro co I o en C 45* vi O cn en VI VI o 45* .fe 45* o 00 ro * O I en oo CO 45* 45. o en en to 45* oo O o o vι o co ro o o Oi o eo 45* ro O cn o ro o 45. 4
V o Ol co CO o f O OO o o o CO V 2I Oi o to o o o r**. VI 00 o vl o I 00 co
VI S o o O
O eo eo o co V co vl ro oo !e o o vl ro oo 45. 45. 00 45. t ro oo en ro 00 ro oo oo CO en ro ft ft O
45* 45* 00 Cn vl 45. o oo O oo 45* VI o VI 2S 00 VI CO o O vl 45*
—' o o O vi VI o oo CO o o ft ° vl 45* o to vl o Cn vl en X vi co o en co VI CO o VI oo o X vl O O X CO o Oi ro CO o 45. o o ro ro cn CO
Co CO Co O 00
O φ CO o VI 00 cn O o o CO en X X X X X X X -π X X X X X X X X X X X X X X X X X O - ro co X π O 00 X O to ro X S 45* 3- α
vl vl vl vl O O O 0ι 45* 45* 45* 45 4i. £-. £-. 45. 45. 4^ C0 C0 C0 C0 co co oo oj oj oo oo co ro ro ro ro ro ro ro vi ro io o o o oo en co ro ro cn en en en co co — - o o o o o oo vi Cn Oi Oi Co — ' O O O .fe. CO — • — - o o o ^ C O O O O 7t cn vi o cn vi o ro 45=.45*' O o en ro — ' oo oo ro ro ro ro vi cn — - ro o o o o en o o o o — ' Co — ' O en o o vi ro co cn 45* — ■ Ω 45* — ■ — . CO CO J- CJ O θ 4
O — ' O O O O O O O O O O Cn O vi vi o cn O O vl O O O O Cn Cn θι Oι ι Oι Oι Oι Ol 45» 45» 45» 0 45»
OO O O vl Co O vl O O C O Oθ 45* J 45. Co rθ 45. ; o o o o o o o o o o Oo O O O vl vl vl Cn .fe O O — " V1 00 O — ' O 45> 45* 45* 45* OJ O o cn io o — ' O — ' O — ' io O 00 vi O 45. Oi 45. 00 O co — ' O OJ en 45* o oo cn — ' Co eo o io ^ — — ' Oo oo vi o o ro o o o o o o oo o O
O -O O O J- -' M 00 θ )
CΛ m
- io ro io ro ro io ro io ro ro io ro ro ro ro ro o
O o o o o o O o o o o o o o o o o o O o o o o O O o o o o o o o o o o o o o o o o o o o o o o O O o o vl vj vl vl vl vl vl vl vl Vl vl vl vl vl vl vl vl vl vl vl VI vl VI vl vl vl vl vl vl vl vl VI vl VI VI VI VI vi vl vi VI VI VI vl vl vl vl vl vl vl —1
VI vl VI vi vi vi VI vl vi vi vl vi vl vl vl vl vl vl vl vl vl vl VI vl VI vl VI vl vl VI vl VI vl VI vl vl VI VI vl VI vl vl vl vl vl vl VI vl vl vl φ
O O O O O O O o O o O O O o O O O o O O o O O O o o o o o o o o O o O
Oi Cn en en cn en en Ol en Ol en Oi en en Oi en Oi Oi en en Oi en cn Oi cn en cn en Oi cn cn en cn cn Oi Oi cn Oi cn en cn Oi en cn Oi Ol en cn Ol Ol 3
Ό ro to to ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro to ro ro ro ro ro to ro ro ro ro ro ro ro ro ro ro to to ro ro to ro to ro ro ro ro ro ro ro Ω
Ω. Ω Ω. Ω Ω Ω Ω. Ω. Ω. Ω Ω Ω. Ω Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω Ω. Ω Ω Ω_ Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω_ Ω Ω Ω. Ω Ω. Ω. Ω Ω Ω. Φ φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φo oφ Φ Φ Φ Φ φ Φ Φ Φ φ Φ Φ φ φ φ Φ φ Φ φ Φ φ φ Φ φ φ Φ φ Φ φ Φ φ φ Φ φ φ Φ Φ Φ Φ o O O O O O O O O o O O O O O o o o o o o O o O O O o o o o o O o o O O O o O o o O O o o o o o α ro [ f co ro ro ro ro co c 45. co 45. to ro to ro 45. 45* en co oo 00 o ro ro 45* Ol ro CO ro o ro to t _ o vl o cn cn o 45* o o CΩ —-o* 45* 45* ro CO
00 VI 00 VI to
Oi VI CO Oi 00 VI VI CΩ CΩ 45. vl VI VI 00 CO 45* CO en 00 o 45* 00 o 45* CO vi ro 00 I ro 45. oo O o O o 5* —' to O ro o o O 45* ^J VI —• o ro cn ro o o CΩ CΩ CΩ CΩ o o° ro 45* o en CO cn en o 45* CO £* V O 5* 45* en 45* O Oi oo o VI ro ro 4 —' 45* 3 o CO cn o ro cn o oo o o vl 00 cn cn o o o co Co O o Oi oo o o 00 —' Ol o a CO
— o VI oo 45* o O ' —' 00 O o oo en vι o 4. VI — Ό cn o 45* CO VJ o o o 45. O vl O ro o co o o o . 45* 00 ro cn o O o 00 O VI Ol CO ro o cn O O o vl o o ro CO en o o ro £ 45* o VI o ro 45. 45. o o O ro Ol ro o o en 45* ro en oo o vl 00 O —. CO o o 45* 45* cn en ro ro I 45* o cn 45* o o en o o! O o i^ cn ro 45. o 00 00 OJ vl Co 00 o o en 45* o o ro vi ro —. * 00 4^ o D
V X O vi Ol ro ro X O 45
X X X X X X X o X X X X X X X X X X X X -π O cn cn 00 Φ
X 45* VI X X X X X X X X X X X X X X X X X X X 70 O O VI 00 ^ .3
Ό
lo to ro to to to ro to ro ro ro ro ro ro ro ro — ■ —. —■ —■ -' -- —. —■ — . -• —■ -' -- —■ -- -. — • — ■
O O O O O O O O O O O O O O O O O O O O O O O O O O O OO OO vl vl vl vl O v vl vl oo oo oo oo -5 —■ Q
^ 45* c^ 45* 4^ 45* 4-* eo o5 io 45* ro ro ro ro — ' O θ θ θ θ 45* 45* ro ro ro o ro oo cn 45» 45* o Ol OO oO oO oO o—' oo vi en o ro -- oo ro o ro .
45* 45* 45* 45. 4i. Cn 0J 4i. C0 45. 0ι C0 45. 45. C0 4^ 45* 45. 45. _ CΛ
£ Cθ Cn 45* 45* 45* 45* 45* IO 45* 4 45. 45. co 45. ro co 45. co co co ro
O 45. cn θι θι vι oo o vj o cn .fe θ o cn o oo cn cn o oo o ro cn 45* o - S . co o 000000 - O 45. o — ' lo co co f-. — ' O O vi ro vi o ro co oo cn ro o o — ■ — ' vi en co oo o o 4* 0^ 0^ 0 0:g
-o" c "o vi co o vl O O £ c :o eoo vroi 4o5. ccno v45.i o --^' c £o g § g g-go
CO m ©
4-* -* 45* 45* 45* 4* 45* 4* ω 5- o θ θJ θo θJ ω c c ω C ω c ω co c c co c j ω ω ω
Z
O o O o O o o O o o o o o o O o o o o o o o O o o o O o o o o O o o o o o o O O o o o o o O o voi o cn cn cn en en cn Oi Oi vi vι VI vl vi vi vl o VI o o o o o o O O ro ro ro ro ro to to to ro ro ro ro ro ro ro ro IO ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro VI VI vi VI VI vi vl VI VI ro ro ro ro ro ro ro ro 45* 45* 45. 45* 45. 45* 45. 45. 45. 45. 45* 45* 45. 45* 45* 45* 45* 45* 45* 45. 45* 45* 45* 45* 45* 45* 45* 45. 45. 45* 45* 45* O O O o O o o o o o o o 00 co C CO Co Co co co co Co Co CO co Co CO CO CO co CO eo CO co co co Co OJ co co co OJ co CO Oi
CO CO CO C en en cn cn cn cn cn cn co co CO oco ro ro to ro ro o ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro to to ro ro ro ro ro ro o o o o o o i> o en cn en cn Oi Oi Oi en cn en en en en Oi en cn cn cn en en Oi cn cn en en en en en en en Oi en ho ho ho ho ho o to ro ho
Ω. Ω. Ω. Ω. Ω. Ω CL Ω Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω. Ω. Ω. Ω Ω Ω Ω Ω Ω Ω. Ω Ω. Ω. Ω. Ω Ω. Ω α Ω. Ω φ
Φ Φ Φ φ φ Φ Φ φ Φ Φ φ φ φ φ Φ Φ Φ Φ φ Φ Φ φ φ Φ φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ φ Φ φ φ Φ Φ φ Φ Φ Φ π
O O O O O O O O O o o o o o O O O O o O O o o o o O O O O O O O O O O O O O O O O O O O O O O O O O
Vl
45. O O O — ^ ' J ft ft ^ ^ p^ cn cn 45» co co co co eo co o ro ro ιo ιo ιo ιo ro ro ro -5 oo ro ro ro — ■ oo _ co r .o _ ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro r o en o o o O 00 O 00 o oo vi o o cn o cn cn ιo en co fe. θι θι θ θ OOOOOOOOOOQ vi vi vi o o i-. en en en cn q.
o o o 00 C ro OJ o en 45* co ro 45* lo co ro to ro ro co en io — ' Co to co io to oj 45* cn ro ro ro ro ro ro oo ro 4* 45* en 45. 45. cn co 4 45* vj CΛ vi ro en —■ o o o o vl —. 00 —i o — ' vvii vvii oooo troo rroo — ' vvii oooo oo vi ro " — ' vi co to oo o oi cn o vi oo — ' O -fe o vi — - o en ——■' o0oo E.fc.. o ro o oo o —' oo oo o oo o o o — oo —■ to o o oo — ' co vi o o co ro vi ro 45* vi .fe. co o vi — ' vi co co — - oo — > fe vι o 45* o co cn o oo 45* en -π
659771.2, p Ternlat 45. 45. 45. 45. 45* 45* 45*
fe 5S
0 011 o en o o en oo en cn — ' O O l oo o o ? . v ro vj o oo .fe — - 45* — ■ — ' O n CO CO α * vji
0 0 45* Cn O Oi Oi Oi Co vl Ol co vi
TABLE 4
Template ID Component ID Start Stop
059263.6.dec 2151592H1 281 553
059263.6.dec g814279 319 720
059263.6.dec 5373649H1 333 554
059263.6.dec 470851 OH 1 183 301
059263.6.dec 5925904H1 242 453
059263.6.dec gl 173538 242 1318
059263.6.dec g389367 252 666
059263.6.dec gl950140 255 667
059263.6.dec g615808 1853 2109
059263.6.dec 3937956H1 1859 2071
059263.6.dec 2906383H1 1823 2109
059263.6.dec 3421560H1 1827 2083
059263.6.dec g3076896 1841 2109
059263.6.dec 5434173H1 1768 1998
059263.6.dec g2324590 1793 2109
059263.6.dec g317469 1795 2109
059263.6.dec 3843223H1 1805 2084
059263.6.dec g2269635 1756 21 10
059263.6.dec g2388765 1761 2109
059263.6.dec g2214360 1767 2109
059263.6.dec 445961 T6 1716 2068
059263.6.dec 5079421 HI 1738 1847
059263.6.dec 4944041 HI 1744 2021
059263.6.dec 145391 1 F6 1748 2019
059263.6.dec 3846369H1 1673 1909
059263.6.dec g828896 1675 2109
059263.6.dec g3870013 1678 2109
059263.6.dec 3002426T6 1683 2069
059263.6.dec g3307105 1688 21 1 1
059263.6.dec g389366 1701 2109
059263.6.dec g3834908 1690 2108
196774.3.dec 4198864H1 366 645
196774.3.dec 6543639H1 1 536
196774.3.dec 5467282H1 349 610
196774.3.dec 5467289H1 349 605
196774.3.dec 6545364H1 383 952
196774.3.dec 3124504H1 596 882
196774.3.dec 2858708T6 720 1 100
196774.3.dec 1656694T6 752 1086
233624.1 1. dec 2578538F6 1 480
233624.1 1. dec 2578538H1 1 187
233624.1 1. dec 4624394H1 26 149
233624.1 1. dec 2478423H1 54 282
233624.1 1. ec g 1999348 59 188
233624.1 1. dec 2136789F6 288 634
233624.1 1. dec 2136789H1 288 51 1
233624.1 1. dec 5350727H1 399 563
233624.1 1. dec 5350889H1 399 524
233624.1 1. dec 3639605H1 581 871
233624.1 1. dec 3765020H1 612 906 TABLE 4
SEQ ID NO: Template ID Component ID Start Stop
17 228585.3.dec 1740045R6 1839 2332
17 228585.3.dec 1739439H1 1839 2088
17 228585.3.dec 1740045H1 1839 2053
17 228585.3.dec 5849788H1 1839 1981
17 228585.3.dec 5374048H1 1849 2100
17 228585.3.dec 4723812H1 1855 2126
17 228585.3.dec 2288963H1 1877 2123
17 228585.3.dec 1373365H1 1895 2124
17 228585.3.dec 1595644F6 1900 2332
17 228585.3.dec 4341071 HI 1900 2206
17 228585.3.dec 1595644H1 1900 21 14
17 228585.3.dec 2245012H1 1903 21 1 1
17 228585.3.dec 532797T6 1927 2525
17 228585.3.dec 1400814H1 1953 2236
17 228585.3.dec 3945768H1 1974 2247
17 228585.3.dec 6307190H1 1986 2542
17 228585.3.dec 4313085H1 1998 2282
17 228585.3.dec 1595644T6 2005 2530
17 228585.3.dec 1712978T6 2037 2530
17 228585.3.dec 1412604T6 2065 2548
17 228585.3.dec 620296T6 2077 2525
17 228585.3.dec 1942348R6 2079 2588
17 228585.3.dec 4312619H1 2078 2365
17 228585.3.dec 2123967T6 2092 2532
17 228585.3.dec 663135T6 2103 2524
17 228585.3.dec g5689560 21 16 5914
17 228585.3.dec g4685449 21 16 2557
17 228585.3.dec g4984720 21 16 2535
17 228585.3.dec 2570503H1 3742 3978
17 228585.3.dec 4722814H1 3806 3916
17 228585.3.dec 1949846H1 2251 2496
17 228585.3.dec 1949815H1 2251 2496
17 228585.3.dec 5904662H1 2255 2551
17 228585.3.dec g819594 2266 2622
17 228585.3.dec 4768762H1 2279 2556
17 228585.3.dec g517574 2128 2616
17 228585.3.dec 6360619H1 2122 2294
17 228585.3.dec 2400848H1 2127 2343
17 228585.3.dec 620296H1 2127 2334
17 228585.3.dec 431 1031 HI 2127 2317
17 228585.3.dec 5902549H1 2142 2436
17 228585.3.dec 1942348H1 2127 2347
17 228585.3.dec 5659857H1 2131 2307
17 228585.3.dec 5614086H1 2142 2422
17 228585.3.dec 5898857H1 2142 2416
17 228585.3.dec 5898671 HI 2142 2410
17 228585.3.dec 1673835T6 2149 2524
17 228585.3.dec 5139434H1 2145 2412
17 228585.3.dec 6131287H1 2180 2445
17 228585.3.dec 5679165H1 3595 3673
CΛ m
© vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl vl D o ro ro ro ro ro ro ro to ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro to to ro ro ro ro ro ro to ro ro ro ro ro to ro ro to ro ro ro ro ro to ro to ro ro ro ro ro ro ro ro ro ro ro ro ro ro to ro ro ro ro to ro ro ro ro ro ro ro ro ro ro ro ro to to ro ro ro ro ro ro ro ro o —1
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo oo oo oo oo oo oo oo 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo oo 00 oo oo oo 00 00 oo 00 Φ en en en en en en en en en en en en en cn cn cn cn cn en cn en en en en en en en cn cn cn en en Oi cn cn cn cn cn cn Oi cn cn cn en cn en cn cn en en
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo oo oo 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo oo 00 00 00 00 00 00 00 oo 00 00 00 oo 00 3 cn en Ol cn en en en en en cn en en en en cn cn cn cn cn en cn en en en en en en cn en cn en cn cn en en cn cn en en en en en en en en Ol cn en Oi en 13
CO CO Co C C CO CO CO CO CO CO CO CO CO CO co CO co co CO CO C C CO CO Co Co Co CO CO CO CO CO CO CO CO CO Co CO CO co co co co C CO CO CO C C Ω
Ω. d Ω Ω. d Ω Ω Ω Ω. Ω. Ω Ω. Ω. Ω. Ω. Ω. Ω Ω. Ω Ω Ω Ω Ω. Ω Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Φ
Φ Φ Φ Φ Φ Φ Φ Φ Φ φ Φ Φ Φ Φ Φ Φ φ Φ Φ Φ Φ Φ Φ Φ Φ φ Φ φ φ φ φ Φ φ φ φ Φ Φ Φ Φ Φ Φ φ Φ φ Φ o O o o O o O O O O O O O O O O O o φo Φo Φ Φ Φ
O o O O o O O O O O O o o O O O o O O O O O O O O O O O o O D
O
O CO O ro o en 45* _ _ o en o CΩ CΩ o o Oi o CO CO 45*
CΩ ro Oi o O . co en O Co ro O
CO o o en vi CO o en C ' ro ro vi —• 45. CΩ CΩ 00 CΩ CΩ CΩ 45. ro ro co CΩ CΩ Ω 45* o o O CΩ O 45. Oi —' o oo VI VI VI vl oo en co CΩ CΩ CΩ vl —• vl
O — o o 5 —* o Co VI 00 o en en o cn 00 8 o o 00 —• o
00 4 cn 3 o o o ro o 9s o vl o O o o VI vi VI o o o 00 O o VI o ro o 45* vi o vi vi ro vi o O o co o ro Co VI co VI VI —' to Co * ro o o ro ro o CO ro ro3 CD
VI VI VI 45* o VI 00 ro 00 "O
00 ro 45* C co VI 00 cn o o 00 4_. o 00 oo Cn ro ro VI o VI —. vl o 00
00 ro o o ro o CO oo oo o o vl 00 00 o VI co en o O ro cn o vi cn CO en —• O o ro Co CO 00 o 00 00 O vi vi o o o 00 O o 5* D
00 00 —' O IO Ol 00
X co o cn 00 C o co en cn o VI O cn en 45* O 45* cn co VI VI vi 45* o VI o VI 00 O 00 en o
70 vι CO 00 00 —' VI X 70 00 O en o co Co
CO o O o vj 00 o o 45* o 00 cn 45. Oi o φ
X X X X X X X X CO o en 45. CO 45. oo -π X en o o X X X X o X X X -π X X Ch X X X -π X X o O VI X X X -π O
co co — • — . — . — . _ . _ . co _, c o o o cn oι roo -v'i Coo evoi cr_o -o' o∞ v4-*ι
OJ OO IO — ' IO — > — — ' ro ro ro ro ro ro ro ro 00 fc* O vl O 00 O 00 (.-. Cn OJ OJ 45i Oi Co 45. cn vl vl O O vl O θ g g g ^ rθ 00 Oι 45. 3 i vi o oo o O cn vl 45* 45*
45* cn ro o oo vj vι co feC — - o vi en to en vi o o ςD vι e ω ιo g g c2 _3 g j co "on ,ro J4^5. , v αo to oi o o oo cn oo o o- cn ti b o t o en 45. oo o oo co ro 00 45. -' 00 00 O 00 V1 —- ιo o_ o_ e_n o oo — • vi ro en co M VI C0 00 45. 0 — ' cn - ' Cn en
CΛ m ©
0O 0O 0O O0 O0 0O Vl Vl Vl Vl Vl Vl Vl VI Vl VI VI VI VI VI Vl VI Vl Vl Vl Vl Vl Vl Vl Vl Vl V| VJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ vJ o ro to to to ro ro ro ro IO ro ro ro ro ro ro ro ro ro ro to ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro ro to ro —1 o o o o o o ro to ro to ro ro ro ro ro ro ro ro ro ro ro ro to ro ro to ro ro o ro ro to ro ro ro ro ro to to ro ro ro ro ro to ro to ro ro ro
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo 00 00 00 φ
00 oo oo oo 00 00 en cn en Oi cn cn en cn cn en en cn cn cn cn en cn en en cn cn cn en en cn en cn cn cn en en en en en cn en cn en cn cn en en cn cn
45* 45* 45* 45* 45* 45* 00 00 00 00 00 oo 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo 00 oo 00 00 00 oo 00 00 00 00 00 00 00 oo 00 00 003
O o O O O o en cn cn cn cn cn cn en cn cn en cn en cn cn cn cn en cn cn en cn en en cn cn en en cn en en en en en en en cn cn cn en en en cn en "σ
CO CO CO CO CO CO CO CO CO CO CO co CO Co co CO co co co co co CO C C CO CO C CO CO CO CO CO CO CO CO CO CO CO CO CO CO co CO C CO CO CO Co co c Ω —»-
Ω Ω a Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω Ω. Ω Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω Ω Ω Ω Ω Ω. Ω. Ω. Ω. Ω_ 'CL Φ
Φ Φ Φ oΦ Φ Φ Φ Φ Φ φ φ Φ φ Φ Φ Φ Φ Φ Φ Φ Φ φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ φ Φ Φ Φ Φ Φ φ φ φ Φ Φ Φ Φ
O O o O O O O O O O O O O O O O O o O O o O O O o O O O O O O O O O O O O O O O O O o o O O O O O D
Ό
Vl Vl VI O OO O OO OO OO VI VI VI VI OO vi O tO O O vl — — - OO -fe Co O ooooo o ro ro oo cn co oo ro to ro o cn 45. o ro O enl 4 o5o* r —o
o o o —. —. _ _. — . _< _. — . -^ — . -^ -^ — . ιo N3 ro ιθ M ro ro ro M M 45* 45. cn ^ N3 to ro ro ro oo ro ro oo co CΛ o o o — - o o — ' O ro ro ii co oj io io — ' ^-.4* OD> ^H oi Oi Oi 00 0J 45* C rO Oi 4^. — ■ — ' O eo — ' Cn vi o c4o* 4ro* —o ' 4o^o. con o— ' oO o— ' eOn o— ' oO eOn o00i 405*C0 ' o45. o10 o01 o0o0 o0 o0 o— ' Ovoi o— - oooo ocno oioo cen
4 r5v* r n Ot rθ tO t 45* vl 00 vl 45* 45* o en ∞ — - en o cn — ' vi en — ' ro o eo en vi co oo oo en oi co vi o — - co — ' O — • — ' Cn ro cn oi o oo oO
CΛ m © o o o o oo oo c» oo c» oo c» c» oo oo oo oo c» oo oo cx> oo oo c» co c» oo c» c» oo c» c» c» oo rø
o o o O c» CX5 oo 00 • o0 o0 00 0o0 00 oo o o o O O ro ro ro ro 0 o 00 00 o 00 o00 o O O O 00 0 O0 0O0 0O0 0 O0 O00 O00 Ooo oOo oOo 00 0 o • 00 CD 0 0 00 0 O0 0O0 O00 O O O O
— 00 00 00 00 00 O00 o oo oo o oo o oo o oo o oo o oo oo oo O 00 O 00 O 00 C iDt!
00 00 oo oo oo oo oo oo 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 oo oo oo oo oo oo oo oo oo oo oo oo 00 00 00 _j cn cn cn cn 45* 45* 4 4 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 4"5* 4 -5* 45* 45. 4 5* 45. 45* 45* 45* 4-5. 45. 45* 45* 45. 45. 45. 45* 45. 45. 45* 45. 45. 45. 45. 45. 45. 45. 45. 45* 45* 45* -π
45. 45. 45* 45* C_> <_> p 4 _5* o5* o5* o o o pop O O O O O O O O O O O O O O O O O O O O O 0 0 0 0 0 0 0 O O en en cn cn 00 co co co co CO co 00 co co co co co C C Co OO OO CO OO OJ OO OJ OJ OJ CO OO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO co ω OJ u Ω
Ω Ω Ω Ω. Ω. Ω Ω. Ω. Ω' a α α Ω Ω. Ω. Ω. Ω. Ω. Ω Ω. Ω. Ω. Ω. Ω. Ω. Ω Ω Ω CL CL Ω Ω. α Φ
Φ Φ Φ Φ Φ φ Φ φ Φ Φ Φ φ φ Φ Φ Φ φ φ φ φ φ φ Φ Φ Φ Φ Φ Φ Φ Φ Φ Φ α Φ Φ
O O O O O O O o O o o o o O O O o o o o o o O O O O O O O O O O O o
Oι O CM> Oι m M- I - ' — ' — en en en en en en en oo oo oo oo o o o oo oo oo oo cn en en en cn cn en --? U C C O O S O O M vl - ' O enl CcoO IioO to ro — ' ro ro ro ro ro ro — ' O ^ O 45* vi vi o o o o en cn en vl Cθ O OO vl 45* Oo vl vl O IO O O Oθ vl vl O θ en 45* Q 00 45* 00 00 w ro o o vi co oo ro — ' O -fe co ro ro o — - oo o o o VI — ' vi ^* oo ιo vi o co ιo o 45. en co co oo o o oj cn oo q.
45* to ro oo o n O 0O U M cn vi vi o vi vi co o oo 8 CO .—' ft ≥—' KIO
© lo io io i M to M io to M io io to i ro ro ro io ro ro No ro ro io M i ro ro ro M M io ro ro ro ro io NJ to r^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O o σ
ro ro ro ro — . —. —. —. —. —. —. —. —. —. —. —. —. —. —. —. —. —. —. — . ro ro — ' O θ θ oo vi vi o oι oι θι θι θι θι θι 45. co co ro ro ro ro _ _, r-, r-i (-) (-> (-> 5 O vi vi o o en en en 0J io — ' co ro °-2 vl tO vl O IO — ' Vl O O OO O O CO tO O O O — ' 45. O O 00 VI VI f^ ^ r» v vj vj S 45* — ' 45* 45* vi rθ — ' — ' — ' ^-. VI O O — — ' — ■ — ' cn co Q OO Cn O — ' Cn vl vl vi — < t0 45. — ' O O O CO IO O — ' 45. 4^ 01 10 10 " 2 cn o c^ cn |vθ ro 01 ϋ1 < Ul ^ ^ c ^ 00 0 o o ^_
ro ro ro ro ro ro ro to ro ro — . —. —. —. —. —. —. — —. —. —. —. —. —. — . _ . _ . vi 45* ro co oo co o o o ro oo oo oo o o o vι o cn cn en cn oι ι 4^ co 45. to cn i co — ■ — ■ oo O O 00 vl 00 vl O 45* 45. rO tO 45* 45* 45* 0i Cn o o o j5* ro ro vj ro co o oj to o co ro oj oι o o oι vι .fe — ■ — - o o ro o — - cn — • M o co v!i cn oo oo o _ cn — — ' — ' Cn cn oo O O O O O
O tO O O O CO O O vl .fe. Cn vl O I0 45. Ol O -fe. IO — ' Cn Co rO — ' O C0 00 oo vi o O O O cn vi co co o cn ro — ' 4-> oo o co £* co-ζ3
© lo io io io ro ro io to to to to io io ro ro ro io io io io io fo io to io io ro io ro io to to to to io io ro to ro io ro ro io to ro io ro ro ro io = co oj oj oj co co co co oo oj oo oo oo ro io ro ro ro ro ro ro ro — . —. —. —. —. —. — • _ —. —. —. _. _. _. _. —■ —■ —' O O O O O O O O O O D o
—. —. —. — ' to ro o ro io ro ro — ■ — • o co co co co co oj co oj o co co .fe. 45- co co ro ro ro ro ro ro co
0 00 00 00 45* 45. 45* 45* — ■ o o vl vl _, — ■ cn co 00 ro 00 10 to ro 45* to o o _1 OO OO O Oi Ol 45* 45. rO O O O Oo — ' OO O f-. 45* 45* 45. 45* 45* ^
45. 10 — ' VI S IO — ' o o o en oo o o ro ro o ro ro vi cn VI 45* 45. 5. O Cn 45* C0 t0 vl -. -' -' O O 45. 45* vl vl O O Oι t Ω 0 0 00 00 00 00 0 00 45. — o vl — - o 00 vi o v 000 vl l o 00 cn o vi to ro 45* 4 vi o o oi co ro 45. oo o o o o cn co cn cn vi .fe vi — ' -4-
ιo ro ιo ιo to t M ro ro ro ro ro ro .ι n. κ, 1 ^n f . 0. frι — ■ ro 00 co vj 45* 45* C CO U U CO CO CO C C U U 4* fe. co to to ιo fe. to ro cΛ
O O 45* 45* 00 O to O θ vι o o o o cn 45. ro ro 45* — ' tO vl OO vl ^ l t Oi O CO IO OO O Oo ro vi o co O Oo Oi o — ' vi co r vi o - ro o -■ ft _ VI VI 45. VI VI OJ OJ — ■ — ' IO OO O fe. O vl fe. 00 ^. — ' Ol O fe 00 o 00 00 vl O ro 4. i- oo K) oo cn o co j- - ' Co o o 0 45. 0 — ' 0J VJ O 3
© lo io io io to io ro io ro io io ro io io io io io Nj ro ro ro io io io ro iO NJ to io io fo ro M ro r^ co co co co o co co ω co co o o co co ω ω oj co o eo ω eo ω ω oo oj o oj co co co o ω
o αo oo oo vi o o cπ oι cn cn cn oι .fe .fe co oo oo ro ro ro — ■ — . —■ —. — ' ^.. ,„ <- _, _, o .fe io io vi o ro vi vi io ro ro — OJ M O J. J. M M O C M U CO I ^ ^ S S M M O O' I- CO . vl vl vl vl O O O O O O O
O — ' 45* 4^ vi to O Oi Oi Oi Oi O O OO Oo vl vl vl — 0 4* C O O O c O ^J > r vi cn en co —. —. —. —. —■ Ω
45* 0 0 0 00 — 00 O Ol
J- -O M O O 00 00 o o (-. o o vi vi o o o o -fe cn co cn -fe ro ro co ro co ro co cn co ro — co co ro
—' ro vj cn o 45. vj CO 00 O U C> - ' Co sl ^ OO U t. 00 - ' Vl O O O O tO O tO O O CO O vl vl — ' O O —' — 45* ro eo Oi O CO —• vl Oi o O Ol vl CO vl 45. o oo o ro — O O O O CO i-, 45. — — IO 00 O
m © lo io ro ro to ro to io io io io to ro ro ro ro M io io io ro ro ro NJ io ro ro ro io ro ro ro ro ro ro ro ro ro ro ro ro ; u u co co co co co co co co u co co u u u co u u co co u u co co co co co co co co co u u co co co co u co u co co co co co u co co co O o
Cύ 00 OJ co co co _, O co co co co
O O o o o o o o o
CO co eo co co o 3
OJ co OJ co co c8o"°s- cn cn en cn en oi
Ω Ω. Ω Ω Ω.
_, _, _. _. _ _ . — — . -4 —. — . —. — —. — . _ _ —. —. —. —. —. —. _ —. —. —. —. _. —. —. —. —. —. —. —. _. —. —. —. —. —. —. _. —. — . _. _. _.. o o o oo c» oo oo cD θ o o o o o c^ oi c^ cn oι oi cn 4^ .fe .fe .fe 4^ co co co co co co co co co ιo to ro ro ιo ro — ■ —■ —■ — - o o o o o ^- o o o o o o oo o oo oo vi vi o en oo co co — • — . o o oo oo oi oi ro o o o o o 4> 45. 4i. — ' Oi ro ro o o o o o — < — - cn n n * io y oo o o oo o cn 4-* c^ ^ M OJ N3 0 0 0 0 0 o en o oo ro — ' oo o oo en — ' lo to o K — ■ — ' θo ^. fe. fe. fe. co oo to o — - o o o o o co =5-
lo io io ro ro ro ro ro ro ro ro ro ro ro — ■ — . —. —. — . to ιo to to to ro ro ro ro 45* ro ro eo ιo oo vi vi o vi OO OO vl vl O vl vl vl O vl OO vl o oo en o o vi cn ^ vι o ^. 45* ro oj co cn ro co ιo vi o — ' O o o vi oo vi vi en co vi o co co o o vi O O Oi OO OO — ' VI O IO O — ' — ' E Ό OI O M I U S OI - ' Oo ro oo vi ro vi o ro vi O i ro ro ro ro ro en oi co oo — ' θ ^ ro f-. ιo to oo oo vi en en vi cn ro oo vi en o ro oo fe vι o vι en o en ro o — ' Cn co eo eo co o vi — . VI O
© lo ro ro ro ro ro ro i io ro io ro ND ro io ro ro ro ro ro ro io io io ro ro iO M ro ro to ro ro r^ u u co (o co u u u u u u u (o u u u u u u u u co u u u (o co ω co co u c u u ω co u (o ω u u u u (o c c u u o
ro ro ro ro ro ro ro co ro ro ro ro — ■ — ■ — . —■ —. — . ro ro ro ro ro — ■ — ' —. —. —. — ' — ■ — > CO OJ OJ CO OO OO CO CO CO CO — • —. —• —. —• —■ — > — > — - vi o o cn oi -fe co o oo oo — ■ — > oo oo oo vι vι vι ro ro ro ro — ' O o c co vi ro — ' oo to cn — • co o cn vi co — ■ — ' o o o o cn -fe o _ o_ o_ o o r .o . — ' O On cOn ovl Oo vOi oCO eCno 4—5. ' O45. Co Cιon CroO rIOo —— ' ' oVo1 oVoI oOo o45o. Co0o o0o0 oVoI oC0 —o ' vi o ro o co ro o o cn co oo vi — ' .fe oo cn o vi 00 C O 45. C 0J — - vl vl O Oo Cn — ' — ' 0 0 45* 45* 10 10 0 — ' — ' 45. 0 0 45* 00 0 0
CO I M M M I ω CO CO U I IO IO NJ IO M M IO I M M I IO ^ W M I I -' M W CO Co ω ω CO CO CO CO CO CO M IO I IO I I IO ^ M Cn ro vi oo oo co vi — ' to o o co o o to ro o o ro fe. o f-, ro ro oo ιo — ' ro o oo ro ro o vi oo o o cn vi en o ro ro — ' to — ' to ro ro c» 45. ^-
O OO O CO IO — ' J-* IO OO O O vl OO vl Oι CO IO vl 4> vl O O vl Co .fe. 45* — ' Vl lO CO O O ia. CO vl — ■ — ' M O - ' — ' Vl vl OO O Oo O
-' t θ c oo c> -' n θ M C)3 fe M Cn ω J- co co ^ θ' θ xι » fe ) α oi cn C> co co M > n n α co o> θi w i M >o ω O
© lo io to io iO M io ro ro ro ro Nj ro ro io w M M iO M io ro ro ro ro ro ro ro ro ro io ro ro ro io i^ .fe 45* 45* 45* 45. .fe .fe 45. 45. 4^ 4-* 45* 45* 45* 45* 45. 45. 45. .fe 45* 45. 45» 45» 4-. 45* 45* 4^ o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o CO Co CO CO Co OJ CO CO CO CO OJ co Oo co co Co 1 o o o o o o o o o o o o o o o o o o o o o co co co co co Co CO CO CO CO CO CO CO CO co CO
00 o o o o o o o o o oo 00 00 00 00 00 00 00 00 00 00 oo 00 00 o 00 o00 o 00 o00 00 00 00 00 OO 00 00 00 00 00 00 00 00 00 00 o o o o o O o o O o o o o o o o φ o o o o o o O O o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O o o O o o o o
00 00 CO o o o co Oo co CO 00 CO OJ OJ co CO CO co co co 00 00 OJ OJ Co co CO co 00 co Co co co co CO co CO CO CO CO co co co co co co OJ co co co CO CO co 3 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o OJ CO CO OJ co CO co OJ OJ OJ OJ co co co 00 co Ό to fo ro f io fo io fo fo io io io io fo fo f fo fo fo io o fo f fo fo fo fo fo io fo fo fo fo en en en en Oi Ol Oi Oi Ol en en en Ol en cn en Ω a a a Ω a Ω Ω Ω Ω Ω a a a a a a a a a a a a a Ω CL a a a a a a Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω a a a Ω Ώ a a Ω. φ φ φ Φ φ φ φ Φ φ φ φ φ φ φ φ φ φ φ φ φ Φ φ φ φ φ Φ φ φ φ Φ Φ φ φ φ φ φ φ φ φ φ Φ φ φ Φ φ φ Φ φ Φ φ Φ
O O O O o o O o o o o o o o o o o o o O o o o o o o o o o o o o o o o O O O o O O o o o o o o o o o D
o o o o oi £* 45. co co oj oj oj co ro ro ro oo r-) (-) o o o o o oo oo o o co to ro — • ro ro — . —. —. —. —. —. —. —. —. — . co co — ■ — - co o vi o oi co o en vι o cn en 45. 45. o co oj Qi )v' ^ O θo cn cn cn 45* 45* 45* co o ro o ro — ■ _, c ro vi ' oO Ω? cn o o oo — ' Oo vi o cn — ' — ' vi vi oo o o o ^r co o cn ro ro ro ro cn ro cn ro o co IOo O Ov1i Ov1ι Ov1ι cOn oO oO Oo oO1 oO1 o45* o—o ' oi-. — IO 00 IO IO -' — ' 00 00 C0 tO O 00 fe. t t0 O =5-
TABLE 4
SEQ ID NO: Template ID Component ID Start Stop
24 998036.2.dec 3723695H1 765 949
25 999304.1. dec 2327457T6 1 364
25 999304.1. dec 2327449H1 4 248
25 999304.1. dec 2327457R6 13 402
25 999304.1. dec 6537441 HI 147 499
25 999304.1. dec 5108773H1 196 254
TABLE 5
SEQ ID NO: Template ID Tissue Distribution
1 348736.2.oct Cardiovascular System - 32%, Exocrine Glands - 29%, Hemic and
Immune System - 29%
2 0251 19.6.oct Unclassified/Mixed - 37%, Germ Cells - 31%
3 474539.1.oct Embryonic Structures - 44%, Hemic and Immune System - 26%,
Male Genitalia - 1 1%, Digestive System - 1 1%
4 197170.1.oct Unclassified/Mixed - 48%, Pancreas - 10%, Digestive System - 10%
5 345638. l .oct Liver - 17%
6 408784.1.dec Hemic and Immune System - 57%, Female Genitalia - 21%, Male
Genitalia - 14%
7 246526.2.dec Germ Cells - 1 1%
8 200488.5.dec Endocrine System - 100%
10 335916.2.dec Male Genitalia - 44%, Cardiovascular System - 25%, Exocrine
Glands - 25%
1 1 040422.12.dec Urinary Tract - 100%
12 977651 ,2.dec widely distributed
14 059263.6.dec Hemic and Immune System - 69%, Respiratory System - 23%
15 196774.3.dec Hemic and Immune System - 100%
16 233624.1 1.dec Digestive System - 100%
17 228585.3.dec Nervous System - 34%, Germ Cells - 1 1 %
19 082154.5.dec Cardiovascular System - 33%, Nervous System - 25%, Female
Genitalia - 25%
20 368396.5.dec Unclassified/Mixed - 28%, Hemic and Immune System - 23%
21 349415.4.dec Skin - 28%, Musculoskeletal System - 25%, Exocrine Glands - 13%,
Hemic and Immune System - 13%
23 330933.5.dec Digestive System - 100%
24 998036.2. dec Exocrine Glands - 25%, Hemic and Immune System - 25%, Nervous
System - 24%
25 999304.1.dec Digestive System - 50%, Female Genitalia - 30%, Male Genitalia -
20%
TABLE 6
Program Description Reference Parameter Threshold ABI FACTURA A program that removes vector sequences and masks PE Biosystems, Foster City, CA. ambiguous bases in nucleic acid sequences.
ABI PARACEL FDF A Fast Data Finder useful in comparing and annotating PE Biosystems, Foster City, CA; Mismatch <50% amino acid or nucleic acid sequences.
ABI AutoAssembler A program that assembles nucleic acid sequences. PE Biosystems, Foster City, CA.
BLAST A Basic Local Alignment Search Tool useful in sequence Altschul, S.F. et al. (1990) J. Mol. Biol. ESTs: Probability value= 1.0E-8 or less; Ful similarity search for amino acid and nucleic acid 215:403-410; Altschul, S.F. et al. (1997) Length sequences: Probability value= 1.0E- sequences. BLAST includes five functions: blastp, Nucleic Acids Res. 25:3389-3402. 10 or less blastn, blastx, tblastn, and tblastx.
FASTA A Pearson and Lipman algorithm that searches for Pearson, W.R. and D.J. Lipman (1988) Proc. ESTs: fasta E value=1.06E-6; Assembled similarity between a query sequence and a group of Natl. Acad Sci. USA 85:2444-2448; ESTs: fasta Identity^ 95% or greater and sequences of the same type. FASTA comprises as least Pearson, W.R. (1990) Methods Enzymol. Match length=200 bases or greater; fastx E five functions: fasta, tfasta, fastx, tfastx, and ssearch. 183:63-98; and Smith, T.F. and M.S. value=1.0E-8 or less; Full Length sequences Waterman (1981) Adv. Appl. Math. 2:482- fastx score=100 or greater 489.
BLIMPS A BLocks IMProved Searcher that matches a sequence Henikoff, S. and J.G. Henikoff (1991) Score=1000 or greater; Ratio of against those in BLOCKS, PRINTS, DOMO, PRODOM, Nucleic Acids Res. 19:6565-6572; Henikoff, Score/Strength = 0.75 or larger; and, if and PFAM databases to search for gene families, J.G. and S. Henikoff (1996) Methods applicable, Probability value= 1.0E-3 or less sequence homology, and structural fingerprint regions. Enzymol. 266:88-105; and Attwood, T.K. et al. (1997) J. Chem. Inf. Comput. Sci. 37:417-
424.
HMMER An algorithm for searching a query sequence against Krogh, A. et al. (1994) J. Mol. Biol. Score= 10-50 bits for PFAM hits, depending hidden Markov model (HMM)-based databases of 235:1501-1531; Sonnhammer, E.L.L. et al. on individual protein families protein family consensus sequences, such as PFAM. (1988) Nucleic Acids Res. 26:320-322.
TABLE 6
Program Description Reference Parameter Threshold ProfileScan An algorithm that searches for structural and sequence Gribskov, M. et al. (1988) CABIOS 4:61- Normalized quality score≥GCG-specified motifs in protein sequences that match sequence patterns 66; Gribskov, M. et al. (1989) Methods "HIGH" value for that particular Prosite defined in Prosite. Enzymol. 183:146-159; Bairoch, A. et al. motif. Generally, score= 1.4-2.1.
(1997) Nucleic Acids Res. 25:217-221.
Phred A base-calling algorithm that examines automated Ewing, B. et al. (1998) Genome Res. 8:175- sequencer traces with high sensitivity and probability. 185; Ewing, B. and P. Green (1998) Genome Res. 8: 186-194.
Phrap A Phils Revised Assembly Program including SWAT Smith, T.F. and M.S. Waterman (1981) Score= 120 or greater; Match length= 56 or and CrossMatch, programs based on efficient Adv. Appl. Math. 2:482-489; Smith, T.F. greater implementation of the Smith-Waterman algorithm, and M.S. Waterman (1981) J. Mol. Biol. useful in searching sequence homology and assembling 147:195-197; and Green, P., University of DNA sequences. Washington, Seattle, WA. o
Consed A graphical tool for viewing and editing Phrap Gordon, D. et al. (1998) Genome Res. 8:195- assemblies. 202.
SPScan A weight matrix analysis program that scans protein Nielson, H. et al. (1997) Protein Engineering Score=3.5 or greater sequences for the presence of secretory signal peptides. 10: 1-6; Claverie, J.M. and S. Audic (1997) CABIOS 12:431-439.
Motifs A program that searches amino acid sequences for Bairoch, A. et al. (1997) Nucleic Acids Res. patterns that matched those defined in Prosite. 25:217-221; Wisconsin Package Program Manual, version 9, page M51-59, Genetics Computer Group, Madison, WI.

Claims

CLAIMSWhat is claimed is:
1. An isolated polynucleotide comprising a polynucleotide sequence selected from the group 5 consisting of: a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25, b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25, c) a polynucleotide sequence complementary to a), o d) a polynucleotide sequence complementary to b), and e) an RNA equivalent of a) through d).
2. An isolated polynucleotide of claim 1 , comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-25. 5
3. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide of claim 1.
4. A composition for the detection of expression of disease detection and treatment molecule 0 polynucleotides comprising at least one of the polynucleotides of claim 1 and a detectable label.
5. A method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide of claim 1, the method comprising: a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction 5 amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
6. A method for detecting a target polynucleotide in a sample, said target polynucleotide o comprising a sequence of a polynucleotide of claim 1 , the method comprising: a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide or fragments thereof, and 5 b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof.
7. A method of claim 5, wherein the probe comprises at least 30 contiguous nucleotides.
8. A method of claim 5, wherein the probe comprises at least 60 contiguous nucleotides.
9. A recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide of claim 1.
10. A cell transformed with a recombinant polynucleotide of claim 9.
1 1. A transgenic organism comprising a recombinant polynucleotide of claim 9.
12. A method for producing a disease detection and treatment molecule polypeptide, the method comprising: a) culturing a cell under conditions suitable for expression of the disease detection and treatment molecule polypeptide, wherein said cell is transformed with a recombinant polynucleotide of claim 9, and b) recovering the disease detection and treatment molecule polypeptide so expressed.
13. A purified disease detection and treatment molecule polypeptide encoded by at least one of the polynucleotides of claim 2.
14. An isolated antibody which specifically binds to a disease detection and treatment molecule polypeptide of claim 13.
15. A method of identifying a test compound which specifically binds to the disease detection and treatment molecule polypeptide of claim 13, the method comprising the steps of: a) providing a test compound; b) combining the disease detection and treatment molecule polypeptide with the test compound for a sufficient time and under suitable conditions for binding; and c) detecting binding of the disease detection and treatment molecule polypeptide to the test compound, thereby identifying the test compound which specifically binds the disease detection and treatment molecule polypeptide.
16. A microarray wherein at least one element of the microarray is a polynucleotide of claim 3.
17. A method for generating a transcript image of a sample which contains polynucleotides, the method comprising the steps of: a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray of claim 16 with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
18. A method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence of claim 1 , the method comprising: a) exposing a sample comprising the target polynucleotide to a compound, under conditions suitable for the expression of the target pofynucleotide, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound.
19. A method for assessing toxicity of a test compound, said method comprising: a) treating a biological sample containing nucleic acids with the test compound ; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide of claim 1 under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide of claim 1 or fragment thereof; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
EP00965340A 1999-09-28 2000-09-22 Molecules for disease detection and treatment Withdrawn EP1222258A2 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US15656599P 1999-09-28 1999-09-28
US156565 1999-09-28
US16819799P 1999-11-30 1999-11-30
US168197 1999-11-30
PCT/US2000/026085 WO2001023538A2 (en) 1999-09-28 2000-09-22 Molecules for disease detection and treatment

Publications (1)

Publication Number Publication Date
EP1222258A2 true EP1222258A2 (en) 2002-07-17

Family

ID=26853314

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00965340A Withdrawn EP1222258A2 (en) 1999-09-28 2000-09-22 Molecules for disease detection and treatment

Country Status (4)

Country Link
EP (1) EP1222258A2 (en)
AU (1) AU7607200A (en)
CA (1) CA2385791A1 (en)
WO (1) WO2001023538A2 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6458939B1 (en) * 1996-03-15 2002-10-01 Millennium Pharmaceuticals, Inc. Compositions and methods for the diagnosis, prevention, and treatment of neoplastic cell growth and proliferation
EP0948531A1 (en) * 1996-12-11 1999-10-13 Chiron Corporation Secreted human proteins
JP2001526546A (en) * 1997-06-11 2001-12-18 アボツト・ラボラトリーズ Reagents and methods useful for detecting lung disease
WO1999002724A2 (en) * 1997-07-11 1999-01-21 Mount Sinai Hospital Corporation Methods for identifying genes expressed in selected lineages, and a novel genes identified using the methods
WO1999046293A1 (en) * 1998-03-12 1999-09-16 Shanghai Second Medical University A zinc finger protein derived from hematopoietic cells
WO2000000513A1 (en) * 1998-06-29 2000-01-06 Shanghai Second Medical University The human gene krab-np (npaawe05)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0123538A3 *

Also Published As

Publication number Publication date
AU7607200A (en) 2001-04-30
WO2001023538A2 (en) 2001-04-05
WO2001023538A3 (en) 2002-05-02
CA2385791A1 (en) 2001-04-05

Similar Documents

Publication Publication Date Title
CA2447212A1 (en) Secretory molecules
CA2419943A1 (en) Secretory molecules
CA2420983A1 (en) Molecules for disease detection and treatment
US20050095587A1 (en) Molecules for disease detection and treatment
WO2003062379A2 (en) Molecules for disease detection and treatment
US20040058365A1 (en) Molecules for disease detection and treatment
EP1263949A2 (en) Secretory polypeptides and corresponding polynucleotides
CA2374822A1 (en) Molecules for disease detection and treatment
EP1311664A2 (en) Microtubule-associated proteins and tubulins
US20040142331A1 (en) Molecules for disease detection and treatment
EP1222258A2 (en) Molecules for disease detection and treatment
WO2002046413A2 (en) Molecules for disease detection and treatment
EP1472285A2 (en) Secretory molecules
EP1200571A1 (en) Secretory molecules
EP1360295A2 (en) Molecules for desease detection and treatment
EP1265918A1 (en) Human immune response proteins
US20040023251A1 (en) Cell cycle proteins and mitosis-associated molecules
US20030119737A1 (en) Human immune response proteins
US20030208040A1 (en) G-protein associated molecules
WO2002092759A9 (en) Molecules for disease detection and treatment
EP1305340A2 (en) Sequences for integrin alpha-8
EP1220907A2 (en) Human secretory molecules
WO2002010200A2 (en) Pas domain proteins
EP1390396A2 (en) Molecules for disease detection and treatment

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020417

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

RIN1 Information on inventor provided before grant (corrected)

Inventor name: BANVILLE, STEVEN, C.

Inventor name: SPIRO, PETER, A.

Inventor name: BRATCHER; SHAWN, R.

Inventor name: DUFOUR; GERARD, E.

Inventor name: COHEN, HOWARD, J.

Inventor name: HODGSON, DAVID, M.

Inventor name: LINCOLN, STEPHEN, E.

Inventor name: RUSSO, FRANK, D.

RIN1 Information on inventor provided before grant (corrected)

Inventor name: BRATCHER; SHAWN, R.

Inventor name: COHEN, HOWARD, J.

Inventor name: HODGSON, DAVID, M.

Inventor name: BANVILLE, STEVEN, C.

Inventor name: SPIRO, PETER, A.

Inventor name: RUSSO, FRANK, D.

Inventor name: DUFOUR; GERARD, E.

Inventor name: LINCOLN, STEPHEN, E.

RIN1 Information on inventor provided before grant (corrected)

Inventor name: BANVILLE, STEVEN, C.

Inventor name: COHEN, HOWARD, J.

Inventor name: BRATCHER; SHAWN, R.

Inventor name: LINCOLN, STEPHEN, E.

Inventor name: DUFOUR; GERARD, E.

Inventor name: RUSSO, FRANK, D.

Inventor name: SPIRO, PETER, A.

Inventor name: HODGSON, DAVID, M.

17Q First examination report despatched

Effective date: 20030704

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20041105