[go: up one dir, main page]

AU749606B2 - Characterization of the yeast transcriptome - Google Patents

Characterization of the yeast transcriptome Download PDF

Info

Publication number
AU749606B2
AU749606B2 AU59280/98A AU5928098A AU749606B2 AU 749606 B2 AU749606 B2 AU 749606B2 AU 59280/98 A AU59280/98 A AU 59280/98A AU 5928098 A AU5928098 A AU 5928098A AU 749606 B2 AU749606 B2 AU 749606B2
Authority
AU
Australia
Prior art keywords
cell
open reading
reading frame
norf
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
AU59280/98A
Other versions
AU749606C (en
AU5928098A (en
Inventor
Kenneth W. Kinzler
Victor E. Velculescu
Bert Vogelstein
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Johns Hopkins University
Original Assignee
Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Johns Hopkins University filed Critical Johns Hopkins University
Publication of AU5928098A publication Critical patent/AU5928098A/en
Application granted granted Critical
Publication of AU749606B2 publication Critical patent/AU749606B2/en
Publication of AU749606C publication Critical patent/AU749606C/en
Anticipated expiration legal-status Critical
Expired legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/37Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from fungi
    • C07K14/39Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from fungi from yeasts
    • C07K14/395Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from fungi from yeasts from Saccharomyces

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Mycology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Microbiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Medicines Containing Material From Animals Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Description

WO 98/32847 PCT/US98/01216 CHARACTERIZATION OF THE YEAST TRANSCRIPTOME TECHNICAL FIELD OF THE INVENTION This invention is related to the characterization of the expressed genes of the yeast genome. More particularly, it is related to the identification and use of previously unrecognized genes.
BACKGROUND OF THE INVENTION It is by now axiomatic that the phenotype of an organism is largely determined by the genes expressed within it. These expressed genes can be represented by a "transcriptome", conveying the identity of each expressed gene and its level of expression for a defined population of cells. Unlike the genome, which is essentially a static entity, the transcriptome can be modulated by both external and internal factors. The transcriptome thereby serves as a dynamic link between an organism's genome and its physical characteristics.
The transcriptome as defined above has not been characterized in any eukaryotic or prokaryotic organism, largely because of technological limitations. However, some general features of gene expression patterns were elucidated two decades ago through RNA-DNA hybridization measurements (Bishop et al., 1974; Hereford and Rosbash, 1977). In many organisms, it was thus found that at least three classes of transcripts could be identified, with either high, medium, or low levels of expression, and the WO 98/32847 PCT/US98/01216 number of transcripts per cell were estimated (Lewin, 1980). These data of course provided little information about the specific genes that were members of each class. Data on the expression levels of individual genes have accumulated as new genes were discovered. However, in only a few instances have the absolute levels of expression of particular genes been measured and compared to other genes in the same cell type.
Description of any cell's transcriptome would therefore provide new information useful for understanding numerous aspects of cell biology and biochemistry.
SUMMARY OF THE INVENTION It is an object of the present invention to provide genes which are involved in cell cycle progression.
It is another object of the present invention to provide methods of using the genes to affect the cell cycle.
It is an object of the present invention to provide methods for screening candidate antifungal drugs.
Another object of the invention is to provide a method for obtaining human homologs of the yeast genes which are involved in cell cycle progression.
Another object of the invention is to provide probes for ascertaining phase in the cell cycle of a cell.
These and other objects of the invention are achieved by providing the art with one or more of the embodiments described below. According to one embodiment of the invention an isolated DNA molecule is provided. It comprises a yeast gene which is involved in cell cycle progression selected from the group of NORF genes identified in Table 3 or 4.
According to another embodiment of the invention a method of using yeast genes is provided. The method is for affecting the cell cycle of a cell.
The method comprises the step of: administering to a cell an isolated DNA molecule comprising a WO 98/32847 PCTIUS98/01216 yeast gene which is involved in cell cycle progression selected from the differentially expressed genes identified in Tables 1, 2, 3 and 4.
In yet another embodiment of the invention a method for screening candidate antifungal drugs is provided. The method comprises the steps of: contacting a test substance with a yeast cell; monitoring expression of a yeast gene which is involved in cell cycle progression selected from the group of yeast genes identified in Tables 1, 2, 3 and 4, wherein a test substance which modifies the expression of the yeast gene is a candidate antifungal drug.
In still another embodiment of the invention a method for identifying human genes which are involved in cell cycle progression is provided. The method comprises the step of: hybridizing a probe comprising at least 14 contiguous nucleotides of a yeast gene which is differentially expressed between at least two phases selected from the group consisting of log phase, S phase, and G2/M phase, wherein the yeast gene is identified in Table 1, 2, 3, or 4.
Also provided by the present invention are isolated DNA molecules, which comprise probes for ascertaining phase in the cell cycle of a cell, wherein the probe comprises at least 14 contiguous nucleotides of a NORF gene as identified in Table 3 or 4.
These and other embodiments of the invention which will be apparent to those of skill in the art upon reading the detailed disclosure provided below, make available to the art hitherto unrecognized genes, and information about the expression of genes globally at the organismal level. We provide the first description of a transcriptome, determined in S. cerevisiae cells. This organism was chosen because it is widely used to clarify the biochemical and physiologic parameters underlying eukaryotic cellular functions and because it is the only eukaryote in which the entire genome has been defined at the nucleotide level (Goffeau, et al., 1996).
WO 98/32847 PCT/US98/01216 BRTEF DIESCRIPTION OF THE DRAWINGS Figure 1. Schematic of SAGE Method and Genome Analysis.
In applying SAGE to the analysis of yeast gene expression patterns, the 3' most Nlail site was used to define a unique position in each transcript and to provide a site for ligation of a linker with a BsmFI site. The type Us enzyme BsmFI, which cleaves a defined distance from its non-palindromic recognition site, was then used to generate a 15bp SAGE tag (designated by the black arrows), which includes the NlaIII site. Automated sequencing of concatenated SAGE tags allowed the routine identification of about a thousand tags per sequencing gel. Once sequenced, the abundance of each SAGE tag was calculated, and each tag was used to search the entire yeast genome to identify its corresponding gene. The lower panel shows a small region of Chromosome 15. Gray arrows indicate all potential SAGE tags (Niall sites) and black arrows indicate 3' most SAGE tags. The total number of tags observed for each potential tag is indicated above strand) or below strand) the tag. As expected, the observed SAGE tags were associated with the 3' end of expressed genes.
Figure 2. Sampling of Yeast Gene Expression.
Analysis of increasing amounts of ascertained tags reveals a plateau in the number of unique expressed genes. Triangles represent genes with known functions, squares represent genes predicted on the basis of sequence information, and circles represent total genes.
Figure 3. Virtual Rot.
Abundance Classes in the Yeast Transcriptome. The transcript abundance is plotted in reverse order on the abscissa, whereas the fraction of total transcripts with at least that abundance is plotted on the ordinate. The dotted lines identify the three components of the curve, 1, 2, and 3. This is analogous to a Rot curve derived from reassociation kinetics where the product of initial RNA concentration and time is plotted on the abscissa, and 4 WO 98/32847 PCT/US98/01216 the percent of labeled cDNA that hybridizes to excess mRNA is plotted on the ordinate.
Comparison of Virtual Rot and Rot Components. Transitions and data from virtual Rot components were calculated from the data in Figure 3A, while data for Rot components were obtained from Hereford and Rosbash, 1977.
Figure 4. Chromosomal Expression Map for S. cerevisiae. Individual yeast genes were positioned on each chromosome according to their open reading frame (ORF) start coordinates. Abundance levels of tags corresponding to each gene are displayed on the vertical axis, with transcription from the strand indicated above the abscissa and that from the strand indicated below.
Yellow bands at ends of the expanded chromosome represent telomeric regions that are undertranscribed (see text for details).
Figure 5. Northern Blot Analysis of Representative Genes. TDH2/3, TEF1/2 and NORF1, are expressed relatively equally in all three states (lane 1, G2/M arrested; lane 2, S phase arrested; lane 3, log phase), while RNR4, RNR2, and NORF5 are highly expressed in S-phase arrested cells. The expression level observed by SAGE (number of tags) is noted below each lane and was highly correlated with quantitation of the Northern blot by PhosphorImager analysis (r-0.97).
WO 98/32847 PCT/US98/01216 Table Legends Table 1. Highly Expressed Genes Tag represents the 10 bp SAGE tag adjacent to the NlaIII site; Gene represents the gene or genes corresponding to a particular tag (multiple genes that match unique tags are from related families, with an average identity of Locus and Description denote the locus name, and functional description of each ORF, respectively; Copies/cell represents the abundance of each transcript in the SAGE library, assuming 15,000 total transcripts per cell and 60,633 ascertained transcripts.
Table 2. Expression of Putative Coding Sequences Table columns are the same as for Table 1.
Table 3. Expression of NORF genes SAGE Tag, Locus, and Copies/cell are the same as for Table 1; Chr and Tag Pos denote the chromosome and position of each tag; ORF Size denotes the size of the ORF corresponding to the indicated tag. In each case, the tag was located within or less than 250 bp 3' of the NORF.
DETAILED DESCRIPTION It is a discovery of the present invention that certain hitherto unknown genes (the NORFs) exist and are expressed in yeast. These genes, as well as other previously identified and previously postulated genes, can be used to study, monitor, and affect phase of cell cycle. The present invention provides information on which genes are differentially expressed during the cell cycle.
Differentially expressed genes can be used as markers of phases of the cell cycle. They can also be used to affect a change in the phase of the cell cycle.
In addition, they can be used to screen for drugs which affect the cell cycle, by affecting expression of the genes. Human homologs of these eukaryotic genes are also presumed to exist, and can be identified using the yeast genes as probes or primers to identify the human homologs.
WO 98/32847 PCT/US98/01216 New genes termed NORFs (not previously assigned open reading frames) have been found. They are uniquely identified by their SAGE tags.
In addition their entire nucleotide sequence is known and publicly available.
In general, these were not previously identified as genes due to their small size. However, they have now been found to be expressed.
Differentially expressed yeast genes are those whose expression varies by a statistically significant difference (to greater than 95% confidence level) within different growth phases, particularly log phase, S phase, and G2/M.
Preferably the difference is greater than 10%, 25%, 50%, or 100%. The genes which have been found to have such differential expression characteristics are: NORF N2 1, 2, 4, 5, 6, 17, 25, 27, TEF1/TEF2, EN02, ADH1, ADH2, PGK1, CUP1A/CUPIB, PYK1, YKL056C, YMR116C, YEL033W, YOR182C, YCR013C, ribonucleotide reductase 2 and 4, and YJR085C.
The DNA molecules according to the invention can be genomic or cDNA. Preferably they are isolated free of other cellular components such as membrane components, proteins, and lipids. They can be made by a cell and isolated, or synthesized using PCR or an automatic synthesizer. Any technique for obtaining a DNA of known sequence may be used. Methods for purifying and isolating DNA are routine and are known in the art.
To administer yeast genes to cells, any DNA delivery techniques known in the art may be used, without limitation. These include liposomes, transfection, transduction, transformation, viral infection, electroporation.
Vectors for particular purposes and characteristics can be selected by the skilled artisan for their known properties. Cells which can be used as gene recipients are yeast and other fungi, mammalian cells, including humans, and bacterial cells.
Antifungal drugs can be identified using yeast cells as described herein.
Expression of a differentially expressed gene can be monitored by any means known in the art. When a test substance affects the expression of such a differentially expressed gene, it is a candidate drug for affecting the growth WO 98/32847 PCT/US98/01216 properties of fungi, and may be useful as an antifungal agent.
Because differentially expressed genes are likely to be involved in cell cycle progression, it is likely that these genes are conserved among species.
The differentially expressed genes identified by the present invention can be used to identify homologs in humans and other mammals. Means for identifying homologous genes among different species are well known in the art. Briefly, stringency of hybridization can be reduced so that imperfectly matching sequences hybridize. This can be in the context of inter alia Southern blots, Northern blots, colony hybridization or PCR. Any hybridization technique which is known in the art can be used.
Probes according to the present invention are isolated DNA molecules which have at least 10, and preferably at least 12, 14, 16, 18, 20, or contiguous nucleotides of a particular NORF gene or other differentially expressed gene. The probes may or may not be labeled. They may be used as primers for PCR or for Southern or Northern blots. Preferably the probes are anchored to a solid support. More preferably they are present on an array so that multiple probes can simultaneously hybridize to a single biological sample. The probes can be spotted onto the array or synthesized in situ on the array. See Lockhart et. al., Nature Biotechnology, Vol. 14, December 1996, "Expression monitoring by hybridization to high-density oligonucleotide arrays." A single array can contain more than 100, 500 or even 1,000 different probes in discrete locations.
The above disclosure generally describes the present invention. A more complete understanding can be obtained by reference to the following specific examples which are provided herein for purposes of illustration only, and are not intended to limit the scope of the invention.
EXAMPLE
Summary We have analyzed the set of genes expressed from the yeast genome, herein WO 98/32847 PCT/US98/01216 called the transcriptome, using serial analysis of gene expression (SAGE).
Analysis of 60,633 transcripts revealed 4,665 genes, with expression levels ranging from 0.3 to over 200 transcripts per cell. Of these genes, 1,981 had known functions, while 2,684 were previously uncharacterized. Integration of positional information with gene expression data allowed the generation of chromosomal expression maps, identifying physical regions of transcriptional activity, and identified genes that had not been predicted by sequence information alone. These studies provide insight into global patterns of gene expression in yeast and demonstrate the feasibility of genome-wide expression studies in eukaryotes.
Results Characteristics and Rationale of SAGE Approach Several methods have recently been described for the high throughput evaluation of gene expression (Nguyen et al., 1995; Schena et al., 1995; Velculescu et al., 1995). We used SAGE (Serial Analysis of Gene Expression) because it can provide quantitative gene expression data without the prerequisite of a hybridization probe for each transcript. The SAGE technology is based on two basic principles (Figure First, a short sequence tag (9-11 bp) contains sufficient information to uniquely identify a transcript, provided that it is derived from a defined location within that transcript. Second, many transcript tags can be concatenated into a single molecule and then sequenced, revealing the identity of multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags and identifying the gene corresponding to each tag.
Genome-wide expression In order to maximize representation of genes involved in normal growth and cell-cycle progression, SAGE libraries were generated from yeast cells in three states: log phase, S phase arrested and G2/M phase arrested. In total, WO 98/32847 WO 9832847PCTIUS98/01216 SAGE tags correspondin to 60,633 total transcripts were identified (including 20,184 from log phase, 20,03 4 from S phase arrested, and 20,415 from G2/M phase arrested cells). Of these tags, 56,291 tags precisely matched the yeast genome, 88 tags matched the midtochondrial genome, and 91 tags matched the 2 micron plasmid.
The number of SAGE tags required to define a yeast transcriptome depends on the confidence level desired for detecting low abundance mRNA molecules. Assuming the previously derived estimate of 15,000 mRNA molecules per cell (Hereford and Rosbash, 1977), 20,000 tags would represent a 1.3 fold coverage even for mRNA molecules present at a single copy per cell, and would provide a 72% probability of detecting such transcripts (as determined by Monte Carlo simulations). Analysis of 20,184 tags from log phase cells identified 3,298 unique genes. As an independent confirmation of mRNA copy number per cell, we compared the expression level of SUP44IRPS4, one of the few genes whose absolute mRNA levels have been reliably determined by quantitative hybridization experiments (Iyer and Struhl, 1996), with expression levels determined by SAGE.
SUP44/RPS4 was measured by hybridization at 75 10 copies/cell (Iyer and Struhl, 1996), in good accord with the SAGE data of 63 copies/cell, suggesting that the estimate of 15,000 mRNA molecules per cell was reasonably accurate. Analysis of SAGE tags from S phase arrested and G2/M phase arrested cells revealed similar expression levels for this gene (range 52 to 55 copies/cell), as well as for the vast majority of expressed genes. As less than 1% of the genes were expressed at dramatically different levels among these three states (see below), SAGE tags obtained from all libraries were combined and used to analyze global patterns of gene expression.
Analysis of ascertained tags at increasing increments revealed that the number of unique transcripts plateaued at -60,000 tags (Figure This suggested that generation of further SAGE tags would yield few additional genes, consistent with the fact that sixty thousand transcripts represented a four-fold redundancy for genes expressed as low as 1 transcript per cell.
WO 98/32847 PCT/US98/01216 Likewise, Monte Carlo simulations indicated that analysis of 60,000 tags would identify at least one tag for a given transcript 97% of the time if its expression level was one copy per cell.
The 56,291 tags that precisely matched the yeast genome represented 4,665 different genes. This number is in agreement with the estimate of 3,000 to 4,000 expressed genes obtained by RNA-DNA reassociation kinetics (Hereford and Rosbash, 1977). These expressed genes included 85% of the genes with characterized functions (1,981 of 2,340), and 76% of the total genes predicted from analysis of the yeast genome (4,665 of 6,121). These numbers are consistent with a relatively complete sampling of the yeast transcriptome given the limited number of physiological states examined and the large number of genes predicted solely on the basis ofgenomic sequence analysis.
The transcript expression per gene was observed to vary from 0.3 to over 200 copies per cell. Analysis of the distribution of gene expression levels revealed several abundance classes that were similar to those observed in previous studies using reassociation kinetics. A "virtual Rot" of the genes observed by SAGE (Figure 3A) identified three main components of the transcriptome with abundances ranging over three orders of magnitude. A Rot curve derived from RNA-cDNA reassociation kinetics also contained three main components distributed over a similar range of abundances (Hereford and Rosbash, 1977). Although the kinetics of reassociation of a particular class of RNA and cDNA may be affected by numerous experimental variables, there were striking similarities between Rot and virtual Rot analyses (Figure 3B). Because Rot analysis may not detect all transcripts of low abundance (Lewin, 1980), it is not surprising that SAGE revealed both a larger total number of expressed genes and a higher fraction of the transcriptome belonging to the low abundance transcript class.
Integration of Expression Information with the Genomic Map The SAGE expression data could be integrated with existing positional WO 98/32847 PCT/US98/01216 information to generate chromosomal expression maps (Figure These maps were generated using the sequence of the yeast genome and the position coordinates of ORFs obtained from the Stanford Yeast Genome Database.
Although there were a few genes that were noted to be physically proximal and have similarly high levels of expression, there did not appear to be any clusters of particularly high or low expression on any chromosome. Genes like histones H3 and H4, which are known to have coregulated divergent promoters and are immediately adjacent on chromosome 14 (Smith and Murray, 1983), had very similar expression levels (5 and 6 copies per cell, respectively). The distribution of transcripts among the chromosomes suggested that overall transcription was evenly dispersed, with total transcript levels being roughly linearly related to chromosome size (r 2 =0.85, data not shown). However, regions within 10 kb of telomeres appeared to be uniformly undertranscribed, containing on average 3.2 tags per gene as compared with 12.4 tags per gene for non-telomeric regions (Figure This is consistent with the previously described observations of "telomeric silencing" in yeast (Gottschling et al., 1990). Recent studies have reported telomeric position effects as far as 4 kb from telomere ends (Renauld et al., 1993).
Gene Expression Patterns Table 1 lists the 30 most highly expressed genes, all of which are expressed at greater than 60 mRNA copies per cell. As expected, these genes mostly correspond to well characterized enzymes involved in energy metabolism and protein synthesis and were expressed at similar levels in all three growth states (Examples in Figure Some of these genes, including EN02 (McAlister and Holland, 1982), PDC1 (Schmitt et al., 1983), PGK1 (Chambers et al., 1989), PYK1 (Nishizawa et al., 1989), and ADH1 (Denis et al., 1983), are known to be dramatically induced in the glucose-rich growth conditions used in this study. In contrast, glucose repressible genes such as the GAL1/GALGALALIO cluster (St John and Davis, 1979), and GAL3 (Bajwa WO 98/32847 PCT/US98/01216 et al., 1988) were observed to be expressed at very low levels (0.3 or fewer copies per cell). As expected for the yeast strain used in this study, mating type a specific genes, such as the a factor genes (MFAJ, MFA2) (Michaelis and Herskowitz, 1988), and alpha factor receptor (STE2) (Burkholder and Hartwell, 1985) were all observed to be expressed at significant levels (range 2 to 10 copies per cell), while mating type alpha specific genes (MFaI, MFa2, STE3) (Hagen et al., 1986; Kurjan and Herskowitz, 1982; Singh et al., 1983) were observed to be expressed at very low levels copies/cell).
Three of the highly expressed genes in Table 1 had not been previously characterized. One contained an ORF with predicted ribosomal function, previously identified only by genomic sequence analysis. Analyses of all SAGE data suggested that there were 2,684 such genes corresponding to uncharacterized ORFs which were transcribed at detectable levels. The most abundant of these transcripts were observed more than 30 times, corresponding to at least 8 transcripts per cell (Table The other two highly expressed uncharacterized genes corresponded to ORFs not predicted by analysis of the yeast genome sequence (NORF Nonannotated ORF).
Analyses of SAGE data suggested that there were approximately 160 NORF genes transcribed at detectable levels. The 30 most abundant of these transcripts were observed at least 9 times (Table 3 and examples in Figure Interestingly, one of the NORF genes (NORF5) was only expressed in S phase arrested cells and corresponded to the transcript whose abundance varied the most in the three states analyzed 49 fold, Figure Comparison ofS phase arrested cells to the other states also identified greater than 9 fold elevation of the RNR2 and RNR4 transcripts (Figure Induction of these ribonucleoside reductase genes is likely to be due to the hydroxyurea treatment used to arrest cells in S phase (Elledge and Davis, 1989).
Likewise, comparison ofG2/M arrested cells identified elevation ofRBL2 and dynein light chain, both microtubule associated proteins (Archer et al., 1995; Dick et al., 1996). As with the RNR inductions, these elevated levels seem likely to be related to the nocodazole treatment used to arrest cells in WO 98/32847 PCT/US98/01216 the G2/M phase. While there were many relatively small differences between the states (for example, NORFI, Figure overall comparison of the three states revealed surprisingly few dramatic differences; there were only 29 transcripts whose abundance varied more than 10 fold among the three different states analyzed.
Discussion Analysis of a yeast transcriptome affords a unique view of the RNA components defining cellular life. We observed gene expression levels to vary over three orders of magnitude, with the transcripts involved in energy metabolism and protein synthesis the most highly expressed. Key transcripts, such as those encoding enzymes required for DNA replication POLl and POL3), kinetochore proteins (NDCO and SKPI), and many other interesting proteins, were present at 1 or fewer copies per cell on average. These abundances are consistent with previous qualitative data from reassociation kinetics which suggested that the largest number of expressed genes was present at 1 or 2 copies per cell. These observations indicate that low transcript copy numbers are sufficient for gene expression in yeast, and suggest that yeast possess a mechanism for rigid control of RNA abundance.
The synthesis of chromosomal expression maps presents a cataloging of the expression level of genes, organized by their genomic positions. It is not surprising that gene expression is well balanced throughout the 16 chromosomes of S. cerevisiae. As most genes have independent regulatory elements, it would have been surprising to find a large number of physically adjacent genes that had similar high levels of expression. Of the few genes that were known to have coregulated divergent promoters, like the H3/H4 pair, SAGE data confirmed concordant levels of expression. For areas like telomere ends that are known to be transcriptionally suppressed, SAGE data corroborated low levels of expression. Other expected expression patterns such as high levels of glucose induced glycolytic enzymes, low levels of glucose repressed GAL genes, expression of mating type a specific genes, and WO 98/32847 PCT/US98/01216 low of expression of mating type alpha genes, were observed. Finally, identification of tags corresponding to NORF genes suggests that there is a significant number of small proteins encoded by the yeast genome that were undetected by the criteria used for systematic sequence analysis. The yeast genome sequence has been annotated for all ORFS larger than 300bp, (encoding proteins 100 amino acids or greater). Genes encoding proteins below this cut off are therefore commonly unannotated. This class of genes might also be underrepresented in mutational collections because of the small target size for mutagenesis, and given their small size, may encode proteins with novel functions. The systematic knockout of these NORF genes will therefore be of great interest.
Comparison of gene expression patterns from altered physiologic states can provide insight into genes that are important in a variety of processes.
Comparison of transcriptomes from a variety of physiologic states should provide a minimum set of genes whose expression is required for normal vegetative growth, and another set composed of genes that will be expressed only in response to specific environmental stimuli, or during specialized processes. For example, recent work has defined a minimal set of 250 genes required for prokaryotic cellular life (Mushegian and Koonin, 1996).
Examination of the yeast genome readily identified homologous genes for 196 of these, over 90% of which were observed to be expressed in the SAGE analysis. Detailed analyses of yeast transcriptomes, as well as transcriptomes from other organisms, should ultimately allow the generation of a minimal set of genes required for eukaryotic life.
Like other genome-wide analyses, SAGE analysis of yeast transcriptomes has several potential limitations. First, a small number of transcripts would be expected to lack an NlaIII site and therefore would not be detected by our analysis. Second, our analysis was limited to transcripts found at least as frequently as 0.3 copies per cell. Transcripts expressed in only a minute fraction of the cell cycle, or transcripts expressed in only a fraction of the cell population, would not be reliably detected by our analysis.
WO 98/32847 PCT/US98/01216 Finally, mRNA sequence data are practically unavailable for yeast, and consequently, some SAGE tags cannot be unambiguously matched to corresponding genes. Tags which were derived from overlapping genes, or genes which have unusually long 3' untranslated regions may be misassigned.
Increased availability of 3' UTR sequences in yeast mRNA molecules should help to resolve the ambiguities.
Despite these potential limitations, it is clear that the analyses described here furnish both global and local pictures of gene expression, precisely defined at the nucleotide level. These data, like the sequence of the yeast genome itself provide simple, basic information integral to the interpretation of many experiments in the future. The availability of mRNA sequence information from EST sequencing as well as various genome projects, will soon allow definition oftranscriptomes from a variety of organisms, including human. The data recorded here suggest that a reasonably complete picture of a human cell transcriptome will require only about 10 20 fold more tags than evaluated here, a number well within the practical realm achievable with a small number of automated sequencers. The analysis of global expression patterns in higher eukaryotes is expected, in general, to be similar to those reported here for S. cerevisiae. However, the analysis of the transcriptome in different cells and from different individuals should yield a wealth of information regarding gene function in normal, developmental, and disease states.
Experimental Procedures Yeast cell culture The source of transcripts for all experiments was S. cerevisiae strain YPH499 (MATa ura3-52 lys2- 8 01 ade2-101 leu2-Al his3-A200 trpl-A63) (Sikorski and Hieter, 1989). Logarithmically growing cells were obtained by growing yeast cells to early log phase (3 x 106 cells/ml) in YPD (Rose et al., 1990) rich medium (YPD supplemented with 6mM uracil, 4.8 mM adenine and 24 mM tryptophan) at 30*C. For arrest in the G1/S phase of the cell cycle, SWO 98/32847 PCT/US98/01216 hydroxyurea (0.1M) was added to early log phase cells, and the culture was incubated an additional 3.5 hours at 30*C. For arrest in the G2/M phase of the cell cycle, nocodazole (15ug/ml) was added to early log phase cells and the culture was incubated for an additional 100 minutes at 30°C. Harvested cells were washed once with water prior to freezing at -70 0 C. The growth states of the harvested cells were confirmed by microscopic and flow cytometric analyses (Basrai et al., 1996).
RNA isolation and Northern Blot Analysis Total yeast RNA was prepared using the hot phenol method as described (Leeds et al., 1991). mRNA was obtained using the MessageMaker Kit (Gibco/BRL) following the manufacturer's protocol. Northern blot analysis was performed as described (El-Deiry et al., 1993), using probes PCR amplified from yeast genomic DNA.
SAGE protocol The SAGE method was performed as previously described (Velculescu et al., 1995), with exceptions noted below. PolyA RNA was converted to doublestranded cDNA with a BRL synthesis kit using the manufacturer's protocol except for the inclusion of primer biotin-5'-T-3'. The cDNA was cleaved with NlalII (Anchoring Enzyme). As NIaIII sites were observed to occur once every 309 base pairs in three arbitrarily chosen yeast chromosomes (1, 10), 95% of yeast transcripts were predicted to be detectable with a NlalIIbased SAGE approach. After capture of the 3' cDNA fragments on streptavidin coated magnetic beads (Dynal), the bound cDNA was divided into two pools, and one of the following linkers containing recognition sites for BsmFI was ligated to each pool: Linker 1, TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG-3' S E D ID N 1 5
TCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC
[amino mod. C7]-3'(SED ID Linker WO 98/32847 PCT/US98/01216 TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3' S E D ID N O 3 TCCCCGTACATCGTTAGAAGCTTGAATTCGAGCAG[amino mod. C7]- 3' (SED ID NO:4).
As BsmFI (Tagging Enzyme) cleaves 14 bp away from its recognition site, and the NiaIII site overlaps the BsmFI site by 1 bp, a 15 bp SAGE tag was released with BsmFI. SAGE tag overhangs were filled-in with Klenow, and tags from the two pools were combined and ligated to each other. The ligation product was diluted and then amplified with PCR for 28 cycles with 5'-GGATTTGCTGGTGCAGTACA-3' (SED ID NO:5) and CTGCTCGAATTCAAGCTTCT-3' (SED ID NO:6), as primers. The PCR product was analyzed by polyacrylamide gel electrophoresis (PAGE), and the PCR product containing two tags ligated tail to tail (ditag) was excised. The PCR product was then cleaved with NlaIl, and the band containing the ditags was excised and self-ligated. After ligation, the concatenated products were separated by PAGE and products between 500 bp and 2 kb were excised.
These products were cloned into the SphI site of pZero (Invitrogen).
Colonies were screened for inserts by PCR with M13 forward and M13 reverse sequences located outside the cloning site as primers.
PCR products from selected clones were sequenced with the TaqFS DyePrimer kits (Perkin Elmer) and analyzed using a 377 ABI automated sequencer (Perkin Elmer), following the manufacturer's protocol. Each successful sequencing reaction identified an average of 26 tags; given a sequencing reaction success rate, this corresponded to an average of about 850 tags per sequencing gel.
SAGE data analysis Sequence files were analyzed by means of the SAGE program group (Velculescu et al., 1995), which identifies the anchoring enzyme site with the proper spacing and extracts the two intervening tags and records them in a database. The 68,691 tags obtained contained 62,965 tags from unique WO 98/32847 PCT/US98/01216 ditags and 5,726 tags from repeated ditags. The latter were counted only once to eliminate potential PCR bias of the quantitation, as described (Velculescu et al., 1995). Of 62,965 tags, 2,332 tags corresponded to linker sequences, and were excluded from further analysis. Of the remaining tags, 4,342 tags could not be assigned, and were likely due to sequencing errors (in the tags or in the yeast genomic sequence). If all of these were due to tag sequencing errors, this corresponds to a sequencing error rate of about 0.7% per base pair (for a O0bp tag), not far from what we would have expected under our automated sequencing conditions. However, some unassigned tags had a much higher than expected frequency of A's as the last five base pairs of the tag (5 of the 52 most abundant unassigned tags), suggesting that these tags were derived from transcripts containing anchoring enzyme sites within several base pairs from their polyA tails. Given the frequency of NlaII sites in the genome (one in 309 base pairs), approximately 3% of transcripts were predicted to contain NIaII sites within 10 bp of their polyA tails.
As very sparse data are available for yeast mRNA sequences and efforts to date have not been able to identify a highly conserved polyadenylation signal (rniger and Braus, 1994; Zaret and Sherman, 1982), we used 14 bp of SAGE tags the NlaII site plus the adjacent 10 bp) to search the yeast genome directly (yeast genome sequence obtained from the Stanford yeast genome ftp site (genome-ftp.stanford.edu) on August 7, 1996). Because only coding regions are annotated in the yeast genome, and SAGE tags can be derived from 3' untranslated regions of genes, a SAGE tag was considered to correspond to a particular gene if it matched the ORF or the region 500 bp 3' of the ORF (locus names, gene names and ORF chromosomal coordinates were obtained from Stanford yeast genome ftp site, and ORF descriptions were obtained from MIPS www site (http://www.mips.biochem. mpg.de/) on August 14, 1996). ORFs were considered genes with known functions if they were associated with a three letter gene name, while ORFs without such designations were considered uncharacterized.
As expected, SAGE tags matched transcribed portions of the genome WO 98/32847 PCTIUS98/01216 in a highly non-random fashion, with 88% matching ORFs or their adjacent 3' regions in the correct orientation (chi-squared P value <10' 0 In instances when more than one tag matched a particular ORF in the correct orientation, the abundance was calculated to be the sum of the matched tags (for Figure 2, Figure 3, and Figure Tags that matched ORFs in the incorrect orientation were not used in abundance calculations. In instances when a tag matched more than one region of the genome (for example an ORF and non- ORF region) only the matched ORF was considered. In some cases the base of the tag could also be used to resolve ambiguities. For Figure 4, only tags that matched the genome once were used.
For the identification ofNORF genes, only tags were considered that matched portions of the genome that were further than 500 bp 3' of a previously identified ORF, and were observed at least two times in the SAGE libraries.
Table 1. Highly expressed genes
X
GGTGTTAACG
AGACAAACTG
TACCACTCCT
GGTTTCGGTT
TTGCCAGTCT
GGTGAAAACG
ATCGCCGCTC
GGTGCTAAGA
TTAGTTTCTA
TCTCTACTGG
GGTTTTGGTT
GGTCCAGCTT
AATCCAGTTG
TTCGTTCACT
AACAGACCAG
CTGCTCTGGG
GCAATACTAC
GCTCTCCCCC
N) AAAGACAGAG
TGTCGTGGTG
CCAAGGGTAr
TCTCCAGAAG
GTTTTCT
ATCACTGGTG
ATGAAGGTTC
GTAGAGCCGG
GGTACTGATG
CCAGATTTGT
GTGCCGTCCA
CAAAACCCAA
TDH2ITDH3 TEFi TEF2 ENO2 RPLA, A2, A3, b0E
PDCI
ADH, ADH2 GPM1
FBAI
RPL47A
PGK
RPLA4
SSMIA/SSMIB
RPL5A I RPL5B RPLI6A I RPLIB
CUPIAICUPIB
RPS31A RPL2A I RPL2B RPS28A RPL35B
PYKI
RPL9A RPL9B RPL27A RPS21 RPL43A NABIA /NABB URP1A RPS18EB YJROO9CNGR192C YPRO8OWIYBR1 18W YHrR1 74W YDL81 CJYOLO39 WIDL 3OWNLR34O YLRO44C YOL086CIYMR303C YKL1 52C YKL060C YDL1 84C YCRO12W YDR382W YPL22OW I YGLI 35W YILOi 8WI YFRO31AC NORFi YPR1 02C I YGRO85C YHRO53C IYHR055C YOR293W YMR230W NORF2 YGR027C YBR031W I YDR01 2W YGR1 18W YDR500C YAL038W YGL1 47C I YNLO67W YHROI OW YOL4OC YDL075W YGR214W I YLR048W YBRI91W YML026C glyceraldehyd3-phosphate dehydrogenase 2 3 cytosolic elongation factor eEF-l alpha-A chain 2-phosphoglycerate dehydratase acidic ribosomal protein all/ P2.beta I L44prime I LO pyruvate decarboxylase isozyme 1 alcohol dehydrogenase I111 phosphoglycerate mutase fructose-bisphosphate aldolase II ribosomal protein phosphoglycerate kinase acidic ribosomal protein ribosomal protein ribosomal protein nonannotated ORF ribosomal protein metallothionein ribosomal protein S10 I similarity to. ribosomal protein Si 0 nonannotated ORF ribosomal protein ribosomal protein ribosomal protein ribosomal protein pyruvate kinase ribosomal protein L9 ribosomal protein L27 ribosomal protein ribosomal protein L31 40S ribosomal protein p40 homolog A ribosomal protein L21 ribosomal protein Si Table 2. Putative coding sequences
TTGAACTACC
TTCGGGTCAC
CCAGATATGA
TTTAAAATGG
GGTGTCGTTG
TACTCTTCGC
TGTAATTAAA
GGAGATCTTG
TCAAGAAGTT
AAAAACTTTG
AAGTTGAACA
GGGTGCGGGT
TGACTCTTTG
GGTCAATGGC
TAAGAATTCT
TCAATTATGT
ACGGCCAAGA
TTGGGCTAGT
CCTTCCAGGT
CCTCTCTTGT
CCCAAAACTT
AACAAGTACT
AACAATAAAA
CAAAAGACCG
GGTTTTGAT
CAATCCATTT
TTTTGGGTCT
AACTGTCCAT
CCAAGGTTAA
GGTTTTTGAA
YKL56C YDR276C YILO93C YMR1 16C YBRO78W YELO33W YOR182C YCRO13C YERO56AC YILO51C YPRO43W YDRO32C YLR390W YJR105W YJL158C YDRO33W YBR62C YJL171C YJRO85C YOR310C YEL018W YGLO37C YERO72W YML056C YOR182C YBR106W YMR318C YDR429C YAR002AC YOR273C strong similarity to human IgE-dependent histamine-releasing factor (21K tumor protein) strong similarity to Hordeum vulgare bitlOl protein hypothetical protein similarity to N.crassa CPC2 protein strong similarity to sporulation specific Sps2p hypothetical protein homology to human ubiquitin-like protein/ribosomal protein weak similarity to M.lepra B1496_F1_41 protein strong similarity to ribosomal protein L34 strong similarity to YERO57c ribosomal protein L37 strong similarity to YCR004c and S.pombe obri hypothetical protein hypothetical protein member of the PirlpIHspl5OplPir3p family strong similarity to putative heat shock protein YRO2 similarity to YJL171p similarity to YBR162c hypothetical protein homology to SIK1 protein weak similarity to similarity to E.coli hypothetical 23K protein similarity to YFLOO4w homology to human IMP dehydrogenase I homology to human ubiquitin-like proteinlribosomal protein hypothetical protein putative alcohol-dehydrogenase similarity to nuclear RNA binding proteins strong similarity to YGL002w putative resistance protein Table 3. NORF genes TTCGTTCACT NORF1 94 4 1489450 198 GCTCTCCCCC NORF2 73 16 75633 243 TGTACGCATT NORF3 16 15 301251 189 TTTTATTATC NORF4 15 6 223182 177 CTTCTCTTTT NORF5 12 13 158973 204 TTTCCTATAA NORF6 11 13 511754 252 TCTAGTCGCC NORF7 10 12 669659 192 ATCGTTTTAT NORF8 8 15 877140 174 GGCCAATGGT NORF9 8 4 1202289 267 ACCCTGTCAT NORF10 7 2 418633 255 AAAAGATCAT NORF10 7 4 1489453 87 CAGAAAATGG NORF12 6 8 115655 279 TGACATCTT NORF13 6 16 883669 183 TAGACATCTA NORF14 6 2 491117 141 TGCCCTGGCC NORF15 5 5 166452 216 SGGTTTTGGCG NORF16 4 3 24169 291 CCATACAGGT NORF17 4 12 673851 114 CCAAATCAAA NORF18 3 4 229494 258 AAGCGGTACT NORF19 3 9 47889 399 AACGCTTTTC NORF20 3 2 351456 198 GAGGATAGAG NORF21 3 2 356201 240 CAATGAACCG NORF22 3 16 75541 243 TCTTATATA NORF23 3 1 73363 CGCCTCCAGT NORF24 3 7 485774 108 TACGTAAGTT NORF25 3 10 156139 81 GATTTAAACT NORF26 3 15 254749 93 GCGCCTCCMAA NORF27 2 5 42622 222 CAATGGCCCA NORF28 2 13 511751 78 TTGAGGAACG NORF29 2 3 154681 264 GCTAAGAACC NORF30 2 4 302607 204 WO 98/32847 PCT/US98/01216 TABLE 4 Additional NORFs :~C:iTaa Pos i GGCGCAATTT 4 TAAGTGATGA 7 TTGTTGAATT 10 GAAGCAGTAA 3 ACATATGTTA 4 CCCTACACGG 6 GTAATTGGAC 10 ATCAGACAAA 14 TTATGAAAGA 15 ATTCG1TCTA 15 AGCAGGAGTT 16 TTCTATTAGG 2 TGGATTTCAG 4 CAGATATAAT 5 CTGTTTTGGG 11 CATTTTTAGT 11 TTGAAAAGAT 13 TAAGCCCATC 13 AGCGTCCTCA 15 TTTAGTTAAT 2 ATGGTAGCCA 3 AATTAGACTA 3 AGTGACTCTT 4 GGACTATAAG 5 ACTTTTTCAG 10 GTCATATAGT 13 CAACAAAGTG 13 GTGGGAAAGG 13 TACTTTATAT 16 AATACCAGCG 3 GCCTTGTATA 4 GGTACATTCA 5 GATTTCTCTG 5 TAGTTGCTCC 7 GTAAGAAATC 7 CTTGGGCTAT 8 AAATGGTGAT 11 ATCATTTGGG 12 CTGAACTTTA 12 CCAGAAGGAG 13 CCGGTTACTA 15 CGATGAGAAG 15 AAACCGTCCC 16 TCATTCATAC 2 TATCTTTG 4 TTAGAATAAT 4 GTACGCTGTG 5 TATATTAATT 6 1108395 593382 608373 155607 916112 223289 392099 687272 81263 841970 188350 418749 1224930 52488 374761 508212 104160 251273 832420 477623 56961 162589 1490879 251266 159213 158765 171166 804600 366449 175540 372624 67152 187462 317108 836202 107992 558686 199358 283720 652873 803663 1004369 199141 164728 169784 603508 118089 64228 WO 98/32847 PCTIUS98/01216 GTTCTTGCCT 7 939579 1 ATATAGCTGC 10 181144 1 CCAAAAAAAA 11 91785 1 GAACTCCACA 11 94125 1 CCTTCACTGC 11 374172 1 CACATCATAA 11 625896 1 GAAGTATTGA 12 603999 1 TGCGCGTATA 13 206410 1 GGGTAGTACT 13 671730 1 TAGTTTTGTC 15 33475 1 CAATTCCTAC 1 172182 0.8 TTTGATTTGA 2 46431 0.8 GGCTCTGGTT 2 414510 0.8 CAGAAATAGC 2 565130 0.8 CTGTTATTTTr 2 616054 0.8 CGAAGTCAAA 2 680605 0.8 CTCTAGATAA 3 171584 0.8 AGTCAAAATG 4 192750 0.8 GCGAGTTTAG 4 691301 0.8 GCTCCAATAG 4 1131020 0.8 TTTATTTGAG 4 1237501 0.8 GTTATATTGA 4 1401803 0.8 TGGGTTGAAG 5 251266 0.8 ATTTTATTTG 5 447729 0.8 ATCATAAAAA 5 548612 0.8 TTATATAAAA 6 223182 0.8 CTACTTCTGC 8 34653 0.8 ATAAGACAGT 10 227802 0.8 TTCATAAGTT 10 471894 0.8 TAAATCTGAG 11 145617 0.8 CTGGTAGAAA 11 151174 0.8 CACGTACACA 11 403208 0.8 CCAAGATCAA 11 425882 0.8 AGCTTGTTCC 12 234966 0.8 CACATTCGTT 12 759953 0.8 CTTACATATA 12 789781 0.8 TCTATAGCAA 13 228936 0.8 CCTTTCTGAA 13 297985 0.8 CCTTTAGAAT 13 777999 0.8 AATTAACACC 13 842122 0.8 GCGCAGGGGC 14 440984 0.8 TGTTTATAAA 14 661710 0.8 AAAAGTCATT 15 32081 0.8 TTCGTAAACT 15 680625 0.8 ITITTITGGAGT 15 888343 0.8 AGGCATCTTG 16 250284 0.8 AAATCAAAAC 16 453890 0.8 AATTGACGAA 16 560169 0.8 TTGATGATTT 16 582360 0.8 CCTGTTTTTG 16 643476 0.8 TTTTTAAAAA 1 101436 WO 98/32847 WO 9832847PCTIUS98/01216 AAGTTTGATC 1 199848 AGCACCTATG 2 46913 TGATTTATCC 2 418946 ACTGCATCTG 2 680860 CAAGTTAGGA 2 744770 ATACCCAATT 3 29939 AACTGTAT 3 30056 GCGGCGGGTG 3 41645 AAAATTGTTC 3 57108 TCAAGTACTC 3 157855 AACTGTATGC 3 223882 CTATCGGCCA 3 278840 ACAAGCCCAA 3 289917 GTACAGGGCT 4 93873 AAGATCATCG 4 254851 GAACTCCTGG 4 340891 GAACGAGAAG 4 371850 TTTTTAATAC 4 372058 TCTCCAGTTG 4 381712 AATACGTTAC 4 471791 ACGATTGGCT 4 509158 TGTTTATAAG 4 521709 CGTTTTCGTC 4 538839 TCGAACCTCT 4 578702 TCCACACACA 4 930972 CCGTGCGTGC 4 1324367 TTTCTTCAAC 5 116099 CCAAGTCTCG 5 159320 AGAGCGAATT 5 207517 TGTAGATTAT 5 280465 AAAAGTAGTT 5 286387 ACTTGGTATG 5 422942 TTAATGTTAT 5 544523 TACACGCGCG 5 544555 GGTCACTCCT 6 62983 AAGTGATGAA 6 76141 TTTATCTTGT 6 130327 AGTGATTGTT 6 256223 GCTTTGTTGT 7 72577 TCATTGATTC 7 110590 TTCACCGGAA 7 323655 ACTATTCTGT 7 423957 GGGCCAACCC 7 433787 AAAATATCTT 7 559397 TAGTAGTAAC 7 622201 AAGCGCACAA 7 735909 TCGCTGTTTT 7 800300 TGTATTTTTG 7 836202 CTAAACAAAG 7 836587 TAGGAAGAAA 7 905046 GGAAAAATTA 7 958839 VWO 98/32847 PCT/US98/01216 TTTGGATAGT 7 974754 CGTTTGTGTA 8 202655 AGAAAAAAAC 8 386651 TAAAGTCCAG 8 518998 TAAGCAGATT 8 529129 ATGAGCATT 9 97114 AGGTGCAAAA 9 229077 TAACAAAGAG 10 628227 CAATTGGCAA 10 721781 ACTCCCTGTA 11 93528 CTCTATTGAT 11 144281 GCTTTCCTTT 11 146665 ACCGCAAAGA 11 231872 CTTGTTCAAA 12 230972 AATGTGCTGT 12 320426 GCAGATAGCG 12 341324 TCTGACTTAG 12 368780 CCCGGATGTT 12 433912 GTAACGATTG 12 449917 GAATAACGAA 12 673851 ACTGCTATTT 12 712476 GTTCTCTAGC 12 712712 CATCACCATC 12 794710 TTGCACTTCT 12 806833 ACTGTTTATG 12 867350 TTGCTATATA 12 1017911 TACATTCTAA 13 95707 CTCTTAGTTG 13 158970 ACGAACACTT 13 278341 TGCGCAAGTC 13 283795 TTTTTCTTAA 13 363037 CAAATGCATT 13 390802 CAAATTGTGT 13 395599 GCAATACTAT 13 826521 AGTGACGATG 14 60143 TACTGGTTTA 14 118854 GTTTGACCTA 14 335512 AGCGTTTGAT 14 478481 CTCTGTTGCG 14 728251 AAATTCAAAA 15 35952 TTTGCTTGGT 15 242742 AGTTTTCCTG 15 304813 TTTAAAGATA 15 331453 AAGGAGACAC 15 448624 CTATATATCA 15 544530 GATGGAATAG 15 571210 TCGAGTCGAA 15 758202 AAAAAAGAAA 15 882567 TTTCCAGAAT 15 969884 TGGACAATGT 15 970607 GGAATTAAGA 15 979894 -WO 98/32847 PCT/US98/01216 ACTATATGTT 16 582230 GATATATOAT 16 589647 AGAATTGATT 16 744406 CACTGTCTCC 16 824649 WO 98/32847 PCT/US98/01216 References Archer, J. Vega, L. and Solomon, F. (1995). Rbl2p, a yeast protein that binds to beta-tubulin and participates in microtubule function in vivo.
Cell 82, 425-434.
Bajwa, Torchia, T. and Hopper, J. E. (1988). Yeast regulatory gene GAL3: carbon regulation; UASGal elements in common with GAL1, GAL2, GAL7, GALl0, GAL80, and MEL1; encoded protein strikingly similar to yeast and Escherichia coli galactokinases. Mol Cell Biol 8, 3439-3447.
Basrai, M. Kingsbury, Koshland, Spencer, and Hieter, P.
(1996). Faithful chromosome transmission requires Spt4p, a putative regulator ofchromatin structure in Saccharomyces cerevisiae. Mol Cell Biol 16, 2838-2847.
Bishop, J. Morton, J. Rosbash, and Richardson, M. (1974). Three abundance classes in HeLa cell messenger RNA. Nature 250, 199-204.
Burkholder, A. and Hartwell, L. H. (1985). The yeast alpha-factor receptor: structural properties deduced from the sequence of the STE2 gene.
Nucleic Acids Res 13, 8463-8475.
Chambers, Tsang, J. Stanway, Kingsman, A. and Kingsman, S.
M. (1989). Transcriptional control of the Saccharomyces cerevisiae PGK gene by RAP1. Mol Cell Biol 9, 5516-5524.
Denis, C. Ferguson, and Young, E. T. (1983). mRNA levels for the fermentative alcohol dehydrogenase of Saccharomyces cerevisiae decrease upon growth on a nonfermentable carbon source. J Biol Chem 258, 1165- 1171.
WO 98/32847 PCT/US98/01216 Dick, Surana, and Chia, W. (1996). Molecular and genetic characterization of SLC1, a putative Saccharomyces cerevisiae homolog of the metazoan cytoplasmic dynein light chainl. Mol Gen Genet 251, 38-43.
El-Deiry, W. Tokino, Velculescu, V. Levy, D. Parsons, R., Trent, J. Lin, Mercer, W. Kinzler, K. and Vogelstein, B.
(1993). WAFI, a potential mediator of p53 tumor suppression. Cell 75, 817- 825.
Elledge, S. and Davis, R. W. (1989). DNA damage induction of ribonucleotide reductase. Mol Cell Biol 9, 4932-4940.
Goffeau, Barrell, Bussey, Davis, Dujon, Feldmann, Galibert, Hoheisel, Jacq, Johnston, Louis, Mewes, Murakami, Philippsen, Tettelin, and Oliver, S.G. (1996).
Life with 6000 genes. Science 274, 546-567.
Gottschling, D. Aparicio, O. Billington, B. and Zakian, V. A.
(1990). Position effect at S. cerevisiae telomeres: reversible repression of Pol 11 transcription. Cell 63, 751-762.
Hagen, D. McCaffrey, and Sprague, G. Jr. (1986). Evidence the yeast STE3 gene encodes a receptor for the peptide pheromone a factor: gene sequence and implications for the structure of the presumed receptor. Proc Natl Acad Sci U S A 83, 1418-1422.
Hereford, L. and Rosbash, M. (1977). Number and distribution of polyadenylated RNA sequences in yeast. Cell 10, 453-462.
Irniger, and Braus, G. H. (1994). Saturation mutagenesis of a polyadenylation signal reveals a hexanucleotide element essential for mRNA WO 98/32847 PCT/US98/01216 3' end formation in Saccharomyces cerevisiae. Proc Nati Acad Sci U S A 91, 257-261.
Iyer, and Struhl, K. (1996). Absolute mRNA levels and transcriptional initiation rates in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 93, 5208-5212.
Kurjan, and Herskowitz, I. (1982). Structure of a yeast pheromone gene (MF alpha): a putative alpha-factor precursor contains four tandem copies of mature alpha-factor. Cell 30, 933-943.
Leeds, Peltz, S. Jacobson, and Culbertson, M. R. (1991). The product of the yeast UPF1 gene is required for rapid turnover of mRNAs containing a premature translational termination codon. Genes Dev 5, 230 3- 2314.
Lewin, B. (1980). Gene Expression 2, (New York, New York: John Wiley and Sons), pp. 694-727.
McAlister, and Holland, M. J. (1982). Targeted deletion of a yeast enolase structural gene. Identification and isolation of yeast enolase isozymes. J Biol Chem 257, 7181-7188.
Michaelis, and Herskowitz, I. (1988). The a-factor pheromone of Saccharomyces cerevisiae is essential for mating. Mol Cell Biol 8, 1309-1318.
Mushegian, A. and Koonin, E. V. (1996). A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl. Acad.
Sci. USA 93, 10268-10273.
WO 98/32847 PCT/US98/01216 Nguyen, Rocha, Granjeaud, Baldit, Bernard, Naquet, P., and Jordan, B. R. (1995). Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones. Genomics 29, 207-216.
Nishizawa, Araki, and Teranishi, Y. (1989). Identification of an upstream activating sequence and an upstream repressible sequence of the pyruvate kinase gene of the yeast Saccharomyces cerevisiae. Mol Cell Biol 9,442-451.
Renauld, Aparicio, O. Zierath, P. Billington, B. Chhablani, S.
and Gottschling, D. E. (1993). Silent domains are assembled continuously from the telomere and are defined by promoter distance and strength, and by SIR3 dosage. Genes Dev 7, 1133-1145.
Rose, M. Winston, and P. Hieter. (1990). Methods in Yeast Genetics.
(Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press), pp.
177.
Schena, Shalon, Davis, R. and Brown, P. 0. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470.
Schmitt, H. Ciriacy, and Zimmermann, F. K. (1983). The synthesis of yeast pyruvate decarboxylase is regulated by large variations in the messenger RNA level. Mol Gen Genet 192, 247-252.
Sikorski, R and Hieter, P. (1989). A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19-27.
WO 98/32847 PCT/US98/01216 Singh, Chen, E. Lugovoy, J. Chang, C. Hitzeman, R. and Seeburg, P. H. (1983). Saccharomyces cerevisiae contains two discrete genes coding for the alpha-factor pheromone. Nucleic Acids Res 11, 4049-4063.
Smith, M. and Murray, K. (1983). Yeast H3 and H4 histone messenger RNAs are transcribed from two non-allelic gene sets. J Mol Biol 169, 641- 661.
St John, T. and Davis, R. W. (1979). Isolation of galactose-inducible DNA sequences from Saccharomyces cerevisiae by differential plaque filter hybridization. Cell 16, 443-452.
10 Velculescu, V. Zhang, Vogelstein, and Kinzler, K. W. (1995).
Serial analysis of gene expression. Science 270, 484-487.
Zaret, K. and Sherman, F. (1982). DNA sequence required for efficient transcription termination in yeast. Cell 28, 563-573.
S. Where the terms "comprise", "comprises", "comprised" or "comprising" are used in this specification, they are to be interpreted as specifying the presence of the stated features, integers, steps or components referred to, but not to preclude the presence or addition of one or more other feature, integer, step, component or group thereof.
*a

Claims (34)

1. An isolated cDNA molecule comprising an open reading frame of a Saccharomyces cerevisiae which is involved in cell cycle progression selected from the group consisting of NORF No. 1, 2, 5, 6, 17, and
2. A method of using yeast genes to affect the cell cycle, comprising the step of: administering to a cell an isolated DNA molecule comprising an open reading frame of a Saccharomyces cerevisiae which is involved in cell cycle progression selected from the group consisting of NORF No. 1, 2, 6, 17 and
3. The method of claim 2, wherein the cell is a yeast cell.
4. The method of claim 2, wherein the cell is a fungal cell. The method of claim 2, wherein the cell is a mammalian cell.
6. A method for screening candidate antifungal drugs, comprising the steps of: .contacting a test substance with a yeast cell; monitoring expression of an open reading frame of a Saccharomyces cerevisiae which is involved in cell cycle progression selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and 25, wherein a test substance which modifies the expression of the open reading frame of a Saccharomyces cerevisiae is a candidate antifungal drug.
7. A method for identifying human genes which are involved in cell Scycle progression, comprising the step of: hybridizing a probe comprising at least 10 contiguous nucleotides of an open reading frame of a Savccharomyces cerevisiae which is differentially expressed between at least two phases selected from the group consisting of log phase, S phase, and G2/M phase, wherein the Saccharomyces cerevisiae open reading frame is selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and
8. A probe for ascertaining phase in the cell cycle of a cell, wherein the probe comprises at least 14 contiguous nucleotides of a NORF open reading frame selected from the group consisting of NORF No. 1, 2, 17 and
9. The method of claim 6 wherein said step of monitoring expression is performed using nucleic acid molecules which are immobilised on a solid support. The method of claim 9, wherein the nucleic acid molecules are in S. an array.
11. The method of claim 6 wherein a probe which comprises a portion of said Saccharomyces cerevisiae open reading frame is in an array on a solid support.
12. An array of probes on a solid support wherein at least one probe S. comprises at least 14 contiguous nucleotides of a NORF open reading *frame selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and
13. The array of claim 12 which comprises at least 100 probes of distinct sequence.
14. The array of claim 12 which comprises at least 500 probes of distinct sequence.
15. The array of claim 12 which comprises at least 1000 probes of distinct sequence.
16. An isolated cDNA molecule comprising an open reading frame of a :Saccharomyces cerevisiae which is involved in cell cycle progression selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
17. A method of using yeast genes to affect the cell cycle, comprising the step of: administering to a cell an isolated DNA molecule comprising an open reading frame of a Saccharomyces cerevisiae which is involved in cell cycle progression selected from the group consisting of NORF No. 1, 2, 6, 17 and 25, substantially as herein defined with reference to at least S: one of the accompanying Examples and/or Figures.
18. A method for screening candidate antifungal drugs, comprising the steps of: contacting a test substance with a yeast cell; monitoring expression of an open reading frame of a Saccharomyces Scerevisiae which is involved in cell cycle progression selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and 25, wherein a test **substance which modifies the expression of the open reading frame of a Saccharomyces cerevisiae is a candidate antifungal drug, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
19. A method for identifying human genes which are involved in cell cycle progression, comprising the step of: hybridizing a probe comprising at least 10 contiguous nucleotides of an open reading frame of a Saccharomyces cerevisiae which is differentially expressed between at least two phases selected from the group consisting of log phase, S phase, and G2/M phase, wherein the Saccharomyces cerevisiae open reading frame is selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and 25, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures. A probe for ascertaining phase in the cell cycle of a cell, wherein the probe comprises at least 14 contiguous nucleotides of a NORF open reading frame selected from the group consisting of NORF No. 1, 2, 6, 17 and 25, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
21. An array of probes on a solid support wherein at least one probe .comprises at least 14 contiguous nucleotides of a NORF open reading frame selected from the group consisting of NORF No. 1, 2, 5, 6, 17 and 25 substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
22. An isolated cDNA molecule consisting of the open reading frame of NORF No. 27. *23. A method of using yeast genes to affect the cell cycle, comprising the step of: administering to a cell an isolated DNA molecule consisting of the open reading frame of NORF No. 27. 38
24. The method of claim 23, wherein the cell is a yeast cell.
25. The method of claim 23, wherein the cell is a fungal cell.
26. The method of claim 23, wherein the cell is a mammalian cell.
27. A method for screening candidate antifungal drugs, comprising the steps of: contacting a test substance with a yeast cell; monitoring expression of the open reading frame of NORF No. 27, wherein a test substance which modifies the expression of the open reading frame of a Saccharomyces cerevisiae is a candidate antifungal drug.
28. A method for identifying human genes which are involved in cell S* cycle progression, comprising the step of: S*hybridizing a probe comprising at least 10 contiguous nucleotides of an open reading frame of a Savccharomyces cerevisiae which is differentially expressed between at least two phases selected from the group consisting of log phase, S phase, and G2/M phase, wherein the accharomyces cerevisiae open reading frame is the open reading frame of NORF No. 27.
29. A probe for ascertaining phase in the cell cycle of a cell, wherein the probe comprises at least 14 contiguous nucleotides of the open reading frame of NORF No. 27. The method of claim 27 wherein said step of monitoring expression is performed using nucleic acid molecules which are immobilised on a solid support. o 31. The method of claim 30, wherein the nucleic acid molecules are in an array. *I
32. The method of claim 27 wherein a probe which comprises a portion of said Saccharomyces cerevisiae open reading frame is in an array on a solid support.
33. An array of probes on a solid support wherein at least one probe comprises at least 14 contiguous nucleotides of the open reading frame of NORF No. 27.
34. The array of claim 33 which comprises at least 100 probes of distinct sequence. The array of claim 33 which comprises at least 500 probes of distinct sequence.
36. The array of claim 33 which comprises at least 1000 probes of distinct sequence.
37. An isolated cDNA molecule consisting of the open reading frame of NORF No. 27, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
38. A method of using yeast genes to affect the cell cycle, comprising the step of: administering to a cell an isolated DNA molecule consisting of the open reading frame of NORF No. 27, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
39. A method for screening candidate antifungal drugs, comprising the steps of: contacting a test substance with a yeast cell; monitoring expression of the open reading frame of NORF No. 27, wherein a test substance which modifies the expression of the open reading frame is a candidate antifungal drug, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures. S% 40. A method for identifying human genes which are involved in cell *cycle progression, comprising the step of: hybridizing a probe comprising at least 10 contiguous nucleotides of the open reading frame of NORF No. 27, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
41. A probe for ascertaining phase in the cell cycle of a cell, wherein the probe comprises at least 14 contiguous nucleotides of the open reading frame of NORF No. 27, substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures.
42. An array of probes on a solid support wherein at least one probe comprises at least 14 contiguous nucleotides of the open reading frame of NORF No. 27 substantially as herein defined with reference to at least one of the accompanying Examples and/or Figures. DATED this 4th day of May 2002 :THE JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE *l By their Patent Attorneys CALLINAN LAWRIE
AU59280/98A 1997-01-23 1998-01-22 Characterization of the yeast transcriptome Expired AU749606C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US3591797P 1997-01-23 1997-01-23
US60/035917 1997-01-23
PCT/US1998/001216 WO1998032847A2 (en) 1997-01-23 1998-01-22 Characterization of the yeast transcriptome

Publications (3)

Publication Number Publication Date
AU5928098A AU5928098A (en) 1998-08-18
AU749606B2 true AU749606B2 (en) 2002-06-27
AU749606C AU749606C (en) 2007-05-17

Family

ID=21885540

Family Applications (1)

Application Number Title Priority Date Filing Date
AU59280/98A Expired AU749606C (en) 1997-01-23 1998-01-22 Characterization of the yeast transcriptome

Country Status (5)

Country Link
EP (1) EP0970202A2 (en)
JP (1) JP2001509017A (en)
AU (1) AU749606C (en)
CA (1) CA2278645A1 (en)
WO (1) WO1998032847A2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7504493B2 (en) 1997-01-23 2009-03-17 The John Hopkins University Characterization of the yeast transcriptome
AU5485600A (en) * 1999-06-16 2001-01-02 Johns Hopkins University, The Characterization of the yeast transcriptome
FR2821087B1 (en) * 2001-02-16 2004-01-02 Centre Nat Rech Scient PROCESS FOR QUALITATIVE AND QUANTITATIVE ANALYSIS OF A POPULATION OF NUCLEIC ACIDS CONTAINED IN A SAMPLE
DE10160660A1 (en) 2001-12-11 2003-06-18 Bayer Cropscience Ag Polypeptides to identify fungicidally active compounds
US20060147926A1 (en) 2002-11-25 2006-07-06 Emmert-Buck Michael R Method and apparatus for performing multiple simultaneous manipulations of biomolecules in a two-dimensional array
ATE469172T1 (en) * 2004-07-23 2010-06-15 Ge Healthcare Uk Ltd CELL CYCLE PHASE MARKERS
CN108348556A (en) * 2015-11-02 2018-07-31 欧瑞3恩公司 Cell-cycle arrest improves the efficiency for generating induced multi-potent stem cell

Also Published As

Publication number Publication date
EP0970202A2 (en) 2000-01-12
AU749606C (en) 2007-05-17
JP2001509017A (en) 2001-07-10
CA2278645A1 (en) 1998-07-30
WO1998032847A3 (en) 1998-11-26
WO1998032847A2 (en) 1998-07-30
AU5928098A (en) 1998-08-18

Similar Documents

Publication Publication Date Title
Velculescu et al. Characterization of the yeast transcriptome
US7504493B2 (en) Characterization of the yeast transcriptome
Brewster et al. An osmosensing signal transduction pathway in yeast
Meyerson et al. A family of human cdc2‐related protein kinases.
Ruby et al. Four yeast spliceosomal proteins (PRP5, PRP9, PRP11, and PRP21) interact to promote U2 snRNP binding to pre-mRNA.
Inada et al. One-step affinity purification of the yeast ribosome and its associated proteins and mRNAs
EP0692025B1 (en) Yeast cells engineered to produce pheromone system protein surrogates, and uses therefor
Garcia-Barrio et al. GCD10, a translational repressor of GCN4, is the RNA-binding subunit of eukaryotic translation initiation factor-3.
Kasten et al. Identification of the Saccharomyces cerevisiae genes STB1–STB5 encoding Sin3p binding proteins
WO2000077214A2 (en) Characterization of the yeast transcriptome
AU749606B2 (en) Characterization of the yeast transcriptome
Mosrin et al. The RPC31 gene of Saccharomyces cerevisiae encodes a subunit of RNA polymerase C (III) with an acidic tail
US20030073163A1 (en) Libraries of expressible gene sequences
Vollmer et al. [15] High expression cloning, purification, and assay of Ypt—GTPase-activating proteins
Naitou et al. Expression profiles of transcripts from 126 open reading frames in the entire chromosome VI of Saccharomyces cerevisiae by systematic northern analyses
Kang et al. Ordered differential display from Cryphonectria parasitica
Boles Yeast as a model system for studying glucose transport
AU5735499A (en) Drug targets in candida albicans
JP2002525073A (en) C. Albicans-derived essential gene and method for screening antifungal substance using the gene
KR100239143B1 (en) Transcription mediator protein (Med6p), its mutant protein (med6p) and genes encoding the mutant protein and strains transformed by wild type and mutant genes
Pereira Aromatic amino acid biosynthesis in Candida albicans
AU2003212061A1 (en) Method for screening antimycotic substances using essential genes from S. Cerevisiae (2)
Hall et al. An HMG Protein, Hmo1, Associates with
Si Eucaryotic Translation Initiation Factor 6 (eIF6) and 60S Ribosome Biogenesis
Poon Genetic and biochemical analyses of the yeast TATA-binding protein

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)