WO1997010267A1

WO1997010267A1 - HUMAN SMCY cDNA AND RELATED PRODUCTS

Info

Publication number: WO1997010267A1
Application number: PCT/US1996/014547
Authority: WO
Inventors: Marijo G. Kent; Alexander I. Agulnik
Original assignee: Promega Corporation
Priority date: 1995-09-14
Filing date: 1996-09-13
Publication date: 1997-03-20
Also published as: EP0859790A1; AU7156896A; CA2238694A1; AU715449B2

Abstract

Disclosed herein are human SMCY cDNA, related products, and methods of making and using such products. The nucleotide sequence of a full-length form of the human SMCY cDNA of this invention is specifically disclosed, as are probes and primers to that sequence, and a protein encoded by that sequence. Also disclosed herein are examples demonstrating the use of the products of this invention in conducting a variety of different research studies, including a study of embryonic development in various different organisms, and a study characterizing the H-Y antigen in humans.

Description

HUMAN SMCY cDNA AND RELATED PRODUCTS

This application claims the benefit of U.S. Provisional Application No. 60/003,744, filed 09/14/95, and U.S. Provisional Application No. 60/012,973, filed 03/07/96.

FIELD OF THE INVENTION

The present invention relates to the fields of molecular, clinical and evolutionary biology. Specifically, the present invention relates to SMCY cDNA, genomic DNA, and proteins and their use in the analysis of SMCY derived histocompatibility antigens, and in the detection of evolutionary conservation and early embryonic expression of the SMCY gene.

BACKGROUND OF THE INVENTION

The Y chromosome of mammals is known to play a crucial role in male development. Numerous genetic factors and/or functions have been localized to the Y chromosome, including the expression and regulation of male specific minor histocompatibility antigens such as H-Y antigen, the serologically detected male Sdma antigen, spermatogenesis factors such as Spy, and the primary testis determinant (Sty) (Tiepolo et al. (1976) Hum. Genet 34: 1 1 9; McLaren et al. (1988) Proc. Natl. Acad. Sci. 85:6442; Roberts et al. (1988) Proc. Natl. Acad. Sci.

85:6446; Koopman et al. (1991 ) Nature 351 :1 17; Ma et al. (1 992) Hum. Mol. Genet. 1 (1 ):29).

Although many of the factors Iisted above have been known for some time, the identification and isolation of the genes encoding these proteins has remained elusive. One particular example is that of the male specific minor histocompatibility antigen H-Y. The H-Y antigen was first identified in 1 960 when it was observed that within an inbred mouse strain, male-to-female skin transplants were rejected but male-to- male or female-to-male skin grafts usually succeeded (Eichwald et al. (1955) Transplant. Bull. 2:148; Billingham et al. (1960 J. Immunol. 85:14). A similar human antigen was identified in 1977 (Goulmy et al.

(1977) Nature 266:544), and was eventually found to be ubiquitously expressed in human tissues (de Bueger et al. (1992) J. Immunol. 149:1788; van der Harst et al. (1994) Blood 83: 1060; Voogt et al. (1990) Lancet 335:131 ). The same H-Y related human antigen was also found to be involved in immune responses during various organ transplantations, blood transfusions, and pregnancy (for review see E. Goulmy (1988) Transplantation Reviews. Volume 2, pg. 29).

One group of researchers recently isolated a cDNA sequence encoding a mouse gene (mouse Smcy) which maps to the Sxr* region of the mouse Y chromosome. This region of the mouse chromosome is known to be involved in the expression and/or regulation of the H-Y antigen and spermatogenesis factors. (Agulnik et al. (1994) Hum. Mol. Genet. 3:873). Further evidence of the importance of this gene was shown by analysis of its expression. The mouse Smcy gene was found to be transcribed in all male mouse tissue, as well as in mouse preimplantation embryos, indicating a possible "housekeeping role" for the protein. (Id.)

The X chromosome homolog of the SMCY gene has also been isolated recently from mouse (Smcx) and human (SMCX or XE169) (Agulnik et al., supra; Agulnik et al. (1994) Hum. Mol. Genet. 3:879;

Wu et al. (1 994) Hum. Mol. Genet. 3: 153). These X homologs show a high degree of homology with the mouse Smcy gene at the amino acid level. (Id.)

The mouse Smcy cDNA sequence has been used to construct primers, which in turn have been used to amplify short stretches of homologous genomic DNA from human, mouse, and horse subjects, using the polymerase chain reaction (PCR). (Agulnik et al. (1994) Hum. Mol. Genet. 3:873.) Comparison of the sequences of the resulting amplified fragments of genomic DNA to the known sequence of the corresponding regions of mouse Smcy cDNA showed greater than 93% homology at the amino acid level. Further, analysis of the expression of the SMCY homolog in humans showed that, as in mouse, the gene is widely expressed in male tissues. This evidence of conservation with a high degree of homology in several species indicate that the SMCY protein likely performs important functions in a variety of different organisms, including in humans. Id. at 876.

Although, as noted above, the sequences or partial sequences of the Smcy and Smcx genes of several different species have been reported in the literature over the last several years, it has not been clear how useful any of that sequence information would be in evaluating the function, control and expression of other SMC homologs. One recent study, in particular, indicated the utility of such information is very limited. In that study, an evolutionary conservation study, a mouse Smcx probe failed to detect an Smcy homolog in cattle and rabbit (Agulnik et al. (1994) Hum. Mol. Genet. 3:879). The authors of that study concluded that the Y copy of the gene was probably no longer required and that the X copy, which escapes X inactivation in all species studied, had taken over its function. (Id.) This last study strongly indicates that mouse Smcx sequence information, in particular, would not be suitable for use in studying Smcx or Smcy homologs in other organisms. Graves, J.A.M. (1995) BioEssays 17(4):31 1 ; Jones et al.

(1995) Zygote 3:133.

Although the importance of the SMCY gene in humans and in many other organisms was appreciated even before the fragment composition comparison study cited immediately above, the human SMCY gene has not been previously isolated and sequenced or otherwise characterized. Without such characterization studies, human SMCYgene expression, mutation, and control can only be accomplished with great difficulty. Therefore, there is a need to isolate the human SMCY gene and to determine the sequence of that gene, as well as to construct probes and primers complementary to that gene for use in embryonic development and other research studies. Once known, the human SMCY gene sequence could be used to study how alterations in the sequence of that gene affect male fertility, to produce antibodies to human SMCY protein, to design possible male contraceptives, to evaluate evolutionary conservation, and to determine the sex of tissues.

SUMMARY OF THE INVENTION

One aspect of the present invention is cDNA encoding a human SMC Y protein. This first aspect of the present invention is, specifically, human SMCY cDNA, comprising a translation region encoding the amino acid sequence of a human SMCY protein. The translation region of the human SMCY cDNA of the present invention most preferably consists of nucleotides 276 to 4893 of SEQ ID NO: 1 .

A second aspect of the present invention is an oligonucleotide primer, consisting of a sequence of nucleotides homologous or complementary to the most preferred human SMCY cDNA of this invention, having the complete nucleotide sequence of SEQ ID NO: 1 .

A third aspect of the present invention is a recombinant human

SMCY protein produced by, and isolated from, a host organism containing the human SMCY cDNA of this invention. The human SMCY cDNA is preferably introduced into the host cell by transfection or transformation. The recombinant protein of this invention is most preferably a protein consisting of the amino acid sequence of SEQ ID NO:2.

A fourth aspect of the present invention is an oligonucleotide probe to human SMCY cDNA consisting of a sequence of nucleotides homologous or complementary to SEQ ID NO:1 . A fifth aspect of the present invention is a method of isolating an oligonucleotide probe to human SMCY genomic DNA, comprising: a. providing at least one primer consisting of a sequence of nucleotides homologous or complementary to SEQ ID NO:1 ; b. providing isolated human genomic DNA; and c. using at least one primer to amplify the isolated genomic

DNA, thereby producing an amplified oligonucleotide probe.

A fifth aspect of the present invention, related to the fourth aspect above, is an oligonucleotide probe to human SMCY genomic DNA produced according to that last method.

In an additional aspect, the present invention is a method of detecting SMC Y homologs comprising: a. providing a nucleic acid sample; b. providing a pair of oligonucleotide primers designed to flank a region of interest in the nucleic acid sample, wherein at least one of the primers consists of a sequence of nucleotides homologous or complementary to SEQ ID NO: 1 ; c. amplifying the nucleic acid sample with said primer pair, resulting in an amplified oligonucleotide; and d. detecting the amplified primers complementary to regions of human SMCY genomic or cDNA which are upstream and/or downstream of regions of interest.

It is anticipated that the SMCY cDNA, isolated gene, probes, primers, protein, and methods of this invention could be used for a wide variety of different purposes, including, but not limited to, the analysis of SMCY derived histocompatibility antigens, the detection of evolutionary conservation, the study of early embryonic expression of the SMCY gene, and studies related to mitosis such as studies of spermatogenesis and tumorgenesis. The present invention has specific use in the fields of evolutionary and developmental biology. Evolutionary biologists look for ways in which to examine the evolutionary relationship between different species of animals. The SMCY sequence of the present invention provides a new tool for such studies both by allowing more detailed and accurate comparison of the sequences of SMC homologs, both presently known and those that will be isolated in the future, as well as providing a superior probe to those currently available for assaying the DNA of diverse species for SMC homologs. Developmental biologists are interested in determining the mechanism by which cellular growth and differentiation occur during embryogenesis. This involves, in part, the elucidation of which genes are active at the various stages of embryonic development and what impact the expression of the genes has on the overall development of the embryo. The human SMCY cDNA sequence and/or isolated genomic DNA of the present invention can be used to ascertain the developmental stage at which SMC homologs are activated in embryos of diverse species.

Further objects, features, and advantages of the invention will be apparent from the following detailed description of the invention and the illustrative figures.

BRIEF DESCRIPTION OF THE FIGURES Figure 1 : A map showing the overlap of the human SMCY cDNA clones that were isolated and sequenced are depicted in relation to one another and to the SMCY mRNA.

Figure 2: A representative map of a Λgt10 clone containing an SMCVcDNA insert, including the location and orientation of primers A1 , A2, C1 , and C2.

Figure 3A: Southern blot of human DNA digested with Eco Rl and probed with human SMCY. The arrow indicates the male specific SMCY DNA bands.

Figure 3B: Southern blot of cattle DNA digested with Sac I and Pst I and probed with human SMCY. The arrow indicates the male specific Smcy DNA bands.

Figure 3C: Southern blot of rabbit, horse and monodelphus DNA digested with Sac I and Pst I and probed with human SMCY. The arrow indcates the male specific Smcy DNA bands.

Figure 4A: Comparison of partial rhesus Smcy cDNA and human SMCY cDNA sequences, with homologous sequences shaded.

Figure 4B: Comparison of partial Rhesus and Human SMCY Amino Acid Sequences, with homologous sequences shaded. Figure 5: Photograph of an ethidium bromide stained electrophoresis gel of DNA from Rhesus 4, 8, and 16 cell-stage embros amplified with a human SMCY primer pair and a rehsus Smcx primer pair.

Figure 6: Elution profiles of HPLC fractionated peptides extracted from HLA-B7 molecules showing which fractions contained reconstituted H-Y epitope: (A) after a single HPLC elution, or (B) after a second HPLC elution.

Figure 7: Plot of results of mass spectrometry and ⁵¹Cr release assay of the peak eluent fraction from the second HPLC column shown in Figure 6B.

Figure 8: Collision-activated dissociation (CAD) spectrum of peptide 1 171 from fraction 14 (Figure 7B) after converting the R residue to ornithine.

Figure 9: Plot of the results of a ⁵¹Cr release study of reconstitution of the H-Y epitope on incubation of T2-B7 cells with a synthetic peptide of SMCY, or of SMCX.

Figure 10: Plot of percent inhibition of the binding of an iodinated endogenous B7 peptide to purified HLA-B7 by various synthetic peptides, including an SMCY synthetic peptide. DETAILED DESCRIPTION

The following definitions are intended to assist in providing a clear and consistent understanding of the scope and detail of the terms used.

The term "gene" as used herein refers to a DNA sequence that codes for a bioactive protein or its precursor. The protein or precursor can be encoded by the full length gene sequence or any portion thereof so long as the bioactivity of the protein product is maintained.

The term "SMC homolog" as used herein refers to Smcx and Smcy genes and their corresponding mRNAs from any species of animal. The term "SMC locus" as used herein refers to a particular segment of an SMC homolog.

The term "upstream" is used herein refers to a nucleic acid sequence proceeding in the opposite direction from expression.

The term "downstream" is used herein to refer to a nucleic acid sequence proceeding in the same direction as expression.

The term "oligonucleotide" as used herein refers to a molecule composed of two or more deoxyribonucleotides or ribonucleotides which is either chemically synthesized or generated by restriction digestion of a DNA molecule. The term "primer" as used herein refers to an oligonucleotide that, when annealled to a complementary DNA or RNA under appropriate conditions, can serve as an initiation point for synthesis of a copy (primer extension product) of the DNA or RNA. Appropriate conditions include the presence of all four nucleotide triphosphates, a DNA polymerase, and buffer conditions suited to the performance of the particular polymerase chosen for the synthesis.

It will be understood that the term "primer," as used herein below, may refer to more than one primer. For example, in the case where the exact sequence of the end(s) of a locus to be amplified is not known, a plurality of primers will be synthesized and used that correspond to some or all of the possible sequences at the priming site of the fragment.

The term "primer pair" as used herein refers to primers which hybridize to opposing strands at opposite ends, and upstream, of an SMC locus.

The term "PCR" as used herein refers to the polymerase chain reaction technique in which cycles of denaturation, primer pair annealing, and extension with DNA polymerase are used to amplify the number of copies of a target nucleic acid. The polymerase chain reaction process for amplifying nucleic acids is described in U.S. Patent

Nos. 4,683,195 and 4,683,202, which are incorporated herein by reference for a detailed description of the process.

Once the sequence of the human SMCY cDNA of the present invention is disclosed herein, it will be possible to obtain the gene, primers, and probes of the present invention by standard methods used in genetic engineering. For examples of methods suitable for use in making the products of the present invention, see J. Sambrook, E.F. Frisch, & T. Maniatis, Molecular Cloning, 2nd ed., Cold Spring Harbor Laboratory (1989), incorporated herein by reference. Below is a summary of the preferred methods for making the human SMCY cDNA and the other products of this invention.

The sequence of the human SMCY cDNA of the present invention is preferably assembled from a set of clones selected and isolated from at least one human male cDNA library, preferably from a human male lymphocyte and/or testis cDNA library. The library used to assemble the human SMCY cDNA of this invention is preferably produced as follows. RNA is extracted from an appropriate biological material, preferably from human male lymphocyte or testis tissue, by any one of several methods well known in the art. (See for example Chirgwin et al. Biochemistry ( 1979) 18:5294 and Chomczynski et al. Anal. Biochem. (1987)

162:156). RNA suitable for use in constructing the libraries used to make the cDNA of this invention can also be extracted using any one of a number of commerically available kits for this purpose, including the RNAgents^® Total RNA Isolation System (Promega Corporation, Madison, Wisconsin). The most preferred methods of extracting RNA for use in constructing the present libraries involve the steps of homogenizing the tissue in a denaturing solution containing guanidine, extracting the homogenate with phenol/chloroform, and then precipitating the total RNA from the resulting aqueous phase with isopropanol. Isolation of the mRNA fraction of the total RNA pool is then accomplished by passing the RNA over an oligo dT cellulose column or by binding to other such dT or dU containing material (Aviv et al. Proc. Natl. Acad. Sci. (1972) 69: 1408). Other oligo dT support material suitable for use in isolating the mRNA fraction include the streptavidin paramagnetic particles provided in commercial mRNA isolation kits, such as the PolyATract^®

System (Promega Corp., Madison, Wisconsin, USA). The mRNA retained on the dT or dU support is then eluted and concentrated by ethanol precipitation prior to use.

Any one of a number of known methods for generating double stranded cDNA equivalents of the mRNA isolated as described above are suitable for use in producing the library used to assemble the full human SMCY cDNA of the present invention. The most preferred such method is that of Gubler and Hoffman, in which an oligo-dT containing primer is annealed to the mRNA and used to prime synthesis of the first strand of the cDNA by reverse transcriptase. Gubler, U. and Hoffman, B.J.

(1983) Gene 25:283. The second strand is subsequently synthesized by strand replacement using: (1 ) RNase H to partially digest the mRNA thereby leaving gaps, (2) a DNA polymerase to synthesize the second strand using the first cDNA strand as template and the remaining mRNA fragments as primers, and (3) DNA ligase to ligate all the newly synthesized second strand cDNA fragments together resulting in double stranded cDNA molecules. Id.

A cDNA library is produced by inserting the double stranded cDNA molecules into a suitable cloning vector, such as λgtI O, by any one of a number of known cloning methods. Suitable cloning methods for use in making the present cDNA library include direct blunt end ligation or ligation using linkers, for example coRI linkers attached to the ends of the cDNA inserts to facilitate ligation of each insert into the coRI restriction enzyme cut site of the vector. Packaging of the recombinant vectors into bacteriophage can be accomplished by on of several methods described in Sambrook et al. at sections 2.95-2.107.

The cDNA library thus produced is then screened for clones containing human SMCY cDNA. Any one of a number of known methods is suitable for such screening. However, either of the following two methods is most preferred. The first such preferred method uses plaque hybridization to screen the cDNA library. In this method aliquots of the cDNA library are mixed with an appropriate bacteria, plated at low density, and grown on agar plates. Once the bacteria are confluent, the plaques are lifted onto a nitrocellulose or a similar membrane filter and the filters are hybridized with one or more SMCY or SMCX DNA probes that have been labeled with a radioisotope or fluorescent reagent (Sambrook et al., sections 2.108-2.1 19). Plaques that hybridize with the probe are then serially isolated to single plaque purity and their cDNA inserts are analyzed. The inserts of Λgt10 clones containing cDNAs of interest are then subcloned into plasmid vectors well known in the art such as pGEM^® (Promega Corp.) or Bluescript^® (Stratagene, La Jolla, California, USA).

The other preferred method for obtaining specific cDNAs of interest from a cDNA library involves amplification of the desired inserts by PCR using one or more sequence specific primers and reverse transcriptase. See, e.g., Kawasaki, E.S., "Amplification of RNA," PCR Protocols: A Guide to Methods & Applications (Acad. Press, pub. 1990), pp. 21 -27; Rappolee et al. (5 Aug. 1988) Science 241 :708; Chelly et al. (30 June 1988) Nature 33:858; Brenner et al. (1989) Biotechniques 7(10): 1096; Block, Will (19 March 1991 ) Biochem. 30(1 1 ):2735, all of which are incorporated herein by reference. In this method, known as RT-PCR, a primer pair is constructed that will hybridize to opposite strands of the cDNA insert of interest such that their 3' ends are in closest proximity to each other. Alternatively, one of the primers is constructed to hybridize to the cDNA insert of interest and the other is constructed to hybridize to the vector DNA flanking the cDNA insert of interest. The primer pair is used to PCR amplify the selected cDNA inserts thus generating a subset of cDNAs specifically containing the inserts of interest. As above, these amplified cDNAs can then be subcloned into plasmid vectors by the methods described above or by use of commercially available "T-vectors" such as the pGEM^®T-Vector System (Promega Corp.) or the original TA Cloning^® Kit (Invitrogen^®, San Diego, California, USA).

The DNA sequence of the cDNA clones selected in the library screening procedures described above can readily be determined by isolating DNA from the selected clones, and analyzing the insert sequence using any standard DNA sequencing method. Suitable standard sequencing methods for use in determining the cDNA insert sequences to assemble the sequence of the human SMCY cDNA of the present invention include the Sanger method of dideoxy sequencing (Sanger et al., J. Mol. Biol. (1975) 94:441 and Sanger et al., Proc. Natl.

Acad. Sci. (1977) 74:5463), and the Maxam and Gilbert chemical cleavage method (Maxam et al., Proc. Natl. Acad. Sci. (1 977) 74:560). See also Sambrook et al., Section 13.

Once the sequence of the insert portion of each cDNA clone is determined, the sequences are aligned with one another at regions of overlap to generate a composite cDNA sequence. This assembly process is preferably done by comparing the individual candidate clones to one another. Any missing cDNA sequences in the resulting composite map are resolved by additional library screening and sequencing until one obtains a contiguous sequence which includes at least the entire translation region of the protein.

Another aspect of the present invention is probes to the human SMCY gene sequence. The probes of this invention include any single or double stranded human SMCY DNA or RNA sequence of sufficient size to hybridize to a human SMCY cDNA sequence. The probes may be obtained by any of a number of methods well known in the art including, but not limited to the isolation of restriction enzyme fragments, synthesis of oligonucleotides, and the synthesis of single stranded DNA or RNA (Sambrook et al., Sections 5, 1 1 , and 10, respectively). The probes of this invention can be labeled and used to probe DNA or RNA samples from various sources to determine whether homologs of human SMCY are present.

The probes of this invention can be labeled by any of a number of procedures well known in the art so that on hybridization to a DNA or RNA sequence having homology with the probe, they can be detected. Labeling procedures, including the use of ³²P or biotin, as well as hybridization techniques and protocols are described in detail in Sambrook et al.. Sections 1 and 7-1 1 , and Ausubel et al., Current Protocols in Molecular Biology (1988), among others.

Yet another aspect of the present invention is an oligonucleotide primer having a human SMCY cDNA sequence which is complementary to a region of DNA or RNA adjacent to and upstream of a corresponding SMC locus of interest. Pairs of human SMCY primers of this invention which are complementary to opposing strands of DNA or RNA flanking a locus of interest can be used to amplify the DNA or RNA of that locus in an amplification reaction, such as the polymerase chain reaction

(PCR). Thus, the primers of this invention provide a means to identify the presence or absence of an SMC homolog. The primer pairs of this invention are particularly useful in instances where one has only extremely small quantities of starting material (RNA or DNA) rendering standard probing techniques such as that described above impractical or impossible.

The primers of this invention are preferably single stranded, but may also be double stranded. The size of any particular primer of this invention depends on many factors, including the method being used and the source of the primer. However, in all instances the primer must be long enough to form a stable hybrid with the DNA or RNA substrate of interest. The primer must also have sufficient complementarity with the DNA or RNA being primed to form a stable hybrid. Exact complementarity is not required.

The primers of this invention may be prepared by any of several methods well known in the art, most preferably by automated synthesis.

See, e.g., Mullis, K.B. and Faloona, F.A. "Specific Synthesis of DNA in vitro via a Polymerase — Catalyzed Chain Reaction," Methods in Enzymology (Acad. Press, publ. 1 987), pp. 335-350; Thein et al. "The use of synthetic oligonucleotides as specific hybridization probes in the diagnosis of genetic disorders," Human Genetic Disease Analysis: A

Practical Approach (IRL Press, Herndon, VA, 1993), pp. 21 -33, both of which are incorporated herein by reference.

Another aspect of the present invention is the isolated gene encoding for SMCY protein. The isolated gene of this invention is preferably obtained using the full length SMCY cDNA sequence, the

SMCY cDNA fragments used to assemble that full length sequence, or probes or primers complementary to one or more region of the full length cDNA sequence. This aspect of the present invention uses known techniques, to isolate the gene of this invention, such as the genomic DNA isolation techniques described in Sambrook et al., supra, incorporated by reference herein. An additional aspect of the present invention is a recombinant form of human SMCY protein. The protein of this invention is preferably produced by cloning the human SMCY cDNA of this invention into an appropriate protein expression vector. The sequence of the recombinant human SMCY protein of this invention is preferably determined by analyzing the sequence of the human SMCY cDNA for potential open reading frames and comparing the possible amino acid sequences of those open reading frames with any specific peptide sequence known. The open reading frame comtaining matches to the known peptide sequence is likely the correct amino acid sequence. See, e.g., Wu, J. et al. (1994) Hum. Mol. Genetics 3(a): 153, incorporated herein by reference.

The examples below illustrate the isolation and characterization of Λgt10 clones with SMCY cDNA inserts, and the use of the resulting insert sequence information to assemble a contiguous human SMCY cDNA sequence encoding a human SMCY protein. The examples also illustrate the use of the sequence information to produce human SMCY DNA or RNA primers or probes to regions of interest in SMCY genes. Finally, the examples below illustrate the use of the human SMCY cDNA sequence to identify and analyze human histocampatibility antigens such as the H-Y antigen.

EXAMPLE 1 - ISOLATION OF HUMAN SMCY cDNA CLONES

Two cDNA libraries were used for the isolation of human SMCY cDNA. The first was constructed by oligo-dT priming/reverse transcribing mRNA from male lymphocytes and cloning the products into

Λgt10. The second, an oligo-dT/random primed human testis λgt10 cDNA library, was obtained from Clonetech Labs (Cat. #HL1 162a).

Isolation of human SMCY cDNA was initiated by screening the lymphocyte cDNA library by plaque hybridization with a 1 90 bp human SMCY genomic DNA fragment, a fragment previously described in Agulnik et al. (1994) Hum. Mol. Genet. 3:873. Library filters containing a total of 2 x 10⁶ clones were hybridized with the random primed probe overnight at 60°C followed by washing with 0.1 X SSC/0.1 % SDS for 30 minutes. Filters were exposed to Kodak XAR film ovemight at -80°C and positive colonies were isolated.

The plaque hybridization screening described above yielded two partial SMCY cDNA clones, H1 and H2 (see Figure 1 ), the inserts of which were subcloned into pBluescript KSII (Stratagene, La Jolla, California, USA). H2 was found to contain 1410 bp of the SMCY cDNA sequence, including 44 bp of the 5' untranslated region, and 1366 bp of the protein coding region of the mRNA. Further, the sequence of H2 was determined to be chimeric, containing 1400 bases of unrelated DNA at its 3' end. H1 , which overlaps H2, contains SMCY cDNA sequence more 3' than H2 but lacks the full coding sequence for the SMCY protein. H1 also contains a deletion and several insertions in the middle of the SMCY sequence as determined by comparison with the human SMCX sequence. Wu, J. et al., supra.

The insert regions of additional SMCY cDNA clones with sequences in the 3'direction of the target mRNA sequence of those contained in H1 and H2 were obtained by amplifying the lymphocyte and testis cDNA libraries prepared as described above, with RT-PCR and with primers complementary to SMCY cDNA and to λgt10 DNA, as follows. The insert region of the first such clones (clones 8 and 9) were determined by amplifying phage DNA from the libraries using RT-PCR, with primers complementary the downstream region of the SMCY cDNA sequence region of the H 1 clone (primers S100 and S101 ) and with primers complementary to λgt10 vector DNA adjacent to and flanking the insert region of each clone. The insert region of the next clone (clone 1 ) was similarly amplified from the same libraries using RT-PCR and primers complementary to the downstream region of the SMCY cDNA sequence from clone 8 (primers S102 and S103), and gt10 primers. The insert region of the last clone containing SMCY cDNA sequence (clone A17) downstream of the the SMCY sequence of clone H1 was similarly amplified from the libraries, using the sameλgtIO primers used above, and primers complementary to the downstream region of the SMCY cDNA insert sequence from clone 1 (primers S104 and S105). The above procedure resulted in the isolation of clone 1 from the lymphocyte library, clone 8 from the testis library, clone 9 from the testis library, and clone A17 from the lymphocyte library. Below is a representative detailed example of how the insert sequence of each such clone described above was first amplified and analyzed.

The sequence of clone 8 was used to screen for clone 1 , as follows. SMCY primers S102 (5'-CCTAACATCCAGGCTCTCAA-3') [SEQ ID NO:3] and S103 (5'-AGAAGCTCTGACTAAGGCACAA-3') [SEQ ID NO:4], complementary to the sequence near the 3' end of clone 8, were synthesized. Primer S102 (nucleotides 3285-3304 of the insert region of clone 8) sits just 5' of primer S103 (nucleotides 3305-3326) in the SMCY sequence. In addition, two sets of Λgt10 vector primers w e re s y n t h e s i z e d : e x t e r n a l p r i m e r s C 1 ( 5 ' -C C A C CTTTTG AG C A AGTTC AG -3 ' ) and C2 (5'-GAGGTGGCTTATGAGTATTTG-3') and intemal primers A1

( 5 ' - AG CCTGGTT A AGTCC A AGCTG-3 ' ) and A2 (5'-CTTCCAGGGTAAAAAGCAAAAG-3'). After a first round of amplification using PCR with primers S102 and C1 or C2, the amplified products obtained were reamplified with primers S103 and A1 or A2. PCR conditions were: 94°C for 2 minutes then 30 cycles of 94°C for 30 seconds, 55°C for 1 minute, 72°C for 3 minutes, followed by a 5 minute extension at 72°C. the resulting PCR fragments were electrophoresed on a 2.5% agarose gel, purified using Wizard™ PCR Preps DNA Purification System (Promega) and subcloned into pGEM^®-T vector (Promega) according to the manufacturers protocols. SMCY cDNA sequences 5' (i.e. upstream) of those contained in clone H2 were isolated by RT-PCR with SMCY primers S32 and S33 complementary to sequences at the 5' end of H2 yielding clone S20 from the testis library. Clone S20 contains an additional 231 bp of 5' untranslated SMCY sequence.

Figure 1 provides a composite map of the various human SMCY cDNA clones that were isolated and sequenced in this example. The maps of each individual clone described above are depicted in relation to one another and to the human SMCY mRNA. The solid blocks in each map represent the protein coding region of the mRNA while the open blocks represent 3' or 5' untranslated regions (UTRs). The slashed portion of clone H2 represents sequence unrelated to SMCY. The hatched portion of clone H1 represents positions of deletions and insertions in the human SMCY sequence. Arrows indicate the position of human SMCY cDNA specific primers used to amplify and subclone

Λgt10 human SMCY cDNA inserts with RT-PCR, as described above. The arrows point in the 5' to 3' direction. Extension of each of the SMCY primers proceeded in the direction of the arrow used to describe the primer in Figure 1 . Figure 2 provides a representative map of λgt10 clones with human SMCY cDNA inserts. The map depicts two principal regions of a representative clone, the Λgt10 vector sequence shown as a shaded region, and the SMCY cDNA insert shown as an unshaded region. The arrows labeled A1 , A2, C1 , and C2 represent primers to the λgtl O vector sequence flanking the cloning site of the cDNA inserts. The arrows point inthe 5' to 3' direction. These primers were paired with SMCY specific primers in RT-PCR reactions to amplify and subclone sections of λgt10 clones containing SMCY inserts as described above. EXAMPLE 2 - HUMAN SMCY cDNA AND PROTEIN SEQUENCE

ANALYSIS

The contiguous 5466 bp sequence of the human SMCY cDNA was determined by combining the sequences of the individual clones at regions of overlap, as depicted in Figure 1 . The 5466 bp of sequence is made up of : (1 ) a 4617 bp protein translation region, (2) 275 bp of 5' untranslated sequence, and (3) 574 bp of 3' untranslated sequence. The 4617 bp translation region codes for a human SMCY protein 1 539 amino acids in length. The 5466 bp human SMCY cDNA sequence obtained (SEQ ID

NO. 1 ) is shown in the Sequence Listing below with the decoded amino acid sequence of the human SMCY protein. All DNA sequencing was done by the dideoxy termination method of Sanger.

EXAMPLE 3 - EVOLUTIONARY CONSERVATION STUDY USING HUMAN SMCY cDNA

DNA was extracted from the blood and/or liver of male and female human, cattle, horse, rabbit and monodelphus (South American Opposum) using standard procedures as previously referenced. The DNA was then digested to completion with the indicated restriction enzymes, electrophoresed on a 0.8% agarose gel, Southern blotted, and hybridized with random primed ³²P labeled human SMCY cDNA fragments at 65°C overnight. The Southern blot was washed under stringent conditions (two 1 5 minute washes in 2X SSC/0.1 % SDS at RT; two 1 5 minute washes in 2X SSC/0.1 % SDS at 55°C; two 15 minute washes in 2X SSC/0.1 % SDS at RT for the cattle, rabbit, horse and monodelphus blots; two 15 minute washes in 0.1 X SSC/0.1 % SDS at 55°C for the human blot) and autoradiographed.

Referring to Figures 3A, 3B and 3C, the Southern Blot shows that SMCY is conserved in the male in all species screened. Previously, using mouse Smcx as a probe, Smcy was demonstrated to be conserved in methatheria (wallaby: 150 million years diverged from eutherians) and in some (human, horse, pig, dog and mouse) but not all (cattle and rabbit) eutherian species. Agulnik etal. (1994) Hum. Mol. Gen. 3(6):879. In these latter two species it was proposed that the Y copy of the gene was no longer required and that the X copy, which escapes

X inactivation in all species studied, had taken over its function. In contrast, our results show that SMCY is conserved in monodelphus, which is 80 million years diverged from Australian marsupials, and in all eutherians probed, including cattle (Figure 3B) and rabbit (Figure 3C). This is not surprising given that mouse Smcx is only about 65% homologous to SMCY in humans and seemingly diverse species such as horse. Similarly, SMCY in primates shows more homology at the nucleotide level to all domestic species amplified using human primers than to SMCY in rodents including Chinese hamster.

EXAMPLE 4 - USE OF HUMAN SMCY cDNA PRIMERS TO AMPLIFY

AND STUDY PRIMATE SMCY

The human SMCY cDNA sequence of Example 2 (SEQ ID NO.1) was used to construct a primer pair, Rh1/S104 (5'-CCTCCAGACCTGGACAGAATT-3') and Rh2/S109 (5'-GTGGTCTGTGGAAGGTGTCA-3') flanking an SMC/locus that codes for the 3' carboxyl region of the human SMCY protein (See SEQ ID NO:1 ). This region was chosen for amplification because comparison of the SMCY cDNA sequence from different species has shown that the 3' end of the cDNAs are most divergent from one another. Using primers Rh 1/S 104 and Rh2/S109at 100 pM concentration,

100 ng of male and female Rhesus DNA was amplified by PCR at low stringency (95°C for 2 minutes, then 1 minute at 94°C, 1 minute at 61°C and 1 minute at 72°C for 35 cycles followed by holding at 4°C). The 359 bp male specific genomic DNA amplified product was gel purified using Wizard™ PCR Preps DNA Purification System (Promega), cloned into pGEM-T and sequenced using techniques described above. As a control and for comparison, human and Rhesus RNA were also amplified, cloned and sequenced. The 264 bp Rhesus SMCY cDNA sequence obtained is shown in Figure 4A along with the human SMCY cDNA sequence. The decoded amino acid sequences for the cDNA sequences in Figure 4A is shown in Figure 4B. At the nucleotide level, there is 92% homology. At the amino acid level there is 82% homology.

EXAMPLE 5 - USE OF HUMAN SMCY cDNA PRIMERS IN AN EARLY EMBRYONIC DEVELOPMENT STUDY IN PRIMATES

To establish the onset of expression of Smcy in primates, Rhesus embryos were produced by superovulation, oocyte collection, in vitro fertilization, and embryo culture, and then collected. A total of 16 embryos were arrested in development at the 4, 8, and 16 cell-stage. A single cell was removed from each of the 16 embryos and sexed using human SRY primers amplifying the conserved "box" region of the gene. Sinclair et al. (1990) Nature 346:216. Poly (A)⁺ RNA was then isolated from the remaining embryo cells by a modification of the PolyATtract^® protocol (Promega) and amplified by RT-PCR with the human SMCY primer pair R /S104 and Rh2/S109, described in Example 4, above, and the SMCX primer pair PRH1

(5'-AGAGTGGGGGCAGGGGTTAGTGTA-3') and PRH2 (5'-GGCCATCACCATTCCAGAGACAA-3'). Poly(A)⁺ RNA from Rhesus blood was also amplified as a control. The amplified products were electrophoresed on a 2.5% agarose gel and stained with ethidium bromide.

As can be seen in Figure 5, female and male embryos showed expression of Smcx (120 bp amplified product) at all three stages, however, Smcy expression (264 bp amplified product) was only seen in the male 8 and 16 cell-stage embryos indicating that expression of Smcy begins between the 4 and 8 cell-stage. Although there is little data on early embryonic messages in primates, models suggest that only essential genes are expressed coincident with maternal/embryonic transition. Our data in primates, combined with the observation that Smcy is expressed in two-cell stage mouse embryos, demonstrates that

Smcy/x are two of the few early transcripts in embryos. Since Smcx escapes X-inactivation, two copies of the Smc gene may be necessary for normal function.

EXAMPLE 6 - USE OF HUMAN SMCY PROTEIN SEQUENCE TO ANALYZE HUMAN H-Y ANTIGEN

Endogenously processed H-Y peptides were processed as follows. Class I MHC HLA-B7 molecules were purified by affinity chromatography from the H-Y positive, B lymphoblastoid cell line, JY. (Turner et al., (1975) J. Biol. Chem. 250: 4512; Parham et al. (1977) J. Biol. Chem. 252: 7555.) The associated peptides were extracted in acid and separated from high molecular weight material by ultrafiltration as previously described (Hunt et al. (1 991 ) Science 255: 1 261 ; Huczko et al. (1993) J. Immunol. 151 :2572.), and subsequently fractionated by reverse-phase high-performance liquid chromatography (HPLC) (Cox et al. (1994) Science 264: 716). Aliquots of each fraction were incubated with HLA-B7 positive, H-Y negative T2-B7 target cells in order to assay for the ability to reconstitute the epitope recognized by an HLA-B7 restricted, H-Y specific cytotoxic T lymphocyte (CTL) clone, 5W4 (Huczko et al. (1 993) J. Immunol. 151 :2572). A single peak of reconstituting activity was observed (Fig. 6A, fraction 28 and 29), which was rechromatographed using a different organic modifier. Although a single active peak of reconstituting activity was also observed from this separation (Fig. 6B, fraction 14, 15 and 16), it still contained more than 100 distinct peptide species, as assessed by electrospray ionization tandem mass spectrometry. The specific conditions used to reconstitute the H-Y epitope with the HPLC fractionated peptides extracted from HLA-B7 molecules, described generally above were as follows. In the first HPLC fractionation step, HLA-B7 molecules were immunoaff inity purified from 2x10¹⁰ H-Y positive JY cells. Peptides were eluted from B7 molecules with 10% acetic acid, pH 2.2, filtered through a 10 kD cut-off filter and fractionated on a C18 reverse phase column. Buffer A was 0.1 % heptafluorobutyric acid (HFBA); buffer B was 0.1 % HFBA in acetonitrile. The gradient consisted of 100% buffer A (0-20 min), 0 to 12% buffer B (20 to 25 min), and 12 to 50% buffer B (25 to 80 min) at a flow rate of

200 l/min. 60 fractions of 200 μl each were collected from 20 to 80 min.

Fractions 28 and 29 from the separation shown in Figure 6A were rechromatographed with the same acetonitrile gradient, but using trifluoroacetic acid (TFA) instead of HFBA as the organic modifier. The elution profile from this second HPLC column is shown in Figure 6B. For both panels A and B of Figure 7, 3% of each peptide fractions were preincubated with 1 ,000 ⁵¹CR-labeled T2-B7 cells at room temperature for 2 hours. The CTL clones were then added at an effector to target ratio of 10 to 1 , and further incubated at 37 °C for 4 hours. Background lysis of T2-B7 by the CTL clones in the absence of any peptides was -3% in (A) and -4% in (B); positive control lysis of JY was 75 % in (A) and 74% in (B).

To identify the active H-Y peptide in the resulting mixture eluted from the second HPLC purification above, fraction 14 was chromatographed with an on-line microcapillary column effluent splitter as previously described (Cox et al. (1994) Science 264: 716; den Haan et al. (1995) Science 255: 1261 ). One-fifth of the effluent was deposited into 100 μ\ of culture media in microtiter plate wells for analysis with CTLs as in Fig. 6. The remaining four-fifths of the material were directed into the electrospray ionization source, and mass spectra of the peptides deposited in each well were recorded on a triple- quadrupole mass spectrometer (Finnigan-MAT, San Jose, California). The results obtained from each well number assayed were were plotted as either H-Y epitope reconstitution activity measured as percent specific lysis ( ♦ ), or as abundance of peptide 1 171 measured as ion current at m/z 391 (■ ). A copy of the plot of these results is reproduced in Figure 7. The amount of the H-Y sensitizing activity in each well was correlated to signals observed in the mass spectrum, and therefore to the abundance of different peptide species. By comparing the profile of H-Y activity and the ion abundance data in Figure 7, we were able to identify a (M + 3H) ^{+ 3} ion at a mass-to- charge ratio (m/z) of 391 (neutral molecular mass = 1 171 ), whose abundance correlated with the amount of H-Y epitope reconstituting activity. Further confirmation of the importance of peptide 1 171 was provided by the demonstration that a peptide with an identical mass and collision-activated dissociation (CAD) spectrum was also present in HLA-B7 associated peptides extracted from a second H-Y positive B lymphoblastoid line, DM, but absent from a spontaneous H-Y antigen loss variant of this cell, DM(-). Assignment of a complete amino acid sequence to the 1 171 peptide from the CAD mass spectrum recorded at the 20 fmol level proved difficult due to the absence of high mass fragment ions containing the amine terminus (b-type ions). A series of single and/or doubly charged fragment ions containing the carboxyl terminus (y-type ions) identified the C-terminal residue as either L or I and the first six amino acids as SPSVDK. The difference in molecular mass between this partial sequence and that of the full length peptide suggested the presence of four additional residues, for a total length of 1 1 . Since the candidate peptide existed exclusively in the gas phase as an (M + 3H)^{+ 3} ion, and underwent mass shifts of 42 and 84 Daltons (Da) on conversion to the corresponding methyl ester and acetylated derivative, respectively, two of the remaining residues were assigned as R and either D or E. Only two combinations of four residues (AREA and GRDV) meet the above criteria and satisfy the missing mass of 427 Da. CAD spectra recorded on synthetic peptides suggested that R could not be located at either position 7 or 10. Databases were searched for proteins containing peptides with these characteristics, and an amino acid sequence (SPAVDKAQAEL) consistent at 9 out of 1 1 positions was found in residues 963-973 of the protein encoded by XE169 or SMCX (Wu et al. (1 994) Hum. Mol. Genet. 3:153). Subsequent analysis of the SMCY sequence (Fig. 3) revealed the amino acid sequence

(SPSVDKARAEL: residues 950-960) that is consistent at 1 1 out of 1 1 positions and has the expected mass of 1 171 Da.

Given these results, peptide 1 171 was further analyzed after conversion of the R residue to ornithine, using a CAD mass spectrometer. Material from second dimension HPLC fraction 14 shown in Fig. 6B was treated with 70% hydrazine hydrate for 1 hour. The CAD mass spectrum was recorded on the (M + 2H)⁺² ion at m/z 566. the results, shown in Figure 8, confirmed that the peptide sequence of peptide 1 171 was identical to that found in the predicted SMCY amino acid sequence.

To confirm this finding, synthetic peptides to the SMCY (SPSVDKARAEL) and SMCX (SPAVDKAQAEL) sequences were sythesized and purified to homogeneity by reverse phase-HPLC on a Vydac C4 column. Purity was established on an analytical RP column and the quantity of each peptide was confirmed by comparing the area of the peak with that of a standard peptide. The identity of the peptides was confirmed by mass spectrometry. These peptides were then used is experiments testing for reconstitution of the H-Y epitope. ⁵¹Cr release was assayed at an effector to target ratio of 10 to 1 on T2-B7 cells that had been incubated in the presence of differing concentrations of the

SMCY sythetic peptide SPSVDKARAEL ( ♦ ), or the SMCX sythetic peptide SPAVDKAQAEL {^■), using the concentrations of each such peptide indicated in the plot of the results of this assay, in Figure 10.

The results show the 1 1 residue SMC Y sequence sensitized T2-B7 cells for recognition by the H-Y specific CTL clone. Half-maximal lysis was achieved at a peptide concentration of 10 pM (Fig. 9). The corresponding peptide derived from the SMCX sequence was also able to sensitize T2-B7 cells for recognition, however, comparable levels of killing were only achieved by using a 10, 000-fold higher peptide concentration. Binding studies showed that the concentration of the SMCY peptide that inhibited the binding of an iodinated standard peptide to purified HLA-B7 by 50% (IC₅₀) was 34 nM, while the IC₅₀ for the SMCX peptide was 140 nM (Fig. 8). Thus, the significant difference in the ability of the SMCY and SMCX peptides to sensitize targets for T cell recognition is almost entirely due to the fine specificity of the T cell receptor, rather than to differences in MHC binding affinities.

Based on this information, we conclude that the peptide epitope representing the HLA-B7 restricted H-Y antigen is derived from the protein encoded by SMCY.

Finally, the identity of the 1 1 amino acid segment of human SMCY protein identified above was further confirmed in a functional assay comparing the binding inhibition properties of a synthetic peptide constructed with that same amino acid sequence to other peptides. Specifically, HPLC-purified test peptides were assayed for the ability to inhibit the binding of the iodinated endogenous B7 peptide APRTYVLLL to purified HLA-B7 as previously described (Ruppert et al. (1993) Cell

74:929; Chen et al. (1994) J. Immunol. 152:2874; Sette et al. (1994) J. Immunol. 153:5586). Figure 10 shows the results of this assay, expressed as percent inhibition measured for each peptide assayed plotted against peptide concentration. The following peptides were assayed herein: the SMCY peptide SPSVDKARAEL ( ♦ ); the SMCX peptide SPAVDKAQAEL (■); APRTLVLLL ), an endogenous peptide bound to HLA-B7; and LLDVPTAAV (X), an endogenous peptide bound to HLA-A2.1 , included as a negative control.

The assay results summarized above all used the human SMCY protein sequence derived from the human SMCY cDNA sequence information obtained in Examples 1 and 2, above, as summarized in the protein and DNA sequence depicted in SEQ ID NO: 1 below, to identify and confirm the sequence of human H-Y antigen. In confirming the common identity of the sequence and functional characteristics of human H-Y antigen and the 1 1 amino acid synthetic peptide sythesized using the human SMCY protein sequence from Figure 3 (SEQ ID NO: 1 ), this example also strongly indicates that we correctly deduced the human SMCY amino acid sequence from the human SMCY cDNA sequence of Figure 3.

EXAMPLE 7 (HYPOTHETICAL) - PRODUCTION OF ANTIBODIES TO THE HUMAN SMCY PROTEIN SEQUENCE

Human specific antibodies, some of which may be H-Y antigen specific, are raised against a human SMCY protein, such as a human SMCY protein identified by SEQ ID NO:2 as described in Example 1 .

EXAMPLE 8 (HYPOTHETICAL) - USE OF ANTIBODIES TO THE HUMAN SMCY PROTEIN SEQUENCE IN IMMUNOLOGICAL TESTING

The antibodies identified in Example 7 are used in an immunological testing and screening system. Information is obtained which has applications in clinical research, diagnostics and therapeutics.

Specifically, an adult and prenatal male infertility diagnostic test is designed. EXAMPLE 9 (HYPOTHETICAL) - USE OF ANTIBODIES TO THE

HUMAN SMCY PROTEIN SEQUENCE IN SEX SELECTION AND

SCREENING FOR HUMANS

The antibodies identified in Example 7 are used in a sex selection and screening system for humans.

EXAMPLE 10 (HYPOTHETICAL) - USE OF ANTIBODIES TO THE

HUMAN SMCY PROTEIN SEQUENCE IN SEX SELECTION AND

SCREENING FOR NON-HUMAN SPECIES

The antibodies identified in Example 7 are used in a sex selection and screening system for non-human species.

EXAMPLE 1 1 (HYPOTHETICAL) - USE OF PRIMERS OF HUMAN SMCY cDNA IN EMBRYONIC EXPRESSION STUDIES

An SMCY primer pair is synthesized using an oligonucleotide synthesizer. Each primer in the pair consists of a sequence of nucleic acid included in or complementary to a human SMCY cDNA sequence, such as SEQ ID NO: 1 . The sequence of each oligonucleotide primer is chosen such that the two primers are complementary to and flank a region of interest of RNA from a subject organism.

Embryos from the subject organism are harvested at various stages of development, i.e., 2 cell stage, 4 cell stage, 8 cell stage, etc.

RNA is then isolated from the individual embryos and amplified by PCR with an SMCY primer pair. After a sufficient number of cycles, any amplified DNA product is electrophoresed on a polyacrylamide gel, the gel is stained, and the gel is photographed or autoradiographed to reveal the presence or absence of amplified product.

By analyzing the amplified product, one can determine the stage at which embryos of the subject organism begin expressing the SMC encoded for by the RNA region amplified and tested. SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT: Promega Corporation

(ii) TITLE OF INVENTION: Human SMCY cDNA and

Related Products

(iii) NUMBER OF SEQUENCES:

(iv) CORRESPONDENCE ADDRESS:

(A) ADDRESSEE: Promega Corporation

(B) STREET: 2800 Woods Hollow Road

(C) CITY: Madison

(D) STATE: Wisconsin

(E) COUNTRY: U.S.A.

(F) ZIP: 53711-5399

(V) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Diskette - 3.5 inch,

1.44 Mb

(B) COMPUTER: IBM compatible PC

(C) OPERATING SYSTEM: DOS, version 6.0

(D) SOFTWARE: WordPerfect 5.1 (DOS text format)

(vi) CURRENT APPLICATION DATA:

(A) APPLICATION NUMBER:

(B) FILING DATE:

(C) CLASSIFICATION:

(vii) PRIOR APPLICATION DATA:

(A) APPLICATION NUMBER: 06/003,744

(B) FILING DATE: 9/14/95

(A) APPLICATION NUMBER: 06/012,973

(B) FILING DATE: 03/07/96

(2) INFORMATION FOR SEQ ID NO:l

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 5,476 base pairs

(B) TYPE: Nucleic Acid

(C) STRANDEDNESS: Single

(D) TOPOLOGY: Linear

(ii) MOLECULE TYPE: cDNA to mRNA

(iii) HYPOTHETICAL: yes

(iv) ANTI-SENSE: no (vi) ORIGINAL SOURCE:

(A) ORGANISM: HOMO SAPIEN

(F) TISSUE TYPE: TESTIS AND LYMPHOCYTE

(vii) IMMEDIATE SOURCE:

(A) LIBRARY: cDNA (ix) FEATURE:

(A) NAME/KEY: Human SMCY cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l:

GGACGGCCAT ACTATTTTTA TCTTGCTTTT TCGTTCTGTC 40

GCAGTACTGT TTAATATGAG TCCAGCGACG GCTCTGTGAC 80

TGTTTTCCTC TGGTAAAATC GCTCTTGCGT CCTCAGCGTT 120

TATCTCAGGT GCGGAAGGTC TCACAGGTTT GGAAATAGCG 160

CCGGAAAAAT CGATCCGCGG AGTGAGACGG CTCGTACCAC 200

ACTGCAGGGC CCGGAGGTCA AGATGGTGGC TGTAAAACTA 240

GGATCCCTGA CGATTGCTTA GCATTAAGGC CCGAC 275

ATG GAA CCG GGG TGT GAC 293

Met Glu Pro Gly Cys Asp

1 5

GAG TTC CTG CCG CCA CCG GAG TGC CCG GTT TTT GAG CCT AGC TGG GCT 341 Glu Phe Leu Pro Pro Pro Glu Cys Pro Val Phe Glu Pro Ser Trp Ala 10 15 20

GAA TTC CAA GAC CCG CTT GGC TAC ATT GCG AAA ATA AGG CCC ATA GCA 389 Glu Phe Gin Asp Pro Leu Gly Tyr lie Ala Lys lie Arg Pro lie Ala 25 30 35

GAG AAG TCT GGC ATC TGC AAA ATC CGC CCA CCC GCG GAT TGG CAG CCT 437 Glu Lys Ser Gly lie Cys Lys lie Arg Pro Pro Ala Asp Trp Gin Pro 40 45 50

CCT TTT GCA GTA GAA GTT GAC AAT TTC AGA TTT ACT CCT CGC GTC CAA 485 Pro Phe Ala Val Glu Val Asp Asn Phe Arg Phe Thr Pro Arg Val Gin 55 60 65 70

AGG CTA AAT GAA CTG GAG GCC CAA ACT AGA GTG AAA TTG AAC TAT TTG 533 Arg Leu Asn Glu Leu Glu Ala Gin Thr Arg Val Lys Leu Asn Tyr Leu 75 80 85

GAT CAG ATT GCA AAA TTC TGG GAA ATT CAA GGC TCC TCT TTA AAG ATT 581 Asp Gin lie Ala Lys Phe Trp Glu lie Gin Gly Ser Ser Leu Lys lie 90 95 100

CCC AAT GTG GAG CGG AAG ATC TTG GAC CTC TAC AGC CTT AGT AAG ATT 629 Pro Asn Val Glu Arg Lys lie Leu Asp Leu Tyr Ser Leu Ser Lys lie 105 110 115

GTG ATT GAG GAA GGT GGC TAT SAA GCC ATC TGC AAG GAT CGT CGG TGG 677 Val lie Glu Glu Gly Gly Tyr Glu Ala lie Cys Lys Asp Arg Arg Trp 120 125 130

GCT CGA GTT GCC CAG CGT CTC CAC TAC CCA CCA GGC AAA AAC ATT GGC 725 Ala Arg Val Ala Gin Arg Leu His Tyr Pro Pro Gly Lys Asn lie Gly 135 140 145 150 TCC CTG CTA CGA TCA CAT TAC GAA CGC ATT ATT TAC CCC TAT GAA ATG 773 Ser Leu Leu Arg Ser His Tyr Glu Arg lie lie Tyr Pro Tyr Glu Met

155 160 165

TTT CAG TCT GGA GCC AAC CAT GTG CAA TGT AAC ACA CAC CCG TTT GAC 821 Phe Gin Ser Gly Ala Asn His Val Gin Cys Asn Thr His Pro Phe Asp 170 175 180

AAT GAG GTA AAA GAT AAG GAA TAC AAG CCC CAC AGC ATC CCC CTT AGA 869 Asn Glu Val Lys Asp Lys Glu Tyr Lys Pro His Ser lie Pro Leu Arg 185 190 195

CAG TCT GTG CAG CCT TCA AAG TTC AGC AGC TAC AGT CGA CGG GCA AAA 917 Gin Ser Val Gin Pro Ser Lys Phe Ser Ser Tyr Ser Arg Arg Ala Lys

200 205 210

AGG CTA CAG CCT GAT CCA GAG CCT ACA GAG GAG GAC ATT GAG AAG CAT 965 Arg Leu Gin Pro Asp Pro Glu Pro Thr Glu Glu Asp lie Glu Lys Hiε 215 220 225 230

CCA GAG CTA AAG AAG TTA CAG ATA TAT GGG CCA GGT CCC AAA ATG ATG 1013 Pro Glu Leu Lys Lys Leu Gin lie Tyr Gly Pro Gly Pro Lys Met Met 235 240 245

GGC TTG GGC CTT ATG GCT AAG GAT AAG GAT AAG ACT GTG CAT AAG AAA 1061 Gly Leu Gly Leu Met Ala Lys Asp Lys Asp Lys Thr Val His Lys Lys 250 255 260

GTC ACA TGC CCC CCA ACT GTT ACG GTG AAG GAT GAG CAA AGT GGA GGT 1109 Val Thr Cys Pro Pro Thr Val Thr Val Lys Asp Glu Gin Ser Gly Gly 265 270 275

GGG AAC GTG TCA TCA ACA TTG CTC AAG CAG CAC TTG AGC CTA GAG CCC 1157 Gly Asn Val Ser Ser Thr Leu Leu Lys Gin His Leu Ser Leu Glu Pro 280 285 290

TGC ACT AAG ACA ACC ATG CAA CTT CGA AAG AAT CAC AGC AGT GCC CAG 1205 Cys Thr Lys Thr Thr Met Gin Leu Arg Lys Asn His Ser Ser Ala Gin

295 300 305 310

TTT ATT GAC TCA TAT ATT TGC CAA GTA TGC TCC CGT GGG GAT GAA GAT 1253 Phe He Asp Ser Tyr He Cys Gin Val Cys Ser Arg Gly Asp Glu Asp 315 320 325

AAT AAG CTT CTT TTC TGT GAT GGC TGT GAT GAC AAT TAC CAC ATC TTC 1301 Asn Lys Leu Leu Phe Cys Asp Gly Cys Asp Asp Asn Tyr His He Phe 330 335 340

TGC TTG TTA CCA CCC CTT CCT GAA ATC CCC AGA GGC ATC TGG AGG TGC 1349 Cys Leu Leu Pro Pro Leu Pro Glu He Pro Arg Gly He Trp Arg Cys 345 350 355

CCA AAA TGT ATC TTG GCG GAG TGT AAA CAG CCT CCT GAA GCT TTT GGA 1397 Pro Lys Cys He Leu Ala Glu Cys Lys Gin Pro Pro Glu Ala Phe Gly 360 365 370 TTT GAA CAG GCT ACC CAG GAG TAC AGT TTG CAG AGT TTT GGT GAA ATG 1445 Phe Glu Gin Ala Thr Gin Glu Tyr Ser Leu Gin Ser Phe Gly Glu Met 375 380 385 390

GCT GAT TCC TTC AAG TCC GAC TAC TTC AAC ATG CCT GTA CAT ATG GTG 1493 Ala Asp Ser Phe Lys Ser Asp Tyr Phe Asn Met Pro Val His Met Val 395 400 405

CCT ACA GAA CTT GTA GAG AAG GAA TTC TGG AGG CTG GTG AGC AGC ATT 1541 Pro Thr Glu Leu Val Glu Lys Glu Phe Trp Arg Leu Val Ser Ser He

410 415 420

GAG GAA GAC GTG ACA GTT GAA TAT GGA GCT GAT ATT CAT TCC AAA GAA 1589 Glu Glu Asp Val Thr Val Glu Tyr Gly Ala Asp He His Ser Lys Glu 425 430 435

TTT GGC AGT GGC TTT CCT GTC AGC AAT AGC AAA CAA AAC TTA TCT CCT 1637 Phe Gly Ser Gly Phe Pro Val Ser Asn Ser Lys Gin Asn Leu Ser Pro 440 445 450

GAG GAG AAG GAG TAT GCG ACC AGT GGT TGG AAC CTG AAT GTG ATG CCA 1685

Glu Glu Lys Glu Tyr Ala Thr Ser Gly Trp Asn Leu Asn Val Met Pro

455 460 465 470

GTG CTA GAT CAG TCT GTT CTC TGT CAC ATC AAT GCA GAC ATC TCA GGC 1733

Val Leu Asp Gin Ser Val Leu Cys His He Asn Ala Asp He Ser Gly 475 480 485

ATG AAG GTG CCC TGG CTG TAC GTG GGC ATG GTT TTC TCA GCA TTT TGT 1781

Met Lys Val Pro Trp Leu Tyr Val Gly Met Val Phe Ser Ala Phe Cys 490 495 500

TGG CAT ATT GAG GAT CAC TGG AGT TAC TCT ATT AAC TAT CTG CAT TGG 1829 Trp His He Glu Asp His Trp Ser Tyr Ser He Asn Tyr Leu His Trp 505 510 515

GGT GAG CCG AAG ACC TGG TAT GGT GTA CCC TCC CTG GCA GCA GAG CAT 1877

Gly Glu Pro Lys Thr Trp Tyr Gly Val Pro Ser Leu Ala Ala Glu His

520 525 530

TTG GAG GAG GTG ATG AAG ATG CTG ACA CCT GAG CTG TTT GAT AGC CAG 1925

Leu Glu Glu Val Met Lys Met Leu Thr Pro Glu Leu Phe Asp Ser Gin

535 540 545 550

CCT GAT CTC CTA CAC CAG CTT GTC ACT CTC ATG AAT CCC AAC ACT TTG 1973

Pro Asp Leu Leu His Gin Leu Val Thr Leu Met Asn Pro Asn Thr Leu

555 560 565

ATG TCC CAT GGT GTG CCA GTT GTC CGC ACA AAC CAG TGT GCA GGG GAG 2021

Met Ser His Gly Val Pro Val Val Arg Thr Asn Gin Cys Ala Gly Glu

570 575 580

TTT GTC ATC ACT TTT CCT CGT GCT TAC CAC AGT GGT TTT AAC CAA GGC 2069

Phe Val He Thr Phe Pro Arg Ala Tyr His Ser Gly Phe Asn Gin Gly

585 590 595 TAC AAT TTT GCT GAA GCT GTC AAC TTT TGT ACT GCT GAC TGG CTA CCT 2117

Tyr Asn Phe Ala Glu Ala Val Asn Phe Cys Thr Ala Asp Trp Leu Pro 600 605 610

GCT GGA CGC CAG TGC ATT GAA CAC tAC CGC CGG CTC CGG CGC TAT TGT 2165 Ala Gly Arg Gin Cyβ He Glu His Tyr Arg Arg Leu Arg Arg Tyr Cys 615 620 625 630

GTC TTC TCC CAC GAG GAG CTC ATC TGC AAG ATG GCT GCC TTC CCA GAG 2213 Val Phe Ser His Glu Glu Leu He Cys Lys Met Ala Ala Phe Pro Glu 635 640 645

ACG TTG GAT CTC AAT CTA GCA GTA GCT GTG CAC AAG GAG ATG TTC ATT 2261 Thr Leu Asp Leu Asn Leu Ala Val Ala Val His Lys Glu Met Phe He 650 655 660

ATG GTT CAG GAG GAG CGA CGT CTA CGA AAG GCC CTT TtG GAG AAG GGC 2309 Met Val Gin Glu Glu Arg Arg Leu Arg Lys Ala Leu Leu Glu Lys Gly 665 670 675

GTC ACG GAG GCT GAG CGA GAG GCT TTT GAG CTG CTC CCA GAT GAT GAA 2357 Val Thr Glu Ala Glu Arg Glu Ala Phe Glu Leu Leu Pro Asp Asp Glu 680 685 690

CGC CAG TGC ATC AAG TGC AAG ACC ACG TGC TTC TTG TCA GCC CTG GCC 2405 Arg Gin Cys He Lys Cys Lys Thr Thr Cys Phe Leu Ser Ala Leu Ala 695 700 705 710

TGC TAC GAC TGC CCA GAT GGC CTT GTA TGC CTT TCC CAC ATC AAT GAC 2453 Cys Tyr Asp Cys Pro Asp Gly Leu Val Cys Leu Ser His He Asn Asp 715 720 725

CTC TGC AAG TGC TCT AGT AGC CGA CAG TAC CTC CGG TAT CGG TAC ACC 2501 Leu Cys Lys Cys Ser Ser Ser Arg Gin Tyr Leu Arg Tyr Arg Tyr Thr 730 735 740

TTG GAT GAG CTC CCC ACC ATG CTG CAT AAA CTG AAG ATT CGG GCT GAG 2549 Leu Asp Glu Leu Pro Thr Met Leu His Lys Leu Lys He Arg Ala Glu 745 750 755

TCT TTT GAC ACC TGG GCC AAC AAA GTG CGA GTG GCC TTG GAG GTG GAG 2597 Ser Phe Asp Thr Trp Ala Asn Lys Val Arg Val Ala Leu Glu Val Glu 760 765 770

GAT GGC CGT AAA CGC AGC TTT GAA GAG CTA AGG GCA CTG GAG TCT GAG 2645 Asp Gly Arg Lys Arg Ser Phe Glu Glu Leu Arg Ala Leu Glu Ser Glu 775 780 785 790

GCT CGT GAG AGG AGG TTT CCT AAT AGT GAG CTG CTT CAG CGA CTG AAG 2693 Ala Arg Glu Arg Arg Phe Pro Asn Ser Glu Leu Leu Gin Arg Leu Lys 795 800 805

AAC TGC CTG AGT GAG GTG GAG GCT TGT ATT GCT CAA GTC CTG GGG CTG 2741 Asn Cys Leu Ser Glu Val Glu Ala Cyε He Ala Gin Val Leu Gly Leu 810 815 820 GTC AGT GGT CAG GTG GCC AGG ATG GAC ACT CCA CAG CTG ACT TTG ACT 2789

Val Ser Gly Gin Val Ala Arg Met Asp Thr Pro Gin Leu Thr Leu Thr 825 830 835

GAA CTC CGG GTC CTT CTT GAG CAG ATG GGC AGC CTG CCC TGC GCC ATG 2837

Glu Leu Arg Val Leu Leu Glu Gin Met Gly Ser Leu Pro Cys Ala Met 840 845 850

CAT CAG ATT GGG GAT GTC AAG GAT GTC CTG GAA CAG GTG GAG GCC TAT 2885

His Gin He Gly Asp Val Lys Asp Val Leu Glu Gin Val Glu Ala Tyr

855 860 865 870

CAA GCT GAG GCT CGT GAG GCT CTG GCC ACA CTG CCC TCT AGT CCA GGG 2933

Gin Ala Glu Ala Arg Glu Ala Leu Ala Thr Leu Pro Ser Ser Pro Gly 875 880 885

CTA TTG CGG TCC CTG TTG GAG AGG GGG CAG CAG CTG GGT GTA GAG GTG 2981

Leu Leu Arg Ser Leu Leu Glu Arg Gly Gin Gin Leu Gly Val Glu Val 890 895 900

CCT GAA GCC CAT CAG CTT CAG CAG CAG GTG GAG CAG GCG CAA TGG CTA 3029 Pro Glu Ala His Gin Leu Gin Gin Gin Val Glu Gin Ala Gin Trp Leu 905 910 915

GAT GAA GTG AAG CAg gCC CTG GCC CCT TCT GCT CAC AGG GGC TCT CTG 3077

Asp Glu Val Lys Gin Ala Leu Ala Pro Ser Ala His Arg Gly Ser Leu 920 925 930

GTC ATC ATG CAG GGG CTT TTG GTT ATG GGT GCC AAG ATA GCC TCC AGC 3125

Val He Met Gin Gly Leu Leu Val Met Gly Ala Lys He Ala Ser Ser 935 940 945 950

CCT TCT GTG GAC AAG GCC CGG GCT GAG CTG CAA GAA CTA CTG ACC ATT 3173 Pro Ser Val Asp Lys Ala Arg Ala Glu Leu Gin Glu Leu Leu Thr He 955 960 965

GCA GAG CGC TGG GAA GAA AAG GCT CAT TTC TGC CTG GAG GCC AGG CAG 3221 Ala Glu Arg Trp Glu Glu Lys Ala His Phe Cys Leu Glu Ala Arg Gin 970 975 980

AAG CAT CCA CCA GCC ACA TTG GAA GCC ATA ATT CGT GAG ACA GAA AAC 3269 Lys His Pro Pro Ala Thr Leu Glu Ala He He Arg Glu Thr Glu Asn 985 990 995

ATC CCT GTT CAC CTG CCT AAC ATC CAG GCT CTC AAA GAA GCT CTG ACT 3317 He Pro Val His Leu Pro Asn He Gin Ala Leu Lys Glu Ala Leu Thr 1000 1005 1010

AAG GCA CAA GCT TGG ATT GCT GAT GTG GAT GAG ATC CAA AAT GGT GAC 3365 Lys Ala Gin Ala Trp He Ala Asp Val Asp Glu He Gin Asn Gly Asp 1015 1020 1025 1030

CAC TAC CCC TGT CTA GAT GAC TTG GAG GGC CTG GTG GCT GTG GGC CGG 4313 His Tyr Pro Cys Leu Asp Asp Leu Glu Gly Leu Val Ala Val Gly Arg 1035 1040 1045 GAC CTG CCT GTG GGG CTG GAA GAG CTG AGA CAG CTA GAG CTG CAG GTA 3461 Asp Leu Pro Val Gly Leu Glu Glu Leu Arg Gin Leu Glu Leu Gin Val 1050 1055 1060

TTG ACA GCA CAT TCC TGG AGA GAG AAG GCC TCC AAG ACC TTT CTC AAG 3509 Leu Thr Ala His Ser Trp Arg Glu Lys Ala Ser Lys Thr Phe Leu Lys 1065 1070 1075

AAG AAT TCT TGC TAC ACA CTG CTT GAG GTG CTT TGC CCG TGT GCA GAC 3557 Lys Asn Ser Cys Tyr Thr Leu Leu Glu Val Leu Cys Pro Cyβ Ala Asp 1080 1085 1090

GCT GGC TCA GAC AGC ACC AAG CGT AGC CGG TGG ATG GAG AAG GCG CTG 3605 Ala Gly Ser Aβp Ser Thr Lys Arg Ser Arg Trp Met Glu Lye Ala Leu 1095 HOO 1105 1110

GGG TTG TAC CAG TGT GAC ACA GAG CTG CTG GGG CTG TCT GCA CAG GAC 3653 Gly Leu Tyr Gin Cys Asp Thr Glu Leu Leu Gly Leu Ser Ala Gin Asp 1115 1120 1125

CTC AGA GAC CCA GGC TCT GTG ATT GTG GCC TTC AAG GAA GGG GAA CAG 3701 Leu Arg Asp Pro Gly Ser Val He Val Ala Phe Lys Glu Gly Glu Gin 1130 1135 1140

AAG GAG AAG GAG GGT ATC CTG CAG CTG CGT CGC ACC AAC TCA GCC AAG 3749 Lys Glu Lys Glu Gly He Leu Gin Leu Arg Arg Thr Asn Ser Ala Lys 1145 1150 1155

CCC AGT CCA CTG GCA CCA TCC CTC ATG GCC TCT TCT CCA ACT TCT ATC 3797 Pro Ser Pro Leu Ala Pro Ser Leu Met Ala Ser Ser Pro Thr Ser He 1160 1165 1170

TGT GTG TGT GGG CAG GTG CCA GCT GGG GTG GGA CTT CTG CAG TGT GAC 3845 Cys Val Cys Gly Gin Val Pro Ala Gly Val Gly Leu Leu Gin Cys Asp 1175 1180 1185 1190

CTG TGT CAG GAC TGG TTC CAT GGG CAG TGT GTG TCA GTG CCC CAT CTC 3893 Leu Cys Gin Asp Trp Phe His Gly Gin Cys Val Ser Val Pro His Leu 1195 1200 1205

CTC ACC TCT CCA AAG CCC AGT CTC ACT TCA TCT CCA CTG CTA GCC TGG 3941 Leu Thr Ser Pro Lys Pro Ser Leu Thr Ser Ser Pro Leu Leu Ala Trp 1210 1215 1220

TGG GAA TGG GAC ACA AAA TTC CTG TGT CCA CTG TGT ATG CGC TCA CGA 3989 Trp Glu Trp Asp Thr Lys Phe Leu Cys Pro Leu Cys Met Arg Ser Arg 1225 1230 1235

CGG CCA CGC CTA GAG ACA ATC CTA GCC TTG CTG GTT GCC CTG CAG AGG 4037 Arg Pro Arg Leu Glu Thr He Leu Ala Leu Leu Val Ala Leu Gin Arg 1240 1245 1250

CTG CCC GTG CGG CTG CCT GAG GGT GAG GCC CTT CAG TGT CTC ACA GAG 4085 Leu Pro Val Arg Leu Pro Glu Gly Glu Ala Leu Gin Cys Leu Thr Glu 1255 1260 1265 1270 AGG GCC ATT GGC TGG CAA GAC CGT GCC AGA AAG GCT CTG GCC TTT GAA 4133 Arg Ala He Gly Trp Gin Asp Arg Ala Arg Lys Ala Leu Ala Phe Glu

1275 1280 1285

GAT GTG ACT GCT CTG TTG CGA CAG CTG GCT GAG CTT CGC CAA CAG CTA 4181

Aβp Val Thr Ala Leu Leu Arg Gin Leu Ala Glu Leu Arg Gin Gin Leu 1290 1295 1300

CAG GCC AAA CCC AGA CCA GAG GAG GCC TCA GTC TAC ACT TCA GCC ACT 4229

Gin Ala Lys Pro Arg Pro Glu Glu Ala Ser Val Tyr Thr Ser Ala Thr 1305 1310 1315

GCC TGT GAC CCT ATC AGA GAA GGC AGT GGC AAC AAT ATT TCT AAG GTC 4277

Ala Cys Asp Pro He Arg Glu Gly Ser Gly Asn Aβn He Ser Lys Val 1320 1325 1330

CAA GGG CTG CTG GAG AAT GGA GAC AGT GTG ACC AGT CCT GAG AAC ATG 4325

Gin Gly Leu Leu Glu Asn Gly Asp Ser Val Thr Ser Pro Glu Asn Met 1335 1340 1345 1350

GCT CCA GGA AAG GGC TCT GAC CTG GAG CTA CTG TCC TCG CTG TTG CCG 4373

Ala Pro Gly Lys Gly Ser Asp Leu Glu Leu Leu Ser Ser Leu Leu Pro

1355 1360 1365

CAG TTG ACT GGC CCT GTG TTG GAG CTG CCT GAG GCA ATC CGG GCT CCC 4421

Gin Leu Thr Gly Pro Val Leu Glu Leu Pro Glu Ala He Arg Ala Pro

1370 1375 1380

CTG GAG GAG CTC ATG ATG GAA GGG ggC CTG CTT GAG GTG ACC CTG GAT 4469 Leu Glu Glu Leu Met Met Glu Gly Gly Leu Leu Glu Val Thr Leu Asp 1385 1390 1395

GAG AAC CAC AGC ATC TGG CAG CTG CTG CAG GCT GGA CAG CCT CCA GAC 4517 Glu Asn His Ser He Trp Gin Leu Leu Gin Ala Gly Gin Pro Pro Asp 1400 1405 1410

CTG GAC AGA ATT CGC ACA CTT CTG GAG CTG GAA AAA TTT GAA CAT CAA 565 Leu Asp Arg He Arg Thr Leu Leu Glu Leu Glu Lys Phe Glu His Gin 1415 1420 1425 1430

GGG AGT CGG ACA AGG AGC CGG GCT CTG GAG AGG CGA CGG CGG CGG CAG 4613 Gly Ser Arg Thr Arg Ser Arg Ala Leu Glu Arg Arg Arg Arg Arg Gin 1435 1440 1445

AAG GTG GAT CAG GGT AGA AAC GTT GAG AAT CTT GTT CAA CAG GAG CTT 4661 Lys Val Asp Gin Gly Arg Asn Val Glu Asn Leu Val Gin Gin Glu Leu 1450 1455 1460

CAG TCA AAA AGG GCT CGG AGC TCA GGG ATT ATG TCT CAG GTG GGC CGA 4709 Gin Ser Lys Arg Ala Arg Ser Ser Gly He Met Ser Gin Val Gly Arg 1465 1470 1475

GAA GAA GAA CAT TAT CAG GAG AAA GCA GAC CGT GAA AAT ATG TTC CTG 4757 Glu Glu Glu His Tyr Gin Glu Lys Ala Asp Arg Glu Asn Met Phe Leu 1480 1485 1490 ACA CCT TCC ACA GAC CAC AGC CCT TTC TTG AAA GGA AAC CAA AAT AGC 4805 Thr Pro Ser Thr Asp His Ser Pro Phe Leu Lys Gly Asn Gin Asn Ser 1495 1500 1505 1510

TTA CAA CAC AAG GAT TCA GGC TCT TCA GCT GCT TGT CCT TCT TTA ATG 4853 Leu Gin His Lye Asp Ser Gly Ser Ser Ala Ala Cyβ Pro Ser Leu Met 1515 1520 1525

CCT TTG CTA CAA CTC TCC TAC TCT GAT GAG CAA CAG TTG TGACAGTGGC 4902 Pro Leu Leu G n Leu Ser Tyr Ser Aβp Glu Gin Gin Leu 1530 1535

ACCAAAGGTC ATTTGTGGTT GTTTTTGTTT GTTTGTTTCT TAAATCCTAC TATCTCCTGG 4962 CCTGGACCTC AGAAGGAGCT TTTTGCCTAT CTATAATTTT TCACTGCCAA TTTTTGATAT 5022 CCTCTCTCCT AGAGTTACTG TTAAAAGGTT GGTTCGTAAA GTCCACACCC CGATGCTCAG 5082 AAGTGTCTTG CCAGCAACAT TCCTGCTAGC ATACAGGAGT GATTTCCTAA ACCAGTTTCA 5142 TTCTAGTCTG AATAGGGACA AACAAATCTT GAGGAAGCCC AAGTGCGTAC CTTTATTTTT 5202 GCCCCCACCA CCCTCTTTCT GTACTTCAAT TTTTGTTTGT TTTTTGTTTT TTTGTCCCTG 5262 TCATAAAATA TTTTGGTGCT TCAAAACTTG TACCTTCATT GTACATCCTT TTCtTTTCTC 5322 CCCTTGGGTC TTATTATAAA AGAAGACAAT GTACGTTGTA ATTACCAAAA AGAATAGGGA 5382 AAAACAAGAA TTTCATGACT CTACCTGTGG TCTATCTTTA ATTTCATTTC TTTTGTTAAA 5442 AATAAAACAA TGAGTATGTT TGGgaaaaaa aaaa 5476

(2) INFORMATION FOR SEQ ID NO:2

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1539 amino acid residues

(B) TYPE: Amino Acid

(C) STRANDEDNESS: Single

(D) TOPOLOGY: Linear

(ii) MOLECULE TYPE: Protein

(iii) HYPOTHETICAL: yes

(ix) FEATURE:

(A) NAME/KEY: Human SMCY Protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:

Met Glu Pro Gly Cys Asp

1 5

Glu Phe Leu Pro Pro Pro Glu Cys Pro Val Phe Glu Pro Ser Trp Ala

10 15 20

Glu Phe Gin Asp Pro Leu Gly Tyr He Ala Lys He Arg Pro He Ala 25 30 35

Glu Lys Ser Gly He Cys Lys He Arg Pro Pro Ala Asp Trp Gin Pro 40 45 50 Pro Phe Ala Val Glu Val Asp Asn Phe Arg Phe Thr Pro Arg Val Gin 55 60 65 70

Arg Leu Asn Glu Leu Glu Ala Gin Thr Arg Val Lys Leu Asn Tyr Leu 75 80 85

Asp Gin He Ala Lye Phe Trp Glu He Gin Gly Ser Ser Leu Lys He 90 95 100

Pro Asn Val Glu Arg Lys He Leu Aβp Leu Tyr Ser Leu Ser Lys He 105 110 115

Val He Glu Glu Gly Gly Tyr Glu Ala He Cys Lys Asp Arg Arg Trp 120 125 130

Ala Arg Val Ala Gin Arg Leu Hiβ Tyr Pro Pro Gly Lye Asn He Gly 135 140 145 150

Ser Leu Leu Arg Ser His Tyr Glu Arg He He Tyr Pro Tyr Glu Met 155 160 165

Phe Gin Ser Gly Ala Asn His Val Gin Cys Asn Thr His Pro Phe Asp 170 175 180

Asn Glu Val Lys Asp Lys Glu Tyr Lys Pro His Ser He Pro Leu Arg 185 190 195

Gin Ser Val Gin Pro Ser Lys Phe Ser Ser Tyr Ser Arg Arg Ala Lys 200 205 210

Arg Leu Gin Pro Asp Pro Glu Pro Thr Glu Glu Asp He Glu Lys His 215 220 225 230

Pro Glu Leu Lys Lys Leu Gin He Tyr Gly Pro Gly Pro Lys Met Met 235 240 245

Gly Leu Gly Leu Met Ala Lys Asp Lys Asp Lys Thr Val His Lys Lys 250 255 260

Val Thr Cys Pro Pro Thr Val Thr Val Lys Asp Glu Gin Ser Gly Gly 265 270 275

Gly Asn Val Ser Ser Thr Leu Leu Lys Gin His Leu Ser Leu Glu Pro 280 285 290

Cys Thr Lys Thr Thr Met Gin Leu Arg Lys Asn His Ser Ser Ala Gin 295 300 305 310

Phe He Asp Ser Tyr He Cys Gin Val Cys Ser Arg Gly Asp Glu Asp 315 320 325

Asn Lys Leu Leu Phe Cys Asp Gly Cys Asp Asp Asn Tyr His He Phe 330 335 340

Cys Leu Leu Pro Pro Leu Pro Glu He Pro Arg Gly He Trp Arg Cys 345 350 355

Pro Lys Cys He Leu Ala Glu Cys Lys Gin Pro Pro Glu Ala Phe Gly 360 365 370

Phe Glu Gin Ala Thr Gin Glu Tyr Ser Leu Gin Ser Phe Gly Glu Met 375 380 385 390

Ala Asp Ser Phe Lys Ser Asp Tyr Phe Asn Met Pro Val His Met Val 395 400 405

Pro Thr Glu Leu Val Glu Lys Glu Phe Trp Arg Leu Val Ser Ser He 410 415 420 Glu Glu Aβp Val Thr Val Glu Tyr Gly Ala Asp He His Ser Lys Glu 425 430 435

Phe Gly Ser Gly Phe Pro Val Ser Aβn Ser Lye Gin Asn Leu Ser Pro 440 445 450

Glu Glu Lye Glu Tyr Ala Thr Ser Gly Trp Aβn Leu Asn Val Met Pro 455 460 465 470

Val Leu Asp Gin Ser Val Leu Cys His He Asn Ala Asp He Ser Gly 475 480 485

Met Lys Val Pro Trp Leu Tyr Val Gly Met Val Phe Ser Ala Phe Cys 490 495 500

Trp Hiβ He Glu Aβp Hiβ Trp Ser Tyr Ser He Aβn Tyr Leu Hiβ Trp 505 510 515

Gly Glu Pro Lys Thr Trp Tyr Gly Val Pro Ser Leu Ala Ala Glu Hiβ 520 525 530

Leu Glu Glu Val Met Lys Met Leu Thr Pro Glu Leu Phe Asp Ser Gin 535 540 545 550

Pro Aεp Leu Leu His Gin Leu Val Thr Leu Met Asn Pro Asn Thr Leu 555 560 565

Met Ser His Gly Val Pro Val Val Arg Thr Asn Gin Cys Ala Gly Glu 570 575 580

Phe Val He Thr Phe Pro Arg Ala Tyr His Ser Gly Phe Asn Gin Gly 585 590 595

Tyr Asn Phe Ala Glu Ala Val Asn Phe Cys Thr Ala Asp Trp Leu Pro 600 605 610

Ala Gly Arg Gin Cys He Glu Hiε Tyr Arg Arg Leu Arg Arg Tyr Cys 615 620 625 630

Val Phe Ser His Glu Glu Leu He Cys Lyε Met Ala Ala Phe Pro Glu 635 640 645

Thr Leu Asp Leu Asn Leu Ala Val Ala Val His Lys Glu Met Phe He 650 655 660

Met Val Gin Glu Glu Arg Arg Leu Arg Lys Ala Leu Leu Glu Lys Gly 665 670 675

Val Thr Glu Ala Glu Arg Glu Ala Phe Glu Leu Leu Pro Asp Asp Glu 680 685 690

Arg Gin Cys He Lys Cys Lys Thr Thr Cys Phe Leu Ser Ala Leu Ala 695 700 705 710

Cys Tyr Asp Cys Pro Asp Gly Leu Val Cys Leu Ser His He Asn Asp 715 720 725

Leu Cys Lys Cyβ Ser Ser Ser Arg Gin Tyr Leu Arg Tyr Arg Tyr Thr 730 735 740

Leu Asp Glu Leu Pro Thr Met Leu His Lyε Leu Lys He Arg Ala Glu 745 750 755

Ser Phe Asp Thr Trp Ala Asn Lys Val Arg Val Ala Leu Glu Val Glu 760 765 770

Asp Gly Arg Lys Arg Ser Phe Glu Glu Leu Arg Ala Leu Glu Ser Glu 775 780 785 790 Ala Arg Glu Arg Arg Phe Pro Asn Ser Glu Leu Leu Gin Arg Leu Lys 795 800 805

Asn Cys Leu Ser Glu Val Glu Ala Cys He Ala Gin Val Leu Gly Leu 810 815 820

Val Ser Gly Gin Val Ala Arg Met Asp Thr Pro Gin Leu Thr Leu Thr 825 830 835

Glu Leu Arg Val Leu Leu Glu Gin Met Gly Ser Leu Pro Cys Ala Met 840 845 850

Hiβ Gin He Gly Asp Val Lys Asp Val Leu Glu Gin Val Glu Ala Tyr 855 860 865 870

Gin Ala Glu Ala Arg Glu Ala Leu Ala Thr Leu Pro Ser Ser Pro Gly 875 880 885

Leu Leu Arg Ser Leu Leu Glu Arg Gly Gin Gin Leu Gly Val Glu Val 890 895 900

Pro Glu Ala His Gin Leu Gin Gin Gin Val Glu Gin Ala Gin Trp Leu 905 910 915

Aβp Glu Val Lys Gin Ala Leu Ala Pro Ser Ala His Arg Gly Ser Leu 920 925 930

Val He Met Gin Gly Leu Leu Val Met Gly Ala Lys He Ala Ser Ser 935 940 945 950

Pro Ser Val Aεp Lys Ala Arg Ala Glu Leu Gin Glu Leu Leu Thr He 955 960 965

Ala Glu Arg Trp Glu Glu Lys Ala His Phe Cys Leu Glu Ala Arg Gin 970 975 980

Lys His Pro Pro Ala Thr Leu Glu Ala He He Arg Glu Thr Glu Asn 985 990 995

He Pro Val Hiβ Leu Pro Asn He Gin Ala Leu Lys Glu Ala Leu Thr 1000 1005 1010

Lys Ala Gin Ala Trp He Ala Asp Val Asp Glu He Gin Asn Gly Asp 1015 1020 1025 1030

His Tyr Pro Cys Leu Asp Asp Leu Glu Gly Leu Val Ala Val Gly Arg 1035 1040 1045

Aβp Leu Pro Val Gly Leu Glu Glu Leu Arg Gin Leu Glu Leu Gin Val 1050 1055 1060

Leu Thr Ala His Ser Trp Arg Glu Lys Ala Ser Lys Thr Phe Leu Lys 1065 1070 1075

Lys Asn Ser Cys Tyr Thr Leu Leu Glu Val Leu Cys Pro Cys Ala Asp 1080 1085 1090

Ala Gly Ser Asp Ser Thr Lyε Arg Ser Arg Trp Met Glu Lys Ala Leu 1095 1100 1105 1110

Gly Leu Tyr Gin Cys Asp Thr Glu Leu Leu Gly Leu Ser Ala Gin Asp 1115 1120 1125

Leu Arg Asp Pro Gly Ser Val He Val Ala Phe Lys Glu Gly Glu Gin 1130 1135 1140

Lys Glu Lys Glu Gly He Leu Gin Leu Arg Arg Thr Asn Ser Ala Lyε 1145 1150 1155 Pro Ser Pro Leu Ala Pro Ser Leu Met Ala Ser Ser Pro Thr Ser He 1160 1165 1170

Cyβ Val Cyβ Gly Gin Val Pro Ala Gly Val Gly Leu Leu Gin Cys Aβp 1175 1180 1185 1190

Leu Cys Gin Aβp Trp Phe Hiβ Gly Gin Cys Val Ser Val Pro His Leu 1195 1200 1205

Leu Thr Ser Pro Lys Pro Ser Leu Thr Ser Ser Pro Leu Leu Ala Trp 1210 1215 1220

Trp Glu Trp Asp Thr Lys Phe Leu Cys Pro Leu Cys Met Arg Ser Arg 1225 1230 1235

Arg Pro Arg Leu Glu Thr He Leu Ala Leu Leu Val Ala Leu Gin Arg 1240 1245 1250

Leu Pro Val Arg Leu Pro Glu Gly Glu Ala Leu Gin Cys Leu Thr Glu 1255 1260 1265 1270

Arg Ala He Gly Trp Gin Asp Arg Ala Arg Lys Ala Leu Ala Phe Glu 1275 1280 1285

Asp Val Thr Ala Leu Leu Arg Gin Leu Ala Glu Leu Arg Gin Gin Leu 1290 1295 1300

Gin Ala Lys Pro Arg Pro Glu Glu Ala Ser Val Tyr Thr Ser Ala Thr 1305 1310 1315

Ala Cys Asp Pro He Arg Glu Gly Ser Gly Asn Aεn He Ser Lyε Val 1320 1325 1330

Gin Gly Leu Leu Glu Aεn Gly Aεp Ser Val Thr Ser Pro Glu Asn Met 1335 1340 1345 1350

Ala Pro Gly Lys Gly Ser Asp Leu Glu Leu Leu Ser Ser Leu Leu Pro 1355 1360 1365

Gin Leu Thr Gly Pro Val Leu Glu Leu Pro Glu Ala He Arg Ala Pro 1370 1375 1380

Leu Glu Glu Leu Met Met Glu Gly Gly Leu Leu Glu Val Thr Leu Aεp 1385 1390 1395

Glu Aεn Hiε Ser He Trp Gin Leu Leu Gin Ala Gly Gin Pro Pro Asp 1400 1405 1410

Leu Asp Arg He Arg Thr Leu Leu Glu Leu Glu Lys Phe Glu His Gin 1415 1420 1425 1430

Gly Ser Arg Thr Arg Ser Arg Ala Leu Glu Arg Arg Arg Arg Arg Gin 1435 1440 1445

Lyε Val Aεp Gin Gly Arg Asn Val Glu Asn Leu Val Gin Gin Glu Leu 1450 1455 1460

Gin Ser Lys Arg Ala Arg Ser Ser Gly He Met Ser Gin Val Gly Arg 1465 1470 1475

Glu Glu Glu His Tyr Gin Glu Lys Ala Aεp Arg Glu Asn Met Phe Leu 1480 1485 1490

Thr Pro Ser Thr Asp His Ser Pro Phe Leu Lys Gly Aεn Gin Aεn Ser 1495 1500 1505 1510

Leu Gin His Lyε Asp Ser Gly Ser Ser Ala Ala Cys Pro Ser Leu Met 1515 1520 1525 Pro Leu Leu Gin Leu Ser Tyr Ser Asp Glu Gin Gin Leu 1530 1535

(2) INFORMATION FOR SEQ ID NO:3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 20 base pairs

(B) TYPE: Nucleic Acid

(C) STRANDEDNESS: Single

(D) TOPOLOGY: Linear

(ii) MOLECULE TYPE: DNA primer to cDNA

(A) DESCRIPTION: Primer homologous to bases 3285-3304 of the human SMCY cDNA sequence identified by SEQ ID NO:l, above.

(iii) HYPOTHETICAL: no

(iv) ANTI-SENSE: no

(ix) FEATURE:

(A) NAME/KEY: Primer S102

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:

CCTAACATCC AGGCTCTCAA 20

(2) INFORMATION FOR SEQ ID NO:4

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 22 base pairs

(B) TYPE: Nucleic Acid

(C) STRANDEDNESS: Single

(D) TOPOLOGY: Linear

(ii) MOLECULE TYPE: DNA primer to cDNA

(A) DESCRIPTION: Primer homologous to bases 3305-3326 of the human SMCY cDNA sequence identified by SEQ ID NO: 1, above.

(iii) HYPOTHETICAL: no

(iv) ANTI-SENSE: no

(ix) FEATURE:

(A) NAME/KEY: Primer S103

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:

AGAAGCTCTG ACTAAGGCAC AA 22

Claims

CLAIMS We claim:

1 . A human SMCY cDNA, comprising a translation region encoding a human SMCY protein.

2. The human SMCY cDN A of claim 1 , wherein the translation region consists of nucleotides 276 to 4893 of SEQ ID NO:1 .

3. The human SMCY cDNA of claim 1 , further comprising an untranslated sequence upstream of the translation region encoding the human SMCY protein.

4. The human SMCY cDNA of claim 3, wherein the untranslated sequence upstream of the translation region consists of nucleotides 1 to 275 of SEQ ID NO: 1 .

5. The human SMCY cDNA of claim 1 , further comprising an untranslated sequence downstream of the translation region encoding the human SMCY protein.

6. The human SMCY cDNA of claim 5, wherein the untranslated sequence downstream of the translation region consists of nucleotides 4893 to 5466 of SEQ ID NO: 1 .

7. An oligonucleotide primer to human SMCY cDNA, wherein the primer consists of a sequence of nucleotides homologous or complementary to SEQ ID NO: 1 .

8. The oligonucleotide primer of claim 7, wherein the sequence of nucleotides is selected from the group consisting of SEQ ID NO:3 and SEQ ID NO:4.

9. A recombinant human SMCY protein having a molecular weight of approximately 1 .5 kD, wherein the protein is produced by a host organism containing recombinant human SMCY cDNA.

10. The recombinant human SMCY protein of claim 9, wherein the protein consists of an amino acid sequence as shown in SEQ ID

NO:2.

1 1 . An oligonucleotide probe to human SMCY cDNA, wherein the probe consists of a sequence of nucleotides homologous or complementary to SEQ ID NO: 1 .

12. A method of producing an oligonucleotide probe to human

SMCY genomic DNA, comprising: a. providing at least one primer consisting of a sequence of nucleotides homologous or complementary to SEQ ID NO: 1 ; b. providing isolated human genomic DNA; c. using the primer to amplify the isolated genomic DNA, thereby producing an amplified oligonucleotide probe.

13. An oligonucleotide probe to human SMCY genomic DNA, produced by: a. providing at least one primer consisting of a sequence of nucleotides homologous or complementary to SEQ ID NO: 1 ; b. providing isolated human genomic DNA; c. using the primer to amplify the isolated genomic DNA.

14. A method of detecting SMCY homologs comprising: a. constructing a primer pair, wherein at least one primer of said primer pair consists of a sequence of nucleotides homologous or complementary to SEQ ID NO: 1 ; b. amplifying a nucleic acid sample with said primer pair, resulting in an oligonucleotide of an amplified fragment of SMCY DNA; and c. detecting the oligonucleotide.

15. The method of claim 14, wherein the nucleic acid sample is DNA.

16. The method of claim 15, wherein the nucleic acid sample of DNA is obtained from a Λgt10 cDNA library.

17. The method of claim 14, wherein the nucleic acid sample is RNA.

18. The method of claim 17, wherein the nucleic acid sample of RNA is poly(A) ⁺ RNA isolated from embryos.