WO2000058489A9

WO2000058489A9 - Production of syringyl lignin in gymnosperms

Info

Publication number: WO2000058489A9
Application number: PCT/US2000/008083
Authority: WO
Inventors: Vincent L Chiang; Daniel T Carraway
Original assignee: Int Paper Co; Vincent L Chiang; Daniel T Carraway
Priority date: 1999-03-26
Filing date: 2000-03-24
Publication date: 2001-11-29
Also published as: WO2000058489A2; AU4034100A; WO2000058489A3

Abstract

The present invention relates to a method for producing syringyl lignin in gymnosperms. The production of syringyl lignin in gymnosperms is accomplished by genetically transforming a gymnosperm genome, which does not normally contain genes which code for enzymes necessary for production of syringyl lignin, with DNA which codes for enzymes found in angiosperms associated with production of syringyl lignin. The expression of the inserted DNA is mediated using host promoter regions in the gymnosperm. In addition, genetic sequences which code for gymnosperm lignin anti-sense mRNA may be incorporated into the gymnosperm genome in order to suppress the formation of the less preferred forms of lignin in the gymnosperm such as guaiacyl lignin.

Description

PRODUCTION OF SYRINGYL LIGNIN IN GYMNOSPERMS

This application is a continuation-in-part application of U.S. Application Ser. No. 08/991,677 filed December 16, 1997, which claims the benefit ofthe filing date, under 35 U.S.C. § 119, ofthe U.S. Provisional Application Ser. No. 60/033,381, filed December 16, 1996, both applications being incorporated by reference herein.

Field of the Invention

The invention relates to the molecular modification of gymnosperms in order to cause the production of syringyl units during lignin biosynthesis and to production and propagation of gymnosperms containing syringyl lignin. More specifically, the invention relates to the use of coniferyl aldehyde 5-hydroxylase (and polynucleotides encoding it) for the modification of lignin biosynthesis in gymnosperms.

Background of the Invention Lignin is a major part ofthe supportive structure of most woody plants including angiosperm and gymnosperm trees which in turn are the principal sources of fiber for making paper and cellulosic products. In order to liberate fibers from wood structure in a manner suitable for making many grades of paper, it is necessary to remove much ofthe lignin from the fiber/lignin network. Lignin is removed from wood chips by treatment of the chips in an alkaline solution at elevated temperatures and pressure in an initial step of papermaking processes. The rate of removal of lignin from wood of different tree species varies depending upon lignin structure. Three different lignin structures have been identified in trees: p-hydroxyphenyl, guaiacyl and syringyl, which are illustrated in Fig. 1. Angiosperm species, such as Liquidambar styraciβua L. [sweetgum], have lignin composed of a mixture of guaiacyl and syringyl monomer units. In contrast, gymnosperm species such as Pinus taeda L. [loblolly pine] have lignin which is devoid of syringyl monomer units. Generally speaking, the rate of delignification in a pulping process is directly proportional to the amount of syringyl lignin present in the wood. The higher delignification rates associated with species having a greater proportion of syringyl lignin result in more efficient pulp mill operations since the mills make better use of energy and capital investment and the environmental impact is lessened due to a decrease in chemicals used for delignification.

It is therefore an object ofthe invention to provide gymnosperm species which are easier to delignify in pulping processes.

Another object ofthe invention is to provide gymnosperm species such as loblolly pine which contain syringyl lignin.

An additional object ofthe invention is to provide a method for modifying genes involved in lignin biosynthesis in gymnosperm species so that production of syringyl lignin is increased while production of guaiacyl lignin is suppressed.

Still another object ofthe invention is to produce whole gymnosperm plants containing genes which increase production of syringyl lignin and repress production of guaiacyl lignin.

Yet another object ofthe invention is to identify, isolate and/or clone those genes in angiosperms responsible for production of syringyl lignin.

A further object ofthe invention is to provide, in gymnosperms, genes which produce syringyl lignin.

Another object ofthe invention is to provide a method for making an expression cassette insertable into a gymnosperm cell for the purpose of inducing formation of syringyl lignin in a gymnosperm plant derived from the cell.

A further object ofthe invention is to provide for identification and isolation of a polynucleotide encoding a coniferyl aldehyde 5-hydroxylase from an angiosperm species, and for the use of such polynucleotide to alter the lignin biosynthesis in a gymnosperm.

Summary of the Invention

With regard to the above and other objects, the invention provides a method for inducing production of syringyl lignin in gymnosperms and to gymnosperms which contain syringyl lignin for improved delignification in the production of pulp for papermaking and other applications. In accordance with one of its aspects, the invention involves cloning an angiosperm DNA sequence which codes for enzymes involved in production of syringyl lignin monomer units, fusing the angiosperm DNA sequence to a lignin promoter region to form an expression cassette, and inserting the expression cassette into a gymnosperm genome.

Enzymes required for production of syringyl lignin in an angiosperm are obtained by deducing an amino acid sequence ofthe enzyme, extrapolating an mRNA sequence from the amino acid sequence, constructing a probe for the corresponding DNA sequence and cloning the DNA sequence which codes for the desired enzyme. A promoter region specific to a gymnosperm lignin biosynthesis gene is identified by constructing a probe for a gymnosperm lignin biosynthesis gene, sequencing the 5' flanking region ofthe DNA which encodes the gymnosperm lignin biosynthesis gene to locate a promoter sequence, and then cloning that sequence. An expression cassette is constructed by fusing the angiosperm syringyl lignin

DNA sequence to the gymnosperm promoter DNA sequence. Alternatively, the angiosperm syringyl lignin DNA is fused to a constitutive promoter to form an expression cassette. The expression cassette is inserted into the gymnosperm genome to transform the gymnosperm genome. Cells containing the transformed genome are selected and used to produce a transformed gymnosperm plant containing syringyl lignin.

In accordance with the invention, the angiosperm genomic sequences bi-OMT, 4CL, P450-1 and P450-2 have been determined and isolated as associated with production of syringyl lignin in sweetgum and lignin promoter regions for the gymnosperm loblolly pine have been determined to be the 5' flanking regions for the 4CL1B, 4CL3B and PAL gymnosperm lignin genes. Expression cassettes containing sequences of selected genes from sweetgum have been inserted into loblolly pine embryogenic cells and presence of sweetgum genomic sequences associated with production of syringyl lignin has been confirmed in daughter cells ofthe resulting loblolly pine embryogenic cells.

The invention therefore enables production of gymnosperms such as loblolly pine containing polynucleotides which code the enzymes involved in the production of syringyl lignin, to thereby produce in such species syringyl lignin in the wood structure for enhanced pulpability.

Brief Description of the Drawings The above and other aspects ofthe invention will now be further described in the following detailed specification considered in conjunction with the following drawings in which: Fig. 1 illustrates a generalized pathway for lignin synthesis;

Fig. 2 A-2E (collectively referred to herein as Fig. 2) illustrate a bifunctional-O-methyl transferase (bi-OMT) nucleotide and amino acid sequences involved in the production of syringyl lignin in an angiosperm (SEQ ID NOS: 5 and 6); Fig. 3A-3G (collectively referred to herein as Fig. 3) illustrate a 4-coumarate CoA ligase (4CL) nucleotide and amino acid sequences involved in the production of syringyl lignin in an angiosperm (SEQ ID NOS: 7 and 8);

Fig. 4A-4G (collectively referred to herein as Fig. 4) illustrate a P450-1 nucleotide and amino acid sequences involved in the production of syringyl lignin in an angiosperm (SEQ ID NOS: l and 2);

Fig. 5A-5G (collectively referred to herein as Fig. 5) illustrate a P450-2 nucleotide and amino acid sequences involved in the production of syringyl lignin in an angiosperm (SEQ ID NOS: 3 and 4);

Fig. 6 illustrates a nucleotide sequence ofthe 5' flanking region ofthe loblolly pine 4CL1B gene showing the location of regulatory elements for lignin biosynthesis (SEQ ID NO: 10);

Fig. 7A-7B (collectively referred to herein as Fig. 7) illustrate a nucleotide sequence ofthe 5' flanking region ofthe loblolly pine 4CL3B gene showing the location of regulatory elements for lignin biosynthesis (SEQ ID NO: 11); Fig. 8A-8B (collectively referred to herein as Fig. 8) illustrates a nucleotide sequence ofthe 5' flanking region of loblolly pine PAL gene showing the location of regulatory elements for lignin biosynthesis (SEQ ID NO: 9); and

Fig. 9 illustrates a PCR confirmation ofthe sweetgum P450-1 gene sequence in transgenic loblolly pine cells.

Detailed Description of the Invention

All patents, patent applications and references cited in this specification are hereby incorporated herein by reference in their entirety. In case of any inconsistency, the present disclosure governs.

Definitions

The term "promoter" refers to a DNA sequence in the 5' flanking region of a given gene which is involved in recognition and binding of RNA polymerase and other transcriptional proteins and is required to initiate DNA transcription in cells.

The term "constitutive promoter" refers to a promoter which activates transcription of a desired coding sequence, and is commonly used in creation of an expression cassette designed for preliminary experiments relative to testing of gene function. An example of a constitutive promoter is 35S CaMV, available from Clonetech, Palo Alto, CA.

The term "polynucleotide" is intended to include double or single stranded genomic and cDNA, RNA, any synthetic and genetically manipulated polynucleotide, and both sense and anti-sense strands together or individually (although only sense or anti- sense strand may be represented herein). This includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids, as well as "protein nucleic acids" (PNA) formed by conjugating bases to an amino acid backbone. This also includes nucleic acids containing modified bases, for example thio-uracil, thio-guanine and fluoro-uracil. An "isolated" nucleic acid or polynucleotide as used herein refers to a component that is removed from its original environment (for example, its natural environment if it is naturally occurring). An isolated nucleic acid or polypeptide may contains less than about 50%, preferably less than about 75%, and most preferably less than about 90%, ofthe cellular components with which it was originally associated. A polynucleotide amplified using PCR so that it is sufficiently and easily distinguishable (on a gel, for example) from the rest ofthe cellular components is considered "isolated". The polynucleotides and polypeptides ofthe invention may be "substantially pure," i.e., having the highest degree of purity that can be achieved using purification techniques known in the art.

The term "hybridization" refers to a process in which a strand of nucleic acid joins with a complementary strand through base pairing.

Polynucleotides are "hybridizable" to each other when at least one strand of one polynucleotide can anneal to a strand of another polynucleotide under defined stringency conditions. Hybridization requires that the two polynucleotides contain substantially complementary sequences; depending on the stringency of hybridization, however, mismatches may be tolerated. Typically, hybridization of two sequences at high stringency (such as, for example, in an aqueous solution of 0.5X SSC at 65 °C) requires that the sequences exhibit some high degree of complementarity over their entire sequence. Conditions of intermediate stringency (such as, for example, an aqueous solution of 2X SSC at 65 °C) and low stringency (such as, for example, an aqueous solution of 2X SSC at 55 °C), require correspondingly less overall complementarity between the hybridizing sequences. (IX SSC is 0.15 M NaCl, 0.015 M Na citrate.) As used herein, the above solutions and temperatures refer to the probe-washing stage ofthe hybridization procedure. The term "a polynucleotide that hybridizes under stringent (low, intermediate) conditions" is intended to encompass both single and double-stranded polynucleotides although only one strand will hybridize to the complementary strand of another polynucleotide. The term "% identity" refers to the percentage ofthe nucleotides/amino acids of one polynucleotide/polypeptide that are identical to the nucleotides/amino acids of another sequence of polynucleotide/polypeptide as identified by program GAP from Genetics Computer Group Wisconsin package (version 9.0) (Madison, Wl). GAP uses the algorithm of Needleman and Wunsch (J Mol Biol 48: 443-453, 1970) to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. When parameters required to run the above algorithm are not specified, the default values offered by the program are contemplated. The following parameters are used by the GCG program GAP as default values (for polynucleotides): gap creation penalty:50; gap extension penalty:3; scoring matrix: nwsgapdnaxpm (local data file). The "% similarity" between two polypeptide sequences is a function ofthe number of similar positions shared by two sequences on the basis ofthe scoring matrix used divided by the number of positions compared and then multiplied by 100 as identified by program GAP from Genetics Computer Group Wisconsin package (version 9.0) (Madison, Wl). GAP uses the algorithm of Needleman and Wunsch (J Mol Biol 48: 443-453, 1970) to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. When parameters required to run the above algorithm are not specified, the default values offered by the program are contemplated. The following parameters are used by the GCG program GAP as default values (for polypeptides): gap creation penalty: 12; gap extension penalty:4; scoring matrix:Blosum62.cpm (local data file).

The term "expression cassette" refers to a double stranded DNA sequence which contains both a promoter and a protein coding sequence such that expression of a given protein is achieved upon insertion ofthe expression cassette into a cell.

The term "plant" includes whole plants and portions of plants, including plant organs (e.g. roots, stems, leaves, etc.).

The term "angiosperm" refers to plants which produce seeds encased in an ovary. A specific example of an angiosperm is Liquidambar styraciflua (L.) [sweetgum]. The angiosperm sweetgum produces syringyl lignin.

The term "gymnosperm" refers to plants which produce naked seeds, that is, seeds which are not encased in an ovary. A specific example of a gymnosperm is Pinus taeda (L.)[loblolly pine]. The gymnosperm loblolly pine does not produce syringyl lignin. The abbreviation "CAld5H" refers to coniferyl aldehyde 5-hydroxylase. The phrase "a polynucleotide having the property of encoding a CAld5H" refers to a polynucleotide that encodes a protein that mediates the conversion of coniferyl aldehyde into 5 -hydroxy coniferyl aldehyde. Description ofthe Invention In accordance with the invention, a method is provided for modifying a gymnosperm genome, such as the genome of a loblolly pine, so that syringyl lignin will be produced in the resulting plant, thereby enabling cellulosic fibers ofthe same to be more easily separated from lignin in a pulping process. In general, this is accomplished by fusing one or more angiosperm DNA sequences (referred to at times herein as the "ASL DNA sequences"), which are involved in production of syringyl lignin, to a gymnosperm lignin promoter region (referred to at times herein as the "GL promoter region") specific to genes involved in gymnosperm lignin biosynthesis to form a gymnosperm syringyl lignin expression cassette (referred to at times herein as the "GSL expression cassette"). Alternatively, the one or more ASL DNA sequences are fused to one or more constitutive promoters to form a GSL expression cassette.

The GSL expression cassette preferably also includes at least one selectable marker gene which enables transformed cells to be differentiated from untransformed cells. The GSL expression cassette containing selectable marker genes is inserted into the gymnosperm genome and transformed cells are identified and selected, from which whole gymnosperm plants may be produced which exhibit production of syringyl lignin.

In one embodiment ofthe invention, the ASL DNA sequence ofthe invention is a polynucleotide encoding a coniferyl aldehyde 5-hydroxylase (hereinafter "CAld5H," also referred to herein as "P450-2") or a functional fragment thereof, wherein the polynucleotide originates from an angiosperm, for example, a sweetgum, Eucalyptus, Acacia, and Populus spp. The polynucleotide encoding the sweetgum coniferyl aldehyde 5-hydroxylase may have the sequence of nucleotides 74-1606 of SEQ ID NO:3. However, any polynucleotide encoding the coniferyl aldehyde 5-hydroxylase having the amino acid sequence represented in Figure 5 (SEQ ID NO:4) is within the scope ofthe invention. These polynucleotides are referred to herein as "sequence conservative variants" of SEQ ID NO:3.

Further within the scope ofthe invention are polynucleotides that hybridize under the conditions of low, medium or high stringency to the polynucleotide comprising the nucleotides 74-1606 of SEQ ID NO:3. In certain embodiments ofthe invention, these hybridizable polynucleotides are at least about 500 bp long, and more preferably at least about 1000 bp long. In one embodiment ofthe invention, the hybridizable polynucleotide is about the same length as the polynucleotide to which it hybridizes. The hybridizable polynucleotides may have the property of encoding a CAld5H or a functional portion thereof.

Also within the scope ofthe invention are polynucleotides that have at least about 75%, preferably at least about 85%, and most preferably at least about 95% identity to the polynucleotide having the sequence of nucleotides 74-1606 of SEQ ID NO:3. These polynucleotides may have the property of encoding a C Ald5H.

Polypeptides that have at least about 85%, preferably at least about 90%, and most preferably at least about 95% similarity to the polypeptide of SEQ ID NO:4 are also within the scope ofthe invention. These polypeptides may have the property of a CAld5H. Polynucleotides encoding these polypeptides are also within the scope ofthe invention. In a further embodiment, polypeptides comprising the amino acid sequence ofthe

N-terminal region ofthe protein having the sequence of SEQ ID NO:4. are also within the scope ofthe invention. These polypeptides have the function of CAld5H. Polynucleotides encoding these polypeptides are also within the scope ofthe invention.

The invention further provides for a method of isolating a polynucleotide encoding a coniferyl aldehyde 5-hydroxylase or a fragment thereof from any angiosperm species. The method comprises using a fragment of SEQ ID NO:3 to identify and isolate a polynucleotide encoding a coniferyl aldehyde 5-hydroxylase or a fragment thereof from an angiosperm species. Screening methods to be used in combination with the fragments of SEQ ID NO:3 are well known to persons of skill in the art, and are described, for example in Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989), Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, or PCR Protocols, A Guide to Methods and Applications, Innis, M.A., et al., Eds., Academic Press, Inc., San Diego, CA (1990). In one embodiment, the method of isolating a polynucleotide encoding a coniferyl aldehyde 5-hydroxylase or a fragment thereof from any angiosperm species comprises screening a genomic or a cDNA library of that species with a probe, which is a fragment of the coding region represented by nucleotides 74-1606 of SEQ ID NO:3. The probe may be from about 400 to about 500 nucleotides long, preferably from about 900 to about 1000 nucleotides long, and most preferably from about 1400 to about 1500 nucleotides long. Any portion ofthe coding region represented by nucleotides 74-1606 of SEQ ID NO:3 may be used. The probes may be labeled as is well known in the art and used to screen a library of interest. Any screening method known in the art, including hybridization of filter lifts with a radioactively-labeled probe, is within the scope ofthe invention.

In another embodiment, the method of isolating a polynucleotide encoding a coniferyl aldehyde 5-hydroxylase or a fragment thereof from any angiosperm species comprises amplifying, in a PCR reaction, a fragment of a genomic DNA or cDNA using as primers portions of the coding region represented by nucleotides 74- 1606 of SEQ ID NO:3. Based on this sequence, a person of skill in the art may construct a number of primer pairs (each pair having a forward and a reverse primer). The primer may be from about 15-17 nucleotides long, preferably from about 17-19 nucleotides long, and most preferably from about 21-23 nucleotides long. Any portion ofthe coding region represented by nucleotides 74-1606 of SEQ ID NO:3 may be used.

In another embodiment, an angiosperm CAld5H protein, such as sweetgum protein, may be used, according to the methods well known in the art, to prepare polyclonal and monoclonal antibodies, which is turn may be used to screen an expression library ofthe angiosperm of interest. Polynucleotides encoding CAld5H or a fragment thereof that are isolated according to the above described methods are within the scope ofthe invention and may be referred to as ASL DNA sequences. Thus, these polynucleotides may be used for the preparation of expression cassettes and thereby for the introduction into the cell of a gymnosperm. The cells containing isolated polynucleotides encoding angiosperm CAld5H ofthe invention (or a functional portion thereof) may then be regenerated into plants having the property of syringil lignin biosynthesis. The invention further relates to the angiosperm CAld5H enzyme, which has the property of catalyzing the 5 -hydroxy lation of coniferyl aldehyde to produce 5- hydroxyconiferyl. In one embodiment, the CAld5H enzyme ofthe invention has at least a 50 fold greater specificity constant (kcat/Km) for coniferyl aldehyde than for ferulic acid, preferably at least 100 fold greater specificity constant, and most preferably at least 140 fold greater specificity constant, when measured in the assay according to Example 8. The term "specificity constant" is well known in the art.

Preferably, the CAld5H enzyme ofthe invention catalyzes the first step in the biosynthetic pathway of syringyl lignin. This was established by showing that 5- hydroxyconiferyl aldehyde, the CAld5H reaction product, was efficiently converted into sinapyl aldehyde in the presence ofthe Escherichia co/z-expressed sweetgum caffeate O- methyltransferase (bi-OMT).

A method for altering lignin biosynthesis in a gymnosperm cell or a plant by introducing a polynucleotide encoding CAld5H or a functional portion thereof into the cell or the plant is also within the scope ofthe invention. Any method for introducing the polynucleotide ofthe invention into a cell or a plant known to a person of skill in the art, including the micro-projectile bombardment described in the Examples, may be used. A method for initiating syringyl lignin biosynthesis in a gymnosperm cell or a plant by expressing an angiosperm CAld5H or a functional portion thereof in a gymnosperm cell or plant is also within the scope ofthe invention. Any method for introducing the polynucleotide ofthe invention into a cell or a plant known to a person of skill in the art, including the micro-projectile bombardment described in the Examples, may be used. The initial step of "introducing" is then followed by the expression ofthe angiosperm CAld5H or a functional portion thereof.

The invention also provides for a method of mediating a conversion of coniferyl aldehyde to 5-hydroxyconiferyl aldehyde by contacting coniferyl aldehyde with the angiosperm CAld5H enzyme ofthe invention so that 5 -hydroxy lation ofthe coniferyl aldehyde occurs, such as described in Example 8. To suppress production of less preferred forms of lignin in gymnosperms, such as guaiacyl lignin, genes from the gymnosperm associated with production of these less preferred forms of lignin are identified, isolated and the DNA sequence coding for anti-sense mRNA (referred to at times herein as the "GL anti-sense sequence") for these genes is produced. The DNA sequence coding for anti-sense mRNA is then incoφorated into the gymnosperm genome, which when expressed binds to the less preferred guaiacyl gymnosperm lignin mRNA, inactivating it.

Further features of these and various other steps and procedures associated with practice ofthe invention will now be described in more detail beginning with identification and isolation of ASL DNA sequences of interest for use in inducing production of syringyl lignin in a gymnosperm.

I. Determination of DNA Sequence For Genes Associated with Production of Syringyl Lignin

The general biosynthetic pathway for production of lignin has been postulated as shown in Fig. 1. From Fig. 1 , it can be seen that the genes encoding 4-coumarate CoA ligase (referred to herein as 4CL or CCL), O-methyl transferase (referred to herein as OMT) and F5H (which is from the class of P450 genes) may play key roles in production of syringyl lignin in some plant species, but their specific contributions and mechanisms remain to be positively established. It is suspected that the CCL, OMT and F5H genes may have specific equivalents in a specific angiosperm, such as sweetgum. Accordingly, one aim ofthe present invention is to identify, sequence and clone specific genes of interest from an angiosperm such as sweetgum which are involved in production of syringyl lignin and to then introduce those genes into the genome of a gymnosperm, such as loblolly pine, to induce production of syringyl lignin. The term "gene" is used herein broadly; thus, it refers to, not only the coding sequence with the upstream promoter, but to the coding sequence alone.

Genes of interest may be identified in various ways, depending on how much information about the gene is already known. Genes believed to be associated with production of syringyl lignin have already been sequenced from a few angiosperm species, viz, CCL and OMT.

DNA sequences ofthe various CCL and OMT genes are compared to each other to determine if there are conserved regions. Once the conserved regions ofthe DNA sequences are identified, primers homologous to the conserved sequences are synthesized. Reverse transcription ofthe DNA-free total RNA which was purified from sweetgum xylem tissue, followed by double PCR using gene-specific primers, enables production of probes for the CCL and OMT genes. A sweetgum cDNA library is constructed in a host, such as lambda ZAPII, available from Stratagene, LaJolla, CA, using poly(A) +RNA isolated from sweetgum xylem, according to the methods described by Bugos et al. (1995 Bio techniques 19:734- 737). The above mentioned probes are used to assay the sweetgum cDNA library to locate cDNA which codes for enzymes involved in production of syringyl lignin. Once a syringyl lignin sequence is located, it is then cloned and sequenced according to known methods which are familiar to those of ordinary skill.

In accordance with the invention, two sweetgum syringyl lignin genes have been determined using the above-described technique. These genes have been designated 4CL and bi-OMT. The sequence obtained for the sweetgum syringyl lignin gene, designated bi-OMT, is illustrated in Fig. 2 (SEQ ID NOS: 5 and 6). The sequence obtained for the sweetgum syringyl lignin gene, designated 4CL, is illustrated in Fig. 3 (SEQ ID NOS: 7 and 8).

An alternative procedure was employed to identify the F5H equivalent genes in sweetgum. Because the DNA sequences for similar P450 genes from other plant species were known, probes for the P450 genes were designed based on the conserved regions found by comparing the known sequences for similar P450 genes. The known P450 sequences used for comparison include all plant P450 genes in the GenBank database. Primers were designed based on two highly conserved regions which are common to all known plant P450 genes. The primers were then used in a PCR reaction with the sweetgum cDNA library as a template. Once P450-like fragments were located, they were amplified using standard PCR techniques, cloned into a pBluescript vector available from Stratagene of LaJolla, CA and transformed into a DH5 E. coli strain available from Gibco BRL of Gaithersburg, MD.

After E. coli colonies were tested in order to determine that they contained the P450-like DNA fragments, the fragments were sequenced. Several P450-like sequences were located in sweetgum using the above described technique. One P450-like sequence was sufficiently different from other known P450 sequences to indicate that it represented a new P450 gene family. This potentially new P450 cDNA fragment was used as a probe to screen two full length clones from the sweetgum xylem cDNA. These putative hydroxylase clones were designated P450-1 and P450-2. The sequences obtained for P450-1 and P450-2 are illustrated in Fig. 4 (SEQ ID NOS: 1 and 2) and Fig. 5 (SEQ ID NOS: 3 and 4), respectively.

π. Identification of GL Gene Promoter Regions

In order to locate gymnosperm lignin promoter regions, probes are developed to locate lignin genes. After the gymnosperm lignin gene is located, the portion of DNA upstream from the gene is sequenced, preferably using the GenomeWalker Kit, available from Clontech. The portion of DNA upstream from the lignin gene will generally contain the gymnosperm lignin promoter region.

Gymnosperm genes of interest include CCL-like genes and PAL-like genes, which are beleived to be involved in the production of lignin in gymnosperms. Preferred probe sequences are developed based on previously sequenced genes, which are available from the gene bank. The preferred gene bank accession numbers for the CCL-like genes include U39404 and U39405. A preferred gene bank accession number for a PAL-like gene is U39792. Probes for such genes are constructed according to methods familiar to those of ordinary skill in the art. A genomic DNA library is constructed and DNA fragments which code for gymnosperm lignin genes are then identified using the above mentioned probes. A preferred DNA library is obtained from the gymnosperm, Pinus taeda (L.) [Loblolly Pine], and a preferred host ofthe genomic library is Lambda Dashϋ, available from Stratagene of LaJolla, CA.

Once the DNA fragments which code for the gymnosperm lignin genes are located, the genomic region upstream from the gymnosperm lignin gene (the 5'flanking region) was identified. This region contains the GL promoter. Three promoter regions were located from gymnosperm lignin biosynthesis genes. The first is the 5'flanking region of the loblolly pine 4CL1B gene, shown in Fig. 6 (SEQ ID NO: 10). The second is the 5' flanking region ofthe loblolly pine gene 4CL3B, shown in Fig. 7 (SEQ ID NO: 11). The third is the 5' flanking region ofthe loblolly pine gene PAL, shown in Fig. 8 (SEQ ID NO: 9). III. Fusing the GL Promoter Region to the ASL DNA Sequence

The next step ofthe process is to fuse the GL promoter region to the ASL DNA sequence to make a GSL expression cassette for insertion into the genome of a gymnosperm. This may be accomplished by standard techniques. In a preferred method, the GL promoter region is first cloned into a suitable vector. Preferred vectors are pGEM7Z, available from Promega, Madison, Wl and SK available from Stratagene, LaJolla, CA. After the promoter sequence is cloned into the vector, it is then released with suitable restriction enzymes. The ASL DNA sequence is released with the same restriction enzyme(s) and purified. The GL promoter region sequence and the ASL DNA sequence are then ligated such as with T4 DNA ligase, available from Promega, to form the GSL expression cassette. Fusion ofthe GL and ASL DNA sequence is confirmed by restriction enzyme digestion and DNA sequencing. After confirmation of GL promoter- ASL DNA fusion, the GSL expression cassette is released from the original vector with suitable restriction enzymes and used in construction of vectors for plant transformation.

rv. Fusing the ASL DNA Sequence to a Constitutive Promoter Region

In an alternative embodiment, a standard constitutive promoter may be fused with the ASL DNA sequence to make a GSL expression cassette. For example, a standard constitutive promoter may be fused with P450-1 to form an expression cassette for insertion of P450-1 sequences into a gymnosperm genome. In addition, a standard constitutive promoter may be fused with P450-2 to form an expression cassette for insertion of P450-2 into a gymnosperm genome. A constitutive promoter for use in the invention is the double 35S promoter. In the preferred practice ofthe invention using constitutive promoters, a suitable vector such as pBI221, is digested by Xbal and HindHI to release the 35S promoter. At the same time the vector pHygro, available from International Paper, was digested by Xbal and HindUI to release the double 35S promoter. The double 35S promoter was ligated to the previously digested pBI221 vector to produce a new pBI221 with the double 35S promoter. This new pBI221 was digested with Sad and Smal, to release the GUS fragment. The vector is next treated with T4 DNA polymerase to produce blunt ends and the vector is self-ligated. This vector is then further digested with BamHl and Xbal, available from Promega. After the pBI221 vector containing the constitutive promoter region has been prepared, lignin gene sequences are prepared for insertion into the pBI221 vector.

The coding regions of sweetgum P450-1 or P450-2 are amplified by PCR using primer with restriction sites incoφorated in the 5' and 3' ends. In one example, an Xbal site was incoφorated at the 5' end and a BamHl site was incoφorated at the 3' end ofthe sweetgum P450-1 or P450-2 genes. After PCR, the P450-1 and P450-2 genes were separately cloned into a TA vector available from Invitrogen, Carlsbad, CA. The TA vectors containing the P450-1 and P450-2 genes, respectively, were digested by Xbal and BamHl to release the P450-1 or P450-2 sequences.

The p35SS vector, described above, and the isolated sweetgum P450-1 or P450-2 fragments were then ligated to make GLS expression cassettes containing the constitutive promoter.

V. Inserting the Expression Cassette into the Gymnosperm Genome

There are a number of methods by which the GSL expression cassette may be inserted into a target gymnosperm cell. One method of inserting the expression cassette into the gymnosperm is by micro-projectile bombardment of gymnosperm cells. For example, embryogenic tissue cultures of loblolly pine may be initiated from immature zygotic embryos. Tissue is maintained in an undifferentiated state on semi-solid proliferation medium. For transformation, embryogenic tissue is suspended in liquid proliferation medium. Cells are then sieved through, a preferably 40 mesh screen, to separate small, densely cytoplasmic cells from large vacuolar cells.

After separation, a portion ofthe liquid cell suspension fraction is vacuum deposited onto filter paper and placed on semi-solid proliferation medium. The prepared gymnosperm target cells are then grown for several days on filter paper discs in a petri dish.

A 1:1 mixture of plasmid DNA containing the selectable marker expression cassette and plasmid DNA containing the P450-1 expression cassette may be precipitated with gold to form microprojectiles. The microprojectiles are rinsed in absolute ethanol and aliqots are dried onto a suitable macrocarrier such as the macrocarrier available from BioRad in Hercules, CA. Prior to bombardment, embryogenic tissue is preferably desiccated under a sterile laminar-flow hood. The desiccated tissue is transferred to semi-solid proliferation medium. The prepared microprojectiles are accelerated from the macrocarrier into the desiccated target cells using a suitable apparatus such as a BioRad PDS-1000/HE particle gun. In a preferred method, each plate is bombarded once, rotated 180 degrees, and bombarded a second time. Preferred bombardment parameters are 1350 psi rupture disc pressure, 6 mm distance from the rupture disc to macrocarrier (gap distance), 1 cm macrocarrier travel distance, and 10 cm distance from macrocarrier stopping screen to culture plate (microcarrier travel distance). Tissue is then transferred to semi-solid proliferation medium containing a selection agent, such as hygromycin B, for two days after bombardment.

Other methods of inserting the GSL expression cassette include use of silicon carbide whiskers, transformed protoplasts, Agrobacterium vectors and electroporation.

VI. Identifying Transformed Cells

In general, insertion ofthe GSL expression cassette will typically be carried out in a mass of cells and it will be necessary to determine which cells harbor the recombinant DNA molecule containing the GSL expression cassette. Transformed cells are first identified by their ability to grow vigorously on a medium containing an antibiotic which is toxic to non-transformed cells. Preferred antibiotics are kanamycin and hygromycin B. Cells which grow vigorously on antibiotic containing medium are further tested for presence of either portions ofthe plasmid vector, the syringyl lignin genes in the GSL expression cassette; e.g. the angiosperm bi-OMT, 4CL, P450-1 or P450-2 gene, or by testing for presence of other fragments in the GSL expression cassette. Specific methods which can be used to test for presence of portions ofthe GSL expression cassette include Southern blotting with a labeled complementary probe or PCR amplification with specific complementary primers. In yet another approach, an expressed syringyl lignin enzyme can be detected by Western blotting with a specific antibody, or by assaying for a functional property such as the appearance of functional enzymatic activity.

Vπ. Production of a Gymnosperm Plant from the Transformed Gymnosperm Cell

Once transformed embryogenic cells ofthe gymnosperm have been identified, isolated and multiplied, they may be grown into plants. It is expected that all plants resulting from transformed cells will contain the GSL expression cassette in all their cells, and that wood in the secondary growth stage ofthe mature plant will be characterized by the presence of syringyl lignin. Transgenic embryogenic cells are allowed to replicate and develop into a somatic embryo, which are then converted into a somatic seedling.

VTA. Identification. Production and Insertion of a GL mRNA Anti-Sense Sequence In addition to adding ASL DNA sequences, anti-sense sequences may be incoφorated into a gymnosperm genome, via GSL expression cassettes, in order to suppress formation ofthe less preferred native gymnosperm lignin. To this end, the gymnosperm lignin gene is first located and sequenced in order to determine its nucleotide sequence. Methods for locating and sequencing amino acids which have been previously discussed may be employed. For example, if the gymnosperm lignin gene has already been purified, standard sequencing methods may be employed to determine the DNA nucleic acid sequence.

If the gymnosperm ligmn gene has not been purified and functionally similar DNA or mRNA sequences from similar species are known, those sequences may be compared to identify highly conserved regions and this information used as a basis for the construction of a probe. A gymnosperm cDNA or genomic library can be probed with the above mentioned sequences to locate the gymnosperm lignin cDNA or genomic DNA. Once the gymnosperm lignin DNA is located, it may be sequenced using standard sequencing methods.

After the DNA sequence has been obtained for a gymnosperm lignin sequence, the complementary anti-sense strand is constructed and incoφorated into an expression cassette. For example, the GL mRNA anti-sense sequence may be fused to a promoter region to form an expression cassette as described above. In a preferred method, the GL mRNA anti-sense sequence is incoφorated into the previously discussed GSL expression cassette which is inserted into the gymnosperm genome as described above.

IX. Inclusion of Cvtochrome P450 Reductase (CPR) to Enhance Biosynthesis of Syringyl Lignin in Gymnosperms In the absence of external cofactors such as NADPH (an electron donor in reductive biosyntheses), certain angiosperm lignin genes such as the P450 genes may remain inactive or not acheive full or desired activity after insertion into the genome of a gymnosperm. Inactivity or insufficient activity can be determined by testing the resulting plant which contains the P450 genes for the presence of syringyl lignin in secondary growth. It is known that cytochrome P450 reductase (CPR) may be involved in promoting certain reductive biochemical reactions, and may activate the desired expression of genes in many plants. Accordingly, if it is desired to enhance the expression ofthe angiosperm syringyl lignin genes in the gymnosperm, CPR may be inserted in the gymnosperm genome. In order to express CPR, the DNA sequence ofthe enzyme is ligated to a constitutive promoter or, for a specific species such as loblolly pine, xylem-specific lignin promoters such as PAL, 4CL1B or 4CL3B to form an expression cassette. The expression cassette may then be inserted into the gymnosperm genome by various methods as described above.

X. Examples

The following non-limiting examples illustrate further aspects ofthe invention. In these examples, the angiosperm is Liquidambar styraciflua (L.)[sweetgum] and the gymnosperm is Pinus taeda (L.)[loblolly pine]. The nomenclature for the genes referred to in the examples is as follows:

Example 1 - Isolating and Sequencing bi-OMT and 4CL Genes from an Angiosperm A cDNA library for Sweetgum was constructed in Lambda ZAPII, available from Stratagene, LaJolla, CA, using poly(A)⁺ RNA isolated from sweetgum xylem tissue. Probes for bi-OMT and 4CL were obtained through reverse transcription of their mRNAs and followed by double PCR using gene-specific primers which were designed based on the OMT and CCL cDNA sequences obtained from similar genes cloned from other species.

Four primers were used for amplifying OMT fragments. One was an oligo-dT primer. One was a bi-OMT primer (which was used to clone gene fragments through modified differential display technique, as described below in Example 2) and the other two were degenerate primers, which were based on the conserved sequences of all known OMTs. The two degenerate primers were derived based on the following amino acid sequences:

5'- Gly Gly Met Ala Thr Tyr Cys Cys Ala Thr Thr Tyr Ala Ala Cys Ala Ala Gly Gly Cys-3' (primer #22) and 3'-Ala Ala Ala Gly Ala Gly Ala Gly Asn Ala Cys Asn Asn Ala Asn Asn Ala Asn

Gly Ala-5* (primer #23).

A 900 bp PCR product was produced when oligo-dT primer and primer #22 were used, and a 550 bp fragment was produced when primer numbers 22 and 23 were used.

Three primers were used for amplifying 4CL fragments. They were derived from the following amino acid sequences:

5'-Thr Thr Gly Gly Ala Thr Cys Cys Gly Gly He Ala Cys He Ala Cys He Gly Gly He Tyr Thr He Cys Cys He Ala Ala Arg Gly Gly-3' (primer RIS)

5'-Thr Thr Gly Gly Ala Thr Cys Cys Gly Thr He Gly Thr He Gly Cys He Cys Ala

Arg Cys Ala Arg Gly Thr He Gly Ala Tyr Gly Gly-3' (primer HIS) and

3*-Cys Cys He Cys Thr Tyr Thr Ala Asp Ala Cys Arg Thr Ala Asp Gly Cys He Cys Cys Ala Gly Cys Thr Gly Thr Ala-5* (primer R2A)

RIS and HIS were both sense primers. Primer R A was an anti-sense primer. A 650 bp fragment was produced if RIS and R2A primers were used and a 550 bp fragment was produced when primers HIS and R2A were used. The sequence of these three primers were derived from conserved sequences for plant CCLs.

The reverse transcription-double PCR cloning technique used for these examples consisted of adding 10 μg of DNA- free total RNA in 25 μl DEPC-treated water to a microfuge tube. Next, the following solutions were added: a. 5x Reverse transcript buffer 8.0μl, b. 0.1 M DTT 4.0 μl c. 10 mM dNTP 2.0 μl d. 100 μM oligo-dT primers 8.0 μl e. Rnasin 2.0 μl f. Superscript II 1.0 μl

After mixing, the tube was incubated at a temperature of 42° C for one (1) hour, followed by incubation at 70° C for fifteen (15) minutes. Forty (40) μl of IN NaOH was added and the tube was further incubated at 68 ° C for twenty (20) minutes. After the incubation periods, 80 μl of IN HCl was added to the reaction mixture. At the same time, 17 μl NaOAc, 5 μl glycogen and 768 μl of 100% ethanol were added and the reaction mixture was maintained at -80° C for 15 minutes in order to precipitate the cDNA. The precipitated cDNA was centrifuged at high speed at 4° C for 15 minutes. The resulting pellet was washed with 70% ethanol and then dried at room temperature, and then was dissolved in 20 μl of water.

The foregoing procedure produced purified cDNA which was used as a template to carry out first round PCR using primers #22 and oligo-dT for cloning OMT cDNA and primer RIS and R2A for cloning 4CL cDNA. For the first round PCR, a master mix of 50μl for each reaction was prepared. Each 50μl mixture contained: a. lOx buffer 5 μl b. 25 mM MgCl₂ 5μl c. 100 μM sense primer 1 μl (primer #22 for OMT and primer RIS for CCL). d. 100 μl anti-sense primer 1 μl (oligo-dT primer for OMT and R2A for CCL). e. lO mM dNTP l μl f. Taq DNA polymerase 0.5 μl

Of this master mix, 48 μl was added into a PCR tube containing 2 μl of cDNA for PCR. The tube was heated to 95 ° C for 45 seconds, 52 °C for one minute and 72° C for two minutes. This temperature cycle was repeated for 40 cycles and the mixture was .then held at 72° C for 10 minutes.

The cDNA fragments obtained from the first round of PCR were used as templates to perform the second round of PCR using primers 22 and 23 for cloning bi-OMT cDNA and primer HIS and R2A for cloning 4CL cDNA. The second round of PCR conditions were the same as the first round.

The desired cDNA fragment was then sub-cloned and sequenced. After the second round of PCR, the product with the predicted size was excised from the gel and ligated into apUC19 vector, available from Clonetech, of Palo Alto, CA, and then transformed into DH5α, an E. coli strain , available from Gibco BRL, of Gaithersburg, MD. After the inserts had been checked for correct size, the colonies were isolated and plasmids were sequenced using a Sequenase kit available from USB, of Cleveland, OH. The sequences are shown in Fig. 2 (SEQ ID NO: 5 and 6) and Fig. 3 (SEQ ID NO: 7 and 8).

Example 2 - Alternative Isolation Method of Angiosperm bi-OMT gene

As previously mentioned, one bi-OMT clone was produced via modified differential display technique. This method is another type of reverse transcription-PCR, in which DNA-free total RNA was reverse transcribed using oligo-dT primers with a single base pair anchor to form cDNA. The oligo-dT primers used for reverse transcription of mRNA to synthesize cDNA were: Til A: TTTTTTTTTTTA, T11C: TTTTTTTTTTTC, and TUG: TTTTTTTTTTTG,

These cDNAs were then used as templates for radioactive PCR which was conducted in the presence ofthe same oligo-dT primers as listed above, a bi-OMT gene-specific primer and 35S-dATP. The OMT gene-specific primer was derived from the following amino acid sequence: 5'-Cys Cys Asn Gly Gly Asn Gly Gly Ser Ala Arg Gly Ala-3'.

The following PCR reaction solutions were combined in a microfuge tube: a. H₂O 9.2μl, b. Taq Buffer 2.0μl c. dNTP (25μM) 1.6μl d. Primers (5 μM) 2 μl, for each primer e. ³⁵S-dATP lμl f. Taq. pol. 0.2μl g. cDNA 2.0μl.

The tube was heated to a temperature of 94° C and held for 45 seconds, then at 37° C for 2 minutes and then 72 °C for 45 seconds for forty cycles, followed by a final reaction at 72 °C for 5 minutes.

The amplified products were fractionated on a denaturing polyacrylamide sequencing gel and autoradiography was used to identify and excise the fragments with a predicted size. The designed OMT gene-specific primer had a sequence conserved in a region toward the 3'-end ofthe OMT cDNA sequence. This primer, together with oligo-dT, was amplified into a OMT cDNA fragment of about 300 bp.

Three oligo-dTs with a single base pair of A, C or G, respectively, were used to pair with the OMT gene-specific primer. Eight potential OMT cDNA fragments with predicted sizes of about 300 bp were excised from the gels after several independent PCR rounds using different combinations of oligo-dT and OMT gene-specific oligo-nucleotides as primers.

The OMT cDNA fragments were then re-amplified. A Southern blot analysis was performed for the resulting cDNAs using a 360 base-pair, ³²P radio-isotope labeled, aspen OMT cDNA 3'-end fragment as a probe to identify the cDNA fragments having a strong hybridization signal, under low stringency conditions. Eight fragments were identified.

Out of these eight cDNA fragments, three were selected based on their high hybridization signal for sub-cloning and sequencing. One clone, LsOMT3'-l, (where the "Ls" prefix indicates that the clone was derived from the Liquidambar styraciβua (L.) genome) was confirmed to encode bi-OMT based on its high homology to other lignin-specific plant

OMTs at both nucleotide and amino acid sequence levels.

A cDNA library was constructed in Lambda ZAP II, available from Stratagene, of

LaJolla, CA, using 5μg poly(A)+RNA isolated from sweetgum xylem tissue. The primary library consisting of approximately 0.7 x 10⁶ independent recombinants was amplified and approximately 10⁵ plaque-forming-units (pfu) were screened using a homologous 550 base-pair probe. The hybridized filter was washed at high stringency (0.25 x SSC, 0.1%

SDS, 65 °C) conditions. The colony containing the bi-OMT fragment identified by the probe was eluted and the bi-OMT fragment was produced. The sequences as illustrated in

Fig. 2 (SEQ ID NO: 5 and 6) was obtained.

Exam^ple 3 - Isolating and Producing the DNA which codes for the Angiosperm P450-1 Gene In order to find putative P450 cDNA fragments as probes for cDNA library screening, a highly degenerated sense primer based on the amino acid sequence of 5'-Glu, Glu, Phe, Arg, Pro, Glu, Arg-3' was designed based on the conserved regions found in some plant P450 proteins. This conserved domain was located upstream of another highly conserved region in P450 proteins, which had an amino acid sequence of 5'-Phe Gly Xaa Gly Xaa Xaa Cys Xaa Gly-3'. This primer was synthesized with the incoφoration of an Xbol restriction site to give a 26-base-pair oligomer nucleotide sequence of 5'ATG TGC AGT TTT TTT TTT TTT TTT TT-3'.

This primer and the oligo-dT-XhoI primer were then used to perform PCR reactions with the sweetgum cDNA library as a template. The cDNA library was constructed in Lambda ZAPII, available from Stratagene, of LaJolla, CA, using poly(A)+ RNA isolated from sweetgum xylem tissue. Amplified fragments of 300 to 600 bp were obtained. Because the designed primer was located upstream ofthe highly conserved P450 domain, this design distinguished whether the PCR products were P450 gene fragments depending on whether they contained the highly conserved amino acid domain. All the fragments obtained from the PCR reaction were then cloned into a pUC19 vector, available from New England Biolab, Beverly, MA, and transformed into a DH5α E. coli strain, available from Gibco BRL, of Gaithersburg, MD.

Twenty-four positive colonies were obtained and sequenced. Sequence analysis indicated four groupings within the twenty-four colonies. One was C4H, one was an unknown P450 gene, and two did not belong to P450 genes. Homologies of P450 genes in different species are usually more than 80%. Because the homologies between the P450 gene families found here were around 40%, the sequence analysis indicated that a new P450 gene family was sequenced. Moreover, since this P450 cDNA was isolated from xylem tissue, it was highly probable that this P450 gene was P450-1.

The novel sweetgum P450 cDNA fragment was used as a probe to screen a full length cDNA encoding for P450-1. Once the P450-1 gene was located it was sequenced. The length ofthe P450-1 cDNA is 1707 bp and it contains 45 bp of 5' non-coding region and 135 bp of 3' non-coding region. The deduced amino acid sequence also indicates that this P450 cDNA has a hydrophobic core at the N-terminal, which could be regarded as a leader sequence for c-translational targeting to membranes during protein synthesis. At the C-terminal region, there is a heme binding domain that is characteristic of all P450 genes. The P450-1 sequence, as illustrated in Fig. 4 (SEQ ID NO: 1 and 2), was produced, according to the above described methods.

Example 4 - Isolating and Producing the DNA which codes for the Angiosperm P450-2 Gene

By using similar strategy of synthesizing PCR primers from the published literature for hydroxylase genes in plants, another full length P450 cDNA has been isolated. This cloned cDNA, designated P450-2 contains 1883 bp and encodes an open reading frame of 511 amino acids. Although the amino acid similarity shared between Arabidopsis putative F5H (Meyers et al. 1996, PNAS, 93:6869-6874) and the P450-2 sweetgum clone is about 75%o, P450-2 has a completely different biochemical function from F5H, as described in Example 8.

To confirm the function ofthe P450-2 gene, it was expressed in E.coli, strain DH5α, via pQE vector preparation, according to directions available with the kit. A CO-Fe²⁺ binding assay was also performed to confirm the expression of P450-2 as a functional P450 gene. (O ura & Sato 1964, J. of Biochemistry 239: 2370-2378, Babriac et.al. 1991 Archives of Biochemistry and Biophysics 288:302-3091. The CO-Fe²⁺ binding assay showed a peak at 450nm which indicates that P450-2 has been overexpressed as a functional P450 gene.

The P450-2 protein was further purified for production of antibodies in rabbits, and antibodies have been successfully produced. In addition, Western blots show that this antibody is specific to the membrane fraction of sweetgum and aspen xylem extract. When the P450-2 antibody was added to a reaction mixture containing aspen xylem tissue, enzyme inhibition studies showed that the activity of FA5H in aspen was reduced more than 60%, a further indication that P450-2 performs a P450-like function. Recombinant P450-2 protein co-expressed with Arabidopsis CPR protein in a baculovirus expression system hydroxylated ferulic acid (specific activity: 7.3 pKat/mg protein), cinnamic acid (specific activity: 25 pKat/mg protein), and p-coumaric acid (specific activity 3.8 pKat/mg protein). In subsequent experiments, recombinant P450-2 protein was also co-expressed with Arabidopsis CPR protein in a yeast expression system as described in Example 8. The P450-2 which may be referred to as CAld5H mediates specifically the biosynthesis of syringyl lignin in plants, as shown in Fig. 1." Fig. 5 (SEQ ID NO: 3 and 4) illustrates the P450-2 sequence.

Example 5 - Identifying Gymnosperm Promoter Regions

In order to identify gymnosperm promoter regions, sequences from loblolly pine PAL and 4CL1B and 4CL3B lignin genes were used as primers to screen the loblolly pine genomic library, using the GenomeWalker Kit. The loblolly pine PAL primer sequence was obtained from the GenBank, reference number U39792. The loblolly pine 4CL1B primer sequences were also obtained from the gene bank, reference numbers U39404 and U39405. The loblolly pine genomic library was constructed in Lambda DasbJJ, available from Stratagene, of LaJolla, CA. 3 x 10⁶ phage plaques from the genomic library of loblolly pine were screened using both the above mentioned PAL cDNA and 4CL (PCR clone) fragments as probes. Five 4CL clones were obtained after screening. Lambda DNAs of two 4CL ofthe five 4CL clones obtained after screening were isolated and digested by EcoRV, PstI, Sail and Xbal for Southern analysis. Southern analysis using 4CL fragments as probes indicated that both clones for the 4CL gene were identical. Results from further mapping showed that none ofthe original five 4CL clones contained promoter regions. When tested, the PAL clones obtained from the screening also did not contain promoter regions. In a second attempt to clone the promoter regions associated with the PAL and

4CL a Universal Genome Walker(TM) kit, available from Clontech, was used. In the process, total DNA from loblolly pine was digested by several restriction enzymes and ligated into the adaptors (libraries) provided with the kit. Two gene-specific primers for each gene were designed (GSP1 and 2). After two rounds of PCR using these primers and adapter primers ofthe kit, several fragments were amplified from each library. A 1.6 kb fragment and a 0.6 kb fragment for PAL gene and a 2.3 kb fragment (4CL1B) and a 0.7 kb fragment (4CL3B) for the 4CL gene were cloned, sequenced and found to contain promoter regions for all three genes. See Fig. 6 (SEQ ID NO: 10), 7 (SEQ ID NO: 11) and 8 (SEQ ID NO: 9). Example 6 - Fusing the ASL DNA Sequence to a Constitutive Promoter Region and Inserting the Expression Cassette into a Gymnosperm Genome

As a first step, an ASL DNA sequence, P450-1, was fused with a constitutive promoter region according to the methods described in the above Section TV to form a P450-1 expression cassette. A second ASL DNA sequence, P450-2, was then fused with a constitutive promoter in the same manner to form a P450-2 expression cassette. The P450-1 expression cassette was inserted into the gymnosperm genome by micro-projectile bombardment. Embryogenic tissue cultures of loblolly pine were initiated from immature zygotic embryos. The tissue was maintained in an undifferentiated state on semi-solid proliferation medium, according to methods described by Newton et al. TAES Technical Publication "Somatic Embryogenesis in Slash Pine", 1995 and Keinonen-Mettala et al. 1996, Scand. J. For. Res. 11 : 242-250.

After separation, 5 ml ofthe liquid cell suspension fraction which passes through the 40 mesh screen was vacuum deposited onto filter paper and placed on semi-solid proliferation medium. The prepared gymnosperm target cells were then grown for 2 days on filter paper discs placed on semi-solid proliferation medium in a petri dish. These target cells were then bombarded with plasmid DNA containing the P450-1 expression cassette and an expression cassette containing a selectable marker gene encoding the enzyme which confers resistance to the antibiotic hygromycin B. A 1 : 1 mixture of of selectable marker expression cassette and plasmid DNA containing the P450-1 expression cassette is precipitated with gold (1.5-3.0 microns) as described by Sanford et al. (1992). The DNA-coated microprojectiles were rinsed in absolute ethanol and aliquots of 10 μl (5 μg DNA/3mg gold) were dried onto a macrocarrier, such as those available from BioRad (Hercules, CA).

Prior to bombardment, embryogenic tissue was desiccated under a sterile laminar-flow hood for 5 minutes. The desiccated tissue was transferred to semi-solid proliferation medium. The microprojectiles were accelerated into desiccated target cells using a BioRad PDS-1000/HE particle gun. Each plate was bombarded once, rotated 180 degrees, and bombarded a second time. Preferred bombardment parameters were 1350 psi rupture disc pressure, 6 mm distance from the rupture disc to macrocarrier (gap distance), 1 cm macrocarrier travel distance, and 10 cm distance from macrocarrier stopping screen to culture plate (microcarrier travel distance). Tissue was then transferred to semi-solid proliferation medium containing hygromycin B for two days after bombardment.

The P450-2 expression cassette was inserted into the gymnosperm genome according to the same procedures.

Example 7 - Selecting Transformed Target Cells

After insertion ofthe P450-1 expression cassette and the selectable marker expression cassette into the gymnosperm target cells as described in Example 6, transformed cells were selected by exposure to an antibiotic that causes mortality of any cells not containing the GSL expression cassette. Forty independent cell lines were established from cultures co-bombarded with an expression cassette containing a hygromycin resistance gene construct and the P450-2 construct. These cell lines include lines Y2, Y17, Y7 and 04, as discussed in more detail below.

PCR techniques were then used to verify that the P450-1 gene had been successfully integrated into the genomes ofthe established cell lines by extracting genomic DNA using the Plant DNAeasy kit, available from Qiagen, Valencia, CA. 200 ng DNA from each cell line were used for each PCR reaction. Two P450-1 specific primers were designed to perform a PCR reaction with a 600bp PCR product size. The primers were: LsP450-iml-S primer: ATGGCTTTCCTTCTAATACCCATCTC , and LsP450-iml-S primer: GGGTGTAATGGACGAGCAAGGACTTG.

Each PCR reaction (100 μl) consisted of 75 μl H₂O, 1 μl MgCl₂ (25 mM), 10 μl PCR buffer 1 μl lOmM dNTPs, and 10 μl DNA. 100 μl oil was layered on the top of each reaction mix. Hot start PCR was done as follows: PCR reaction was incubated at 95 °C for 7 minutes and 1 μl each of both LsP450-iml-S and LsP450-iml-A primers (100 μM stock) and 1 μl of Taq polymerase were added through oil in each reaction. The PCR program used was 95 °C for 1.5 minutes, 55 °C for 45 seconds and 72° C for 2 minutes, repeated for 40 cycles, followed by extension at 72 °C for 10 minutes.

The above PCR products were employed to determine if gymnosperm cells contained the angiosperm lignin gene sequences. With reference to Fig. 9, PCR amplification was performed using template DNA from cells which grew vigorously on hygromycin B-containing medium. The PCR products were electrophoresed in an agarose gel containing 9 lanes. Lanes 1-4 contained PCR amplification of products ofthe Sweetgum P450-1 gene from a non-transformed control and transgenic loblolly pine cell lines. Lane 1 contained the non-transformed control PT52. Lane 2 contained transgenic line Y2. Lane 3 contained transgenic line Y17 and Lane 4 contained the plasmid which contains the expression cassette pSSLs450-l-im-s. Lanes 2 through 4 all contain an amplified fragment of about 600 bp, indicating that the P450-1 gene has been successfully inserted into transgenic cell lines Y2 and Y17.

Lane 5 contained a DNA size marker Phi 174/HaeIII (BRL). The top four bands in this lane indicate molecular sizes of 1353, 1078, 872 and 603 bp. Lanes 6-9 contained PCR amplification products of hygromycin B gene from non- transformed control and transgenic loblolly pine cell lines. Lane 6 contained the non- transformed control line referenced to as PT52. Lane 7 contained transgenic line Y7. Lane 8 contained transgenic line O4. Lane 9 contained the plasmid which includes the expression cassette containing the gene encoding the enzyme which confers resistance to the antibiotic hygromycin B. Lanes 7-9 all show an amplified fragment of about lOOObp, indicating that the hygromycin gene has been successfully inserted into transgenic lines Y7 and O4.

These PCR results confirmed the presence of P450-1 and hygromycin resistance gene in transformed loblolly pine cell cultures. The results obtained from the PCR verification of 4 cell lines, and similar tests with the remaining 36 cell lines, confirm stable integration of the P450-1 gene and the hygromycin B gene in 25% ofthe 40 cell lines.

In addition, loblolly pine embryogenic cells which have been co-bombarded with the P450-2 and hygromycin B expression cassettes, are growing vigorously on hygromycin selection medium, indicating that the P450-2 expression cassette was successfully integrated into the gymnosperm genome.

Example 8: Determination of Enzyme Activity for CAld5H

To determine the enzyme activity of sweetgum C Ald5H having the amino acid sequence of SEQ ID NO:4, the following two test were used: (1) an enzyme assay to determine the relative activity of CAW5H on various substrates, and (2) an HPLC-Mass Spectrometry (MS) analysis to verify the reaction products. Expression of Sweetgum CAld5H in Yeast and Bi-OMT in E. coli

First, the INVScl host strain of yeast Saccharomyces cerevisiae (Invitrogen, Carlsbad, CA) engineered for mutation of the ADE2 gene (Stotz, A. & Linder, P. 1990, Gene 95, 91-98) in order to use adenine as a selection marker. This mutated yeast strain was designated as INVSc2. Arabidopsis CPR cDNA (EST clone G8A6, Arabidopsis Biological Resource Center (ARBC), Ohio State University, Columbus, OH.) driven by GAL promoter was then integrated into the INVSc2 genome by homologous recombination giving rise to the INVSc2(CPR) strain. CAld5HcDNA driven by GAL promoter was put in the autonomously replicating vector pYAL, created by cloning the ADE2 gene in the pYX243 vector (Novagen R & D Systems, Madison, Wl) and selected using adenine and leucine as the markers. This CAld5H expression vector (pYAL- CAld5H) was transferred into INVSc2(CPR) to create the INVSc2(CPR)/pYAL-CAld5H yeast strain for co-expressing CPR and CAld5H cDNAs. The expression and preparation of microsomal fractions from INVSc2(CPR)/pYAL-CAld5H cells and control cells transformed with pY AL alone (INVSc2(CPR)/pYAL) were carried out as described

(Pompon, D., Louerat, B., Bronine, A. & Urban, P. 1996, Methods Enzymol. 272, 51-64). P450 was measured from the reduced-CO difference spectrum (Omura, T. & Sato, R. 1964, J. Biol. Chem. 239, 2370-2378). Microsomal NADPH-cytochrome c reductase activity was determined as described (Yasukochi, Y. & Masters, B.S.S. 1976, J. Biol. Chem. 251, 5337-5344). Protein concentrations were determined using the Bradford dye- binding reagent (Bio-Rad, Hercules, CA) with BSA as the standard. Microsomal fractions containing CAld5H recombinant protein were used for enzyme assay. E. cø/z^'-expressed bi- OMT was prepared according to Hu et al. (1998, Proc. Natl. Acad. Sci. USA 95, 5407- 5412).

Synthesized Substrates for Enzyme assay

5-Hydroxyferulic acid, feruloyl-CoA and 5-hydroxyferuloyl-CoA thioesters were synthesized as described (Li et al. 1997, Proc. Natl. Acad. Sci. USA 94, 5461-5466). 5-Hydroxyconiferyl aldehyde was synthesized from 5-hydroxyvanillin by first condensing it with monoethyl malonate to give ethyl 5-hydroxyferulate which was ethoxyethylated with ethyl vinyl ether and DL-10-camphorsulfonic acid in CH₂C1₂ to yield ethyl 5-hydroxyferulate diethoxyethyl ether. This ether was reduced by diisobutylalurninum hydride in CH₂C1₂ to give 5-hydroxyconiferyl alcohol diethoxyethyl ether, followed by oxidation with activated MnO₂ in CH₂C1₂ to afford 5-hydroxyconiferyl aldehyde diethoxyethyl ether, of which the ethoxyethyl groups were hydrolyzed by HCl in acetone to produce 5-hydroxyconiferyl aldehyde and its structure was confirmed by ^!H- and ¹³C-NMR, C,H-correlation spectroscopy (CH-COSY) and heteronuclear multiple bond connectivity (HMBC), and MS.

CAld5H and Bi-OMT Enzyme Activity Assay

For CAld5H activity, 500 μl of reaction mixture (saturated with oxygen) containing 50 mM NaH₂PO₄ (pH 7.5), 1 mM β-mercaptoethanol, 200 nM P450 from transformed yeast cells or 720 μg microsomal proteins from xylem, 0.5mM substrate, and 1 mM NADPH was incubated at 30 °C for 15 min followed by the addition of 20 μl 6N HCl to terminate the reaction and 1 μg sinapic acid as internal standard. For kinetic analyses, the reaction time was 5 min with 15 nM P450 from transformed yeast and varying concentrations of coniferyl aldehyde (1 to 32 μM) or ferulic acid (100 to 3200 μM) to measure the K_m, V_max and £_cat. To measure the K for coniferyl aldehyde, hydroxy lation of ferulate (100 to 3200 μM) was assayed in the presence of coniferyl aldehyde at various concentrations (0.25 to 5 μM). Bi-OMT activity was assayed according to Li et al. (Li, L., Popko, J.L., Zhang, X.-H., Osakabe, K., Tsai, C.J., Joshi, C.P. & Chiang, V.L. 1997, Proc. Natl. Acad. Sci. USA 94, 5461-5466), except non- radioactive S-adenosyl-L-methionine was used and 1 μg o-coumarate as internal standard.

HPLC-UV and HPLC-MS Characterization of Enzyme Reaction Products

The ethyl acetate extracted and dried reaction mixtures were dissolved in 30 μl of HPLC mobile phase (20% acetonitrile in 10 mM formic acid, pH 2.7). Samples of 15 μl were injected automatically onto a Supelcosil LC-ABZ column (15cm x 4.6mm x 5 μm, Supelco, Bellefonte, PA), and compounds were separated isocratically at 40 °C and a flow rate of 1 ml/min with an HP 1100 LC system and detected by an HP 1100 diode array detector and an HP 1100 LC-MSD with an Atmospheric Pressure Ionization (API)- elecfrospray (ES) source in negative ion mode (Hewlett Packard, Palo Alto, CA). The reaction products were identified and quantified by comparison to authentic standards. 5- Hydroxyconiferyl aldehyde; UV (HPLC mobile phase) λ _max I 244 nm, λ _max II 344 nm, λ _min273 nm; MS (150 V) m/z (%), 194 (5.9), 193 ([M-H]^", 55), 178 (100), 150 (18.1); Retention time (Rt) 4.41 min/UV, 4.47 min/MS. 5-Hydroxyferulate; UV, λ _max I 236 nm, λ _maxII 322 nm, λ _min263 nm; MS (150 V) m/z (%), 210 (9.8), 209 ([M-H]-, 71.9), 194 (100), 150 (79.3); Rt 3.79 min/UV, 3.85 min/MS. Sinapyl aldehyde; UV, λ _maxI 244 nm, λ _maxII 344 nm, λ_min275 nm; MS (150 V) m/z (%), 208 (7.9), 207 ([M-H]\ 44), 192 (77), 177(100), 149 (13); Rt 6.64 min UV, 6.70 min/MS. Sinapate; UV, λ_maxI 238 nm, λ _maxII 324 nm, λ _min264 nm; MS (150 V) m/z (%), 224 (11.3), 223 ([M-H]^", 100), 208 (44.5), 193 (59); Rt 5.92 min/UV, 5.98 min/MS.

Measurement of Kinetic Constants

K_m and V_ma values were determined from Lineweaver-Burk plots, and £_cat values by dividing V_max by the enzyme concentration, based on three to four independent assays. K_x- was derived from a Dixon plot. Specificity constant = k_c K_m

Results of Enzyme Assay and Kinetics

Based on the enzyme activity, it was found that CAld5H is highly specific towards coniferyl aldehyde as the substrate, an activity which has never been discovered before. Based on the kinetic analyses, the K_m and k of CAld5H for coniferyl aldehyde are 2.77 μM and 4.31/min, respectively, whereas these values for ferulic acid are 286 μM and 3.1/min, respective. The specificity constant ( k_cJK_m) values indicates that coniferyl aldehyde 5-hydroxylation is -140 times more efficient than ferulic acid 5-hydroxylation. More significantly, when coniferyl aldehyde and ferulate were present together, coniferyl aldehyde noncompetitively inhibited (K_{ = 0.59 μM) the 5-hydroxylation of ferulate, thereby eliminating the entire reaction sequence from ferulic acid to sinapic acid. In contrast, ferulate had no effect on coniferyl aldehyde 5-hydroxylation. Thus, CAld5H is a novel enzyme that is completely different from ferulic acid 5-hydroxylase (F5H). CAld5H exhibits a unique function mediating specifically the biosynthesis of syringyl lignin.

Although various embodiments and features ofthe invention have been described in the foregoing detailed description, those of ordinary skill will recognize the invention is capable of numerous modifications, rearrangements and substitutions without departing from the scope ofthe invention as set forth in the appended claims. For example, in the case where the lignin DNA sequence is transcribed and translated to produce a functional syringyl lignin gene, those of ordinary skill will recognize that because of codon degeneracy a number of polynucleotide sequences will encode the same gene. These variants are intended to be covered by the DNA sequences disclosed and claimed herein. In addition, the sequences claimed herein include those sequences with encode a gene having substantial functional identity with those claimed. Thus, in the case of syringyl lignin genes, for example, the DNA sequences include variant polynucleotide sequences encoding polypeptides which have substantial identity with the amino acid sequence of syringyl lignin and which show syringyl lignin activity in gymnosperms. The polynucleotides that hybridize under the conditions of low, medium or high stringency to the isolated polynucleotides described herein are also within the scope ofthe invention. Furthermore, the invention also relates to polynucleotide and polypeptide having a certain % identity and/or % similarity to isolated polynucleotides/polypeptides (e.g. bi-OMT, 4CL) ofthe invention as described herein in more detail for CAld5H.

SEQUENCE LISTING

<110> Chiang, Vincent L Carraway, Daniel T Smeltzer, Richard H International Paper Company

<120> Production of Syringyl Lignin in Gymnosperms

<130> Syringyl Lignin

<140> Not Yet Assigned

<141> Filed Concurrently Herewith

_<150> US 08/991,677 <151> 1997-12-16

<150> US 60/033,381 <151> 1997-12-16

<160> 11

<170> Patentln Ver. 2.0 <210> 1

<211> 1708

<212> DNA

<213> Liquidambar styraciflua

<220>

<221> CDS

<222> (48) .. (1571)

<400> 1 cggcacgagg aaaccctaaa actcacctct cttacccttt ctcttca atg get ttc 56

Met Ala Phe

ctt eta ata ccc ate tea ata ate ttc ate gtc tta get tac cag etc 104 Leu Leu lie Pro lie Ser lie lie Phe lie Val Leu Ala Tyr Gin Leu 5 10 15

tat caa egg etc aga ttt aag etc cca ccc ggc cca cgt cca tgg ccg 152 Tyr Gin Arg Leu Arg Phe Lys Leu Pro Pro Gly Pro Arg Pro Trp Pro 20 25 30 35

ate gtc gga aac ctt tac gac ata aaa ccg gtg agg ttc egg tgt ttc 200 lie Val Gly Asn Leu Tyr Asp lie Lys Pro Val Arg Phe Arg Cys Phe

40 45 50 gcc gag tgg tea caa gcg tac ggt ccg ate ata teg gtg tgg ttc ggt 248

Ala Glu Trp Ser Gin Ala Tyr Gly Pro lie lie Ser Val Trp Phe Gly

55 60 65

tea acg ttg aat gtg ate gta teg aat teg gaa ttg get aag gaa gtg 296

Ser Thr Leu Asn Val lie Val Ser Asn Ser Glu Leu Ala Lys Glu Val

70 75 80

etc aag gaa aaa gat caa caa ttg get gat agg cat agg agt aga tea 344

Leu Lys Glu Lys Asp Gin Gin Leu Ala Asp Arg His Arg Ser Arg Ser

85 90 95

get gcc aaa ttt age agg gat ggg cag gac ctt ata tgg get gat tat 392

Ala Ala Lys Phe Ser Arg Asp Gly Gin Asp Leu lie Trp Ala Asp Tyr

100 105 110 115

gga cct cac tat gtg aag gtt aca aag gtt tgt ace etc gag ctt ttt 440

Gly Pro His Tyr Val Lys Val Thr Lys Val Cys Thr Leu Glu Leu Phe

120 125 130

act cca aag egg ctt gaa get ctt aga ccc att aga gaa gat gaa gtt 488

Thr Pro Lys Arg Leu Glu Ala Leu Arg Pro lie Arg Glu Asp Glu Val

135 140 145

aca gcc atg gtt gag tec att ttt aat gac act gcg aat cct gaa aat 536

Thr Ala Met Val Glu Ser lie Phe Asn Asp Thr Ala Asn Pro Glu Asn

150 155 160

tat ggg aag agt atg ctg gtg aag aag tat ttg gga gca gta gca ttc 584

Tyr Gly Lys Ser Met Leu Val Lys Lys Tyr Leu Gly Ala Val Ala Phe 165 170 175

aac aac att aca aga etc gca ttt gga aag cga ttc gtg aat tea gag 632

Asn Asn lie Thr Arg Leu Ala Phe Gly Lys Arg Phe Val Asn Ser Glu 180 185 190 195

ggt gta atg gac gag caa gga ctt gaa ttt aag gaa att gtg gcc aat 680

Gly Val Met Asp Glu Gin Gly Leu Glu Phe Lys Glu He Val Ala Asn

200 205 210 gga etc aag ctt ggt gcc tea ctt gca atg get gag cac att cct tgg 728

Gly Leu Lys Leu Gly Ala Ser Leu Ala Met Ala Glu His He Pro Trp

215 220 225

etc cgt tgg atg ttc cca ctt gag gaa ggg gcc ttt gcc aag cat ggg 776

Leu Arg Trp Met Phe Pro Leu Glu Glu Gly Ala Phe Ala Lys His Gly 230 235 240

gca cgt agg gac cga ctt ace aga get ate atg gaa gag cac aca ata 824

Ala Arg Arg Asp Arg Leu Thr Arg Ala He Met Glu Glu His Thr He

245 250 255

gcc cgt aaa aag agt ggt gga gcc caa caa cat ttc gtg gat gca ttg 872

Ala Arg Lys Lys Ser Gly Gly Ala Gin Gin His Phe Val Asp Ala Leu 260 265 270 275

etc ace eta caa gag aaa tat gac ctt age gag gac act att att ggg 920

Leu Thr Leu Gin Glu Lys Tyr Asp Leu Ser Glu Asp Thr He He Gly

280 285 290 etc ctt tgg gat atg ate act gca ggc atg gac aca ace gca ate tct 968

Leu Leu Trp Asp Met He Thr Ala Gly Met Asp Thr Thr Ala He Ser 295 300 305

gtc gaa tgg gcc atg gcc gag tta att aag aac cca agg gtg caa caa 1016 Val Glu Trp Ala Met Ala Glu Leu He Lys Asn Pro Arg Val Gin Gin 310 315 320

aaa get caa gag gag eta gac aat gta ctt ggg tec gaa cgt gtc ctg 1064 Lys Ala Gin Glu Glu Leu Asp Asn Val Leu Gly Ser Glu Arg Val Leu 325 330 335

ace gaa ttg gac ttc tea age etc cct tat eta caa tgt gta gcc aag 1112 Thr Glu Leu Asp Phe Ser Ser Leu Pro Tyr Leu Gin Cys Val Ala Lys 340 345 350 355

gag gca eta agg ctg cac cct cca aca cca eta atg etc cct cat cgc 1160 Glu Ala Leu Arg Leu His Pro Pro Thr Pro Leu Met Leu Pro His Arg

360 365 370

gcc aat gcc aac gtc aaa att ggt ggc tac gac ate cct aag gga tea 1208 Ala Asn Ala Asn Val Lys He Gly Gly Tyr Asp He Pro Lys Gly Ser 375 380 385 aat gtt cat gta aat gtc tgg gcc gtg get cgt gat cca gca gtg tgg 1256

Asn Val His Val Asn Val Trp Ala Val Ala Arg Asp Pro Ala Val Trp 390 395 400

cgt gac cca eta gag ttt cga ccg gaa egg ttc tct gaa gac gat gtc 1304

Arg Asp Pro Leu Glu Phe Arg Pro Glu Arg Phe Ser Glu Asp Asp Val

405 410 415

gac atg aaa ggt cac gat tat agg eta ctg ccg ttt ggt gca ggg agg 1352

Asp Met Lys Gly His Asp Tyr Arg Leu Leu Pro Phe Gly Ala Gly Arg

420 425 430 435

cgt gtt tgc ccc ggt gca caa ctt ggc ate aat ttg gtc aca tec atg 1400

Arg Val Cys Pro Gly Ala Gin Leu Gly He Asn Leu Val Thr Ser Met

440 445 450

atg ggt cac eta ttg cac cat ttc cat tgg age cct cct aaa ggt gta 1448

Met Gly His Leu Leu His His Phe Tyr Trp Ser Pro Pro Lys Gly Val 455 460 465 aaa cca gag gag att gac atg tea gag aat cca gga ttg gtc ace tac 1496

Lys Pro Glu Glu He Asp Met Ser Glu Asn Pro Gly Leu Val Thr Tyr 470 475 480

atg cga ace ccg gtg caa get gtt ccc act cca agg ctg cct get cac 1544 Met Arg Thr Pro Val Gin Ala Val Pro Thr Pro Arg Leu Pro Ala His 485 490 495

ttg tac aaa cgt gta get gtg gat atg taattcttag tttgttatta 1591 Leu Tyr Lys Arg Val Ala Val Asp Met 500 505

ttcatgctct taaggttttg gactttgaac ttatgatgag atttgtaaaa ttccaagtga 1651

tcaaatgaag aaaagaccaa ataaaaaggc ttgacgattt aaaaaaaaaa aaaaaaa 1708

<210> 2

<211> 508

<212> PRT

<213> Liquidambar styraciflua

<400> 2

Met Ala Phe Leu Leu He Pro He Ser He He Phe He Val Leu Ala 1 5 10 15 Tyr Gin Leu Tyr Gin Arg Leu Arg Phe Lys Leu Pro Pro Gly Pro Arg

20 25 30

Pro Trp Pro He Val Gly Asn Leu Tyr Asp He Lys Pro Val Arg Phe

35 40 45

Arg Cys Phe Ala Glu Trp Ser Gin Ala Tyr Gly Pro He He Ser Val

50 55 60

Trp Phe Gly Ser Thr Leu Asn Val He Val Ser Asn Ser Glu Leu Ala

65 70 75 80

Lys Glu Val Leu Lys Glu Lys Asp Gin Gin Leu Ala Asp Arg His Arg

85 90 95

Ser Arg Ser Ala Ala Lys Phe Ser Arg Asp Gly Gin Asp Leu He Trp

100 105 110

Ala Asp Tyr Gly Pro His Tyr Val Lys Val Thr Lys Val Cys Thr Leu 115 120 125

Glu Leu Phe Thr Pro Lys Arg Leu Glu Ala Leu Arg Pro He Arg Glu

130 135 140 Asp Glu Val Thr Ala Met Val Glu Ser He Phe Asn Asp Thr Ala Asn 145 150 155 160

Pro Glu Asn Tyr Gly Lys Ser Met Leu Val Lys Lys Tyr Leu Gly Ala

165 170 175

Val Ala Phe Asn Asn He Thr Arg Leu Ala Phe Gly Lys Arg Phe Val 180 185 190

Asn Ser Glu Gly Val Met Asp Glu Gin Gly Leu Glu Phe Lys Glu He 195 200 205

Val Ala Asn Gly Leu Lys Leu Gly Ala Ser Leu Ala Met Ala Glu His 210 215 220

He Pro Trp Leu Arg Trp Met Phe Pro Leu Glu Glu Gly Ala Phe Ala 225 230 235 240

Lys His Gly Ala Arg Arg Asp Arg Leu Thr Arg Ala He Met Glu Glu

245 250 255

His Thr He Ala Arg Lys Lys Ser Gly Gly Ala Gin Gin His Phe Val 260 265 270 Asp Ala Leu Leu Thr Leu Gin Glu Lys Tyr Asp Leu Ser Glu Asp Thr

275 280 285

He He Gly Leu Leu Trp Asp Met He Thr Ala Gly Met Asp Thr Thr

290 295 300

Ala He Ser Val Glu Trp Ala Met Ala Glu Leu He Lys Asn Pro Arg

305 310 315 320

Val Gin Gin Lys Ala Gin Glu Glu Leu Asp Asn Val Leu Gly Ser Glu

325 330 335

Arg Val Leu Thr Glu Leu Asp Phe Ser Ser Leu Pro Tyr Leu Gin Cys

340 345 350

Val Ala Lys Glu Ala Leu Arg Leu His Pro Pro Thr Pro Leu Met Leu

355 360 365

Pro His Arg Ala Asn Ala Asn Val Lys He Gly Gly Tyr Asp He Pro

370 375 380

Lys Gly Ser Asn Val His Val Asn Val Trp Ala Val Ala Arg Asp Pro

385 390 395 400 Ala Val Trp Arg Asp Pro Leu Glu Phe Arg Pro Glu Arg Phe Ser Glu

405 410 415

Asp Asp Val Asp Met Lys Gly His Asp Tyr Arg Leu Leu Pro Phe Gly 420 425 430

Ala Gly Arg Arg Val Cys Pro Gly Ala Gin Leu Gly He Asn Leu Val 435 440 445

Thr Ser Met Met Gly His Leu Leu His His Phe Tyr Trp Ser Pro Pro 450 455 460

Lys Gly Val Lys Pro Glu Glu He Asp Met Ser Glu Asn Pro Gly Leu 465 470 475 480

Val Thr Tyr Met Arg Thr Pro Val Gin Ala Val Pro Thr Pro Arg Leu

485 490 495

Pro Ala His Leu Tyr Lys Arg Val Ala Val Asp Met 500 505

<210> 3 <211> 1883

<212> DNA

<213> Liquidambar styraciflua

<220>

<221> CDS

<222> (74) .. (1606)

<400> 3 tgeaaacetg cacaaaeaaa gagagagaag aagaaaaagg aagagaggag agagagagag 60

agagagagaa gcc atg gat tct tct ctt cat gaa gcc ttg caa cca eta 109 Met Asp Ser Ser Leu His Glu Ala Leu Gin Pro Leu 1 5 10

ccc atg acg ctg ttc ttc att ata cct ttg eta etc tta ttg ggc eta 157 Pro Met Thr Leu Phe Phe He He Pro Leu Leu Leu Leu Leu Gly Leu 15 20 25

gta tct egg ctt cgc cag aga eta cca tac cca cca ggc cca aaa ggc 205 Val Ser Arg Leu Arg Gin Arg Leu Pro Tyr Pro Pro Gly Pro Lys Gly 30 35 40 tta ccg gtg ate gga aac atg etc atg atg gat caa etc act cac cga 253

Leu Pro Val He Gly Asn Met Leu Met Met Asp Gin Leu Thr His Arg

45 50 55 60

gga etc gcc aaa etc gcc aaa caa tac ggc ggt eta ttc cac etc aag 301

Gly Leu Ala Lys Leu Ala Lys Gin Tyr Gly Gly Leu Phe His Leu Lys

65 70 75

atg gga ttc tta cac atg gtg gcc gtt tec aca ccc gac atg get cgc 349

Met Gly Phe Leu His Met Val Ala Val Ser Thr Pro Asp Met Ala Arg 80 85 90

caa gtc ctt caa gtc caa gac aac ate ttc teg aac egg cca gcc ace 397

Gin Val Leu Gin Val Gin Asp Asn He Phe Ser Asn Arg Pro Ala Thr 95 100 105

ata gcc ate age tac etc ace tat gac cga gcc gac atg gcc ttc get 445

He Ala He Ser Tyr Leu Thr Tyr Asp Arg Ala Asp Met Ala Phe Ala 110 115 120

cac tac ggc ccg ttt tgg cgt cag atg cgt aaa etc tgc gtc atg aaa 493

His Tyr Gly Pro Phe Trp Arg Gin Met Arg Lys Leu Cys Val Met Lys

125 130 135 140 tta ttt age egg aaa cga gcc gag teg tgg gag teg gtc cga gac gag 541

Leu Phe Ser Arg Lys Arg Ala Glu Ser Trp Glu Ser Val Arg Asp Glu

145 150 155

gtc gac teg gca gta cga gtg gtc gcg tec aat att ggg teg acg gtg 589

Val Asp Ser Ala Val Arg Val Val Ala Ser Asn He Gly Ser Thr Val 160 165 170

aat ate ggc gag ctg gtt ttt get ctg acg aag aat att act tac agg 637

Asn He Gly Glu Leu Val Phe Ala Leu Thr Lys Asn He Thr Tyr Arg

175 180 185

gcg get ttt ggg acg ate teg cat gag gac cag gac gag ttc gtg gcc 685

Ala Ala Phe Gly Thr He Ser His Glu Asp Gin Asp Glu Phe Val Ala 190 195 200

ata ctg caa gag ttt teg cag ctg ttt ggt get ttt aat ata get gat 733

He Leu Gin Glu Phe Ser Gin Leu Phe Gly Ala Phe Asn He Ala Asp 205 210 215 220

ttt ate cct tgg etc aaa tgg gtt cct cag ggg att aac gtc agg etc 781

Phe He Pro Trp Leu Lys Trp Val Pro Gin Gly He Asn Val Arg Leu

225 230 235 aac aag gca cga ggg gcg ctt gat ggg ttt att gac aag ate ate gac 829

Asn Lys Ala Arg Gly Ala Leu Asp Gly Phe He Asp Lys He He Asp 240 245 250

gat cat ata cag aag ggg agt aaa aac teg gag gag gtt gat act gat 877 Asp His He Gin Lys Gly Ser Lys Asn Ser Glu Glu Val Asp Thr Asp 255 260 265

atg gta gat gat tta ctt get ttt tac ggt gag gaa gcc aaa gta age 925 Met Val Asp Asp Leu Leu Ala Phe Tyr Gly Glu Glu Ala Lys Val Ser 270 275 280

gaa tct gac gat ctt caa aat tec ate aaa etc ace aaa gac aac ate 973 Glu Ser Asp Asp Leu Gin Asn Ser He Lys Leu Thr Lys Asp Asn He 285 290 295 300

aaa get ate atg gac gta atg ttt gga ggg ace gaa acg gtg gcg tec 1021 Lys Ala He Met Asp Val Met Phe Gly Gly Thr Glu Thr Val Ala Ser

305 310 315

gcg att gaa tgg gcc atg acg gag ctg atg aaa. age cca gaa gat eta 1069 Ala He Glu Trp Ala Met Thr Glu Leu Met Lys Ser Pro Glu Asp Leu 320 325 330 aag aag gtc caa caa gaa etc gcc gtg gtg gtg ggt ctt gac egg cga 1117

Lys Lys Val Gin Gin Glu Leu Ala Val Val Val Gly Leu Asp Arg Arg 335 340 345

gtc gaa gag aaa gac ttc gag aag etc ace tac ttg aaa tgc gta ctg 1165

Val Glu Glu Lys Asp Phe Glu Lys Leu Thr Tyr Leu Lys Cys Val Leu

350 355 360

aag gaa gtc ctt cgc etc cac cca ccc ate cca etc etc etc cac gag 1213

Lys Glu Val Leu Arg Leu His Pro Pro He Pro Leu Leu Leu His Glu 365 370 375 380

act gcc gag gac gcc gag gtc ggc ggc tac tac att ccg gcg aaa teg 1261

Thr Ala Glu Asp Ala Glu Val Gly Gly Tyr Tyr He Pro Ala Lys Ser

385 390 395

egg gtg atg ate aac gcg tgc gcc ate ggc egg gac aag aac teg tgg 1309

Arg Val Met He Asn Ala Cys Ala He Gly Arg Asp Lys Asn Ser Trp

400 405 410

gcc gac cca gat acg ttt agg ccc tec agg ttt etc aaa gac ggt gtg 1357

Ala Asp Pro Asp Thr Phe Arg Pro Ser Arg Phe Leu Lys Asp Gly Val

415 420 425

ccc gat ttc aaa ggg aac aac ttc gag ttc ate cca ttc ggg tea ggt 1405

Pro Asp Phe Lys Gly Asn Asn Phe Glu Phe He Pro Phe Gly Ser Gly

430 435 440 cgt egg tct tgc ccc ggt atg caa etc gga etc tac gcg eta gag acg 1453

Arg Arg Ser Cys Pro Gly Met Gin Leu Gly Leu Tyr Ala Leu Glu Thr 445 450 455 460

act gtg get cac etc ctt cac tgt ttc acg tgg gag ttg ccg gac ggg 1501 Thr Val Ala His Leu Leu His Cys Phe Thr Trp Glu Leu Pro Asp Gly

465 470 475

atg aaa ccg agt gaa etc gag atg aat gat gtg ttt gga etc ace gcg 1549 Met Lys Pro Ser Glu Leu Glu Met Asn Asp Val Phe Gly Leu Thr Ala 480 485 490

cca aga gcg att cga etc ace gcc gtg ccg agt cca cgc ctt etc tgt 1597 Pro Arg Ala He Arg Leu Thr Ala Val Pro Ser Pro Arg Leu Leu Cys 495 500 505

cct etc tat tgatcgaatg attgggggag ctttgtggag gggcttttat 1646

Pro Leu Tyr 510

ggagacteta tatatagatg ggaagtgaaa caacgacagg tgaatgettg gatttttggt 1706

atatattggg gagggagggg aaaaaaaaaa taatgaaagg aaagaaaaga gagaatttga 1766 atttetctte ctetgtggat aaaagcetcg tttttaattg tttttatgtg gagatatttg 1826

tgtttgttta tttttatctc tttttttgea ataacactca aaaataaaaa aaaaaaa 1883

<210> 4

<211> 511

<212> PRT

<213> Liquidambar styraciflua

<400> 4

Met Asp Ser Ser Leu His Glu Ala Leu Gin Pro Leu Pro Met Thr Leu 1 5 10 15

Phe Phe He He Pro Leu Leu Leu Leu Leu Gly Leu Val Ser Arg Leu 20 25 30

Arg Gin Arg Leu Pro Tyr Pro Pro Gly Pro Lys Gly Leu Pro Val He 35 40 45

Gly Asn Met Leu Met Met Asp Gin Leu Thr His Arg Gly Leu Ala Lys 50 55 60

Leu Ala Lys Gin Tyr Gly Gly Leu Phe His Leu Lys Met Gly Phe Leu 65 70 75 80 His Met Val Ala Val Ser Thr Pro Asp Met Ala Arg Gin Val Leu Gin

85 90 95

Val Gin Asp Asn He Phe Ser Asn Arg Pro Ala Thr He Ala He Ser 100 105 110

Tyr Leu Thr Tyr Asp Arg Ala Asp Met Ala Phe Ala His Tyr Gly Pro 115 120 125

Phe Trp Arg Gin Met Arg Lys Leu Cys Val Met Lys Leu Phe Ser Arg 130 135 140

Lys Arg Ala Glu Ser Trp Glu Ser Val Arg Asp Glu Val Asp Ser Ala 145 150 155 160

Val Arg Val Val Ala Ser Asn He Gly Ser Thr Val Asn He Gly Glu

165 170 175

Leu Val Phe Ala Leu Thr Lys Asn He Thr Tyr Arg Ala Ala Phe Gly 180 185 190

Thr He Ser His Glu Asp Gin Asp Glu Phe Val Ala He Leu Gin Glu 195 200 205 Phe Ser Gin Leu Phe Gly Ala Phe Asn He Ala Asp Phe He Pro Trp 210 215 220

Leu Lys Trp Val Pro Gin Gly He Asn Val Arg Leu Asn Lys Ala Arg 225 230 235 240

Gly Ala Leu Asp Gly Phe He Asp Lys He He Asp Asp His He Gin

245 250 255

Lys Gly Ser Lys Asn Ser Glu Glu Val Asp Thr Asp Met Val Asp Asp 260 265 270

Leu Leu Ala Phe Tyr Gly Glu Glu Ala Lys Val Ser Glu Ser Asp Asp 275 280 285

Leu Gin Asn Ser He Lys Leu Thr Lys Asp Asn He Lys Ala He Met 290 295 300

Asp Val Met Phe Gly Gly Thr Glu Thr Val Ala Ser Ala He Glu Trp 305 310 315 320

Ala Met Thr Glu Leu Met Lys Ser Pro Glu Asp Leu Lys Lys Val Gin

325 330 335 Gin Glu Leu Ala Val Val Val Gly Leu Asp Arg Arg Val Glu Glu Lys

340 345 350

Asp Phe Glu Lys Leu Thr Tyr Leu Lys Cys Val Leu Lys Glu Val Leu

355 360 365

Arg Leu His Pro Pro He Pro Leu Leu Leu His Glu Thr Ala Glu Asp

370 375 380

Ala Glu Val Gly Gly Tyr Tyr He Pro Ala Lys Ser Arg Val Met He

385 390 395 400

Asn Ala Cys Ala He Gly Arg Asp Lys Asn Ser Trp Ala Asp Pro Asp

405 410 415

Thr Phe Arg Pro Ser Arg Phe Leu Lys Asp Gly Val Pro Asp Phe Lys

420 425 430

Gly Asn Asn Phe Glu Phe He Pro Phe Gly Ser Gly Arg Arg Ser Cys

435 440 445

Pro Gly Met Gin Leu Gly Leu Tyr Ala Leu Glu Thr Thr Val Ala His

450 455 460 Leu Leu His Cys Phe Thr Trp Glu Leu Pro Asp Gly Met Lys Pro Ser 465 470 475 480

Glu Leu Glu Met Asn Asp Val Phe Gly Leu Thr Ala Pro Arg Ala He

485 490 495

Arg Leu Thr Ala Val Pro Ser Pro Arg Leu Leu Cys Pro Leu Tyr 500 505 510

<210> 5

<211> 1380

<212> DNA

<213> Liquidambar styraciflua

<220>

<221> CDS

<222> (67) .. (1170)

<400> 5 eggcacgage ectacctcet ttettggaaa aatttececa ttcgatcaca ateegggcet 60 caaaaa atg gga tea aca age gaa acg aag atg age ccg agt gaa gca 108

Met Gly Ser Thr Ser Glu Thr Lys Met Ser Pro Ser Glu Ala

1 5 10

gca gca gca gaa gaa gaa gca ttc gta ttc get atg caa tta ace agt 156 Ala Ala Ala Glu Glu Glu Ala Phe Val Phe Ala Met Gin Leu Thr Ser 15 20 25 30

get tea gtt ctt ccc atg gtc eta aaa tea gcc ata gag etc gac gtc 204 Ala Ser Val Leu Pro Met Val Leu Lys Ser Ala He Glu Leu Asp Val

35 40 45

tta gaa ate atg get aaa get ggt cca ggt gcg cac ata tec aca tct 252 Leu Glu He Met Ala Lys Ala Gly Pro Gly Ala His He Ser Thr Ser 50 55 60

gac ata gcc tct aag ctg ccc aca aag aat cca gat gca gcc gtc atg 300 Asp He Ala Ser Lys Leu Pro Thr Lys Asn Pro Asp Ala Ala Val Met 65 70 75

ctt gac cgt atg etc cgc etc ttg get age tac tct gtt eta acg tgc 348 Leu Asp Arg Met Leu Arg Leu Leu Ala Ser Tyr Ser Val Leu Thr Cys 80 85 90 tct etc cgc ace etc cct gac ggc aag ate gag agg ctt tac ggc ctt 396

Ser Leu Arg Thr Leu Pro Asp Gly Lys He Glu Arg Leu Tyr Gly Leu

95 100 105 110

gca ccc gtt tgt aaa ttc ttg ace aga aac gat gat gga gtc tec ata 444

Ala Pro Val Cys Lys Phe Leu Thr Arg Asn Asp Asp Gly Val Ser He

115 120 125

gcc get ctg tct etc atg aat caa gac aag gtc etc atg gag age tgg 492

Ala Ala Leu Ser Leu Met Asn Gin Asp Lys Val Leu Met Glu Ser Trp 130 135 140

tac cac ttg ace gag gca gtt ctt gaa ggt gga att cca ttt aac aag 540

Tyr His Leu Thr Glu Ala Val Leu Glu Gly Gly He Pro Phe Asn Lys

145 150 155

gcc tat gga atg aca gca ttt gag tac cat ggc ace gat ccc aga ttc 588

Ala Tyr Gly Met Thr Ala Phe Glu Tyr His Gly Thr Asp Pro Arg Phe 160 165 170

aac aca gtt ttc aac aat gga atg tec aat cat teg ace att ace atg 636 Asn Thr Val Phe Asn Asn Gly Met Ser Asn His Ser Thr He Thr Met

175 180 185 190 aag aaa ate ctt gag act tac aaa ggg ttc gag gga ctt gga tct gtg 684

Lys Lys He Leu Glu Thr Tyr Lys Gly Phe Glu Gly Leu Gly Ser Val

195 200 205

gtt gat gtt ggt ggt ggc act ggt gcc cac ctt aac atg att ate get 732 Val Asp Val Gly Gly Gly Thr Gly Ala His Leu Asn Met He He Ala 210 215 220

aaa tac ccc atg ate aag ggc att aac ttc gac ttg cct cat gtt att 780 Lys Tyr Pro Met He Lys Gly He Asn Phe Asp Leu Pro His Val He 225 230 235

gag gag get ccc tec tat cct ggt gtg gag cat gtt ggt gga gat atg 828 Glu Glu Ala Pro Ser Tyr Pro Gly Val Glu His Val Gly Gly Asp Met 240 245 250

ttt gtt agt gtt cca aaa gga gat gcc att ttc atg aag tgg ata tgt 876 Phe Val Ser Val Pro Lys Gly Asp Ala He Phe Met Lys Trp He Cys 255 260 265 270

cat gat tgg age gat gaa cac tgc ttg aag ttt ttg aag aaa tgt tat 924 His Asp Trp Ser Asp Glu His Cys Leu Lys Phe Leu Lys Lys Cys Tyr

275 280 285 gaa gca ctt cca ace aat ggg aag gtg ate ctt get gaa tgc ate etc 972

Glu Ala Leu Pro Thr Asn Gly Lys Val He Leu Ala Glu Cys He Leu 290 295 300

ccc gtg gcg cca gac gca age etc ccc act aag gca gtg gtc cat att 1020 Pro Val Ala Pro Asp Ala Ser Leu Pro Thr Lys Ala Val Val His He 305 310 315

gat gtc ate atg ttg get cat aac cca ggt ggg aaa gag aga act gag 1068 Asp Val He Met Leu Ala His Asn Pro Gly Gly Lys Glu Arg Thr Glu 320 325 330

aag gag ttt gag gcc ttg gcc aag ggg get gga ttt gaa ggt ttc cga 1116 Lys Glu Phe Glu Ala Leu Ala Lys Gly Ala Gly Phe Glu Gly Phe Arg 335 340 345 350

gta gta gcc teg tgc get tac aat aca tgg ate ate gaa ttt ttg aag 1164 Val Val Ala Ser Cys Ala Tyr Asn Thr Trp He He Glu Phe Leu Lys

355 360 365

aag att tgagtcctta cteggctttg agtacataat accaactcct tttggttttc 1220 Lys He

gagattgtga ttgtgattgt gattgtetct etttcgeagt tggcettatg atataatgta 1280 tegttaaete gatcacagaa gtgeaaaaga cagtgaatgt acaetgcttt ataaaataaa 1340

aattttaaga ttttgattca tgtaaaaaaa aaaaaaaaaa 1380

<210> 6

<211> 368

<212> PRT

<213> Liquidambar styraciflua

<400> 6

Met Gly Ser Thr Ser Glu Thr Lys Met Ser Pro Ser Glu Ala Ala Ala 1 5 10 15

Ala Glu Glu Glu Ala Phe Val Phe Ala Met Gin Leu Thr Ser Ala Ser 20 25 30

Val Leu Pro Met Val Leu Lys Ser Ala He Glu Leu Asp Val Leu Glu 35 40 45

He Met Ala Lys Ala Gly Pro Gly Ala His He Ser Thr Ser Asp He 50 55 60 Ala Ser Lys Leu Pro Thr Lys Asn Pro Asp Ala Ala Val Met Leu Asp

65 70 75 80

Arg Met Leu Arg Leu Leu Ala Ser Tyr Ser Val Leu Thr Cys Ser Leu

85 90 95

Arg Thr Leu Pro Asp Gly Lys He Glu Arg Leu Tyr Gly Leu Ala Pro 100 105 110

Val Cys Lys Phe Leu Thr Arg Asn Asp Asp Gly Val Ser He Ala Ala 115 120 125

Leu Ser Leu Met Asn Gin Asp Lys Val Leu Met Glu Ser Trp Tyr His 130 135 140

Leu Thr Glu Ala Val Leu Glu Gly Gly He Pro Phe Asn Lys Ala Tyr 145 150 155 160

Gly Met Thr Ala Phe Glu Tyr His Gly Thr Asp Pro Arg Phe Asn Thr

165 170 175

Val Phe Asn Asn Gly Met Ser Asn His Ser Thr He Thr Met Lys Lys 180 185 190 He Leu Glu Thr Tyr Lys Gly Phe Glu Gly Leu Gly Ser Val Val Asp

195 200 205

Val Gly Gly Gly Thr Gly Ala His Leu Asn Met He He Ala Lys Tyr 210 215 220

Pro Met He Lys Gly He Asn Phe Asp Leu Pro His Val He Glu Glu 225 230 235 240

Ala Pro Ser Tyr Pro Gly Val Glu His Val Gly Gly Asp Met Phe Val

245 250 255

Ser Val Pro Lys Gly Asp Ala He Phe Met Lys Trp He Cys His Asp 260 265 270

Trp Ser Asp Glu His Cys Leu Lys Phe Leu Lys Lys Cys Tyr Glu Ala 275 280 285

Leu Pro Thr Asn Gly Lys Val He Leu Ala Glu Cys He Leu Pro Val 290 295 300

Ala Pro Asp Ala Ser Leu Pro Thr Lys Ala Val Val His He Asp Val 305 310 315 320 lie Met Leu Ala His Asn Pro Gly Gly Lys Glu Arg Thr Glu Lys Glu

325 330 335

Phe Glu Ala Leu Ala Lys Gly Ala Gly Phe Glu Gly Phe Arg Val Val 340 345 350

Ala Ser Cys Ala Tyr Asn Thr Trp He He Glu Phe Leu Lys Lys He 355 360 365

<210> 7

<211> 2025

<212> DNA

<213> Liquidambar styraciflua

<220>

<221> CDS

<222> (60) .. (1679)

<400> 7 eggcacgage teattttcca cttctggttt gatctctgea attcttceat eagtcecta 59 atg gag ace caa aca aaa caa gaa gaa ate ata tat egg teg aaa etc 107

Met Glu Thr Gin Thr Lys Gin Glu Glu He He Tyr Arg Ser Lys Leu

1 5 10 15

ccc gat ate tac ate ccc aaa cac etc cct tta cat teg tat tgt ttc 155

Pro Asp He Tyr He Pro Lys His Leu Pro Leu His Ser Tyr Cys Phe

20 25 30

gag aac ate tea cag ttc ggc tec cgc ccc tgt ctg ate aat ggc gca 203

Glu Asn He Ser Gin Phe Gly Ser Arg Pro Cys Leu He Asn Gly Ala

35 40 45

acg ggc aag tat tac aca tat get gag gtt gag etc att gcg cgc aag 251

Thr Gly Lys Tyr Tyr Thr Tyr Ala Glu Val Glu Leu He Ala Arg Lys 50 55 60

gtc gca tec ggc etc aac aaa etc ggc gtt cga caa ggt gac ate ate 299 Val Ala Ser Gly Leu Asn Lys Leu Gly Val Arg Gin Gly Asp He He 65 70 75 80

atg ctt ttg eta ccc aac teg ccg gag ttc gtg ttt tea att etc ggc 347 Met Leu Leu Leu Pro Asn Ser Pro Glu Phe Val Phe Ser He Leu Gly

85 90 95

gca tec tac cgc ggg get gcc gcc ace gcc gca aac ccg ttt tat ace 395 Ala Ser Tyr Arg Gly Ala Ala Ala Thr Ala Ala Asn Pro Phe Tyr Thr 100 105 110

cct gcc gag ate agg aag caa gcc aaa ace tec aac gcc agg ctt att 443 Pro Ala Glu He Arg Lys Gin Ala Lys Thr Ser Asn Ala Arg Leu He 115 120 125

ate aca cat gcc tgt tac tat gag aaa gtg aag gac ttg gtg gaa gag 491 He Thr His Ala Cys Tyr Tyr Glu Lys Val Lys Asp Leu Val Glu Glu 130 135 140 aac gtt gcc aag ate ata tgt ata gac tea ccc ccg gac ggt tgt ttg 539

Asn Val Ala Lys He He Cys He Asp Ser Pro Pro Asp Gly Cys Leu 145 150 155 160

cac ttc teg gag ctg agt gag gcg gac gag aac gac atg ccc aat gta 587 His Phe Ser Glu Leu Ser Glu Ala Asp Glu Asn Asp Met Pro Asn Val

165 170 175

gag att gac ccc gat gat gtg gtg gcg ctg ccg tac teg tea ggg acg 635 Glu He Asp Pro Asp Asp Val Val Ala Leu Pro Tyr Ser Ser Gly Thr 180 185 190

acg ggt tta cca aag ggg gtg atg eta aca cac aag gga caa gtg acg 683 Thr Gly Leu Pro Lys Gly Val Met Leu Thr His Lys Gly Gin Val Thr 195 200 205

agt gtg gcg caa cag gtg gac gga gag aat ccg aac ctg tat ata cat 731 Ser Val Ala Gin Gin Val Asp Gly Glu Asn Pro Asn Leu Tyr He His 210 215 220

age gag gac gtg gtt ctg tgc gtg ttg cct ctg ttt cac ate tac teg 779 Ser Glu Asp Val Val Leu Cys Val Leu Pro Leu Phe His He Tyr Ser 225 230 235 240 atg aac gtc atg ttt tgc ggg tta cga gtt ggt gcg gcg att ctg att 827

Met Asn Val Met Phe Cys Gly Leu Arg Val Gly Ala Ala He Leu He

245 250 255

atg cag aaa ttt gaa ata tat ggg ttg tta gag ctg gtc aga agt aca 875

Met Gin Lys Phe Glu He Tyr Gly Leu Leu Glu Leu Val Arg Ser Thr

260 265 270

ggt gac cat cat gcc tat cgt aca ccc ate gta ttg gca ate tec aag 923

Gly Asp His His Ala Tyr Arg Thr Pro He Val Leu Ala He Ser Lys 275 280 285

act ccg gat ctt cac aac tat gat gtg tec tec att egg act gtc atg 971

Thr Pro Asp Leu His Asn Tyr Asp Val Ser Ser He Arg Thr Val Met 290 295 300

tea ggt gcg get cct ctg ggc aag gaa ctt gaa gat tct gtc aga get 1019

Ser Gly Ala Ala Pro Leu Gly Lys Glu Leu Glu Asp Ser Val Arg Ala 305 310 315 320

aag ttt ccc ace gcc aaa ctt ggt cag gga tat gga atg acg gag gca 1067

Lys Phe Pro Thr Ala Lys Leu Gly Gin Gly Tyr Gly Met Thr Glu Ala

325 330 335 ggg ccc gtg eta gcg atg tgt ttg gca ttt gcc aag gaa ggg ttt gaa 1115

Gly Pro Val Leu Ala Met Cys Leu Ala Phe Ala Lys Glu Gly Phe Glu 340 345 350

ata aaa teg ggg gca tct gga act gtt tta agg aac gca cag atg aag 1163 He Lys Ser Gly Ala Ser Gly Thr Val Leu Arg Asn Ala Gin Met Lys 355 360 365

att gtg gac cct gaa ace ggt gtc act etc cct cga aac caa ccc gga 1211 He Val Asp Pro Glu Thr Gly Val Thr Leu Pro Arg Asn Gin Pro Gly 370 375 380

gag att tgc att aga gga gac caa ate atg aaa ggt tat ctt aat gat 1259 Glu He Cys He Arg Gly Asp Gin He Met Lys Gly Tyr Leu Asn Asp 385 390 395 400

cct gag gcg acg gag aga ace ata gac aag gaa ggt tgg tta cac aca 1307 Pro Glu Ala Thr Glu Arg Thr He Asp Lys Glu Gly Trp Leu His Thr

405 410 415

ggt gat gtg ggc tac ate gac gat gac act gag etc ttc att gtt gat 1355 Gly Asp Val Gly Tyr He Asp Asp Asp Thr Glu Leu Phe He Val Asp 420 425 430 egg ttg aag gaa ctg ate aaa tac aaa ggg ttt cag gtg gca ccc get 1403

Arg Leu Lys Glu Leu He Lys Tyr Lys Gly Phe Gin Val Ala Pro Ala

435 440 445

gag ctt gag gcc atg etc etc aac cat ccc aac ate tct gat get gcc 1451

Glu Leu Glu Ala Met Leu Leu Asn His Pro Asn He Ser Asp Ala Ala

450 455 460

gtc gtc cca atg aaa gac gat gaa get gga gag etc cct gtg gcg ttt 1499

Val Val Pro Met Lys Asp Asp Glu Ala Gly Glu Leu Pro Val Ala Phe 465 470 475 480

gtt gta aga tea gat ggt tct cag ata tec gag get gaa ate agg caa 1547

Val Val Arg Ser Asp Gly Ser Gin He Ser Glu Ala Glu He Arg Gin

485 490 495

tac ate gca aaa cag gtg gtt ttt tat aaa aga ata cat cgc gta ttt 1595

Tyr He Ala Lys Gin Val Val Phe Tyr Lys Arg He His Arg Val Phe 500 505 510

ttc gtc gaa gcc att cct aaa gcg ccc tct ggc aaa ate ttg egg aag 1643

Phe Val Glu Ala He Pro Lys Ala Pro Ser Gly Lys He Leu Arg Lys

515 520 525 gac ctg aga gcc aaa ttg gcg tct ggt ctt ccc aat taattctcat 1689

Asp Leu Arg Ala Lys Leu Ala Ser Gly Leu Pro Asn

530 535 540

tcgctaccct cctttctctt atcatacgcc aacacgaacg aagaggctca attaaacgct 1749

geteattega agcggctcaa ttaaagctgc tcattcatgt ecaccgagtg ggeagcctgt 1809

cttgttggga tgttctttca tttgattcag ctgtgagaag ccagaccctc attatttatt 1869

gtgaaattea eaagaatgte tgtaaatega tgttgtgagt gatgggttte aaaaeaettt 1929

tgaeattgtt taegttgtat ttcctgctgt tgaaaataac tactttgtat gaettttatt 1989

tgggaagata acctttcaaa aaaaaaaaaa aaaaaa 2025

<210> 8

<211> 540

<212> PRT

<213> Liquidambar styraciflua

<400> 8

Met Glu Thr Gin Thr Lys Gin Glu Glu He He Tyr Arg Ser Lys Leu 1 5 10 15 Pro Asp He Tyr He Pro Lys His Leu Pro Leu His Ser Tyr Cys Phe 20 25 30

Glu Asn He Ser Gin Phe Gly Ser Arg Pro Cys Leu He Asn Gly Ala 35 40 45

Thr Gly Lys Tyr Tyr Thr Tyr Ala Glu Val Glu Leu He Ala Arg Lys 50 55 60

Val Ala Ser Gly Leu Asn Lys Leu Gly Val Arg Gin Gly Asp He He 65 70 75 80

Met Leu Leu Leu Pro Asn Ser Pro Glu Phe Val Phe Ser He Leu Gly

85 90 95

Ala Ser Tyr Arg Gly Ala Ala Ala Thr Ala Ala Asn Pro Phe Tyr Thr 100 105 110

Pro Ala Glu He Arg Lys Gin Ala Lys Thr Ser Asn Ala Arg Leu He 115 120 125 He Thr His Ala Cys Tyr Tyr Glu Lys Val Lys Asp Leu Val Glu Glu

130 135 140

Asn Val Ala Lys He He Cys He Asp Ser Pro Pro Asp Gly Cys Leu 145 150 155 160

His Phe Ser Glu Leu Ser Glu Ala Asp Glu Asn Asp Met Pro Asn Val

165 170 175

Glu He Asp Pro Asp Asp Val Val Ala Leu Pro Tyr Ser Ser Gly Thr 180 185 190

Thr Gly Leu Pro Lys Gly Val Met Leu Thr His Lys Gly Gin Val Thr 195 200 205

Ser Val Ala Gin Gin Val Asp Gly Glu Asn Pro Asn Leu Tyr He His 210 215 220

Ser Glu Asp Val Val Leu Cys Val Leu Pro Leu Phe His He Tyr Ser 225 230 235 240

Met Asn Val Met Phe Cys Gly Leu Arg Val Gly Ala Ala He Leu He

245 250 255 Met Gin Lys Phe Glu He Tyr Gly Leu Leu Glu Leu Val Arg Ser Thr

260 265 270

Gly Asp His His Ala Tyr Arg Thr Pro He Val Leu Ala He Ser Lys

275 280 285

Thr Pro Asp Leu His Asn Tyr Asp Val Ser Ser He Arg Thr Val Met

290 295 300

Ser Gly Ala Ala Pro Leu Gly Lys Glu Leu Glu Asp Ser Val Arg Ala 305 310 315 320

Lys Phe Pro Thr Ala Lys Leu Gly Gin Gly Tyr Gly Met Thr Glu Ala

325 330 335

Gly Pro Val Leu Ala Met Cys Leu Ala Phe Ala Lys Glu Gly Phe Glu 340 345 350

He Lys Ser Gly Ala Ser Gly Thr Val Leu Arg Asn Ala Gin Met Lys 355 360 365

He Val Asp Pro Glu Thr Gly Val Thr Leu Pro Arg Asn Gin Pro Gly 370 375 380

Glu He Cys He Arg Gly Asp Gin He Met Lys Gly Tyr Leu Asn Asp 385 390 395 400 Pro Glu Ala Thr Glu Arg Thr He Asp Lys Glu Gly Trp Leu His Thr

405 410 415

Gly Asp Val Gly Tyr He Asp Asp Asp Thr Glu Leu Phe He Val Asp 420 425 430

Arg Leu Lys Glu Leu He Lys Tyr Lys Gly Phe Gin Val Ala Pro Ala 435 440 445

Glu Leu Glu Ala Met Leu Leu Asn His Pro Asn He Ser Asp Ala Ala 450 455 460

Val Val Pro Met Lys Asp Asp Glu Ala Gly Glu Leu Pro Val Ala Phe 465 470 475 480

Val Val Arg Ser Asp Gly Ser Gin He Ser Glu Ala Glu He Arg Gin

485 490 495

Tyr He Ala Lys Gin Val Val Phe Tyr Lys Arg He His Arg Val Phe 500 505 510 Phe Val Glu Ala He Pro Lys Ala Pro Ser Gly Lys He Leu Arg Lys

515 520 525

Asp Leu Arg Ala Lys Leu Ala Ser Gly Leu Pro Asn 530 535 540

<210> 9

<211> 1544

<212> DNA

<213> Pinus taeda

<400> 9 aaagataata tatgtgtatg cctactacta cacattgttt tgaagtgtgt aaacatagtg 60

caaeactagg aggaetcaca atgageaett gttgaeatga aactagctaa atgeccaaca 120

atattagtga aagetagtta aactaaecce tttgaettte aagatgatat atttatatec 180

ctactacgtc ttcctctttt tgtctttctc ttgtgattaa accttcettg aaacaattct 240

caaatgtaaa attaaacett gaaacttgta gagaceaaae tteeetagga gaaaccaeat 300

ttatgacaac atatatacac caacccattg catactataa tattggaatt acctgcagcg 360 aacgaaagaa acgctgtctc accaactcgt gcactacatc ccgaaactta accttcccct 420 gatacagatt gaagagccga aaaaagcgtg catccaaatt tctggtatgg tgaggagccg 480 aaaaacgcgt gcgcctaatt tttttgagat gggccggaaa ataatgcgtg catctaaatt 540 ttcacgtgtc gcgtattggc gaggttgcgc tgaatgtgat cctgtgcgtg agccacattc 600 attccattgg ttgacccgcc ggtaccgcga ggaccgtggg gtctcacaga tacgcggatg 660 gtggatcagc actgagaaga ttagatgatg accaggeggg eatttgaagt aaaaacttgg 720 gggtggttgg caagtacgeg acaaagaggg gtagtgcgca aggaagegag ttggatgcaa 780 ataatattac aaagtgggtt ggtgggcatg agcatcaacc agaatgatgt tgttgctggt 840 tccgtgcaaa ttctgaccag tagtttgaac aatactaccc aacttgtttt tggtaaaaca 900 tgaagtgggt aaggagaatt gaacttacgt ctcatggtaa agggcaaggg caaatgactt 960 aacacatace tttaactaat aaaaatacce etaacaaata egaaaaegaa tgagttatca 1020 cagaccttea actaataaga tageeatcag acccacatct ectgactgae caaaaacaaa 1080 tgacttcaac caactaagat acccatcaaa gctaacccac aacccaattc ctcacttccc 1140

cttaccagac caaccaagca gacctacgcc attaactact ttaggacgtg ggaattgggg 1200

gtgccaccgt tgaagaatgg cactcagggt tggtaatccc tccacgtgta tgtagcagtc 1260

gtttggtgga gacggcgtgt ttgaatgtcc accttccagt ttggagaaca aggaaattgg 1320

gettatatta ggcetggate tettgtttca gagcaggagt agttcaggac aggaactagc 1380

attcaagaat tcaattgccc tgccctgctc tgctctgctt tgctcaactt attgatccct 1440

getctggttt gttcaatttc ttgaecectg ctgggttetg etetggtttg cacactttct 1500

cgattatata agtcattttg gatccttgea aggaagagaa tatg 1544

<210> 10

<211> 659

<212> DNA

<213> Pinus taeda

<400> 10 aaacaccaat ttaatgggat ttcagatttg tatcccatgc tattggctaa ggcatttttc 60 ttattgtaat ctaaccaatt ctaatttcca ccctggtgtg aactgactga caaatgcggt 120 ccgaaaacag cgaatgaaat gtctgggtga tcggtcaaac aagcggtggg cgagagagcg 180 cgggtgttgg cctagccggg atgggggtag gtagacggcg tattaccggc gagttgtccg 240 aatggagttt tcggggtagg tagtaaegta gaegtcaatg gaaaaagtea taatetcegt 300 caaaaatcca accgctcctt cacatcgcag agttggtggc cacgggaccc tccacccact 360 cactcaatcg atcgcctgcc gtggttgccc attattcaac catacgccac ttgactcttc 420 accaacaatt ccaggccggc tttctataca atgtactgca caggaaaatc caatataaaa 480 agccggcctc tgcttccttc tcagtagccc ccagctcatt caattcttcc cactgcaggc 540 tacatttgtc agaeacgttt tcegecattt ttcgcctgtt tctgcggaga atttgateag 600 gttcggattg ggattgaate aattgaaagg tttttatttt cagtatttcg ategecatg 659

<210 > 11 <211> 2251

<212> DNA

<213> Pinus taeda

<400> 11 ggccgggtgg tgacatttat tcataaattc atctcaaaac aagaaggatt tacaaaaata 60

aaagaaaaca aaattttcat ctttaacata attataattg tgttcacaaa attcaaactt 120

aaaccettaa tataaagaat ttettteaac aataeacttt aateacaact tettcaatea 180

caacctcctc caacaaaatt aaaatagatt aataaataaa taaacttaac tatttaaaaa 240

aaaatattat acaaaattta ttaaaacttc aaaataaaca aactttttat acaaaattca 300

tcaaaacttt aaaataaagc taaacactga aaatgtgagt acatttaaaa ggaegctgat 360

cacaaaaatt ttgaaaacat aaacaaactt gaaactctac cttttaagaa tgagtttgtc 420

gtctcattaa cteattagtt ttatagtteg aatecaatta aegtatcttt tattttatgg 480

aataagggtg ttttaataag tgattttggg atttttttag taatttattt gtgatatgtt 540

atggagtttt taaaaatata tatatatata tatatttttg ggttgagttt acttaaaatt 600 tggaaaaggt tggtaagaac tataaattga gttgtgaatg agtgttttat ggatttttta 660 agatgttaaa tttatatatg taattaaaat tttattttga ataacaaaaa ttataattgg 720 ataaaaaatt gttttgttaa atttagagta aaaatttcaa aatctaaaat aattaaacac 780 tattattttt aaaaaatttg ttggtaaatt ttatcttata tttaagttaa aatttagaaa 840 aaattaattt taaattaata aacttttgaa gtcaaatatt ccaaatattt tccaaaatat 900 taaatctatt ttgcattcaa aatacaattt aaataataaa acttcatgga atagattaac 960 caatttgtat aaaaaceaaa aateteaaat aaaatttaaa ttacaaaaea ttateaaeat 1020 tatgatttca agaaagacaa taaccagttt ccaataaaat aaaaaacctc atggcccgta 1080 attaagatet cattaattaa ttcttatttt ttaatttttt tacatagaaa atatctttat 1140 attgtatcca agaaatatag aatgtteteg tccagggact attaatctec aaacaagttt 1200 caaaatcatt acattaaagc tcatcatgtc atttgtggat tggaaattat attgtataag 1260 agaaatatag aatgtteteg tctagggact attaatttcc aaacaaattt caaaatcatt 1320 acattaaagc tcatcatgtc atttgtggat tggaaattag acaaaaaaaa tcccaaatat 1380 ttctctcaat ctcccaaaat atagttcgaa ctccatattt ttggaaattg agaatttttt 1440 tacccaataa tatatttttt tatacatttt agagattttc cagacatatt tgctctggga 1500 tttattggaa tgaaggttga gttataaact ttcagtaatc caagtatctt cggtttttga 1560 agatactaaa tccattatat aataaaaaca cattttaaac accaatttaa tgggatttca 1620 gatttgtate ceatgetatt ggctaaggea tttttettat tgtaatetaa ecaattetaa 1680 tttccaccct ggtgtgaact gactgacaaa tgeggtcega aaacagcgaa tgaaatgtct 1740 gggtgategg teaaaeaagc ggtgggcgag agagegeggg tgttggccta gccgggatgg 1800 gggtaggtag aeggegtatt aceggegagt tgtccgaatg gagttttegg ggtaggtagt 1860 aacgtagacg tcaatggaaa aagtcataat ctccgtcaaa aatccaaccg ctccttcaca 1920 tegcagagtt ggtggccacg ggaccctcca cccactcact cgatcgcctg ccgtggttgc 1980 ccattattca accatacgcc acttgactct tcaccaacaa ttccaggccg gctttctata 2040 caatgtactg eacaggaaaa tccaatataa aaagccggcc tctgettcet tetcagtagc 2100 ecccagctea ttcaattett eccactgeag getacatttg teagacaegt tttcegeeat 2160 ttttcgcctg tttctgegga gaatttgate aggtteggat tgggattgaa teaattgaaa 2220 ggtttttatt ttcagtattt cgatcgccat g 2251

Claims

What is Claimed Is:

1. A method for altering lignin biosynthesis in a gymnosperm plant comprising expressing an angiosperm coniferyl aldehyde 5-hydrohylase (CAld5H) in a gymnosperm plant, wherein said CAld5H catalyzes 5-hydroxilation of coniferyl aldehyde in said gymnosperm plant.

2. The method of claim 1 , wherein said gymnosperm plant is a plant cell, a plant organ or an entire plant.

3. The method of claim 1 , wherein said angiosperm is a sweetgum.

4. The method of claim 1 , wherein said CAld5H is encoded by a polynucleotide comprising the nucleotides 74-1606 of SEQ ID NO:3.

5. The method of claim 1 , further comprising the step of incorporating a polynucleotide encoding said angiosperm CAld5H into said gymnosperm plant.

6. The method of claim 5, wherein upstream to said polynucleotide is a promoter region selected from the group consisting ofthe 5' flanking region of phenylalanine ammonia lyase (PAL), the 5 'flanking region of 4-coumarate CoA ligase IB (4CL1B), and the 5'flanking region of 4-coumarate CoA ligase 3B (4CL3B).

7. The method of claim 1, wherein said CAld5H is encoded by a polynucleotide comprising the nucleotides 74-1606 of SEQ ID NO: 3 and said gymnosperm is a loblolly pine.

8. The method of claim 1, wherein said CAld5H has at least about 50- fold greater specific activity for coniferyl aldehyde than for ferulic acid.

9. The method of claim 1 , wherein said CAld5H has at least about 140-fold greater specific activity for coniferyl aldehyde than for ferulic acid.

10. A method for initiating siryngyl lignin biosynthesis in a gymnosperm plant comprising expressing an angiosperm coniferyl aldehyde 5-hydrohylase (CAW5H) in a gymnosperm plant, wherein said CAld5H catalyzes conversion of coniferyl aldehyde to 5-hydroxyconiferyl aldehyde in said gymnosperm and wherein said conversion represents the first step in the formation of syringyl monolignols..

11. The method of claim 10, wherein said gymnosperm plant is a plant cell, a plant organ or an entire plant.

12. The method of claim 10, wherein said angiosperm is a sweetgum.

13. The method of claim 10, wherein said CAld5H is encoded by a polynucleotide comprising the nucleotides 74- 1606 of SEQ ID NO:3.

14. The method of claim 10, further comprising the step of incorporating a polynucleotide encoding said angiosperm CAld5H into said gymnosperm plant.

15. The method of claim 14, wherein upstream to said polynucleotide is a promoter region selected from the group consisting ofthe 5' flanking region of phenylalanine ammonia lyase (PAL), the 5' flanking region of 4-coumarate CoA ligase IB (4CL1B), and the 5' flanking region of 4-coumarate CoA ligase 3B (4CL3B).

16. The method of claim 10, wherein said CAld5H is encoded by a polynucleotide comprising the nucleotides 74-1606 of SEQ ID NO:3 and said gymnosperm is a loblolly pine.

17. The method of claim 10, wherein said CAld5H has at least about 50-fold greater specific activity for coniferyl aldehyde than for ferulic acid.

18. The method of claim 10, wherein said CAld5H has at least about 140-fold greater specific activity for coniferyl aldehyde than for ferulic acid.

19. A method of mediating a conversion of coniferyl aldehyde to 5- hydroxyconiferyl aldehyde comprising contacting an angiosperm coniferyl aldehyde 5- hydrohylase (CAld5H) with coniferyl aldehyde to produce 5 -hydroxyconiferyl aldehyde.

20. The method of claim 19, wherein said CAld5H comprises the amino acid sequence represented by SEQ ID NO:4.

21. The method of claim 19, wherein said C Ald5H has at least about 50-fold greater specific activity for coniferyl aldehyde than for ferulic acid.

22. The method of claim 19, wherein said CAld5H has at least about 140-fold greater specific activity for coniferyl aldehyde than for ferulic acid.

23. The method of claim 19, v/herein said angiosperm CAld5H is from a sweetgum.

24. An isolated polynucleotide comprising a sequence encoding a sweetgum coniferyl aldehyde 5-hydrohylase (CAld5H).

25. An isolated polynucleotide comprising the nucleotide sequence as represented by the nucleotides 74-1606 of SEQ ID NO:3.

26. An isolated polynucleotide that hybridizes under stringent conditions to the polynucleotide of claim 25.

27. The isolated polynucleotide of claim 26, wherein said polynucleotide is at least about 1000 nucleotides long.

28. The isolated polynucleotide of claim 26, wherein said polynucleotide is at least about 500 nucleotides long.

29. An isolated polynucleotide encoding a polypeptide having at least 85% similarity to the polypetide of SEQ ID NO:4.

30. An isolated polynucleotide having at least 75% identity to the polynucleotide of claim 25.

31. A sequence conservative variant of the polynucleotide of claim 25.

32. A method for identifying a polynucleotide encoding an angiosperm coniferyl aldehyde 5-hydrohylase (CAld5H) or a portion thereof comprising the step of using (i) a polynucleotide comprising the nucleotides of SEQ ID NO: 3 or a fragment thereof, or (ii) a protein having the amino acid sequence of SEQ ID NO:4 or a fragment thereof, to identify a polynucleotide encoding said angiosperm CAld5H or a portion thereof.

33. The method of claim 32, wherein said polynucleotide comprising the nucleotides of SEQ TO NO:3 or a fragment thereof is a probe or a PCR primer.

34. The method of claim 32, wherein said step of using the protein having the amino acid sequence of SEQ ID NO:4 or a fragment thereof comprises preparation of antibodies to said protein or fragment thereof, said method further comprising the step of screening an angiosperm expression library with said antibodies.

35. An isolated polynucleotide comprising a nucleotide sequence encoding an angiosperm coniferyl aldehyde 5-hydrohylase (CAld5H) identified according to the method of claim 32.