WO2024141641A2

WO2024141641A2 - Secretion signals

Info

Publication number: WO2024141641A2
Application number: PCT/EP2023/087985
Authority: WO
Inventors: Lucia Nancy COCONI LINARES; Yentil DE VLEESCHOUWER; Marie-Laure Juliette ERFFELINCK; Anton Alain An HEYMAN; Alrik Pieter Los; Deniz Güver MALAT
Original assignee: Biotalys NV
Current assignee: Biotalys NV
Priority date: 2022-12-30
Filing date: 2023-12-29
Publication date: 2024-07-04
Anticipated expiration: 2025-06-30
Also published as: EP4642916A2; WO2024141641A3

Abstract

The present invention relates to a microbial host cell for expressing a gene of interest, for example a VHH, wherein the gene of interest is fused to a secretion signal sequence comprising a signal peptide encoding sequence. The gene of interest fused to the secretion signal sequence is further comprised in an expression cassette of which one or more are integrated in the microbial host cell. The present invention further provides nucleic acids encoding secretion signal sequences and peptides encoded by said nucleic acids. The present invention further provides expression cassettes comprising said nucleic acids and a promoter operably linked thereto and vectors comprising said nucleic acids or said expression cassettes. Finally, the present invention relates to a microbial host cell comprising said nucleic acids, expression cassettes or vectors, a method for producing a protein and a protein comprising the peptide encoded by said nucleic acids.

Description

Secretion signals

Field of the invention

The present invention relates to the field of biotechnology, specifically to the field of recombinant protein expression. More specifically, the present invention relates to cells modified to express higher yields of recombinant protein or a protein encoded by a gene of interest.

Background

Yeasts in general and Komagataella phaffii (K. phaffii; synonym: Pichia pastoris) in particular are popular expression systems for the secretion of recombinant proteins. The initial and crucial step in secretion is the translocation of the recombinant protein into the endoplasmic reticulum (ER). This process is directed by a secretion signal fused to the recombinant protein. The signal sequence specifies either a co-translational or posttranslational targeting route to the ER on the conventional secretion pathway (Ng et al. The Journal of cell biology. 1996. 134 (2), 269-78). The most commonly used secretion signal in K. phaffii is the Saccharomyces cerevisiae a-mating factor pre-pro peptide (a-MF) (Lin-Cereghino et al. Gene. 2013. 519, 31 1 -7). This secretion signal mediates post-translational translocation in S. cerevisiae and most likely in K. phaffii too (Fitzgerald & Glick. Microb Cell Fact. 2014. 13, 125; Ng et al. The Journal of cell biology. 1996. 134 (2), 269-78). Other secretion signals are continually added to the repertoire and tested with different recombinant proteins.

Today, mammalian antibodies have become the dominant product class within the biopharmaceutical market (Ecker et al. MAbs. 2015. 7, 9-14.). Antibodies are known to be co-translationally translocated in their native environment (Feige et al.. Trends Biochem Sci. 2010. 35, 189-89). A trend toward development of smaller antigen-binding fragments (e.g. Fab, scFv and VHH) is also evident (Nelson & Reichert. Nat Biotechnol. 2009. 27, 331 -7; Walsh. Nat Biotechnol. 2014. 32, 992-1000). Recently the use of VHH in agriculture as biological crop protection products has provided an increased necessity for production of VHH at large scale production levels that surpass the production levels usually required in the biopharmaceutical market.

As the production of many mammalian proteins in yeast and at high quantities may require different and more efficient ways or cellular pathways to be followed by the recombinant protein for production and secretion, the a-MF secretion signal could be suboptimal, and it may be preferable to use alternative secretion signals (Ng et al. The Journal of cell biology. 1996. 134 (2), 269-78). In fact, the secretion signal a-MF, has already been reported to cause a bottleneck in translocation (Fitzgerald & Glick. Microb Cell Fact. 2014. 13, 125; Zahri et al. Microbiology. 2018.). WO2018165589 and WO2018165594 disclose a recombinant secretion signal comprising an a-MF pro-peptide originating from Saccharomyces cerevisiae and a signal peptide other than a-MF secretion peptide originating from Saccharomyces cerevisiae.

Consequently, there is still a need for secretion signals that increase the secretion of various proteins such as VHH. Moreover, there still exists a need to drive the production levels of VHH even higher and to the limit with which the yeast secretion pathway can process secretion of recombinant proteins such as VHH. The technical problem therefore is to comply with this need. Summary of the invention

The present invention relates to a microbial host cell comprising one or more copies of an expression cassette comprising a promoter capable of promoting expression of a gene of interest, and the gene of interest, wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and a terminator, and wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell. In some embodiments the microbial host cell comprises two or more copies of said expression cassette.

The present invention further relates to the use of a microbial host cell, comprising one or more copies of an expression cassette comprising a promoter capable of promoting expression of a gene of interest, and the gene of interest, wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and a terminator, and wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell, for manufacturing a protein, where the protein is encoded by the gene of interest.

The present invention further relates to a nucleic acid comprising a secretion peptide-encoding sequence wherein the signal peptide-encoding sequence is the nucleotide sequence according to the nucleotide sequence of SEQ ID NO: 101 , a nucleotide sequence with at least 90% identity to SEQ ID NO:

101 , a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 18, or a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 18.

The present invention further relates to a nucleic acid comprising a secretion peptide-encoding sequence wherein the signal peptide-encoding sequence is the nucleotide sequence according to the nucleotide sequence of SEQ ID NO: 102, a nucleotide sequence with at least 90% identity to SEQ ID NO:

102, a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 19, or a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 19.

The present invention further relates to a nucleic acid comprising a secretion peptide-encoding sequence wherein the signal peptide-encoding sequence is the nucleotide sequence according to the nucleotide sequence of SEQ ID NO: 103, a nucleotide sequence with at least 90% identity to SEQ ID NO:

103, a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 120, or a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 120.

The present invention further relates to a nucleic acid comprising a secretion peptide-encoding sequence wherein the signal peptide-encoding sequence is the nucleotide sequence according to the nucleotide sequence of SEQ ID NO: 106, a nucleotide sequence with at least 90% identity to SEQ ID NO: 106, a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 123, or a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 123.

The present invention further relates to the use of a nucleic acid of the invention as or in a secretion signal sequence.

The present invention further relates to an expression cassette comprising a nucleic acid of the invention, and a promoter operably linked to the nucleic acid, and optionally a gene of interest. The present invention further relates to a vector comprising a nucleic acid of the invention or an expression cassette of the invention, or a vector comprising said nucleic acid and a promoter operable linked to said nucleic acid and optionally a gene of interest.

The present invention further relates to a method for producing a protein, the method comprising

5 culturing a microbial host cell of the invention, or a microbial host cell comprising an expression cassette or vector of the invention, under conditions to express the gene of interest, wherein the gene of interest encodes the protein, and optionally isolating the protein, and optionally purifying the protein, and optionally modifying the protein, and optionally formulating the protein. The present invention further relates to a protein produced by said method.

10 The present invention further relates to a peptide comprising the amino acid sequence provided in SEQ ID NO: 1 18, or an amino acid sequence with at least 90% identity to SEQ ID NO: 1 18, a peptide comprising the amino acid sequence provided in SEQ ID NO: 1 19, or an amino acid sequence with at least 90% identity to SEQ ID NO: 1 19, a peptide comprising the amino acid sequence provided in SEQ ID NO: 120, or an amino acid sequence with at least 90% identity to SEQ ID NO: 120, or a peptide comprising the

15 amino acid sequence provided in SEQ ID NO: 123, or an amino acid sequence with at least 90% identity to SEQ ID NO: 123. The present invention further relates to a protein comprising said peptide, for example a recombinant fusion protein comprising the peptide. The present invention further relates to the use of said peptide as or in a secretion signal.

The present invention further relates to a microbial host cell comprising a nucleic acid of the invention,

20 an expression cassette of the invention, a vector of the invention; or a peptide of the invention.

Description of the sequence listing

Description of the figures

Figure 1 : Sets out the expression levels according to a Bradford coloration assay of VHH-X fused to different secretion signal sequences.

5 Figure 2: Sets out the expression levels according to a Bradford coloration assay of VHH-Y fused to different secretion signal sequences.

Figure 3: Sets out the expression levels according to a Bradford coloration assay of VHH-Z fused to different secretion signal sequences.

10 Detailed description of the invention

Reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that this prior art forms part of the common general knowledge in any country.

All documents cited in the present specification are hereby incorporated by reference in their entirety. Unless otherwise defined, all terms used in disclosing the invention, including technical and scientific terms,

15 have the meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The present invention will be described with respect to particular embodiments but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope.

Where the term “comprising” is used in the present description and claims, it does not exclude other

20 elements or steps.

Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated.

The term ’’about” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, is meant to encompass variations of +/-10% or less, preferably

25 +/-5% or less, more preferably +/-1% or less, and still more preferably +/-0.1 % or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier ‘about’ refers is itself also specifically, and preferably, disclosed.

The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 2^nd ed., Cold Spring Harbor Press, Plainsview, New York (1989); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 47), John Wiley & Sons, New York (1999), for definitions and terms of the art.

The definitions provided herein should not be construed to have a scope less than understood by a person of ordinary skill in the art. Unless indicated otherwise, all methods, steps, techniques and manipulations that are not specifically described in detail can be performed and have been performed in a manner known per se, as will be clear to the skilled person. Reference is for example again made to the standard handbooks, to the general background art referred to above and to the further references cited therein.

Microbial host cells

The present invention provides a microbial host cell comprising: one or more copies of an expression cassette comprising a promoter capable of promoting expression of a gene of interest, and the gene of interest, wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and where the expression cassette further comprises a terminator, and wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell. Preferably, the one or more copies of the expression cassette are integrated into the chromosomal DNA of the microbial host cell.

A microbial host cell is defined as a single cellular organism which can be used in a fermentation process or in cell culture to produce a protein encoded by a gene of interest. The microbial host cell may be a prokaryotic cell or a eukaryotic cell. Preferably, the microbial host cell is selected from the kingdom Fungi. In particular, the fungus may be a yeast. The yeast may be selected from the group consisting of Pichia (also known as and herein referred to as Komagataella), Candida, Torulopsis, Arxula, Hansenula, Yarrowia, Kluyveromyces and Saccharomyces. More preferably, the microbial host cell may be from the Pichia genus (also known as Komagataella), such as P. pastoris (herein reffered to as Komagataella phaffii), P. farinose, P. anomala, P. heedii, P. guilliermondii, P. kluyveri, P. membranifaciens, P. norvegensis, P. ohmeri, P. methanolica and P. subpelliculosa. Most preferably, the microbial host cell may be Komagataella phaffii.

Expression cassettes

The microbial host cell of the present invention comprises one or more copies of an expression cassette comprising a promoter capable of promoting expression of a gene of interest.

The term “expression cassette” as used herein refers to a distinct functional unit of nucleotide sequence (i.e. DNA) comprising a regulatory sequence such as a promoter capable of expressing a gene of interest. The expression cassette may comprise a gene of interest or alternatively may be provided in a form suitable for insertion of a gene of interest into the expression cassette (e.g., the expression cassette may comprise a multiple cloning site for insertion of the gene of interest). Typically, translation of a gene of interest in an expression cassette according to this invention is initiated by a start codon, which initiates translation by the ribosomal translation machinery of the microbial cell. Where the 5’-end of the gene of interest is fused to a secretion signal sequence the start codon is situated at the 5’-end of said secretion signal sequence. Typically, a gene of interest in an expression cassette according to this invention is terminated with a stop-codon. The expression cassette according to the invention may further include a terminator sequence, halting expressing progression at the end of the expression cassette. An expression cassette may comprise additional regulatory and other sequences such as signal sequences, introns, IRES- sequences, ribosomal binding sites etc.

The expression cassettes may be provided as part of a vector. One or more expression cassettes, and optionally further parts of said vectors, may be integrated into the genome of the microbial host cell. The expression cassette may be first amplified by PCR using said vectors as a DNA template whereafter said PCR products can be used for transforming two or more different expression cassettes to the host cell. Alternatively, the expression cassette is synthesized or assembled without the need for it to be included into a vector. The microbial host cell of the present invention may thus have the PCR products containing the expression cassettes integrated into its genome (partially or completely). It might also be that some of the transformed vectors are integrated (partially or completely) into the genome of said host cells, whereas other of said transformed vectors are present as plasmids within the cytosol of said host cell.

The phrase “under the control of” and “capable of promoting expressing” and “promote the expression of” or “operably linked” as used herein are interchangeable. Thus, a promoter that is “capable of promoting expression of a gene of interest” is a promoter that is operably linked to the gene of interest and which, under suitable conditions, promotes the expression of the gene of interest in and by the microbial host cell. In some embodiments, the gene of interest may be under the control of a constitutive promoter, or the gene of interest may be under the control of an inducible promoter. When under the control of an inducible promoter, methods of the invention may comprise a step of inducing expression of the gene of interest by the microbial host cell. It is said that the promoter and the gene of interest are operably linked.

The phrase “fused to” as used herein, means that two or more distinct nucleic acid sequences are so organized so that the amino acid sequences encoded by the two or more distinct nucleic acid sequences are expressed as one single polypeptide chain. For example, expression of a secretion signal sequence fused to a gene of interest will results in a single polypeptide chain comprising the secretion signal and the protein of interest (encoded by the gene of interest) linked in a single polypeptide chain. Of course, the single polypeptide chain can thereafter be further processed and for example cleaved, such as the cleavage of a secretion signal from the protein of interest during secretion. The single polypeptide chain may further comprise an additional amino acid sequence between the secretion signal and the protein of interest. Thus the secretion signal sequence may be fused to the gene of interest directly (no additional sequence in between) or indirectly (an additional amino acid sequence in between), provided that the protein of interest and the secretion signal are expressed in a single polypeptide chain.

The phrase ’’fused to” in the context of a protein or “fusion protein” as used herein, means that two or more distinct proteins or peptides (such as a signal sequence) are produced as one single polypeptide chain. It is said the two proteins may be fused. For example a signal sequence may be fused to a protein of interest. In general a fused protein results from the expression of two or more nucleic acids sequences that are expressed as a single polypeptide chain as described above. An example of a fusion protein is a precursor protein where the signal sequence is fused to the N-terminus of the protein of interest. In some embodiments, the expression cassette comprises, in a 5' to 3' order, a promoter, a start codon, a secretion signal sequence, optionally a N-terminal tag, a gene of interest encoding a protein of interest, optionally a C-terminal tag, a stop codon and a terminator sequence. A “start codon” as used herein refers to the first codon of a messenger RNA (mRNA) transcript translated by a ribosome or the first codon of a DNA sequence encoding the mRNA. The most common start codon is AUG (i.e ., ATG in the corresponding DNA sequence). Optionally, the start codon is preceded by a 5' untranslated region (5' UTR).

A “stop codon” as used herein may refer to a nucleotide sequence within a messenger RNA (mRNA) molecule that signals a halt to protein synthesis, or the corresponding DNA sequence. Possible stop codons are UAG, UAA, and UGA (i.e., TAG, TAA, TGA in the corresponding DNA sequence).

The phrase “gene of interest” as used herein refers to a sequence of nucleotides that encodes a protein of interest, e.g., a recombinant protein (e.g., a VHH) to be produced in a microbial host cell. The gene of interest my comprise intron and exon sequence or sequences where the genetic information in the intron sequence or sequences may not necessarily be present in the final protein product. When referring to a gene of interest this does not include additional sequences such as secretion signal sequences or tags which although fused to the gene of interest and may be present in the single amino acid sequence of the corresponding protein, these additional sequences are not construed as being part of the gene of interest. Hence, the protein encoded by the gene of interest is the protein of interest defined by the amino acid sequence without any additional tags or secretion signals.

Secretion signal sequences

The inventors have found that the choice of a secretion signal sequence for the expression and secretion of a protein encoded by a gene of interest and produced by a microbial host cell is critical to increasing the production and/or yield of said protein. The inventors have found that the production and/or yield of a protein can be improved when using a microbial host cell comprising one or more copies of an expression cassette comprising a promoter capable of promoting expression of a gene of interest, and the gene of interest, wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and a terminator, and wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell. The current invention therefore relates to said microbial host cell and the use of the microbial host cell for manufacturing a protein, wherein the protein is encoded by the gene of interest. The invention further relates to a nucleic acid comprising a signal peptide encoding sequence comprised in a secretion signal sequence, the use of said nucleic acid as a secretion signal sequence, an expression cassette comprising said nucleic acid and a vector comprising said nucleic acid or said expression cassette. The invention further relates to a method for producing a protein and the protein produced by said method. The present invention further relates to a peptide (i.e., a signal peptide) and a protein comprising said peptide and the use of said peptide as a secretion signal. The invention finally relates to a microbial host cell comprising said nucleic acid, said expression cassette, said vector, or said peptide.

A “secretion signal sequence” refers to a nucleic acid sequence that comprises a signal peptide encoding sequence. A secretion signal sequence may also further comprise a “pro-sequence”, which is a nucleic acid sequence encoding a “pro-peptide” (also known as a “carrier peptide”). The pro-sequence is preferably fused to the 3’ end of the signal peptide encoding sequence. Similarly, the pro-peptide is preferably fused to the C-terminus of the signal peptide. A secretion signal sequence may be fused to a gene of interest. A secretion signal sequence is preferably fused to the 5’-end of the gene of interest. Preferably, the secretion signal sequence is not fused to the gene of interest in nature, i.e., preferably the fusion of the secretion signal sequence and the gene of interest is a recombinant fusion. A secretion signal sequence encodes a secretion signal. The term “secretion signal” (as opposed to “secretion signal sequence”) refers to a peptide comprising a signal peptide (encoded by the “signal peptide encoding sequence”) and optionally a pro-peptide (encoded by the “pro-sequence”). Where the secretion signal sequence is fused to the 5’-end of the gene of interest, the secretion signal will be fused to the N-terminal end of the protein encoded by the gene of interest. Upon expression of a secretion signal sequence fused to a gene of interest in a microbial host cell, the resulting fusion protein comprising the secretion signal and the protein of interest will be guided by the secretion signal to be secreted from the cell. Put differently, the secretion signal when fused to a protein encoded by a gene of interest will signal the internal protein expression and secretion pathways of the microbial cell to secrete the protein encoded by the gene of interest into the surrounding environment. The surrounding environment may be a fermentation broth or culture media. In some embodiments, the protein encoded by the gene of interest may be isolated or purified from the fermentation broth or culture media.

The secretion signal sequence comprises a signal peptide encoding sequence, also referred to as a “pre-sequence”, which encodes for a signal peptide. A signal peptide encoding sequence may be sufficient (i.e., without a pro-sequence) for secretion of a protein of interest when expressed in a microbial host cell as a fusion protein. A typical example of a signal peptide encoding sequence is the pre-sequence of the Saccharomyces a-mating factor (a-MF) of SEQ ID NO: 114, where the corresponding signal peptide is identified by the amino acid sequence according to SEQ ID NO: 131 . Other non-limiting examples of signal peptides encoding sequences disclosed herein are provided in SEQ ID Nos: 99 to 1 13 and 134, where the corresponding signal peptides are identified by the amino acid sequences according to SEQ ID Nos: 1 16 to 130 and 135.

In some embodiments of the invention the signal peptide-encoding sequence is from a yeast. In some embodiments of the invention the signal peptide-encoding sequence is from a filamentous fungi. In some more preferred embodiments of the invention the signal peptide-encoding sequence is from a Komagataella species. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from a Saccharomyces species. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from a Hansenula species. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from a Fusarium species. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from a Trichoderma species. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from a Myceliophthora species. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from an Aspergillus species. In some more preferred embodiments of the invention the signal peptide-encoding sequence is from Komagataella phaffii. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from Saccharomyces cerevisiae. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from Hansenula polymorpha. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from Fusarium solani. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from Trichoderma reesei. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from Myceliophthora thermophila or Myceliophthora heterothallica. In other more preferred embodiments of the invention the signal peptide-encoding sequence is from Aspergillus niger.

In some embodiments of the invention the signal peptide-encoding sequence is selected from any of (a) the nucleotide sequence of any one of SEQ ID Nos: 99 to 1 13 or 134, (b) a nucleotide sequence with at least 90% identity to any one of SEQ ID Nos: 99 to 1 13 or 134, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of any one of SEQ ID Nos: 1 16 to 130 or 135, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID Nos: 1 16 to 130 or 135.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Komagataella phaffii Epx1 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 99, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 99, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 16, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 16. In a preferred embodiment of the invention the signal peptide-encoding sequence is a variant of the Komagataella phaffii Epx1 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 100, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 100, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 17, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 17.

The inventors have designed new artificial signal peptides for directing secretion of recombinant proteins. Therefore, in a more preferred embodiment of the invention the signal peptide-encoding sequence is an artificial signal peptide. In a more preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 101 , (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 101 , (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 18, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 18. In another more preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 102, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 102, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 19, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 19.

In a preferred embodiments of the invention the signal peptide-encoding sequence is from the Hansenula polymorpha Pep4 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 103, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 103, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 120, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 120.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Saccharomyces cerevisiae Pep4 gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 104, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 104, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 121 , or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 121 .

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Saccharomyces cerevisiae Scw10 gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 105, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 105, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 122, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 122.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Komagataella phaffii Gcw14 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 106, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 106, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 123, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 123.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Komagataella phaffii Cwp11 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 107, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 107, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 124, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 124.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Saccharomyces cerevisiae Fre2 gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 108, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 108, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 125, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 125.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Saccharomyces cerevisiae killer toxin gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 109, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 109, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 126, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 126.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Fusarium solani alpha/beta hydrolase gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 110, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 110, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 127, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 127.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Saccharomyces cerevisiae Dan4 gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 111 , (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 1 1 1 , (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 128, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 128.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Trichoderma reesei hydrophobin gene. In a preferred embodiment of the invention the signal peptide- encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 1 12, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 1 12, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 129, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 129.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Pichia pastoris Flo10 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 1 13, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 1 13, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 130, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 130.

In a preferred embodiment of the invention the signal peptide-encoding sequence is from the Pichia pastoris Dse4 gene. In a preferred embodiment of the invention the signal peptide-encoding sequence is any of (a) the nucleotide sequence of SEQ ID NO: 134, (b) a nucleotide sequence with at least 90% identity to SEQ ID NO: 134, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 135, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 135.

In some embodiments, where the secretion signal sequence comprises only a signal peptide- encoding sequence (i.e. , no pro-sequence), the secretion signal sequence and the signal peptide-encoding sequence may be identical. In some embodiments, where the secretion signal comprises only a signal peptide (i.e., no pro-peptide), the secretion signal and the signal peptide may be identical.

In another embodiment the secretion signal sequence comprises a signal peptide-encoding sequence and a pro-sequence. A pro-sequence encodes a pro-peptide, sometimes also referred to as “carrier peptide”. Where a pro-sequence is present in a secretion signal sequence, said pro-sequence is situated between the signal peptide-encoding sequence and the gene of interest. A pro-peptide may further facilitate the secretion of a protein encoded by a gene of interest fused to a secretion signal comprising said pro-peptide. A secretion signal comprising a pro-peptide may be more efficient in secreting a protein produced by a gene of interest when fused thereto. In some cases, there is no difference between a secretion signal with or without a pro-peptide in the efficiency of secreting a protein produced by a gene of interest when fused thereto. In other cases, a secretion signal without a pro-peptide is more efficient in secreting a protein produced by a gene of interest when fused thereto.

Where a secretion signal sequence comprises a signal peptide encoding sequence and a prosequence, the secretion signal sequence may also be referred to as a pre-pro sequence encoding a pre- pro protein.

An example of a pro-sequence is the pro-sequence from the a-MF from Saccharomyces cerevisiae as identified in SEQ ID NO: 1 15 encoding the pro-peptide as identified in SEQ ID NO: 132. An example of a pre-pro protein (i.e., secretion signal) composed of the signal peptide of Saccharomyces cerevisiae and the pro-peptide of Saccharomyces cerevisiae, i.e. the full a-MF of Saccharomyces cerevisiae, is given in SEQ ID NO: 137 and the corresponding nucleotide sequence is given in SEQ ID NO: 136.

In some embodiments, the secretion signal is cleaved from the protein encoded by the gene of interest before, at or during the secretion process of the microbial host cell. Where the secretion signal is cleaved from the protein encoded by the gene of interest, the secreted protein encoded by the gene of interest may not contain any residual secretion signal amino acids at its N-terminus. In some embodiments, a fraction of proteins encoded by the gene of interest that are secreted still contain the secretion signal fused to the N-terminus. In some embodiments, a fraction of proteins encoded by the gene of interest that are secreted still contain one or more amino acids of the secretion signal fused to the N-terminus.

In some embodiments, where the secretion signal sequence does not comprise a pro-sequence, the signal peptide-encoding sequence may be directly fused to the 5’-end of the gene of interest. In other embodiments additional sequences may be included between the signal-peptide encoding sequence and the gene of interest, such as for example but not limited thereto, sequences encoding for protease cleavage sites, such as a Kex2 cleavage site and/or sequences encoding a peptide tag that can be used for purification or detection of the protein of interest, such as a His6, c-myc, FLAG, C-tag, 3xFLAG, His5, His10, HA, T7, strep, HSV, and/or an E-tag.

In some embodiments where the secretion signal sequence further comprises a pro-sequence, the pro-sequence may be directly fused to the 5’-end of the gene of interest. In other embodiments additional sequences may be included between the pro-sequence and the gene of interest, such as for example but not limited thereto, sequences encoding for protease cleavage sites, such as a Kex2 cleavage site and/or sequences encoding a peptide tag that can be used for purification or detection of the protein of interest, such as a His6, c-myc, FLAG, C-tag, 3xFLAG, His5, His10, HA, T7, strep, HSV, and/or an E-tag.

Thus the expression cassette of the present invention may comprise, from the 5’-end to the 3’-end, (i) a promoter capable of promoting expression of a gene of interest, (ii) a secretion signal sequence comprising, from the 5’-end to the 3’-end, (a) a signal peptide-encoding sequence and optionally (b) a prosequence, where the secretion signal sequence is fused to (iii) the gene of interest, and the expression cassette further comprises at its 3’-end (iv) a terminator.

Nucleic acids and expression cassettes containing said nucleic acids.

The inventors have found that changing the secretion signal sequence, and more specifically the signal peptide-encoding sequence, from the canonically used a-MF signal peptide-encoding sequence can drastically improve the expression, secretion and or production of a gene of interest, particularly when the gene of interest is an immunoglobin variably domain or VHH. The inventors have, by screening many secretion signals, found secretion signals which show significant increases in the expression of a protein encoded by a gene of interest, in particular a VHH. Therefore, the current disclosure provides nucleic acids comprising a signal peptide encoding sequence. Said nucleic acids may be used as secretion signal sequences. The nucleic acids may be comprised in an expression cassette. Said expression cassette may be present in one or more copies in a microbial host cell. When provided in a microbial host cell and fused to a gene of interest said nucleic acids may significantly increase the production and/or yield of protein encoded by said gene of interest. In particular said nucleic acids can be provided in one or more or two or more copies. Surprisingly, the inventors have found that when providing two or more copies of an expression cassette comprising a nucleic acid of the invention, the production and/or yield of a protein expressed by a gene of interest can be improved even further.

As such the current invention relates to a microbial host cell comprising one or more expression cassettes where the expression cassette comprises a gene of interest and where the gene of interest is fused, preferably at its N-terminus, to a secretion signal sequence comprising a signal peptide encoding sequence according to any of the signal peptide-encoding sequence selected from any of (a) the nucleotide sequence of any one of SEQ ID Nos: 99 to 1 13 or 134, (b) a nucleotide sequence with at least 90% identity to any one of SEQ ID Nos: 99 to 1 13 or 134, (c) a nucleotide sequence encoding a signal peptide having the amino acid sequence of any one of SEQ ID Nos: 1 16 to 130 or 135, or (d) a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID Nos: 1 16 to 130 or 135; and where the gene of interest is in a preferred embodiment a VHH and where in a preferred embodiment the secretion signal sequence further comprises a pro-sequence, where the pro-sequence may be the pro-sequence from the a-MF from Saccharomyces cerevisiae as identified in SEQ ID NO: 1 15 encoding the pro-peptide as identified in SEQ ID NO: 132.

In their efforts to search for the ideal signal peptide encoding sequence, the inventors have furthermore identified unique nucleic acids comprising a signal peptide-encoding sequence. As such, the current invention provides a nucleic acid comprising a signal peptide-encoding sequence, wherein the signal peptide-encoding sequence is the signal peptide encoding sequence according to the nucleotide sequence of SEQ ID NO: 101 , or a nucleotide sequence with at least 90% identity to SEQ ID NO: 101 , a nucleotide sequence encoding the signal peptide according to amino acid sequence of SEQ ID NO: 1 18, or a nucleotide sequence encoding an amino acid sequence with at least 90% identity to SEQ ID NO: 1 18; or the signal peptide encoding sequence according to the nucleotide sequence of SEQ ID NO: 102, or a nucleotide sequence with at least 90% identity to SEQ ID NO: 102, a nucleotide sequence encoding the signal peptide according to amino acid sequence of SEQ ID NO: 1 19, or a nucleotide sequence encoding an amino acid sequence with at least 90% identity to SEQ ID NO: 1 19; or the signal peptide encoding sequence according to the nucleotide sequence of SEQ ID NO: 103, or a nucleotide sequence with at least 90% identity to SEQ ID NO: 103, a nucleotide sequence encoding the signal peptide according to amino acid sequence of SEQ ID NO: 120, or a nucleotide sequence encoding an amino acid sequence with at least 90% identity to SEQ ID NO: 120; or the signal peptide encoding sequence according to the nucleotide sequence of SEQ ID NO: 106, or a nucleotide sequence with at least 90% identity to SEQ ID NO: 106, a nucleotide sequence encoding the signal peptide according to amino acid sequence of SEQ ID NO: 123, or a nucleotide sequence encoding an amino acid sequence with at least 90% identity to SEQ ID NO: 123. The nucleic acid may further comprise a pro-sequence, such as a Saccharomyces a-mating factor prosequence according to the nucleotide sequence of SEQ ID NO: 1 15, encoding pro-peptide such as the Saccharomyces a-mating factor pro peptide according to the amino acid sequence of SEQ ID NO: 132.

A nucleic acid of the invention may be used as a secretion signal sequence by fusing it to a gene of interest, for example, but not limited to, a VHH. Therefore, the nucleic acid of the invention can be further comprised in an expression cassette, wherein the expression cassette further comprises a promoter operably linked to the nucleic acid. In some embodiments, the expression cassette further comprises a gene of interest wherein the signal peptide-encoding sequence is fused to the gene of interest. Where the nucleic acid is comprised in an expression cassette and where the expression cassette comprises a gene of interest fused to the signal peptide-encoding sequence and where the expression cassette further comprises a promoter operably linked to the gene of interest, the expression cassette can be used in a microbial host cell to express the gene of interest, whereby the signal peptide encoded by the nucleic acid directs secretion of the protein encoded by the gene of interest into the surrounding environment (e.g., culture broth). The expression cassette may further comprise a start codon, a stop-codon, a terminator sequence, or additional regulatory and other sequences such as signal sequences, introns, IRES- sequences, ribosomal binding sites etc. Where the expression cassette comprises a promoter, the promoter may be selected from the group consisting of CAT1 , A0X1 , GAP, AOD, A0X2, ADH1 , CAM1 , DAK1 , DAS1 , DAS2, ENO1 , FDH1 , FLD1 , FMD, GPM1 , GPM2, HSP82, ICL1 , ILV5, KAR2, KEX2, MOX, OLE1 , PET9, PEX5, PEX8, PMP20, PGK1 , PHO89/NSP, SSA4, SUT2, TEF1 , THI1 1 , TPI1 , YPT1 , GTH1 , GCW14, and GUT1 . In some embodiments the expression cassette comprises an A0X1 promoter.

The nucleic acids and the expression cassettes of the present invention may be comprised in a vector. The terms “vector” and “plasmid” and “episomal vector” as used herein interchangeably are nucleic acids, often circular, capable of replicating autonomously inside a microbial host cell, i.e. without being integrated into the microbial host cell’s genome. In some embodiments the vector may be an integrative vector that can integrate into the cell’s genome. In some embodiments the vector may be linearized prior to being integrated into the cell’s genome.

The invention also provides a composition comprising the nucleic acid of the invention, the expression cassette of the invention or the vector of the invention.

Peptides

The inventors have found that changing the secretion signal, and more specifically the signal peptide, from the canonically used a-MF signal peptide, can drastically improve the expression, secretion and or production of a gene of interest, particularly when the gene of interest is an immunoglobin variably domain or VHH.

In their efforts to search for an optimal peptides that may serve as a secretion signal, the inventors have furthermore identified unique peptides that may serve as a secretion signal. Therefore, the invention further relates to use of said peptide as or in a secretion signal. The peptide of the current invention may comprise the amino acid sequence of SEQ ID NO: 1 18, or an amino acid sequence with at least 90% identity to SEQ ID NO: 1 18; or the amino acid sequence of SEQ ID NO: 1 19, or an amino acid sequence with at least 90% identity to SEQ ID NO: 1 19; or the amino acid sequence of SEQ ID NO: 120, or an amino acid sequence with at least 90% identity to SEQ ID NO: 120; or amino acid sequence of SEQ ID NO: 123, or an amino acid sequence with at least 90% identity to SEQ ID NO: 123. The peptide may further comprise a pro-peptide, such as a Saccharomyces a-mating factor pro-peptide such as the Saccharomyces a-mating factor pro-peptide according to the amino acid sequence of SEQ ID NO: 132. The peptides may be isolated and/or recombinant. The current invention further relates to a protein comprising said peptide. The protein may be a recombinant protein, such as a recombinant fusion protein comprising the peptide fused to a protein encoded by a gene of interest. The recombinant protein may comprise the peptide of the invention at its N-terminus. The invention also provides a composition comprising the peptide of the invention or a protein comprising the peptide of the invention. Precursor protein

The invention further relates to precursor proteins where the precursor protein comprises a signal sequence and a protein of interest, where the signal sequence is fused to the N-terminus of the protein of interest. A precursor protein may be a recombinant protein. The term “precursor protein” refers to the preliminary or temporary nature of the precursor protein, that is to say the fusion of the signal sequence to the N-terminus of the protein of interest is present mainly and often only present during the production and secretion of the precursor protein in the microbial host cell since the signal sequence will become cleaved from the protein of interest. That is to say, the precursor protein exists as a precursor protein until the N- terminal signal sequence is cleaved, releasing the protein of interest. In some cases the signal sequence will be incorrectly cleaved or not cleaved at all whereby in a final product comprising the protein of interest, remnants of the precursor protein may be still be present.

In one embodiment, the precursor polypeptides comprises a signal sequence fused to a protein of interest. And where the signal sequence comprises a signal peptide having an amino acid sequence according to any one of SEQ ID NOs: 1 16 to 130 or 135 or a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID Nos: 1 16 to 130 or 135. In some embodiment thee precursor polypeptide comprises a signal sequence fused to a protein of interest and where the signal sequence further comprises a pro-sequence, preferably the Saccharomyces alpha mating factor pro-sequence. In a more preferred embodiment the precursor polypeptide comprises a signal sequence fused to a VHH, and where the signal sequence comprises a signal peptide having an amino acid sequence according to any one of SEQ ID NOs: 1 16 to 130 or 135 or a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID Nos: 1 16 to 130 or 135. In some embodiment the precursor polypeptide comprises a signal sequence fused to a VHH and where the signal sequence further comprises a prosequence, preferably the Saccharomyces alpha mating factor pro-sequence.

Multiple copies

The inventors have surprisingly found that by increasing the copies of the expression cassette of the invention integrated into the genome of the microbial host cell the production and/or yield of the protein expressed by the gene of interest may be significantly improved. The inventors have observed that this improvement in production and/or yield is surprisingly larger for certain secretion signal sequences (i.e. , the nucleic acids of the invention). The inventors have found that, when multiple copies of an expression cassette are integrated in the genome of a host cell, significant increases in the production and/or yield of a protein encoded by a gene of interest can be achieved when using the secretion signal sequences of the invention, as opposed to the canonically used a mating factor (aMF) of Saccharomyces cerevisiae. More specifically when introducing two or more copies of an expression cassette comprising a secretion signal sequence according to the invention fused to a gene of interest, into the genome of a microbial host cell, the production and/or yield of the protein encoded by the gene of interest will be improved in comparison with the same or similar microbial host cell but using the a-MF secretion signal sequence (e.g., the same host cell with the same expression cassette with the same number integration sites, except using the a-MF secretion signal sequence).

Therefore, the current invention relates to a microbial host cell comprising one or more copies of an expression cassette. The expression cassette comprises a promoter capable of promoting expression of a gene of interest, and the gene of interest, wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and wherein the expression cassette further comprises a terminator and wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell. In a more preferred embodiment, the microbial host cell comprises two or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 3 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 4 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 5 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 6 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 7 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 8 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 9 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 10 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 1 1 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 12 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 13 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 14 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 15 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 16 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 17 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 18 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 19 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 20 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 30 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 40 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 50 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 60 or more copies of the expression cassette. In another preferred embodiment, the microbial host cell comprises at least 70 or more copies of the expression cassette.

The expression cassette integrated into the genome of the cell in one or more copies, as per the embodiments above, comprises a secretion signal sequence which comprises a signal peptide-encoding sequence. Suitable secretion signal sequences, including where two, or three, or four, or five, or 10 or more copies of the expression cassette are integrated, are discussed herein above in the section entitled “Secretion signal sequences”.

Methods of producing microbial host cells

Also provided herein are methods of producing a microbial host cell capable of expressing a gene of interest. The method may comprise the steps of providing a microbial host cell; and integrating into the genome of the microbial host cell one or more copies of an expression cassette, where the expression cassette comprises a promoter capable of promoting expression of a gene of interest, and the gene of interest, wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and a terminator.

The expression cassette may be introduced into the microbial host cell according to any suitable method known to the skilled person. For example, the nucleic acid constructs may be introduced by transformation, for example chemical transformation, heat-shock based transformation, electroporation, biolistic transformation, or particle-based transformation. In some embodiments, the transformation is chemical-based transformation, for example comprises the use of lithium, calcium phosphate, cationic polymers, liposomes (lipofection) or dendrimers. In some embodiments, the transformation is nonchemical-based transformation, for example electroporation, sonoporation, optical transformation, or protoplast fusion transformation. In some embodiments, the transformation is particle-based transformation, for example, comprising the use of a gene gun or using glass beads, magnetofection (or magnet-assisted transformation), impalefection (comprising the use of elongated nanostructures that are used to impale the cell to be transformation), or particle bombardment. Other methods of transformation include nucleofection or viral-based transformation, also referred to as transduction.

The expression cassette may be provided for transformation in the form a double stranded DNA product derived from a PCR. The expression cassette may be provided on an integrative plasmid. The expression cassette may be present on a vector that is linearized prior to being transformed. The expression cassette may be in the form of a double stranded DNA product derived from a PCR and where the double stranded DNA product further comprises a selectable marker. The expression cassette may be comprised on an integrative plasmid where the integrative plasmid further comprises a selectable marker. The expression cassette may be present on a vector that is linearized prior to being transformed and where the linearized vector further comprises a selectable marker. The selectable marker may be flanked by a site specific recombination sites (such as FRT sites) and where the corresponding recombinase (such as the FLP flippase recombinase) is also included next to the selectable marker to allow for the selectable marker, together with the gene encoding the recombinase to be removed from the cell by inducing the flippase gene, for example by inducing an inducible promoter driving the flippase recombinase.

A “selectable marker” or “selection marker” or “selection cassette” is a gene introduced into a cell that confers a trait suitable for artificial selection i.e. the cell receiving the selectable marker is capable of growing on or in a growth media containing or lacking a substance preventing cells without the selectable marker from growing or killing the cells lacking the selectable marker. Selectable markers are often antibiotic-resistance genes. Examples include the bleoR gene encoding the phleomycin resistance protein conferring resistance against the antibiotic phleomycin or zeocin, the hygB gene encoding the Hygromycin B resistance protein conferring resistance against the antibiotic Hygromycin B, the bsr gene conferring resistance against the antibiotic blasticidin, or the nat gene conferring resistance against the antibiotic nourseothricin.

A selectable marked often comes with a constitutive promoter so that the corresponding gene is expressed. Additionally, a marker can be equipped with a terminator to prevent the readthrough of said promoter. For example, a commonly used selectable marker cassette is constructed of Ble encoding the Zeocin resistance gene, as well as the plLV5 promoter and the Aox1 terminator.

A selectable marker can be an antibiotic but also an auxotrophic marker where the presence of the selectable marker allows the cell to grow in the absence of an essential nutrient. For example, ARG1 , ARG2, ARG3, ARG4, HIS1 , HIS2, HIS4, HIS5, HIS6, URA3 genes. A benefit over an auxotrophic marker is that an antibiotic resistance marker does not require the construction of laborious auxotrophic strains.

Selectable marker genes may be transformed at the same time as the expression cassettes to enable for the selection of successfully transformed host cells. The selectable marker may be provided in trans, where the selectable marker is present on a different recombinant DNA construct as the expression cassette. Where the microbial host cells that were transformed with the selectable marker also have a very high likelihood of having been transformed with one or more expression cassettes. Techniques and methods for selection successfully transformed host cells are well known in the art.

Where the expression cassettes are introduced into the microbial host cell, the methods may comprise screening a plurality of transformed microbial host cells to identify a host cell that produces the protein of interest, for example produces the protein of interest at a high production and/or yield.

Some embodiments of the invention begin with microbial host cells that already comprise at least one expression cassette. Where a second round of transformation may further increase the copy number of the expression cassette in the genome of the microbial host cell.

Any suitable method may be employed to determine the expression yield of the transformed host cell, for example SDS-PAGE, a spectroscopic analytical procedure (such as Bradford protein assay) or a by use of a protein characterization system (e.g. LabChip GXII, Perkin Elmer )].

Methods of producing a protein of interest

The present invention provides methods for the production of a protein encoded by a gene of interest. In some embodiments, the method comprises: culturing the microbial host cell according to the invention, or a microbial host cell comprising the expression cassette or vector of the invention, under conditions to express the gene of interest, wherein the gene of interest encodes the protein, optionally isolating the protein, optionally purifying the protein, optionally modifying the protein, and optionally formulating the protein.

The present invention also provides the use of the microbial host cell of the invention for manufacturing a protein, wherein the protein is encoded by the gene of interest.

The present invention also provides protein obtained by such methods and uses.

The protein of interest may be formulated, for example into an agrochemical or pharmaceutical composition.

“Culturing”, “cultivation”, “cell culture”, “fermentation”, “fermenting” or “microbial fermentation” as used herein means the use of a microbial host cell to produce a protein, such as a polypeptide, at an industrial scale, laboratory scale or during scale-up experiments. It includes suspending the microbial cell in a broth or growth medium, providing sufficient nutrients including but not limited to one or more suitable carbon source (including glucose, sucrose, fructose, lactose, avicel®, xylose, galactose, ethanol, methanol, or more complex carbon sources such as molasses or wort), nitrogen source (such as yeast extract, peptone or beef extract), trace element (such as iron, copper, magnesium, manganese or calcium), amino acid or salt (such as sodium chloride, magnesium chloride or natrium sulfate) or a suitable buffer (such as phosphate buffer, succinate buffer, HEPES buffer, MOPS buffer or Tris buffer). Optionally it includes one or more inducing agents driving expression of the protein of interest or a protein involved in the production of the protein of interest (such as lactose, IPTG, ethanol, methanol, sophorose or sophorolipids). If can also further involve the agitation of the culture media via for example stirring of purging to allow for adequate mixing and aeration. It can further involve different operational strategies such as batch cultivation, fed- batch cultivation, semi-continuous cultivation or continuous cultivation and different starvation or induction regimes according to the requirements of the microbial cell and to allow for an efficient production of the protein of interest or a protein involved in the production of the protein. Alternatively, the microbial cell is grown on a solid substrate in an operational strategy commonly known as solid state fermentation.

Fermentation broth, culture media or cell culture media as used herein can mean the entirety of liquid or solid material of a fermentation or culture at any time during or after that fermentation or culture, including the liquid or solid material that results after optional steps taken to isolate the protein. As such, the fermentation broth or culture media as defined herein includes the surroundings of the protein after isolation of the protein, during storage and/or during use as an agrochemical or pharmaceutical composition. Fermentation broth is also referred to herein as a culture medium or cell culture medium.

“Isolating the protein" is an optional step or series of steps taking the cell culture media or fermentation broth as an input and increasing the amount of the protein relative to the amount of culture media or fermentation broth. Isolating the protein may alternatively or additionally comprises obtaining or removing the protein form the culture media or fermentation broth. Isolating the protein can involve the use of one or multiple combinations of techniques well known in the art, such as precipitation, centrifugation, sedimentation, filtration, diafiltration, affinity purification, size exclusion chromatography and/or ion exchange chromatography. Isolating the protein of interest may be followed by formulation of the protein of interest into an agrochemical or pharmaceutical composition.

The term “yield” as used herein refers to the amount of a protein produced. When using the term “improved” or “increased” or a similar term when referring to “yield”, it is meant that the protein produced by the modified microbial host cell of the invention capable of producing a protein is increased in quantity, quality, stability and/or concentration either in the fermentation broth or cell culture media, as a purified or partially purified protein, during storage and/or during use as an agrochemical or pharmaceutical composition. The increase in yield is compared to the yield of protein of interest produced by a parent microbial host cell (or a microbial host cell having had fewer classes or species of expression cassette introduced into it, for example having only had one class or species of expression cassette introduced). In some embodiments, the yield is increased by at least about 1 %, at least about 2%, at least about 3%, at least about 4%, at least about 5%, at least about 6%, at least about 7%, at least about 8%, at least about 9%, at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% or at least about 100%, at least about 1 10%, at least about 120%, at least about 130%, at least about 140%, at least about 150%, at least about 160%, at least about 170%, at least about 180%, at least about 190%, at least about 200%, at least about 210%, at least about 220%, at least about 230%, at least about 240%, at least about 250%, at least about 260%, at least about 270%, at least about 280%, at least about 290% or at least about 300%, at least about 500%, at least about 1000% or at least about 1500% when a microbial host cell according to the invention or a microbial host cell comprising the expression cassette according to the invention is used to produce the protein, compared to a similarly or identically produced microbial host cell comprising one or more expression cassettes comprising a canonical a-MF secretion signal sequence.

In some embodiments, the method of the invention increase protein production by at least 20%, for example by at least 30%, at least 40%, at least 50%, at least 60%, at least 70% at least 80%, at least 90% or at least 100% (e.g. by at least a factor of 2) as compared to a similarly or identically produced microbial host cell comprising one or more expression cassettes comprising a canonical a-MF secretion signal sequence.

Further definitions

A “promoter” or “promoter sequence” as used herein refers a nucleotide sequence that is preferably recognized by a polypeptide, for example a regulator of transcription or at the very least allows the correct formation of a RNA-polymerase complex in such a way that expression of a protein of interest, of which the polynucleotide coding for the protein of interest is located downstream (3') of the promoter sequence as is well known in the art, is established. Expression may be established in a continuous manner (for example in the case of a constitutive promoter) or during conditions suitable for expression (for example in the case of an inducible promoter and in conditions in which expression is induced), as to produce the protein of interest. The promoters are generally promoters that are functional in fungi, for example in yeast. These promoters can be but are not limited to CAT1 , A0X1 , GAP, AOD, A0X2, ADH1 , CAM1 , DAK1 , DAS1 , DAS2, ENO1 , FDH1 , FLD1 , FMD, GPM1 , GPM2, HSP82, ICL1 , ILV5, KAR2, KEX2, MOX, OLE1 , PET9, PEX5, PEX8, PMP20, PGK1 , PHO89/NSP, SSA4, SUT2, TEF1 , THI1 1 , TPI1 , YPT1 , GTH1 , GCW14, and GUT1 . Other suitable promoters include alcA, amyB, bli-3, bphA, catR, cbhl, cbh2, cel5a, cel12a, cre1 , exylA, gas, glaA, gla1 , mir1 , niiA, qa-2, Smxyl, tcu-1 , thi A, vvd, xyl1 , xylP, xyn1 , xyn2, xyn3, zeaR, cDNA1 , enol , gpd1 , pdc1 , and pki 1 . In preferred embodiments, the promoter is a methanol inducible promoters.

“Agrochemical”, “agrochemically” or “agrochemically suitable” as used herein, means suitable for use in the agrochemical industry (including agriculture, horticulture, floriculture and home and garden uses), but also products intended for non-crop related uses such as public health/pest control operator uses to control undesirable insects and rodents, household uses, such as household fungicides and insecticides and agents, for protecting plants or parts of plants, crops, bulbs, tubers, fruits (e.g. from harmful organisms, diseases or pests); for controlling, preferably promoting or increasing, the growth of plants; and/or for promoting the yield of plants, crops or the parts of plants that are harvested (e.g. its fruits, flowers, seeds etc.). Examples of such substances will be clear to the skilled person and may for example include proteins that are active as insecticides (e.g. contact insecticides or systemic insecticides, including insecticides for household use), herbicides (e.g. contact herbicides or systemic herbicides, including herbicides for household use), fungicides (e.g. contact fungicides or systemic fungicides, including fungicides for household use), nematicides (e.g. contact nematicides or systemic nematicides, including nematicides for household use) and other pesticides or biocides (for example agents for killing insects or snails); as well as fertilizers; growth regulators such as plant hormones; micro-nutrients, safeners, pheromones; repellants; insect baits; and/or active principles that are used to modulate (i.e. increase, decrease, inhibit, enhance and/or trigger) gene expression (and/or other biological or biochemical processes) in or by the targeted plant (e.g. the plant to be protected or the plant to be controlled), such as nucleic acids (e.g., single stranded or double stranded RNA, as for example used in the context of RNAi technology) and other factors, proteins, chemicals, etc. known per se for this purpose, etc. Examples of such agrochemicals will be clear to the skilled person; and for example include, without limitation: glyphosate, paraquat, metolachlor, acetochlor, mesotrione, 2, 4-D, atrazine, glufosinate, sulfosate, fenoxaprop, pendimethalin, picloram, trifluralin, bromoxynil, clodinafop, fluroxypyr, nicosulfuron, bensulfuron, imazetapyr, dicamba, imidacloprid, thiamethoxam, fipronil, chlorpyrifos, deltamethrin, lambda-cyhalotrin, endosulfan, methamidophos, carbofuran, clothianidin, cypermethrin, abamectin, diflufenican, spinosad, indoxacarb, bifenthrin, tefluthrin, azoxystrobin, thiamethoxam, tebuconazole, mancozeb, cyazofamid, fluazinam, pyraclostrobin, epoxiconazole, chlorothalonil, copper fungicides, trifloxystrobin, prothioconazole, difenoconazole, carbendazim, propiconazole, thiophanate, sulphur, boscalid and other known agrochemicals or any suitable combination(s) thereof.

An “agrochemical composition”, as used herein means a composition for agrochemical use, as further defined, comprising at least one active substance, optionally with one or more additives (for example one or more additives favoring optimal dispersion, atomization, deposition, leaf wetting, distribution, retention and/or uptake of agrochemicals). It will become clear from the further description herein that an agrochemical composition as used herein includes biological control agents or biological pesticides (including but not limited to biological biocidal, biostatic, fungistatic and fungicidal agents) and these terms will be interchangeably used in the present application. Accordingly, an agrochemical composition as used herein includes compositions comprising at least one biological molecule as an active ingredient, substance or principle for controlling pests in plants or in other agro-related settings (such for example in soil). Nonlimiting examples of biological molecules being used as active principles in the agrochemical compositions disclosed herein are proteins (including antibodies and fragments thereof, such as but not limited to heavy chain variable domain fragments of antibodies, including VHH’s), nucleic acid sequences, (poly-) saccharides, lipids, vitamins, hormones glycolipids, sterols, and glycerolipids. As a non-limiting example, the additives in the agrochemical compositions disclosed herein may include but are not limited to excipients, diluents, solvents, adjuvants, surfactants, wetting agents, spreading agents, oils, stickers, thickeners, penetrants, buffering agents, acidifiers, anti-settling agents, anti-freeze agents, photoprotectors, defoaming agents, biocides and/or drift control agents. The protein of interest may be formulated with one or more such components when preparing an agrochemical composition. For example, the protein of interest may be formulated with one or more additives, for example one or more agrochemically acceptable excipients.

A “Pharmaceutical composition”, “pharmaceutically” or “pharmaceutically suitable” as used herein means a composition for medical use. For example, the composition may be suitable for injection or infusion which can include sterile aqueous solutions or dispersions or sterile powders comprising the active ingredient which are adapted for the extemporaneous preparation of sterile injectable or infusible solutions or dispersions, optionally encapsulated in liposomes. In all cases, the ultimate dosage form must be sterile, fluid, and stable under the conditions of manufacture and storage. The liquid carrier or vehicle can be a solvent or liquid dispersion medium comprising, for example, water, ethanol, a polyol (for example, glycerol, propylene glycol, liquid polyethylene glycols, and the like), vegetable oils, nontoxic glyceryl esters, and suitable mixtures thereof. The proper fluidity can be maintained, for example, by the formation of liposomes, by the maintenance of the required particle size in the case of dispersions or by the use of surfactants. The prevention of the action of microorganisms can be brought about by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, for example, sugars, buffers or sodium chloride. Prolonged absorption of the injectable compositions can be brought about by the use in the compositions of agents delaying absorption, for example, aluminum monostearate and gelatin. The protein of interest may be formulated with one or more such components when preparing a pharmaceutical composition. For example, the protein of interest may be formulated with one or more additives, for example one or more pharmaceutically acceptable excipients. In certain embodiments, the microbial host cell may produce an increased or enhanced level of a protein (such as the VHH) as taught herein relative to (i.e. , compared with) a similar or identically produced microbial host comprising the canononical a-MF secretion signal sequence (instead of a nucleic acid of the invention), when measured under substantially the same conditions.

With “capable of expressing a protein of interest” it is meant that the microbial host cell is modified in such a way that it contains the genetic information of a protein of interest that is under control of a promoter sequence that drives the expression of said protein either in a continuous manner or during conditions suitable for expression. For example, in some embodiments, the microbial host cell may comprise a gene of intererst coding for the protein.

The protein encoded by the gene of interest may therefore be a recombinant or heterologous protein, since it may not be encoded by the wild-type genome of the microbial host cell.

As used herein, the term “homology” denotes at least secondary structural similarity between two macromolecules, particularly between two polypeptides or polynucleotides, from same or different taxons, wherein said similarity is due to shared ancestry. Hence, the term “homologues” denotes so-related macromolecules having said secondary and optionally tertiary structural similarity. For comparing two or more nucleotide sequences, the '(percentage of) sequence identity' between a first nucleotide sequence and a second nucleotide sequence may be calculated using methods known by the person skilled in the art, e.g. by dividing the number of nucleotides in the first nucleotide sequence that are identical to the nucleotides at the corresponding positions in the second nucleotide sequence by the total number of nucleotides in the first nucleotide sequence and multiplying by 100% or by using a known computer algorithm for sequence alignment such as NCBI Blast. In determining the degree of sequence identity between two amino acid sequences, the skilled person may take into account so-called 'conservative' amino acid substitutions, which can generally be described as amino acid substitutions in which an amino acid residue is replaced with another amino acid residue of similar chemical structure and which has little or essentially no influence on the function, activity or other biological properties of the polypeptide. Possible conservative amino acid substitutions will be clear to the person skilled in the art. Amino acid sequences and nucleic acid sequences are said to be “exactly the same” if they have 100% sequence identity over their entire length.

In certain embodiments, the nucleic acid sequences may be integrated into the genome of the microbial host cell. This may result in the constitutive expression of the protein of interest. In other embodiments, the nucleic acid sequence may be transiently expressed by the microbial host cell.

“Biostatic (effect)” or “biostatic use”, as used herein, includes any effect or use of an active substance (optionally comprised in a biostatic, biocidal, fungicidal or fungistatic composition as defined herein) for controlling, modulating or interfering with the harmful activity of a pest, such as a plant pest or a plant pathogen, including but not limited to inhibiting the growth or activity of the pest, altering the behaviour of the pest, and repelling the pest in or on plants, plant parts or in other agro-related settings, such as for example for household uses or in soil.

“Biocidal (effect)” or “biocidal use”, as used herein, includes any effect or use of an active substance (optionally comprised in a biocidal or fungicidal composition as defined herein) for killing the pest in or on plants, plant parts or in other agro-related settings, such as for example for household uses or in soil.

“Anti-fungal” activity or effect refers to fungistatic and/or fungicidal activity or effect. “Fungistatic (effect)” or “Fungistatic use” or “fungistatic activity”, as used herein, includes any effect or use of an active substance (optionally comprised in a fungicidal or fungistatic composition as defined herein) for controlling, modulating or interfering with the harmful activity of a fungus, including but not limited to inhibiting the growth or activity of the fungus, altering the behaviour of the fungus, and repelling the fungus in or on plants, plant parts or in other agro-related settings, such as for example for household uses or in soil.

“Fungicidal (effect)” or “Fungicidal use” or “fungicidal activity”, as used herein, includes any effect or use of an active substance (optionally comprised in a fungicidal composition as defined herein) for killing the fungus in or on plants, plant parts or in other agro-related settings, such as for example for household uses or in soil.

“Pesticidal activity” or “biocidal activity”, as used interchangeably herein, means to interfere with the harmful activity of a pest, including but not limited to killing the pest.

“Biostatic activity”, as used herein, means to interfere with the harmful activity of a pest, including but not limited to inhibiting the growth or activity of the pest, altering the behaviour of the pest, or repelling the pest.

Pesticidal, biocidal, or biostatic activity of an active ingredient, substance or principle or a composition or agent comprising a pesticidal, biocidal, or biostatic active ingredient, substance or principle, can be expressed as the minimum inhibitory activity (MIC) of an agent (expressed in units of concentration such as e.g. mg/mL), without however being restricted thereto.

“Fungicidal activity”, as used herein, means to interfere with the harmful activity of a fungus, including but not limited to killing the fungus.

“Fungistatic activity”, as used herein, means to interfere with the harmful activity of a fungus, including but not limited to inhibiting the growth or activity of the fungus, altering the behaviour of the fungus, and repelling the fungus.

Fungicidal or fungistatic activity of an active ingredient, substance or principle or a composition or agent comprising a pesticidal, biocidal, or biostatic active ingredient, substance or principle, can be expressed as the minimum inhibitory activity (MIC) of an agent (expressed in units of concentration such as e.g. mg/mL), without however being restricted thereto.

Protein manufactured by the microbial host cell of the invention

The current invention provides microbial host cells, use of said microbial host cells for the manufacturing of a protein encoded by a gene of interest, expression cassettes comprising a gene of interest, and methods for manufacturing a protein encoded by a gene of interest. In some embodiments the protein encoded by the gene of interest is a bioactive protein.

Bioactive proteins may have the effect of actively killing microbial organisms such as bacteria or fungi. Additionally bioactive proteins may have the effect of actively killing insects. In some instances, the effect of the bioactive protein is that it inhibits or stops the growth of the microbial organism or insect. In some instance the bioactive protein can inhibit essential communication systems and in so doing disrupt the successful propagation of microbial organisms or insects. Examples of the latter would be inhibition of quorum sensing in bacteria or pheromone signaling in insects. In other examples the bioactive protein can prevent the microbial organism or insect to exert its pathogenicity traits without necessarily killing or impairing the microbial organism or insect. As such a bioactive protein may be fungistatic or fungicidal, bacteriostatic or bactericidal, insecticidal or insectistatic, or have pathogenicity inhibiting properties.

In some embodiments the bioactive protein may be a small peptide with anti-microbial properties such as an antimicrobial peptide or AMP. AMPs usually have a length of in the range of 10 to 50 amino acids. AMPs are commonly anionic or cationic and can be subdivided in 4 classes: (i) anionic peptides which are rich in glutamic and aspartic acids, (ii) linear cationic a-helical peptides, (iii) cationic peptides enriched for specific amino acidrich in proline, arginine, phenylalanine, glycine, tryptophan and (iv) anionic/cationic peptides forming disulfide bonds. More specific examples are plant derived AMPs with antimicrobial or antiviral activities such as peptides composed of at least two helical domains connected by a linker/turn such as plant-derived amphipathic helix or two helices engineered into a helix-tum-helix (HTH) format in which homologous or heterogeneous helices are connected by a peptide linker. For example, as described in WO2021202476, W02020072535, W02020176224 or W02003000863.

Non-limiting examples of bioactive proteins that can be produced in a microbial fermentation reaction and are suitable for being formulated in an agrochemical or pharmaceutical compostion may be the well- known Bt toxins, e.g., a Cry protein, a Cyt protein, or a Vip protein, or an b-endotoxin (e.g., Crystal (Cry) toxins and/or cytolytic (Cyt) toxins); vegetative insecticidal proteins (Vips); secreted insecticidal protein (Sips); or Bin-like toxins. “Vip” or “VIP” or “Vegetative Insecticidal Proteins” refer to proteins discovered from screening the supernatant of vegetatively grown strains of Bt for possible insecticidal activity. Vips have little or no similarity to Cry proteins. Of particular use and preference for use with this document are what have been called VIP3 or Vip3 proteins, which have Lepidopteran activity. Vips are thought to have a similar mode of action as Bt cry peptides. Further examples may be polypeptides derived from spider venom such as venom from funnel-web spiders such as agatoxins or diguetoxins more specifically a Mu- diguetoxin-dc1 a variant polypeptides or a U1 -agatoxin-Ta1 b variant polypeptide. Other examples are polypeptides derived from sea anemone, such as Av3 toxins. Such as described in WO2022067214 or WO2021216621 or WO2022212777.

In preferred embodiments, the bioactive protein that can be produced in a microbial fermentation reaction and are suitable for being formulated in an agrochemical or pharmaceutical composition is an antibody or a functional fragment thereof, a carbohydrate-binding domain, a heavy chain antibody or a functional fragment thereof, a single domain antibody, a heavy chain variable domain of an antibody or a functional fragment thereof, a heavy chain variable domain of a heavy chain antibody or a functional fragment thereof, a variable domain of camelid heavy chain antibody (VHH) or a functional fragment thereof, a variable domain of a new antigen receptor, a variable domain of shark new antigen receptor (vNAR) or a functional fragment thereof, a minibody, a nanobody, a nanoantibody, an affibody, an alphabody, a designed ankyrin-repeat domain, an anticalins, a knottins or an engineered CH2 domain.

In a more preferred embodiment the bioactive protein may comprise at least one camelized heavy chain variable domain of a conventional four-chain antibody (camelized VH), or a functional fragment thereof, at least one heavy chain variable domain of a heavy chain antibody (VHH), which is naturally devoid of light chains or a functional fragment thereof, such as but not limited to a heavy chain variable domain of a camelid heavy chain antibody (camelid VHH) or a functional fragment thereof. Where the at least one heavy chain variable domain of an antibody or a functional fragment thereof, does not have an amino acid sequence that is exactly the same as (i.e. as in a degree of sequence identity of 100% with) the amino acid sequence of a naturally occurring VH domain, such as the amino acid sequence of a naturally occurring VH domain from a mammal, and in particular from a human being.

In more specific embodiments, the VHH may be a VHH that binds a specific lipid fraction of the cell membrane of a fungal spore. Such VHHs may exhibit fungicidal activity through retardation of growth and/or lysis and explosion of spores, thus preventing mycelium formation. The VHH may therefore have fungicidal or fungistatic activity.

In some embodiments, the VHH may be a VHH that is capable of binding to a lipid-containing fraction of the plasma membrane of a fungus (for example Botrytis cinerea or other fungus). Said lipid-containing fraction may be obtainable by chromatography. For example, said lipid-containing fraction may be obtainable by a method comprising: fractionating hyphae of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract thin-layer chromatography and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

The VHH may be generally capable of binding to a fungus. Such VHHs can thereby cause retardation of growth of a spore of the said fungus and/or lysis of a spore of the said fungus. That is to say, binding of the VHH to a fungus results in retardation of growth of a spore of the said fungus and/or lysis of a spore of the said fungus.

The VHHs may (specifically) bind to a membrane of a fungus or a component of a membrane of a fugus. In some embodiments, the VHHs do not (specifically) bind to a cell wall or a component of a cell wall of a fungus. For example, in some embodiments, the VHHs do not (specifically) bind to a glucosylceramide of a fungus.

The VHHs may be capable of (specifically) binding to a lipid-containing fraction of the plasma membrane of a fungus, such as for example a lipid-containing fraction of Botrytis cinerea or other fungus. Said lipid-containing fraction (of Botrytis cinerea or otherwise) may be obtainable by chromatography. The chromatography may be performed on a crude lipid extract (also referred to herein as a total lipid extract, or TLE) obtained from fungal hyphae and/or conidia. The chromatography may be, for example, thin-layer chromatography or normal-phase flash chromatography. The chromatography (for example thin-layer chromatography) may be performed on a substrate, for example a glass plate coated with silica gel. The chromatography may be performed using a chloroform/methanol mixture (for example 85/15% v/v) as the eluent.

For example, said lipid-containing fraction may be obtainable by a method comprising: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract thin-layer chromatography and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

In a more specific embodiment, the lipid-containing fraction may be obtainable by a method comprising: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract thin-layer chromatography on a silica-coated glass slide using a chloroform/methanol mixture (for example 85/15% v/v) as the eluent and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

Alternatively, the fraction may be obtained using normal-phase flash chromatography. In such a method, the method may comprise: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract normal-phase flash chromatography, and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

In a more specific embodiment, the lipid-containing fraction may be obtainable by a method comprising: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract normal-phase flash chromatography comprising dissolving the TLE in dichloromethane (CH2CI2) and MeOH and using CH2CI2/MeOH (for example 85/15%, v/v) as the eluent, followed by filtration of the fractions through a filter.

In a more specific embodiment, the lipid-containing fraction may be obtainable by a method comprising: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract normal-phase flash chromatography comprising dissolving the TLE in dichloromethane (CH2CI2) and MeOH loading the TLE on to a phase flash cartridge (for example a flash cartridge with 15 pm particles), running the column with CH2CI2/MeOH (85/15%, v/v) as the eluent, and filtering the fractions through a filter (for example a 0.45 pm syringe filter with a nylon membrane) and drying the fractions.

The fractions from the chromatography may be processed prior to testing of binding of the VHH to the fraction or of interaction with the fraction. For example, liposomes comprising the fractions may be prepared. Such a method may comprise the use of thin-film hydration. For example, in such a method, liposomes may be prepared using thin-film hydration with the addition of 1 ,6-diphenyl-1 ,3,5-hexatriene (DPH). Binding and/or disruption of the membranes by binding of the VHH may be measured by a change in fluorescence before and after polypeptide binding (or by reference to a suitable control).

Accordingly, in some embodiments, the VHH may (specifically) bind to a lipid-containing chromatographic fraction of the plasma membrane of a fungus, optionally wherein the lipid-containing chromatographic fraction is prepared into liposomes prior to testing the binding of the polypeptide thereto.

Binding of the VHH to a lipid-containing fraction of a fungus may be confirmed by any suitable method, for example bio-layer interferometry. Specific interactions with the lipid-containing fractions may be tested. For example, it may be determined if the polypeptide is able to disrupt the lipid fraction when the fraction is prepared into liposomes, for example using thin-film hydration.

In methods involving chromatography, an extraction step may be performed prior to the step of chromatography. For example, fungal hyphae and/or conidia may be subjected to an extraction step to provide a crude lipid extract or total lipid extract on which the chromatography is performed. For example, in some embodiments, fungal hyphae and/or conidia (for example fungal hyphae and/or conidia of Fusarium oxysporum or Botrytis cinerea) may be extracted at room temperature, for example using chloroform: methanol at 2:1 and 1 :2 (v/v) ratios. Extracts so prepared may be combined and dried to provide a crude lipid extract or TLE.

Accordingly, in some embodiments, the VHH may be capable of (specifically) binding to a lipid- containing fraction of the plasma membrane of a fungus (such as Fusarium oxysporum or Botrytis cinerea), wherein the lipid-containing fraction of the plasma membrane of the fungus is obtained or obtainable by chromatography. The chromatography may be normal-phase flash chromatography or thin-layer chromatography. Binding of the VHH to the lipid to the lipid-containing fraction may be determined according to bio-layer interferometry. In some embodiments, the chromatography step may be performed on a crude lipid fraction obtained or obtainable by a method comprising extracting lipids from fungal hyphae and/or conidia from a fungal sample. The extraction step may use chloroform: methanol at 2:1 and 1 :2 (v/v) ratios to provide two extracts, and then combining the extracts.

In methods relating to thin-layer chromatography, the chromatography may comprise the steps of: fractionating hyphae of the fungus by total lipid extract thin-layer chromatography and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

In some methods relating to thin-layer chromatography, the chromatography may comprise the steps of: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract thin-layer chromatography on a silica-coated glass slide using a chloroform/methanol mixture (for example 85/15% v/v) as the eluent and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

In methods relating to normal-phase flash chromatography, the chromatography may comprise the steps of: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract normal-phase flash chromatography, and selecting the fraction with a Retention Factor (Rf) higher than the ceramide fraction and lower than the non-polar phospholipids fraction.

In some methods relating to normal-phase flash chromatography, the chromatography may comprise the steps of: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus) by total lipid extract normal-phase flash chromatography comprising dissolving the TLE in dichloromethane (CH2CI2) and MeOH and using CH2CI2/MeOH (for example 85/15%, v/v) as the eluent, followed by filtration of the fractions through a filter.

In some methods relating to normal-phase flash chromatography, the chromatography may comprise the steps of: fractionating hyphae and/or conidia of a fungus (for example Botrytis cinerea or other fungus)by total lipid extract normal-phase flash chromatography comprising dissolving the TLE in dichloromethane (CH2CI2) and MeOH loading the TLE on to a phase flash cartridge (for example a flash cartridge with 15 pm particles), running the column with CH2CI2/MeOH (85/15%, v/v) as the eluent, and filtering the fractions through a filter (for example a 0.45 pm syringe filter with a nylon membrane) and drying the fractions.

In some embodiments, the protein encoded by the gene of interest is VHH-1 , VHH-2 or VHH-3. For example, in some embodiments, the protein encoded by the gene of interest is a VHH comprising or consisting of a sequence selected from the group consisting of SEQ ID NOs: 1 , 2, 6, 10, 14 and 15.

In some embodiments, the protein encoded by the gene of interest is a VHH comprising: a CDR1 comprising or consisting of a sequence selected from the group consisting of SEQ ID NOs

3, 7 and 1 1 ; a CDR2 comprising or consisting of a sequence selected from the group consisting of SEQ ID NOs:

4, 8 and 12; and a CDR3 comprising or consisting of a sequence selected from the group consisting of SEQ ID NOs:

5, 9 and 13.

In some embodiments, the protein encoded by the gene of interest is a VHH comprising: a CDR1 comprising or consisting of the sequence of SEQ ID NO: 3, a CDR2 comprising or consisting of the sequence of SEQ ID NO: 4 and a CDR3 comprising or consisting of the sequence of SEQ ID NO: 5; a CDR1 comprising or consisting of the sequence of SEQ ID NO: 7, a CDR2 comprising or consisting of the sequence of SEQ ID NO: 8 and a CDR3 comprising or consisting of the sequence of SEQ ID NO: 9 or a CDR1 comprising or consisting of the sequence of SEQ ID NO: 1 1 , a CDR2 comprising or consisting of the sequence of SEQ ID NO: 12 and a CDR3 comprising or consisting of the sequence of SEQ ID NO: 13.

In some embodiments, the protein encoded by the gene of interest is a VHH comprising a CDR1 comprising or consisting of the sequence of SEQ ID NO: 3, a CDR2 comprising or consisting of the sequence of SEQ ID NO: 4 and a CDR3 comprising or consisting of the sequence of SEQ ID NO: 5.

In some embodiments, the protein encoded by the gene of interest is a VHH comprising SEQ ID NO:

1 .

2.

In some embodiments, the protein encoded by the gene of interest is a VHH comprising any of SEQ ID NOs: 1 , 2, 6, 10, or 14 to 99.

In some embodiments, the protein encoded by the gene of interest comprises a VHH disclosed in WQ2014/177595 or WQ2014/191 146, the entire contents of which are incorporated herein by reference. More specifically the protein encoded by the gene of interest may comprise a VHH comprising an amino acid sequence chosen from the group consisting of SEQ ID NO's: 1 to 84 from WQ2014/177595 or WQ2014/191 146, which correspond to SEQ ID Nos 16-98 and 133 of the present application.

In some embodiments, the protein encoded by the gene of interest is a VHH comprising (a) the amino acid sequence provided in any one of SEQ ID NOs: 1 , 2, 6, 10, 14 to 98 or 133, or (b) an amino acid sequence that is at least 80%, preferably at least 90%, identical to any one of SEQ ID NOs: 1 , 2, 6, 10, 14 to 98 or 133.

In some embodiments, the VHHs are fused to a carrier peptide.

The protein encoded by the gene of interest may be a monoclonal antibody or portion thereof. As used herein, the term "monoclonal antibody" refers to an antibody composition having a homogeneous antibody population. The term is not limited regarding the species or source of the antibody, nor is it intended to be limited by the manner in which it is made. The term encompasses whole immunoglobulins as well as fragments such as Fab, F(ab)2, Fv, and others that retain the antigen binding function of the antibody. Monoclonal antibodies of any mammalian species can be used in this invention. In practice, however, the antibodies will typically be of rat or murine origin because of the availability of rat or murine cell lines for use in making the required hybrid cell lines or hybridomas to produce monoclonal antibodies. As used herein, the term "polyclonal antibody" refers to an antibody composition having a heterogeneous antibody population. Polyclonal antibodies are often derived from the pooled serum from immunized animals or from selected humans.

“Heavy chain variable domain of an antibody or a functional fragment thereof” (also indicated hereafter as VHH), as used herein, means (i) the variable domain of the heavy chain of a heavy chain antibody, which is naturally devoid of light chains, including but not limited to the variable domain of the heavy chain of heavy chain antibodies of camelids or sharks or (ii) the variable domain of the heavy chain of a conventional four-chain antibody (also indicated hereafter as VH), including but not limited to a camelized (as further defined herein) variable domain of the heavy chain of a conventional four-chain antibody (also indicated hereafter as camelized VH).

As used herein, the terms "complementarity determining region" or "CDR" within the context of antibodies refer to variable regions of either the H (heavy) or the L (light) chains (also abbreviated as VH and VL, respectively) and contain the amino acid sequences capable of specifically binding to antigenic targets. These CDR regions account for the basic specificity of the antibody for a particular antigenic determinant structure. Such regions are also referred to as "hypervariable regions." The CDRs represent non-contiguous stretches of amino acids within the variable regions but, regardless of species, the positional locations of these critical amino acid sequences within the variable heavy and light chain regions have been found to have similar locations within the amino acid sequences of the variable chains. The variable heavy and light chains of all canonical antibodies each have 3 CDR regions, each non- contiguous with the others (termed L1 , L2, L3, H1 , H2, H3) for the respective light (L) and heavy (H) chains.

As further described hereinbelow, the amino acid sequence and structure of a heavy chain variable domain of an antibody can be considered, without however being limited thereto, to be comprised of four framework regions or “FR's”, which are referred to in the art and hereinbelow as “framework region 1 ” or “FR1 ”; as “framework region 2” or “FR2”; as “framework region 3” or “FR3”; and as “framework region 4” or “FR4”, respectively, which framework regions are interrupted by three complementary determining regions or “CDR's”, which are referred to in the art as “complementarity determining region 1 ” or “CDR1 ”; as “complementarity determining region 2” or “CDR2”; and as “complementarity determining region 3” or “CDR3”, respectively.

As also further described hereinbelow, the total number of amino acid residues in a heavy chain variable domain of an antibody (including a VHH or a VH) can be in the region of 1 10-130, is preferably 1 12-1 15, and is most preferably 1 13. It should however be noted that parts, fragments or analogs of a heavy chain variable domain of an antibody are not particularly limited as to their length and/or size, as long as such parts, fragments or analogs retain (at least part of) the functional activity, such as the pesticidal, biocidal, biostatic activity, insecticidal, insectistatic, fungicidal or fungistatic activity (as defined herein) and/or retain (at least part of) the binding specificity of the original a heavy chain variable domain of an antibody from which these parts, fragments or analogs are derived from. Parts, fragments or analogs retaining (at least part of) the functional activity, such as the pesticidal, biocidal, biostatic activity, fungicidal or fungistatic activity (as defined herein) and/or retaining (at least part of) the binding specificity of the original heavy chain variable domain of an antibody from which these parts, fragments or analogs are derived from are also further referred to herein as “functional fragments” of a heavy chain variable domain.

A method for numbering the amino acid residues of heavy chain variable domains is the method described by Chothia et al. (Nature 342, 877-883 (1989)), the so-called “AbM definition” and the so-called “contact definition”. Herein, this is the numbering system adopted.

Alternatively, the amino acid residues of a variable domain of a heavy chain variable domain of an antibody (including a VHH or a VH) may be numbered according to the general numbering for heavy chain variable domains given by Kabat et al. (“Sequence of proteins of immunological interest”, US Public Health Services, NIH Bethesda, Md., Publication No. 91 ), as applied to VHH domains from Camelids in the article of Riechmann and Muyldermans, referred to above (see for example FIG. 2 of said reference). For a general description of heavy chain antibodies and the variable domains thereof, reference is inter alia made to the following references, which are mentioned as general background art: WO 94/04678, WO 95/04079 and WO 96/34103 of the Vrije Universiteit Brussel; WO 94/25591 , WO 99/37681 , WO 00/40968, WO 00/43507, WO 00/65057, WO 01/40310, WO 01/44301 , EP 1 134231 and WO 02/48193 of Unilever; WO 97/49805, WO 01/21817, WO 03/035694, WO 03/054016 and WO 03/055527 of the Vlaams Instituut voor Biotechnologie (VIB); WO 03/050531 of Algonomics N.V. and Ablynx NV; WO 01/90190 by the National Research Council of Canada; WO 03/025020 (=EP 1 433 793) by the Institute of Antibodies; as well as WO 04/041867, WO 04/041862, WO 04/041865, WO 04/041863, WO 04/062551 by Ablynx; Hamers-Casterman et al., Nature 1993 Jun. 3; 363 (6428): 446-8.

As described herein, the protein encoded by the gene of interest may be a heavy chain single variable domain. Generally, it should be noted that the term “heavy chain single variable domain” as used herein in its broadest sense is not limited to a specific biological source or to a specific method of preparation. For example, a heavy chain single variable domain can be obtained (1 ) by isolating the VHH domain of a naturally occurring heavy chain antibody; (2) by isolating the VH domain of a naturally occurring four-chain antibody (3) by expression of a nucleotide sequence encoding a naturally occurring VHH domain; (4) by expression of a nucleotide sequence encoding a naturally occurring VH domain (5) by “camelization” (as described below) of a naturally occurring VH domain from any animal species, in particular a species of mammal, such as from a human being, or by expression of a nucleic acid encoding such a camelized VH domain; (6) by “camelisation” of a “domain antibody” or “Dab” as described by Ward et al (supra), or by expression of a nucleic acid encoding such a camelized VH domain (7) using synthetic or semi-synthetic techniques for preparing proteins, polypeptides or other amino acid sequences; (8) by preparing a nucleic acid encoding a VHH or a VH using techniques for nucleic acid synthesis, followed by expression of the nucleic acid thus obtained; and/or (9) by any combination of the foregoing. Suitable methods and techniques for performing the foregoing will be clear to the skilled person based on the disclosure herein.

However, according to a specific embodiment, the heavy chain variable domains as disclosed herein do not have an amino acid sequence that is exactly the same as (i.e. as a degree of sequence identity of 100% with) the amino acid sequence of a naturally occurring VH domain, such as the amino acid sequence of a naturally occurring VH domain from a mammal, and in particular from a human being.

The present invention will now be illustrated by way of the following non-limiting Examples.

Examples

Example 1 : cloning of gene of interest into Komaqataella phaff ii and expression screening

Cloning of VHH-1 (a VHH having the amino acid sequence of SEQ ID NO: 1 ) fused to different secretion signal sequences was performed using Golden Gate assembly, exploiting the Type IIS restriction enzyme BsmBI (New England Biolabs NEBR0739L) using the specifications as provided by the manufacturer. Cloning was performed in a pUC19 derived vector suitable for amplification in standard E. co// strains such as One Shot™ TOP10 Chemically Competent E. co// (Thermo Scientific) or NEB® 5-alpha Competent E. coli (New England Biolabs). The pUC19 derived vector contains an AOX1 promoter, a cloning site compatible with Bsmbl to receive a gene of interest fused to a secretion signal sequence and the AOX1 terminator. Here the secretion signal sequence and gene of interest (VHH1 ) were ordered and/or amplified by PCR including the BsmBI restriction sites and the corresponding overhangs to clone the secretion signal sequence fused to VHH1 by ligation into the pUC19 derived vector. This was repeated for each tested secretion signal sequence. The signal sequences used for this example consist of the signal peptide as indicated in tables 1 and 2 and the pro-peptide of aMF. The obtained plasmids were then used as a template for a standard PCR reaction using high fidelity polymerase such as used here Q5® High- Fidelity DNA Polymerase (New England Biolabs) followed by a PCR purification using for example the GeneJET PCR Purification Kit (Thermo Fisher Scientific) in order to produce double stranded DNA expression constructs for transformation into electrocompetent Komagataella phaffii cells. Electrocompetent cells were prepared according to the protocol as set out by Joan Lin-Cereghino et al. (2005) Condensed protocol for competent cell preparation and transformation of the methylotrophic yeast Pichia pastoris. Biotechniques: 38, 1:44-48. Transformation of expression constructs to electrocompetent cells was done in a 2 mm electroporation cuvette using an electroporator (MicroPulser Electroporator, Biorad, cat n° 1652100) according to the manufacturer’s instructions for Komagataella phaffii. Note that due to the high frequency of random integration events of expression constructs in Komagataella phaffii, multiple copies of the expression cassette can be integrated randomly into the genome (Jan-Philipp Schwarzhans et al. (2016). Non-canonical integration events in Pichia pastoris encountered during standard transformation analysed with genome sequencing. Scientific Reports. 6: 38952). Therefore, multiple individual Komagataella phaffii transformants were screened for expression levels of VHH1 in order to select those clones showing most promising expression levels as determined by standard SDS-PAGE analysis, using a LabChip GXII (PerkinElmer) or Bradford protein assay (Thermo Scientific™ 23236) according to the manufacturer’s instructions. Although Bradford analysis will provide a measure of total protein and not the individual VHH or protein of interest, it can still be used to assess as a first screening which colonies produced and secreted the highest amount of the protein of interest since the background levels of secreted protein are in general similar for all wells with a similar GD600 value and furthermore these background levels are significantly lower compared to secretion of VHH or protein of interest. Titer screening was performed in microscale fermentation conditions (fermentation was conducted in 96-deep well plates) using standard methanol inducing conditions essentially as was described by Maria Weidner et al. (2010). Expression of Recombinant Proteins in the Methylotrophic Yeast Pichia pastoris. J Vis Exp. 36: 1862 and Thomas Vogletal. (2020). Orthologous promoters from related methylotrophic yeasts surpass expression of endogenous promoters of Pichia pastoris. AMB Express. 10: 38. Alternatively, transformants in which DNA constructs were successfully integrated can be identified by PCR, using specific primers for the expression cassettes, prior to confirming expression levels as described above. The results of each of the 96-deep well plate fermentation reactions are provided in Table 1 and are expressed relative to the expression levels obtained by using the a-MF and following the exact same procedure as was done for the constructs containing the signal peptides as listed in Table 1 .

Signal peptides 3 (SEQ ID NO: 118) and 4 (SEQ ID NO: 119) are artificial constructs designed by the inventors to find alternative or improved signal peptides and were shown to provide improved expression over a-MF as secretion signal (Table 1).

Table 1 : VHH expression levels of strains transformed with expression constructs varying in signal peptide sequences relative to constructs comprising the a-MF signal peptide, in 96-deep well plates.

Example 2: expression levels in fed-batch fermentation

Further confirmation of expression levels was performed by fed-batch fermentation using an Ambr® 250 bioreactor (Sartorius) and operated according to the manufacturer’s instructions. Media and conditions for growing Pichia pastor/s with methanol induction were taken from Maria Weidner etal. (2010). Expression of Recombinant Proteins in the Methylotrophic Yeast Pichia pastoris. J Vis Exp. 36: 1862 and Thomas Vogt et al. (2020). Orthologous promoters from related methylotrophic yeasts surpass expression of endogenous promoters of Pichia pastoris. AMB Express. 10: 38, further complimented with teachings specific for running fermentation reactions from, for example, Wan-Cang Liu et al. (2019). Fed-batch high-cell-density fermentation strategies for Pichia pastoris growth and production. Critical Reviews in Biotechnology. 39, 2: 258-271. Determination of VHH concentration was performed by protein A affinity high performance liquid chromatography (PA-HPLC) or reverse phase high performance liquid chromatography (RP-HPLC). The strains used to generate the data shown in Table 2 were constructed identically as described in example 1 .

Table 2: VHH expression levels in gram per liter of strains transformed with expression constructs varying in signal peptide sequences growing in fed-batch fermentations and relative to constructs comprising the a-MF signal peptide. Example 3: Expression of three additional VHH and effect of secretion signal sequence on production levels.

As a further example, three additional VHH encoding sequences where fused at their 5’ end to different secretion signal sequences. Expression cassettes were constructed with an AOX1 promoter and an AOX1 terminator. Each expression cassette for each different VHH comprised a different secretion signal sequence comprising a signal peptide sequence (more specifically and as indicated in the corresponding figures 1 to 3, any one of signal peptide 1 according to SEQ ID NO: 99, signal peptide 3 according to SEQ ID NO: 101 , signal peptide 4 according to SEQ ID NO: 102, signal peptide 6 according to SEQ ID NO: 104, signal peptide 10 according to SEQ ID NO: 108, or signal peptide 16 according to SEQ ID NO: 134). Each secretion signal sequence also comprised the pro-peptide sequence of Saccharomyces cerevisiae aMF according to SEQ ID NO: 1 15. Expression cassettes where constructed and cloned into Pichia pastoris according to the procedure set out in Example 1 . Several hundreds of clones where picked and inoculated into 96-deep-well plates, methanol induced, and after cell removal, total protein content was measured using Bradford absorbance. The results are displayed in Figures 1 to 3 for each separate VHH- X, VHH-Y, VHH-W. Figures 1 to 3 show adjusted Bradford absorbance values across all 30 plates, grouped by BioBrick and plate. Each point shows the adjusted absorbance from a single well. The Tukey-style box and whiskers plots show the distribution of the data: the box shows the inner two quartiles, with the median indicated by the solid horizontal line through the box; whiskers extend to 5 times the inner-quartile range; points beyond the whiskers are extreme outliers of the distribution. Mean values for each distribution are plotted as solid black dots. The first four bricks on the left are the controls included in each plate (4 each, 16 total per plate); the remaining bars are the screening data for each BioBrick, with BioBricks sorted by the median of the top 8 wells (top 10 %), from highest to lowest. This median value is plotted as a triangle for each screening plate. The dotted horizontal line at y = 1 shows the end of the linear range for a Bradford measurement, (0.1 , 1). The y-axis is transformed using a Box-Cox transformation with A=-0.5, which normalizes and created homoskedastic (variance independent of mean) control distributions. The standardization eliminates any significant plate-to-plate variability.

The results in Figures 1 to 3 clearly demonstrate that transformation of the host cell with an expression construct comprising signal peptide 1 , 6, 10 or 16 (and in some cases peptides 3 and 4) consistently resulted in clones having far greater VHH expression levels than cells transformed with constructs comprising the canonical Saccharomyces cerevisiae a-MF signal peptide sequence.

The results of Figures 1 to 3 furthermore show the importance of selecting the outliers of transformed Pichia pastoris strains showing the highest expression levels to achieve optimal expression of a VHH or other gene of interest.

Example 4: Influence of signal peptides and VHH copy numbers on protein production.

A clone selection was performed to identify strains producing high, medium, and low levels of VHH for each construct as indicated in Table 3. Typically a large number of transformed clones (at least one 96 well plate per construct) are screened for expression levels using Bradford protein quantification as described in Example 1 . Subsequently, a number of high producing clones, intermediate producing clones and low producing clones for each VHH were first streaked to obtain single colonies and expression levels were then validated in a second 96-deep-well plate fermentation. From this validation, 16 clones per VHH demonstrating stability by producing approximately the same amount as in the initial screen were chosen for further analysis. In order to obtain an estimate copy number of low, middle and high producing clones, qPCR was performed for each of the selected clones. Therefore, the cell lysis and protein removal from overnight yeast cultures were carried out using the reagents provided in the MasterPure Yeast DNA Purification Kit (Lucigen), following the kit protocol, except for the addition of 12 times the recommended amount of RNAse A. Genomic DNA (gDNA) purification was conducted using the Monarch® Genomic DNA Purification Kit (NEB), employing the kit protocol for binding buffer, columns, and wash buffer. The quality assessment of the extracted gDNA involved agarose gel electrophoresis and measurement with a Nanodrop spectrophotometer.

For the qPCR analysis, primers were designed using Primer3 software (https://bioinfo.ut.ee/primer3-0.4.0/). The qPCR assays were executed with Sso Advanced SYBR Green master mix (Bio-rad) according to the manufacturer's instructions, using the CFX-Opus 96 Real-Time PCR equipment (Bio-rad). In each assay, a standard curve was generated through a dilution series (1 :5) of 5 for each gene target. The efficiency and R² per target were calculated using the CFX Maestro Software (Biorad). All samples and controls were run in duplicate, and qPCR data were normalized using the ARG4 gene as the endogenous control (housekeeping gene). For copy number calculation, the average Ct value of the duplicates was employed using primers binding to A0X1 promoter. The VHH copy number of each strain was determined with the value of the total VHH copy number to the total genomic DNA copy number, which could be represented by the total copy number of A0X1 because P. pastoris contains only one copy of AOX1 within its genomic DNA. Relative copy numbers were calculated using the 2-AACt method, and wild type as reference.

Table 3: VHH Copy Number (CN) determination of strains transformed with expression constructs having signal peptide 1 , signal peptide 10 and the pre-pro protein from aMF fused to VHH-1 or VHH-W. Fold change was determined using the Bradford protein assay and set out relative to the pre-pro protein aMF low producers. Copy number (CN) represents an estimate of the total number of VHH construct copies present in each selected clone.

Out of the screening signal peptides 1 and 10 presented several high producers whereas constructs with the pre-pro protein form a-MF failed to show significant variation in expression levels. As indicated by the copy numbers, for signal peptides 1 and 10, high copy numbers correlate with higher expression levels. A similar observation for the canonical pre-pro protein form a-MF could not be made. Although a 7-copy number strain was found, this did not lead to considerable increases in expression levels. It must be noted here that strains with a-MF pre-pro protein fused to a VHH with even higher copy numbers can be created just like with signal peptides 1 and 10, but since higher copy numbers do not equate to higher expression levels for the a-MF pre-pro protein, such multi-copy events are not easily identifiable. For signal peptides 1 and 10 high expression levels do correlate with high copy-number strains, with expression levels doubling in a microtiter plate fermentation setup for strains having multiple copies.

It therefore seems that the maximum increase in production levels of a protein of interest such as a VHH, can be achieved when multiple copies of the expression constructs are integrated into the genome of the host cell. This increase in production level (from higher copy number) was surprisingly much larger for signal peptides 1 and 10 relative to the canonical a-MF signal peptide sequence. The canonical secretion signal sequence with the signal peptide (pre) and carrier (pro) peptides from Saccharomyces cerevisiae aMF, fails to yield outliers having significantly increased protein of interest production (such as increased VHH production) since no improvement of production can be made by increasing the copy numbers of expression cassettes comprising the canonical a-MF secretion signal sequence.

Statements (features) and embodiments of the methods and compositions as disclosed herein are set out herebelow. Each of the statements and embodiments as disclosed by the invention so defined may be combined with any other statement and/or embodiment unless clearly indicated to the contrary. In particular, any feature indicated as being preferred or advantageous may be combined with any other feature or features indicated as being preferred or advantageous.

Embodiments

The present invention provides at least the following numbered statements of invention

1 . A microbial host cell comprising: a. one or more copies of an expression cassette comprising i. a promoter capable of promoting expression of a gene of interest, and ii. the gene of interest,

• wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and iii. a terminator, and b. wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell. The microbial host cell of claim 1 , wherein the microbial host cell comprises two or more copies of the expression cassette. The microbial host cell of statement 1 , wherein the microbial host cell comprises at least 3 or more copies, at least 4 or more copies, at least 5 or more copies, at least 6 or more copies, at least or more 7 copies, at least 8 or more copies, at least 9 or more copies, at least 10 or more copies, at least 1 1 or more copies, at least 12 or more copies, at least 13 or more copies, at least 14 or more copies, at least 15 or more copies, at least 16 or more copies, at least 17 or more copies, at least 18 or more copies, at least 19 or more copies, at least 20 or more copies, at least 30 or more copies, at least 40 or more copies, at least 50 or more copies, at least 60 or more copies or at least 70 or more copies of the expression cassette. The microbial host cell of any of the preceding statements, wherein the signal peptide- encoding sequence is derived from a Komagataella strain, a Saccharomyces strain, a Fusarium strain, or a Trichoderma strain. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence of any one of SEQ ID NOs: 99 to 1 13 or 134, b. a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 99 to 1 13 or 134, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of any one of SEQ ID NOs: 1 16 to 130 or 135, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID NOs: 1 16 to 130 or 135. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 99, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 99, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 1 16, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 16. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 100, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 100, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 1 17, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 117.

8. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 101 , b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 101 , c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 118, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 118.

9. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 102, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 102, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 119, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 119.

10. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 103, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 103, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 120, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 120.

11. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 104, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 104, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 121 , or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 121 .

12. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 105, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 105, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 122, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 122.

13. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 106, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 106, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 123, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 123.

14. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 107, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 107, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 124, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 124.

15. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 108, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 108, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 125, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 125.

16. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 109, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 109, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 126, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 126. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 110, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 110, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 127, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 127. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 111 , b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 111 , c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 128, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 128. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 112, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 112, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 129, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 129. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 113, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 113, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 130, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 130. The microbial host cell of any of statements 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 134, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 134, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 135, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 135.

22. The microbial host cell of any of the preceding statements, wherein the gene of interest encodes an antibody, an antibody fragment or a VHH.

23. The microbial host cell of any of statements 1 to 20, wherein the gene of interest encodes a VHH.

24. The microbial host cell of any of statements 1 to 20, wherein the gene of interest encodes a toxin such as a Bacillus thuringiensis (Bt) toxin, a crystal (Cry) toxin, a cytolytic (Cyt) toxin, a vegetative insecticidal protein (Vip), a secreted insecticidal protein (Sip), a Bin-like toxin or a spider toxin such as an agatoxin or a diguetoxin.

25. The microbial host cell of any of statements 1 to 20, wherein the gene of interest encodes an antimicrobial peptide.

26. The microbial host cell of any of the preceding statements, wherein the secretion signal sequence further comprises a pro-sequence.

27. The microbial host cell of statement 26, wherein the pro-sequence is the Saccharomyces a- mating factor pro-sequence.

28. The microbial host cell of any of statement 1 to 26, wherein the secretion signal sequence is fused to the gene of interest without the presence of a pro-sequence between the signal peptide encoding sequence and the gene of interest.

29. The microbial host cell of any of the preceding statements, wherein the microbial host cell is a yeast cell.

30. The microbial host cell of statement 29, wherein the yeast is Komagataella phaffii.

31. Use of the microbial host cell of any of the preceding statements for manufacturing a protein, wherein the protein is encoded by the gene of interest.

32. A nucleic acid comprising a signal peptide-encoding sequence, and wherein the signal peptide-encoding sequence is a. the nucleotide sequence of SEQ ID NO: 101 , b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 101 , c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 1 18, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 118.

33. A nucleic acid comprising a signal peptide-encoding sequence, and wherein the signal peptide-encoding sequence is a. the nucleotide sequence of SEQ ID NO: 102, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 102, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 119, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 119.

34. A nucleic acid comprising a signal peptide-encoding sequence, and wherein the signal peptide-encoding sequence is a. the nucleotide sequence of SEQ ID NO: 103, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 103, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 120, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 120.

35. A nucleic acid comprising a signal peptide-encoding sequence, and wherein the signal peptide-encoding sequence is a. the nucleotide sequence of SEQ ID NO: 106, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 106, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of SEQ ID NO: 123, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 123.

36. The nucleic acid of any of the statements 32 to 35 where the nucleic acid further comprises a pro-sequence.

37. The nucleic acid of statement 36, where the pro-sequence is a Saccharomyces a-mating factor pro-sequence.

38. The nucleic acid of any one of statements 32 to 37, wherein the nucleic acid is isolated and/or recombinant.

39. A peptide encoded by the nucleic acid of any one of statements 32 to 38. 40. An expression cassette comprising the nucleic acid of any of statements 32 to 38 and a promoter operably linked to the nucleic acid.

41 . The microbial host cell of any of statements 1 to 30 or the expression cassette of statement 40, where the promoter is selected from the group consisting of CAT1 , AOX1 , GAP, AOD, AOX2, ADH1 , CAM1 , DAK1 , DAS1 , DAS2, ENO1 , FDH1 , FLD1 , FMD, GPM1 , GPM2, HSP82, ICL1 , ILV5, KAR2, KEX2, MOX, OLE1 , PET9, PEX5, PEX8, PMP20, PGK1 , PHO89/NSP, SSA4, SUT2, TEF1 , THI1 1 , TPI1 , YPT1 , GTH1 , GCW14, and GUT1 .

42. The expression cassette of any one of statements 40 to 41 , wherein the expression cassette further comprises a gene of interest, wherein the signal peptide-encoding sequence is fused to the gene of interest.

43. The expression cassette of statement 42, wherein the gene of interest encodes an antibody, an antibody fragment or a VHH.

44. The expression cassette of statement 42, wherein the gene of interest encodes a VHH.

45. The expression cassette of statement 42, wherein the gene of interest encodes a toxin such as a Bacillus thuringiensis (Bt) toxin, a crystal (Cry) toxin, a cytolytic (Cyt) toxin, a vegetative insecticidal protein (Vip), a secreted insecticidal protein (Sip), a Bin-like toxin or a spider toxin such as an agatoxin or a diguetoxin.

46. The expression cassette of statement 42, wherein the gene of interest encodes an antimicrobial peptide.

47. The expression cassette of any one of statements 40 to 46, wherein the expression cassette is isolated and/or recombinant.

48. A vector comprising the nucleic acid of any one of statements 32 to 38 or the expression cassette of any one of statements 40 to 47.

49. A method for producing a protein, the method comprising a. culturing the microbial host cell of any one of statements 1 to 30, or a microbial host cell comprising the expression cassette any one of statements 40 to 47 or the vector of statement 48, under conditions to express the gene of interest, wherein the gene of interest encodes the protein, b. optionally isolating the protein, c. optionally purifying the protein, d. optionally modifying the protein, and e. optionally formulating the protein. 50. A protein produced by the method of statement 49.

51. A peptide comprising the amino acid sequence provided in SEQ ID NO: 1 18, or an amino acid sequence with at least 90% identity to SEQ ID NO: 1 18.

52. A peptide comprising the amino acid sequence provided in SEQ ID NO: 1 19, or an amino acid sequence with at least 90% identity to SEQ ID NO: 1 19.

53. A peptide comprising the amino acid sequence provided in SEQ ID NO: 120, or an amino acid sequence with at least 90% identity to SEQ ID NO: 120.

54. A peptide comprising the amino acid sequence provided in SEQ ID NO: 123, or an amino acid sequence with at least 90% identity to SEQ ID NO: 123.

55. The peptide of any one of statements 51 to 54, further comprising a pro-peptide.

56. The peptide of statement 55, wherein the pro-peptide is the Saccharomyces a-mating factor pro-peptide.

57. A microbial host cell comprising the nucleic acid of any one of statements 32 to 38, the expression cassette any one of statements 40 to 47, the vector of statement 48 or the peptide of any one of statements 50 to 54.

58. A protein comprising the peptide of any one of statements 50 to 54.

59. Use of the nucleic acid of any one of statements 32 to 38 as or in a secretion signal sequence.

60. Use of the peptide of any one of statements 51 to 54 as or in a secretion signal.

61. A precursor protein comprising a secretion signal fused to a protein of interest, where the secretion signal comprises a signal peptide having the amino acid sequence of any one of SEQ ID NOs: 1 16 to 130 or 135, or a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID NOs: 1 16 to 130 or 135.

62. The precursor protein according to statement 61 , where the secretion signal further comprises pro-sequence.

63. The precursor protein of statement 61 or 61 , where the pro-sequence is a Saccharomyces a-mating factor pro-sequence. 64. The precursor protein of any one of statements 61 to 63, where the protein of interest is an antibody, an antibody fragment or a VHH.

65. The precursor protein of any one of statements 61 to 64, wherein the protein of interest is a VHH.

66. The precursor protein of any one of statements 61 to 63, where the VHH is a VHH selected from an amino acid sequence according to any one of SEQ ID Nos:1 , 2 ,6, 10 or 14 to 98.

67. The precursor protein of any one of statements 61 to 63, wherein the protein of interest is a toxin such as a Bacillus thuringiensis (Bt) toxin, a crystal (Cry) toxin, a cytolytic (Cyt) toxin, a vegetative insecticidal protein (Vip), a secreted insecticidal protein (Sip), a Bin-like toxin or a spider toxin such as an agatoxin or a diguetoxin.

68. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 116 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

69. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 117 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

70. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 118 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

71 . A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 119 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

72. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 120 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

73. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 121 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 122 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 123 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 124 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 125 and (b) a Saccharomyces a-mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N-terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 126 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 127 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 128 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 129 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH. 82. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 130 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

83. A precursor protein comprising or consisting of (1 ) a secretion signal consisting of (a) a signal peptide having the amino acid sequence of SEQ ID NO: 135 and (b) a Saccharomyces a- mating factor pro-sequence, and (2) a VHH; where the secretion signal is fused to the N- terminus of the VHH.

84. The precursor protein of any one of statements 68 to 83, wherein the Saccharomyces a- mating factor pro-sequence is fused to the C-terminus of the signal peptide.

85. The precursor protein of any one of statements 68 to 84, wherein the Saccharomyces a- mating factor pro-sequence comprises the amino acid sequence of SEQ ID NO: 1 15.

86. A nucleic acid encoding the precursor protein of any one of statements 61 to 85.

87. An expression cassette comprising the nucleic acid of statement 86.

88. A vector comprising the nucleic acid of statement 86 or the expression cassette of statement 87.

89. A microbial host cell comprising the precursor protein of any one of statements 61 to 85, comprising the nucleic acid of statement 86, the expression cassette of statement 87 or the vector of statement 88.

90. A microbial host cell comprising: a. one or more copies of an expression cassette comprising i. a promoter capable of promoting expression of a nucleotide sequence encoding a precursor protein according to any one of statements 61 to 85, and ii. a nucleotide sequence encoding a precursor protein according to any one of statements 61 to 85, and

Hi. a terminator, and b. wherein the one or more copies of the expression cassette are integrated into the genome of the microbial host cell.

91. The microbial host cell of statement 90, wherein the microbial host cell comprises two or more copies of the expression cassette. The microbial host cell of statement 84, wherein the microbial host cell comprises at least 3 or more copies, at least 4 or more copies, at least 5 or more copies, at least 6 or more copies, at least or more 7 copies, at least 8 or more copies, at least 9 or more copies, at least 10 or more copies, at least 1 1 or more copies, at least 12 or more copies, at least 13 or more copies, at least 14 or more copies, at least 15 or more copies, at least 16 or more copies, at least 17 or more copies, at least 18 or more copies, at least 19 or more copies, at least 20 or more copies, at least 30 or more copies, at least 40 or more copies, at least 50 or more copies, at least 60 or more copies or at least 70 or more copies of the expression cassette. A nucleic acid comprising a gene of interest and a secretion signal sequence fused to the 5’ end of the gene of interest, wherein i. the secretion signal sequence comprises a signal peptide encoding sequence and a pro-sequence, ii. the signal peptide encoding sequence encodes a signal peptide having the amino acid sequence provided in any one of SEQ ID NOs: 1 16 to 130 or 135, or an amino acid sequence which is at least 90% identical to an amino acid sequence provided in any one of SEQ ID NOs: 1 16 to 130 or 135,

Hi. the pro-sequence encodes a pro-peptide having the amino acid sequence provided in SEQ ID NO: 1 15, or an amino acid sequence which is at least 90% identical to the amino acid sequence provided in SEQ ID NO: 1 15, and iv. the pro-sequence is fused to the 3’ end of the signal peptide encoding sequence. The nucleic acid of statement 93, wherein the signal peptide encoding sequence encodes a signal peptide having the amino acid sequence provided in any one of SEQ ID NOs: 1 16, 121 , 125 or 135, or an amino acid sequence which is at least 90% identical to an amino acid sequence provided in any one of SEQ ID NOs: 1 16, 121 , 125 or 135. The nucleic acid of statement 93, wherein the signal peptide encoding sequence encodes a signal peptide having the amino acid sequence provided in SEQ ID NO: 1 16, or an amino acid sequence which is at least 90% identical to the amino acid sequence provided in SEQ ID NO: 1 16. The nucleic acid of statement 93, wherein the signal peptide encoding sequence encodes a signal peptide having the amino acid sequence provided in SEQ ID NO: 121 , or an amino acid sequence which is at least 90% identical to the amino acid sequence provided in SEQ ID NO: 121 . The nucleic acid of statement 93, wherein the signal peptide encoding sequence encodes a signal peptide having the amino acid sequence provided in SEQ ID NO: 125, or an amino acid sequence which is at least 90% identical to the amino acid sequence provided in SEQ ID NO: 125. 98. The nucleic acid of statement 93, wherein the signal peptide encoding sequence encodes a signal peptide having the amino acid sequence provided in SEQ ID NO: 135, or an amino acid sequence which is at least 90% identical to the amino acid sequence provided in SEQ ID NO: 135.

99. The nucleic acid of any one of statements 93 to 98, wherein the gene of interest encodes a VHH.

100. An expression cassette comprising the nucleic acid of any one of statements 93 to 99.

101 .A vector comprising the nucleic acid of any one of statements 93 to 99 or the expression cassette of statement 100.

102. A microbial host cell comprising the nucleic acid of any one of statements 93 to 99, the expression cassette of statement 100 or the vector of statement 101 .

103. The microbial host cell of statement 102, wherein two or more copies of the expression cassette of statement 100 are integrated into the genome of the microbial host cell.

104. The microbial host cell of statement 102, wherein 3 or more copies, 4 or more copies, 5 or more copies, 6 or more copies, or more 7 copies, 8 or more copies, 9 or more copies, 10 or more copies, 1 1 or more copies, 12 or more copies, 13 or more copies, 14 or more copies, 15 or more copies, 16 or more copies, 17 or more copies, 18 or more copies, 19 or more copies, 20 or more copies, 30 or more copies, 40 or more copies, 50 or more copies, 60 or more copies or 70 or more copies of the expression cassette of statement 100 are integrated into the genome of the microbial host cell.

105. A protein encoded by the nucleic acid of any one of statements 93 to 99.

106. The protein of any one of statements 61 to 85 or 105, the nucleic acid of any one of statements 93 to 99, the vector of statement 88 or 101 , or the microbial host cell of any one of statements 89 to 92 or 102 to 104, which is isolated and/or recombinant.

Claims

• wherein the gene of interest is fused to a secretion signal sequence which comprises a signal peptide encoding sequence, and

2. The microbial host cell of claim 1 , wherein the microbial host cell comprises two or more copies of the expression cassette, optionally wherein the microbial host cell comprises at least 3 or more copies, at least 4 or more copies, at least 5 or more copies, at least 6 or more copies, at least or more 7 copies, at least 8 or more copies, at least 9 or more copies, at least 10 or more copies, at least 1 1 or more copies, at least 12 or more copies, at least 13 or more copies, at least 14 or more copies, at least 15 or more copies, at least 16 or more copies, at least 17 or more copies, at least 18 or more copies, at least 19 or more copies, at least 20 or more copies, at least 30 or more copies, at least 40 or more copies, at least 50 or more copies, at least 60 or more copies or at least 70 or more copies of the expression cassette.

3. The microbial host cell of claim 1 or claim 2, wherein the signal peptide-encoding sequence is derived from a Komagataella strain, a Saccharomyces strain, a Fusarium strain, or a Trichoderma strain.

4. The microbial host cell of any preceding claim, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence of any one of SEQ ID NOs: 99 to 1 13 or 134, b. a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 99 to 1 13 or 134, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of any one of SEQ ID NOs: 1 16 to 130 or 135, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID NOs: 1 16 to 130 or 135.

5. The microbial host cell of any of claims 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence of any one of SEQ ID NOs: 99, 104, 108 and 134, b. a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 99, 104, 108 and 134, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of any one of SEQ ID NOs: 1 16, 121 , 125 or 135, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID NOs: 1 16, 121 , 125 or 135.

6. The microbial host cell of any of claims 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 99 or 108, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 99 or 108, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 1 16 or 125, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 16 or 125.

7. The microbial host cell of any of claims 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 99, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 99, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 1 16, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 1 16.

8. The microbial host cell of any of claims 1 to 4, wherein the signal peptide-encoding sequence is selected from any of a. the nucleotide sequence provided in SEQ ID NO: 108, b. a nucleotide sequence with at least 90% identity to SEQ ID NO: 108, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence provided in SEQ ID NO: 125, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to SEQ ID NO: 125.

9. The microbial host cell of any of the preceding claims, wherein the gene of interest encodes a. an antibody, an antibody fragment or a VHH; or b. a toxin such as a Bacillus thuringiensis (Bt) toxin, a crystal (Cry) toxin, a cytolytic (Cyt) toxin, a vegetative insecticidal protein (Vip), a secreted insecticidal protein (Sip), a Binlike toxin or a spider toxin such as an agatoxin or a diguetoxin; or c. an antimicrobial peptide.

10. The microbial host cell of any of the preceding claims, wherein the gene of interest encodes a VHH.

1 1 . The microbial host cell of any of the preceding claims, wherein the secretion signal sequence further comprises a pro-sequence, optionally wherein the pro-sequence is the Saccharomyces a-mating factor pro-sequence.

12. The microbial host cell of any of claim 1 to 10, wherein the secretion signal sequence is fused to the gene of interest without the presence of a pro-sequence between the signal peptide encoding sequence and the gene of interest.

13. The microbial host cell of any of the preceding claims, wherein the microbial host cell is a yeast cell, optionally wherein the yeast is Komagataella phaffii.

14. A nucleic acid comprising a signal peptide-encoding sequence, and wherein the signal peptide-encoding sequence is a. the nucleotide sequence of any one of SEQ ID NOs: 101 to 103 or 106, b. a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 101 to 103 or 106, c. a nucleotide sequence encoding a signal peptide having the amino acid sequence of any one of SEQ ID NOs: 1 18 to 120 or 123, or d. a nucleotide sequence encoding a signal peptide having an amino acid sequence with at least 90% identity to any one of SEQ ID NOs: 1 18 to 120 or 123.

15. A method for producing a protein, the method comprising a. culturing the microbial host cell of any one of claims 1 to 13, wherein the gene of interest encodes the protein, b. optionally isolating the protein, c. optionally purifying the protein, d. optionally modifying the protein, and e. optionally formulating the protein.

16. A peptide comprising the amino acid sequence provided in any one of SEQ ID NOs: 1 18 to 120 or 123, or an amino acid sequence with at least 90% identity to any one of SEQ ID NOs: 1 18 to 120 or 123.