[go: up one dir, main page]

WO2026017381A1 - Optimized bacillus host cells - Google Patents

Optimized bacillus host cells

Info

Publication number
WO2026017381A1
WO2026017381A1 PCT/EP2025/068095 EP2025068095W WO2026017381A1 WO 2026017381 A1 WO2026017381 A1 WO 2026017381A1 EP 2025068095 W EP2025068095 W EP 2025068095W WO 2026017381 A1 WO2026017381 A1 WO 2026017381A1
Authority
WO
WIPO (PCT)
Prior art keywords
polypeptide
seq
polynucleotide
protease
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2025/068095
Other languages
French (fr)
Inventor
Thomas Krogh KALLEHAUGE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Novozymes AS
Original Assignee
Novozymes AS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novozymes AS filed Critical Novozymes AS
Publication of WO2026017381A1 publication Critical patent/WO2026017381A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N1/00Microorganisms, e.g. protozoa; Compositions thereof; Processes of propagating, maintaining or preserving microorganisms or compositions thereof; Processes of preparing or isolating a composition containing a microorganism; Culture media therefor
    • C12N1/20Bacteria; Culture media therefor
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P3/00Drugs for disorders of the metabolism
    • A61P3/04Anorexiants; Antiobesity agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P3/00Drugs for disorders of the metabolism
    • A61P3/08Drugs for disorders of the metabolism for glucose homeostasis
    • A61P3/10Drugs for disorders of the metabolism for glucose homeostasis for hyperglycaemia, e.g. antidiabetics
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/575Hormones
    • C07K14/605Glucagons
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/48Hydrolases (3) acting on peptide bonds (3.4)
    • C12N9/50Proteinases, e.g. Endopeptidases (3.4.21-3.4.25)
    • C12N9/52Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from bacteria or Archaea
    • C12N9/54Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from bacteria or Archaea bacteria being Bacillus
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y304/00Hydrolases acting on peptide bonds, i.e. peptidases (3.4)
    • C12Y304/21Serine endopeptidases (3.4.21)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/07Bacillus
    • C12R2001/10Bacillus licheniformis

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Wood Science & Technology (AREA)
  • Diabetes (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Microbiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Veterinary Medicine (AREA)
  • Endocrinology (AREA)
  • Hematology (AREA)
  • Obesity (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Emergency Medicine (AREA)
  • Child & Adolescent Psychology (AREA)
  • Toxicology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Virology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present invention relates to nucleic acid constructs comprising a first polynucleotide encoding a signal peptide, and a second polynucleotide encoding a glucagon like-1 peptide (GLP- 1) receptor agonist; expression vectors and host cells comprising said nucleic acid constructs; methods for producing a GLP-1 receptor agonist; and fusion proteins comprising the signal peptide and a GLP-1 receptor agonist. The present invention also relates to Bacillus cells with reduced protease activities and to methods of producing recombinant proteins in said Bacillus cells.

Description

OPTIMIZED BACILLUS HOST CELLS
Reference to a Sequence Listing
This application contains a Sequence Listing in computer readable form, which is incorporated herein by reference.
Background of the Invention
Field of the Invention
The present invention relates to nucleic acid constructs comprising a first polynucleotide encoding a signal peptide, and a second polynucleotide encoding a glucagon like-1 peptide (GLP- 1) receptor agonist; expression vectors and host cells comprising said nucleic acid constructs; methods for producing a GLP-1 receptor agonist; and fusion proteins comprising the signal peptide and a GLP-1 receptor agonist. The present invention also relates to Bacillus cells with reduced protease activities and to methods of producing recombinant proteins in said Bacillus cells.
Description of the Related Art
Product development in pharmaceutical and industrial biotechnology includes a continuous challenge to increase recombinant protein yields at large scale to reduce costs. Two major approaches have been used for this purpose in the last decades. The first one is based on classical mutagenesis and screening. Here, the specific genetic modification is not predefined, and the main requirement is a screening assay that is sensitive to detect increments in yield. High-throughput screening enables large numbers of mutants to be screened in search for the desired phenotype, i.e., higher recombinant protein yields. The second approach includes numerous strategies ranging from the use of stronger promoters and multi-copy strains to ensure high expression of the gene of interest to the use of codon-optimized gene sequences to aid translation. However, high-level production of a given protein may in turn trigger several bottlenecks in the cellular machinery for secretion of the enzyme of interest into the medium, emphasizing the need for further optimization strategies.
Signal peptides (SPs) are short amino acid sequences present in the amino terminus of many newly synthesized polypeptides that target these into or across cellular membranes, thereby aiding maturation and secretion. The amino acid sequence of the SP influences secretion efficiency and thereby the yield of the polypeptide manufacturing process. Bioinformatic tools such as SignalP and SignalP5 can predict SPs from amino acid sequences, but most cannot distinguish between various types of SPs (Armenteros et al., Nat. Biotechnol. 37: 420-423, 2019). Moreover, a large degree of redundancy in the amino acid sequence of SPs makes it difficult to predict the efficiency of any given SP for production of recombinant proteins at industrial scale. Hence, SP selection is an important step for manufacturing of recombinant proteins, but the optimal combination of signal peptide and mature protein is very context dependent and highly difficult to predict.
Thus, there is a need for identifying signal peptides which are related to increased protein yield.
Even if protein expression bottlenecks are solved, protein stability during and after secretion is often a further challenge as secreted proteins are not sufficiently stabilized in the supernatant during fermentation. Thus, there is a further need to stabilize the protein(s) of interest, in particular proteins of interest which show low stability in Bacillus expression systems.
Summary of the Invention
The present invention provides signal peptides for increased GLP-1 receptor agonist (GLP-1) expression, and Bacillus protease deletion strains facilitating increased expression of polypeptides of interest which are highly sensitive to native proteases of Bacillus, including GLP- 1.
Accordingly, in a 1st aspect the present invention relates to nucleic acid constructs comprising a first polynucleotide encoding a signal peptide comprising or consisting of an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18; and a second polynucleotide encoding a polypeptide of interest, wherein the polypeptide of interest comprises or consists of a glucagon like peptide-1 (GLP-1) receptor agonist, and wherein the first polynucleotide and the second polynucleotide are operably linked in translational fusion.
As shown in the Examples and Figs. 1 and 2, the identified signal peptides resulted in a significant increase in GLP-1 expression in Bacillus.
In a 2nd aspect, the invention relates to expression vectors comprising a nucleic acid construct according to the 1st aspect.
In a 3rd aspect, the invention relates to Bacillus cells comprising in its genome one or more nucleic acid construct according to the 1st aspect; and/or one or more expression vector according to the 2nd aspect.
In a 4th aspect, the invention relates to mutants of a parent Bacillus strain comprising in its genome a heterologous promoter operably linked to a polynucleotide encoding a polypeptide of interest; and comprising in its genome one or more protease genes selected from the group consisting of: alkaline protease (apr ), glu-specific protease (mpr ), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), a second minor extracellular serine protease (vpr), a Cell-wall associated protease (wprA), and an intracellular serine protease (ispA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
As shown in the examples and Fig. 3 these protease modifications are required to express and stabilize protease-sensitive proteins of interest (e.g. GLP-1) in Bacillus host cells.
In a 5th aspect, the invention relates to methods of producing a polypeptide of interest, the method comprising cultivating a host cell according to the 3rd aspect and/or 4th aspect under conditions conducive for production of the polypeptide of interest; and optionally recovering the polypeptide of interest.
In a 6th aspect, the invention relates to fusion polypeptides comprising: a) a signal peptide comprising or consisting of:
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18, and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 1 , 16, or 18, wherein the three-dimensional structure is calculated by Alphafold; and b) a GLP-1 receptor agonist comprising or consisting of
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold. In a 7th aspect, the invention relates to a composition, cell composition, or fermentation broth comprising the fusion polypeptide of the 6th aspect.
Brief Description of the Figures
Figure 1 shows GLP-1 expression using various signal peptides.
Figure 2 shows increased GLP-1 expression using the 1 E6 signal peptide of SEQ ID NO:1.
Figure 3 shows expression and stability of GLP-1 expressed in d2, d6, and d7 protease deletion strains.
Definitions
In accordance with this detailed description, the following definitions apply. Note that the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise.
Unless defined otherwise or clearly indicated by context, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. alkaline protease: The term “alkaline protease” refers to a serine-type endopeptidase (EC.3.4.21.-), also known as a serine protease, which is an endo-peptide protease and is highly active in a neutral-to-alkaline pH range. In Bacillus, the alkaline protease is encoded by aprL A non-limiting example for a Bac/7/us licheniformis aprL gene sequence is shown in SEQ ID NO: 8. glu-specific protease: The term “glu-specific protease” means a glutamic-acid-specific endopeptidase, within the group of RP-II proteases (Residual Protease II), including C-component proteases, belonging to the protease family S1 B. Bacillus proteases of the RP-II type are serine proteases that in primary structure are similar to chymotrypsin. The glu-specific protease is encoded by mprL. Non-limiting examples of the mprL sequence include SEQ ID NO: 1 of US8563289, and SEQ ID NO: 9 of the present application. bacillopeptidase F: The term “bacillopeptidase F” refers to a serine protease (E.C.3.4.21.-), also known as serine proteinase, esterase, or RP-I protease, with a size of about 90 kDa. In Sac/7/t/s, the alkaline protease is encoded by bprAS. A non-limiting example for a Bacillus licheniformis bprAB gene sequence is shown in SEQ ID NO: 10. first minor extracellular serine protease: The term “first mino extracellular serine protease” refers to a secreted serine endopeptidase (E.C. 3.4.21 .-) encoded by epr. A non-limiting example for a Bacillus licheniformis epr gene sequence is shown in SEQ ID NO: 11. second minor extracellular serine protease: The term “first mino extracellular serine protease” refers to a secreted serine endopeptidase (E.C. 3.4.21.-) encoded by vpr. A non-limiting example for a Bacillus licheniformis vpr gene sequence is shown in SEQ ID NO: 12. Cell-wall associated protease: The term “cell-wall associated protease” means a serine- type cell-wall associated protease (E.C. 3.4.21.-) encoded by wprA. A non-limiting example for a Bacillus licheniformis wpr gene sequence is shown in SEQ I D NO: 13. intracellular serine protease: The term “intracellular serine protease” means a serine protease which is an intracellular member of the subtilisin family of serine proteases. Typically, the intracellular serine protease has a size of around 34 kDa. The intracellular serine protease is belonging to the peptidase S8 family and is encoded by ispA. A non-limiting example for a Bacillus licheniformis ispA ene sequence is shown in SEQ ID NO: 14.
Aib: The term “aib” refers to 2-aminoisobutyric acid (also known as a-aminoisobutyric acid, AIB, a-methylalanine, or 2-methylalanine) and is the non-proteinogenic amino acid with the structural formula H2N-C(CH3)2-COOH. Aib substitution at position 2 of the GLP-1 with SEQ ID NO: 5 enhances enzymatic stability, i.e. , by protecting GLP-1 from enzymatic degradation by DPP-4.
AlphaFold structure prediction: AlphaFold 2 is a computational method for calculating the three-dimensional structure of a polypeptide from its amino acid sequence (Jumper et al., 2021 , Nature 596: 583-589). Predicted structures for millions of polypeptides deposited in the UniProt database have been deposited in the AlphaFold Protein Structure Database, using the AlphaFold Monomer v2.0 algorithm (Varadi et al., 2021 , Nucleic Acids Res. 50(D1):D439-D444). In the AlphaFold Protein Structure Database, the three-dimensional structure of a polypeptide can be obtained by searching for the UniProt accession number of the polypeptide.
In addition to the many three-dimensional structures that are already publicly available, code is available for reproducing and predicting structures of new polypeptides at source code repositories such as Github.com under deepmind/alphafold/, using notebooks/AlphaFold.ipynb, which uses AlphaFold v2.3.1 or newer. Additionally, it can be found in Github.com under sokrypton/ColabFold using v1.5.2 or newer, using AlphaFold2.ipynb. For technical details, please see Jumper et al. (vide supra).
AlphaFold 2 produces a per-residue estimate of its confidence on a scale from 0 to 100. This confidence measure is called pLDDT and corresponds to the model’s predicted score on the IDDT-Ca metric. It is stored in the B-factor fields of the mmCIF and PDB files available for download (although unlike a B-factor, higher pLDDT is better). Regions with pLDDT score of more than 90 are expected to be modelled to high accuracy. These should be suitable for any application that benefits from high accuracy (e.g., characterization of binding sites). Regions with a pLDDT score between 70 and 90 are expected to be modelled well, corresponding to a generally good backbone prediction. cDNA: The term "cDNA" means a DNA molecule that can be prepared by reverse transcription from a mature, spliced, mRNA molecule obtained from a eukaryotic or prokaryotic cell. cDNA lacks intron sequences that may be present in the corresponding genomic DNA. The initial, primary RNA transcript is a precursor to mRNA that is processed through a series of steps, including splicing, before appearing as mature spliced mRNA.
Coding sequence: The term “coding sequence” means a polynucleotide, which directly specifies the amino acid sequence of a polypeptide. The boundaries of the coding sequence are generally determined by an open reading frame, which begins with a start codon, such as ATG, GTG, or TTG, and ends with a stop codon, such as TAA, TAG, or TGA. The coding sequence may be a genomic DNA, cDNA, synthetic DNA, or a combination thereof.
Control sequences: The term “control sequences” means nucleic acid sequences involved in regulation of expression of a polynucleotide in a specific organism or in vitro. Each control sequence may be native (/.e., from the same gene) or heterologous (/.e., from a different gene) to the polynucleotide encoding the polypeptide, and native or heterologous to each other. Such control sequences include, but are not limited to leader, polyadenylation, prepropeptide, propeptide, signal peptide, promoter, terminator, enhancer, and transcription or translation initiator and terminator sequences. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the polynucleotide encoding a polypeptide.
Expression: The term “expression” means any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
Expression vector: An "expression vector" refers to a linear or circular DNA construct comprising a DNA sequence encoding a polypeptide, which coding sequence is operably linked to a suitable control sequence capable of effecting expression of the DNA in a suitable host. Such control sequences may include a promoter to effect transcription, an optional operator sequence to control transcription, a sequence encoding suitable ribosome binding sites on the mRNA, enhancers and sequences which control termination of transcription and translation.
Extension: The term “extension” means an addition of one or more amino acids to the amino and/or carboxyl terminus of a polypeptide, wherein the “extended” polypeptide has GLP-1 receptor agonist activity.
Fragment: The term “fragment” means a polypeptide having one or more amino acids absent from the amino and/or carboxyl terminus of the mature polypeptide, wherein the fragment has GLP-1 receptor agonist activity.
Fusion polypeptide: The term “fusion polypeptide” is a polypeptide in which one polypeptide is fused at the N-terminus and/or the C-terminus of a polypeptide of the present invention. A fusion polypeptide is produced by fusing a polynucleotide encoding another polypeptide to a polynucleotide of the present invention, or by fusing two or more polynucleotides of the present invention together. Techniques for producing fusion polypeptides are known in the art, and include ligating the coding sequences encoding the polypeptides so that they are in frame and that expression of the fusion polypeptide is under control of the same promoter(s) and terminator. Fusion polypeptides may also be constructed using intein technology in which fusion polypeptides are created post-translationally (Cooper et al., 1993, EMBO J. 12: 2575-2583; Dawson et al., 1994, Science 266: 776-779). A fusion polypeptide can further comprise a cleavage site between the two polypeptides. Upon secretion of the fusion protein, the site is cleaved releasing the two polypeptides. Examples of cleavage sites include, but are not limited to, the sites disclosed in Martin et al., 2003, J. Ind. Microbiol. Biotechnol. 3: 568-576; Svetina et al., 2000, J. Biotechnol. 7Q: 245-251 ; Rasmussen-Wilson et al., 1997, Appl. Environ. Microbiol. 63: 3488-3493; Ward et al., 1995, Biotechnology 13: 498-503; and Contreras et al., 1991 , Biotechnology 9: 378-381 ; Eaton etal., 1986, Biochemistry 25: 505-512; Collins-Racie et al., 1995, Biotechnology 13: 982-987; Carter et al., 1989, Proteins: Structure, Function, and Genetics 6: 240-248; and Stevens, 2003, Drug Discovery World 4: 35-48.
GLP-1 receptor agonist: The term “GLP-1 receptor agonist” or “GLP-1” means a glucagon like peptide-1 receptor agonist in form of a polypeptide.
The term “GLP-1 receptor agonist” synonymous to "GLP-1 peptide" and “GLP-1” as used herein refers to a compound comprising a peptide and which, when active, fully or partially activates the human GLP-1 receptor. In some embodiments the GLP-1 peptide is a GLP- 1 analogue, optionally comprising one substituent. The term "analogue" as used herein referring to a GLP-1 peptide (hereafter "peptide") means a peptide wherein at least one amino acid residue of the peptide has been substituted with another amino acid residue and/or wherein at least one amino acid residue has been deleted from the peptide and/or wherein at least one amino acid residue has been added to the peptide and/or wherein at least one amino acid residue of the peptide has been modified. Such addition or deletion of amino acid residues may take place at the N-terminal of the peptide and/or at the C-terminal of the peptide. In some embodiments a simple nomenclature is used to describe the GLP-1 peptide, e.g., [Aib2] GLP-1 designates an analogue of the wildtype GLP-1 of SEQ ID NO:3 wherein the naturally occurring Ala in position 2 has been substituted with Aib. In some embodiments the GLP-1 peptide comprises a maximum of twelve, such as a maximum of 10, 8 or 6, amino acids which have been alterered, e.g., by substitution, deletion, insertion and/or modification, compared to e.g. GLP-1 of SEQ ID NO:3. In some embodiments the analogue comprises up to 10 substitutions, deletions, additions and/or insertions, such as up to 9 substitutions, deletions, additions and/or insertions, up to 8 substitutions, deletions, additions and/or insertions, up to 7 substitutions, deletions, additions and/or insertions, up to 6 substitutions, deletions, additions and/or insertions, up to 5 substitutions, deletions, additions and/or insertions, up to 4 substitutions, deletions, additions and/or insertions or up to 3 substitutions, deletions, additions and/or insertions, compared to e.g. GLP-1 of SEQ ID NO:3. Unless otherwise stated the GLP-1 comprises only L-amino acids. In some embodiments the term “GLP-1 receptor agonist” or "GLP-1 analogue" or "analogue of GLP-1" or “GLP-1” as used herein refers to a peptide, or a compound, which is a variant of the human Glucagon-Like Peptide-1 of SEQ ID NO:3. Wildtype human GLP-1 has the sequence HAEGTFTSDVSSYLEGQAAKEFIAWLVKGRG (SEQ ID NO: 3). In some embodiments the term "variant" refers to a compound which comprises one or more amino acid substitutions, deletions, additions and/or insertions. In some embodiments the GLP-1 peptide exhibits at least 60 percent, 65 percent, 70 percent, 80 percent or 90 percent sequence identity to SEQ ID NO: 3 over the entire length of GLP-1 SEQ ID NO:3.
The concentration of GLP-1 peptide may be determined using any suitable method. For example, LC-MS (Liquid Chromatography Mass Spectroscopy) may be used, or immunoassays such as RIA (Radio Immuno Assay), ELISA (Enzyme-Linked Immuno Sorbent Assay), and LOCI (Luminescence Oxygen Channeling Immunoasssay). General protocols for suitable RIA and ELISA assays are found in, e.g., WO 2009/030738 on p. 1 16- 118.
In some embodiments the GLP-1 peptide is liraglutide (SEQ ID NO: 4), or truncated liraglutide (SEQ ID NO:6). Liraglutide is Arg34,Lys26- (N-epsilon-(gamma-L-glutamyl(N-alfa- hexadecanoyl)))-GLP-1 (7-37), also known as N26- (hexadecanoyl-Y-glutamyle)-[34- arginine]GLP-1-(7-37)-peptide (WHO Drug Information Vol. 17, No. 2, 2003).
In some embodiments the GLP-1 peptide is semaglutide (SEQ ID NO: 5).
In some embodiments the GLP-1 peptide is truncated, e.g., the truncated GLP-1 of SEQ ID NO: 6, or the truncated GLP-1 of SEQ ID NO: 7.
Heterologous: The term "heterologous" means, with respect to a host cell, that a polypeptide or nucleic acid does not naturally occur in the host cell. The term "heterologous" means, with respect to a polypeptide or nucleic acid, that a control sequence, e.g., promoter, of a polypeptide or nucleic acid is not naturally associated with the polypeptide or nucleic acid, i.e., the control sequence is from a gene other than the gene encoding the mature polypeptide.
Host Strain or Host Cell: A "host strain" or "host cell" is an organism into which an expression vector, phage, virus, or other DNA construct, including a polynucleotide encoding a polypeptide of interest (e.g., an amylase) has been introduced. Exemplary host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing the polypeptide of interest and/or fermenting saccharides. The term "host cell" includes protoplasts created from cells.
Introduced: The term "introduced" in the context of inserting a nucleic acid sequence into a cell, means "transfection", "transformation" or "transduction," as known in the art.
Isolated: The term “isolated” means a polypeptide, nucleic acid, cell, or other specified material or component that has been separated from at least one other material or component, including but not limited to, other proteins, nucleic acids, cells, etc. An isolated polypeptide, nucleic acid, cell or other material is thus in a form that does not occur in nature. An isolated polypeptide includes, but is not limited to, a culture broth containing the secreted polypeptide expressed in a host cell.
Linker: The term “linker” means a polypeptide connecting the polypeptide of interest with the signal peptide. Typically, the linker is upstream of the polypeptide of interest and downstream of the signal peptide. Preferably the linker is a cleavable linker. Where the linker is a cleavable linker, the linker and the signal peptide can be removed from the polypeptide of interest after fermentation, e.g., by enzymatic cleavage, leaving a matured polypeptide of interest. A nonlimiting example for a cleavable linker is the linker of SEQ ID NO: 2. The linker of SEQ ID NO: 2 comprises an enterokinase cleavage site and may be cleaved as described by the methods described in WO15091613.
Mature polypeptide: The term “mature polypeptide” means a polypeptide in its mature form following N-terminal and/or C-terminal processing (e.g., removal of signal peptide, and/or removal of the linker). In one aspect, the mature polypeptide is SEQ ID NO: 3. In one aspect, the mature polypeptide is SEQ ID NO: 4. In one aspect, the mature polypeptide is SEQ ID NO: 5. In one aspect, the mature polypeptide is SEQ ID NO: 6. In one aspect, the mature polypeptide is SEQ ID NO: 7.
Mature polypeptide coding sequence: The term “mature polypeptide coding sequence” means a polynucleotide that encodes a mature polypeptide having GLP-1 receptor agonist activity.
Native: The term "native" means a nucleic acid or polypeptide naturally occurring in a host cell.
Nucleic acid: The term "nucleic acid" encompasses DNA, RNA, heteroduplexes, and synthetic molecules capable of encoding a polypeptide. Nucleic acids may be single stranded or double stranded, and may be chemical modifications. The terms "nucleic acid" and "polynucleotide" are used interchangeably. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present compositions and methods encompass nucleotide sequences that encode a particular amino acid sequence. Unless otherwise indicated, nucleic acid sequences are presented in 5'-to-3' orientation.
Nucleic acid construct: The term "nucleic acid construct" means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which is synthetic, and which comprises one or more control sequences operably linked to the nucleic acid sequence.
Operably linked: The term "operably linked" means that specified components are in a relationship (including but not limited to juxtaposition) permitting them to function in an intended manner. For example, a regulatory sequence is operably linked to a coding sequence such that expression of the coding sequence is under control of the regulatory sequence. Purified: The term “purified” means a nucleic acid, polypeptide or cell that is substantially free from other components as determined by analytical techniques well known in the art (e.g., a purified polypeptide or nucleic acid may form a discrete band in an electrophoretic gel, chromatographic eluate, and/or a media subjected to density gradient centrifugation). A purified nucleic acid or polypeptide is at least about 50% pure, usually at least about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, about 99.5%, about 99.6%, about 99.7%, about 99.8% or more pure (e.g., percent by weight or on a molar basis). In a related sense, a composition is enriched for a molecule when there is a substantial increase in the concentration of the molecule after application of a purification or enrichment technique. The term "enriched" refers to a compound, polypeptide, cell, nucleic acid, amino acid, or other specified material or component that is present in a composition at a relative or absolute concentration that is higher than a starting composition.
In one aspect, the term "purified" as used herein refers to the polypeptide or cell being essentially free from components (especially insoluble components) from the production organism. In other aspects, the term "purified" refers to the polypeptide being essentially free of insoluble components (especially insoluble components) from the native organism from which it is obtained. In one aspect, the polypeptide is separated from some of the soluble components of the organism and culture medium from which it is recovered. The polypeptide may be purified (/.e., separated) by one or more of the unit operations filtration, precipitation, or chromatography.
Accordingly, the polypeptide may be purified such that only minor amounts of other proteins, in particular, other polypeptides, are present. The term "purified" as used herein may refer to removal of other components, particularly other proteins and most particularly other enzymes present in the cell of origin of the polypeptide. The polypeptide may be "substantially pure", i.e., free from other components from the organism in which it is produced, e.g., a host organism for recombinantly produced polypeptide. In one aspect, the polypeptide is at least 40% pure by weight of the total polypeptide material present in the preparation. In one aspect, the polypeptide is at least 50%, 60%, 70%, 80% or 90% pure by weight of the total polypeptide material present in the preparation. As used herein, a "substantially pure polypeptide" may denote a polypeptide preparation that contains at most 10%, preferably at most 8%, more preferably at most 6%, more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1%, and even most preferably at most 0.5% by weight of other polypeptide material with which the polypeptide is natively or recombinantly associated.
It is, therefore, preferred that the substantially pure polypeptide is at least 92% pure, preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 97% pure, more preferably at least 98% pure, even more preferably at least 99% pure, most preferably at least 99.5% pure by weight of the total polypeptide material present in the preparation. The polypeptide of the present invention is preferably in a substantially pure form (/.e., the preparation is essentially free of other polypeptide material with which it is natively or recombinantly associated). This can be accomplished, for example by preparing the polypeptide by well-known recombinant methods or by classical purification methods.
Recombinant: The term "recombinant" is used in its conventional meaning to refer to the manipulation, e.g., cutting and rejoining, of nucleic acid sequences to form constellations different from those found in nature. The term recombinant refers to a cell, nucleic acid, polypeptide or vector that has been modified from its native state. Thus, for example, recombinant cells express genes that are not found within the native (non-recombinant) form of the cell, or express native genes at different levels or under different conditions than found in nature. The term “recombinant” is synonymous with “genetically modified” and “transgenic”.
Recover: The terms "recover" or “recovery” means the removal of a polypeptide or fusion polypeptide from at least one fermentation broth component selected from the list of a cell, a nucleic acid, or other specified material, e.g., recovery of the polypeptide from the whole fermentation broth, or from the cell-free fermentation broth, by polypeptide crystal harvest, by filtration, e.g. depth filtration (by use of filter aids or packed filter medias, cloth filtration in chamber filters, rotary-drum filtration, drum filtration, rotary vacuum-drum filters, candle filters, horizontal leaf filters or similar, using sheed or pad filtration in framed or modular setups) or membrane filtration (using sheet filtration, module filtration, candle filtration, microfiltration, ultrafiltration in either cross flow, dynamic cross flow or dead end operation), or by centrifugation (using decanter centrifuges, disc stack centrifuges, hyrdo cyclones or similar), or by precipitating the polypeptide and using relevant solid-liquid separation methods to harvest the polypeptide from the broth media by use of classification separation by particle sizes. Recovery encompasses isolation and/or purification of the polypeptide.
Sequence difference: The term "sequence difference" means the percent of amino acid differences between a polypeptide and the polypeptide, e.g., the polypeptide of SEQ ID NO: 1 , and is calculated as follows:
(Different Residues x 100)/(Length of SEQ ID NO: 1) wherein the different residues comprise any substitution, deletion, or insertion (e.g., an extension at the N-terminus and/or C-terminus) in the sequence.
Sequence identity: The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter “sequence identity”.
Method 1 :
For purposes of the present invention, the sequence identity between two amino acid sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. In order for the Needle program to report the longest identity, the -nobrief option must be specified in the command line. The output of Needle labeled “longest identity” is calculated as follows:
(Identical Residues x 100)/(Length of Alignment - Total Number of Gaps in Alignment)
For purposes of the present invention, the sequence identity between two polynucleotide sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, supra), preferably version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NLIC4.4) substitution matrix. In order for the Needle program to report the longest identity, the nobrief option must be specified in the command line. The output of Needle labeled “longest identity” is calculated as follows:
(Identical Deoxyribonucleotides x 100)/(Length of Alignment- Total Number of Gaps in Alignment)
Signal Peptide: A "signal peptide" is a sequence of amino acids attached to the N- terminal portion of a protein, which facilitates the secretion of the protein outside the cell. The mature form of an extracellular protein lacks the signal peptide, which is usually cleaved off during the secretion process. However, in some cases a portion of the signal peptide (or the full length signal peptide) may still be attached to a fraction of the polypeptide of interest.
Structural Similarity: For purposes of the present invention, the relatedness between the three-dimensional structure of two polypeptides is described by the parameter “structural similarity”.
A three-dimensional structure of any polypeptide may be obtained experimentally via, e.g., X-ray crystallography or using in silico methods such as AlphaFold 2 (vide supra). The structural similarity between three-dimensional structures may then be determined by the TM-score, which is calculated using the following general formula (Zhang & Skolnick, 2004, Proteins 57:702-710): where LN is the length of the native structure, LT is the length of the aligned residues to the template structure, dj is the distance between pair / of aligned residues and do is a scale to normalize the match difference. ‘Max’ denotes the maximum value after optimal spatial superposition. For the purposes of the present invention, LN is the length of the reference polypeptide:
TM score
A structural alignment of the three-dimensional structures of two polypeptides is necessary before the TM-score can be calculated. This is achieved via algorithms that optimize the structural overlap, and several methods are available, such as CEalign (Shindyalov and Bourne, 1998, Protein Eng., 11 :739-747), DALI (Holm and Sander, 1995, Trends Biochem. Sci., 20:478-480), or TM-align (Zhang and Skolnick, 2005, Nucleic Acids Res. 33(7):2302-2309).
For the purposes of the present invention, TM-align is applied. For convenience, TM-score is integrated in the TM-align software, which is available from the author’s website (zhanggroup.org/TM-score/). The version of TM-align is preferably updated 2019-08-22 or later, and the TM-score between a reference and a query protein is determined by running this command:
TMalign <query.pdb> <reference.pdb> -L <length of reference>
Where <query.pdb> is the name of the PDB file containing coordinates of the query polypeptide, <reference.pdb> is the name of the PDB file containing coordinates of the reference polypeptide. The TM-score is calculated and reported in the output, along with several other parameters from the alignment.
The maximal TM-score is 1 , e.g., 1.0, corresponding to identical three-dimensional structures.
Subsequence: The term “subsequence” means a polynucleotide having one or more nucleotides absent from the 5' and/or 3' end of a mature polypeptide coding sequence; wherein the subsequence encodes a fragment having GLP-1 receptor agonist activity.
Variant: The term “variant” means a polypeptide having GLP-1 receptor agonist activity comprising a man-made mutation, i.e., a substitution, insertion (including extension), and/or deletion (e.g., truncation), at one or more positions.
The term “variant” also means a signal peptide having secretional activity comprising a man-made mutation, i.e., a substitution, insertion (including extension), and/or deletion (e.g., truncation), at one or more positions.
A substitution means replacement of the amino acid occupying a position with a different amino acid; a deletion means removal of the amino acid occupying a position; and an insertion means adding 1-5 amino acids (e.g., 1-3 amino acids, in particular, 1 amino acid) adjacent to and immediately following the amino acid occupying a position.
Wild-type: The term "wild-type" in reference to an amino acid sequence or nucleic acid sequence means that the amino acid sequence or nucleic acid sequence is a native or naturally- occurring sequence. As used herein, the term "naturally-occurring" refers to anything (e.g., proteins, amino acids, or nucleic acid sequences) that is found in nature. Conversely, the term "non-naturally occurring" refers to anything that is not found in nature (e.g., recombinant nucleic acids and protein sequences produced in the laboratory or modification of the wild-type sequence).
Detailed Description of the Invention
Signal Peptide and Propeptide
The present invention relates to a first polynucleotide encoding a signal peptide comprising or consisting of amino acids 1 to 28 of SEQ ID NO: 1.
The present invention also relates to a first polynucleotide encoding a signal peptide comprising or consisting of amino acids 1 to 29 of SEQ ID NO: 16.
The present invention also relates to a first polynucleotide encoding a signal peptide comprising or consisting of amino acids 1 to 27 of SEQ ID NO: 18.
The polynucleotides further comprise a second polynucleotide encoding a GLP-1 receptor agonist, which is operably linked to the signal peptide. The GLP-1 receptor agonist is preferably heterologous to the signal peptide.
The present invention also relates to nucleic acid constructs, expression vectors and recombinant host cells comprising such polynucleotides.
The present invention also relates to methods of producing a GLP-1 receptor agonist, comprising (a) cultivating a recombinant host cell comprising such polynucleotide; and optionally (b) recovering the GLP-1 receptor agonist protein.
Preferably, the GLP-1 receptor agonist protein is heterologous to a host cell. The term “protein” is not meant herein to refer to a specific length of the encoded product and, therefore, encompasses peptides, oligopeptides, and polypeptides. The term “protein” also encompasses two or more polypeptides combined to form the encoded product. The proteins also include hybrid polypeptides and fusion polypeptides.
Removal or Reduction of Protease Activity
The present invention also relates to methods of producing a Bacillus mutant of a parent cell, which comprises disrupting or deleting one or more polynucleotides, or a portion thereof, encoding a native protease, which results in the mutant cell producing less of the native protease than the parent cell when cultivated under the same conditions.
The mutant cell may be constructed by reducing or eliminating expression of the one or more polynucleotide using methods well known in the art, for example, one or more nucleotide insertions, one or more gene disruptions, one or more nucleotide replacements, or one or more nucleotide deletions. The one or more polynucleotide to be modified or inactivated may be, for example, the coding region or a part thereof essential for activity, or a regulatory or control element required for expression of the coding region, e.g., a functional part of a promoter sequence, and/or a regulatory or control element required for the transcription or translation of the polynucleotide. Other control sequences for possible modification include, but are not limited to, a leader, polyadenylation sequence, propeptide sequence, signal peptide sequence, transcription terminator, and transcriptional activator.
Modification or inactivation of the one or more polynucleotide may be performed by subjecting the parent cell to mutagenesis and selecting for mutant cells in which expression of the polynucleotide has been reduced or eliminated. The mutagenesis, which may be specific or random, may be performed, for example, by use of a suitable physical or chemical mutagenizing agent, by use of a suitable oligonucleotide, or by subjecting the DNA sequence to PCR generated mutagenesis. Furthermore, the mutagenesis may be performed by use of any combination of these mutagenizing agents.
Examples of a physical or chemical mutagenizing agent include ultraviolet (UV) irradiation, hydroxylamine, N-methyl-N'-nitro-N-nitrosoguanidine (MNNG), O-methyl hydroxylamine, nitrous acid, ethyl methane sulphonate (EMS), sodium bisulphite, formic acid, and nucleotide analogues (see J. L. Bose, Springer Protocols 2016, Methods in Molecular Biology, The Genetic Manipulation of Staphylococci).
Additionally or alternatively, nucleotides may be inserted or removed so as to result in the introduction of a stop codon, the removal of the start codon, or a change in the open reading frame. Such modification or inactivation may be accomplished by site-directed mutagenesis or PCR generated mutagenesis in accordance with methods known in the art, or by targeted gene editing using one or more nucleases, e.g., zinc-finger nucleases or CRISPR-associated nucleases. Additionally or alternatively, the modification or inactivation may be achieved by gene silencing, genetic repression, genetic activation, and/or post-translational mutagenesis, e.g., by methods employing non-coding RNA, RNAi, siRNA, miRNA, ribozymes, catalytically inactive nucleases, CRISPRi, nucleotide methylation, and/or histone acetylation. The modification may be transient and/or reversible, irreversible and/or stable, or the modification may be dependent on chemical inducers or dependent on cultivation conditions, such as the cultivation temperature.
The modification may be performed in vivo, i.e., directly on the cell expressing the polynucleotide to be modified, or the modification be performed in vitro.
An example of a convenient way to eliminate or reduce expression of a polynucleotide is based on techniques of gene replacement, gene deletion, or gene disruption. For example, in the gene disruption method, a nucleic acid sequence corresponding to the endogenous polynucleotide is mutagenized in vitro to produce a defective nucleic acid sequence that is then transformed into the parent cell to produce a defective gene. By homologous recombination, the defective nucleic acid sequence replaces the endogenous polynucleotide. It may be desirable that the defective polynucleotide also encodes a marker that may be used for selection of transformants in which the polynucleotide has been modified or destroyed. In an aspect, the polynucleotide is disrupted with a selectable marker such as those described herein.
The present invention further relates to a mutant cell of a parent cell that comprises a disruption or deletion of one or more polynucleotide encoding a Bacillus protease or a control sequence thereof or a silenced gene encoding the protease, which results in the mutant cell producing less of the protease or no protease compared to the parent cell.
The protease-deficient mutant cells are useful as host cells for expression of native and heterologous polypeptides. Therefore, the present invention further relates to methods of producing a native or heterologous polypeptide, comprising (a) cultivating the mutant cell under conditions conducive for production of the polypeptide; and (b) recovering the polypeptide. The term "heterologous polypeptides" means polypeptides that are not native to the host cell, e.g., a variant of a native protein. The host cell may comprise more than one copy of a polynucleotide encoding the native or heterologous polypeptide.
The methods of the present invention for producing an essentially protease-free product are of interest in the production of polypeptides, e.g., proteins such as enzymes. The proteasedeficient cells may also be used to express heterologous proteins of pharmaceutical interest such as hormones, growth factors, receptors, and the like, and in particular GLP-1.
In some embodiments, the present invention relates to a protein product essentially free from protease activity that is produced by a method of the present invention.
In one embodiment, the invention relates to a Bacillus host cell comprising in its genome a heterologous promoter operably linked to a polynucleotide encoding a polypeptide of interest, wherein protease activity of each of the native proteases encoded by aprL, mprL, bprAB, epr, vpr, and wprA is reduced or eliminated.
In one particular embodiment, the protease activity of each of the native proteases encoded by aprL, mprL, bprAB, epr, vpr, wprA, and ispA is reduced or eliminated.
In one embodiment, the protease activity is reduced or eleminated by one or more mutation in one or more of aprL, mprL, bprAB, epr, vpr, wprA, and ispA, e.g., by substitution, insertion, or deletion of one or more nucleotides; and/or wherein protease activity is reduced or eliminated by full and/or partial deletion of one or more of aprL, mprL, bprAB, epr, vpr, wprA, and ispA.
Polynucleotides
The present invention also relates to first polynucleotides encoding a signal of the present invention, as described herein. The polynucleotide may be a genomic DNA, a cDNA, a synthetic DNA, a synthetic RNA, a mRNA, or a combination thereof. The polynucleotide may be cloned from a strain of Bacillus, or a related organism and thus, for example, may be a polynucleotide sequence encoding a variant of the polypeptide of the invention.
The first polynucleotide may be cloned from a strain of Bacillus amyloliquefaciens, Bacillus licheniformis, Bacillus subtilis, or Bacillus clausii.
In one embodiment the polynucleotide encoding the signal peptide of the present invention is isolated from a Bacillus cell.
The polynucleotide may also be mutated by introduction of nucleotide substitutions that do not result in a change in the amino acid sequence of the polypeptide, but which correspond to the codon usage of the host organism intended for production of the enzyme, or by introduction of nucleotide substitutions that may give rise to a different amino acid sequence. For a general description of nucleotide substitution, see, e.g., Ford et al., 1991 , Protein Expression and Purification 2: 95-107.
In an aspect, the polynucleotide is isolated.
In another aspect, the polynucleotide is purified.
Nucleic Acid Constructs
In a 1st aspect, the invention relates to nucleic acid constructs comprising: a first polynucleotide encoding a signal peptide comprising or consisting of an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18; and a second polynucleotide encoding a polypeptide of interest, wherein the polypeptide of interest comprises or consists of a glucagon like peptide-1 (GLP-1) receptor agonist, and wherein the first polynucleotide and the second polynucleotide are operably linked in translational fusion.
In one embodiment the signal peptide comprises or consists of: an amino acid sequence having at least 85%, e.g., at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1.
In one embodiment the signal peptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 1 .
In one embodiment the signal peptide comprises or consists of: an amino acid sequence having at least 85%, e.g., at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 16, or 18.
In one embodiment the signal peptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 16, or 18.
In one embodiment the signal peptide comprises or consists of: a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 1 , wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the signal peptide comprises or consists of: a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 16, or 18, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the GLP-1 receptor agonist comprises or consists of a) a polypeptide derived from SEQ ID NO: 3, 4, 5, 6 or 7, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the signal peptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 1 , 16, or 18, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or (c) a fragment of the polypeptide of a), or b).
In one embodiment the construct is further comprising a promoter, wherein the promoter is heterologous to the first polynucleotide and heterologous to the second polynucleotide, and wherein the promoter is operably linked to the first polynucleotide.
In one embodiment the promoter comprises or consists of a P3 promoter, or a P3-based promoter.
In one embodiment the promoter is a tandem promoter comprising the P3 promoter, or is a tandem promoter derived from the P3 promoter.
In one embodiment the promoter comprises or consists of a nucleic acid sequence having a sequence identity of least 80%, e.g. at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%, to SEQ ID NO: 15, preferably the promoter comprises, consists essentially of, or consists of SEQ ID NO: 15.
In one embodiment the construct is further comprising a third polynucleotide downstream of the first polynucleotide and upstream of the second polynucleotide, wherein the third polynucleotide is encoding a linker, and wherein the first, second, and third polynucleotide are operably linked in translational fusion.
In one embodiment the linker is a cleavable linker.
In one embodiment the linker comprises or consists of: an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 2.
In one embodiment the linker comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 2.
In one embodiment the linker comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 2, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the GLP-1 receptor agonist comprises or consists of: an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NOs: 3, 4, 5, 6, or 7. In one embodiment the GLP-1 receptor agonist comprises or consists of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7.
In one embodiment the GLP-1 receptor agonist comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the GLP-1 receptor agonist is a human GLP-1 receptor agonist, a fragment thereof, or a variant thereof.
In one embodiment the construct is isolated.
In one embodiment the construct is purified.
In one embodiment sequence identity is determined by Sequence Identity Determination Method 1.
The present invention also relates to nucleic acid constructs comprising a polynucleotide of the present invention, wherein the polynucleotide is operably linked to one or more control sequences that direct the expression of the coding sequence in a suitable Bacillus host cell under conditions compatible with the control sequences.
The polynucleotide may be manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of the polynucleotide prior to its insertion into a vector may be desirable or necessary depending on the expression vector. Techniques for modifying polynucleotides utilizing recombinant DNA methods are well known in the art.
Promoters
The control sequence may be a promoter, a polynucleotide that is recognized by a host cell for expression of a polynucleotide encoding a polypeptide of the present invention. The promoter contains transcriptional control sequences that mediate the expression of the polypeptide. The promoter may be any polynucleotide that shows transcriptional activity in the host cell including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
Examples of suitable promoters for directing transcription of the polynucleotide of the present invention in a bacterial host cell are described in Sambrook et al. , 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Lab., NY, Davis et al., 2012, supra, and Song et al., 2016, PLOS One 11(7): e0158447. Terminators
The control sequence may also be a transcription terminator, which is recognized by a host cell to terminate transcription. The terminator is operably linked to the 3’-terminus of the polynucleotide encoding the polypeptide. Any terminator that is functional in the host cell may be used in the present invention.
Preferred terminators for bacterial host cells may be obtained from the genes for Bacillus clausii alkaline protease (aprH), Bacillus licheniformis alpha-amylase (amyL), and Escherichia coli ribosomal RNA (rrnB). mRNA Stabilizers
The control sequence may also be an mRNA stabilizer region downstream of a promoter and upstream of the coding sequence of a gene which increases expression of the gene.
Examples of suitable mRNA stabilizer regions are obtained from a Bacillus thuringiensis crylllA gene (WO 94/25612) and a Bacillus subtilis SP82 gene (Hue etal., 1995, J. Bacterid. 177: 3465-3471).
Examples of mRNA stabilizer regions for fungal cells are described in Geisberg et al., 2014, Cell 156(4): 812-824, and in Morozov et al., 2006, Eukaryotic Ce// 5(11): 1838-1846.
Leader Sequences
The control sequence may also be a leader, a non-translated region of an mRNA that is important for translation by the host cell. The leader is operably linked to the 5’-terminus of the polynucleotide encoding the polypeptide. Any leader that is functional in the host cell may be used.
Suitable leaders for bacterial host cells are described by Hambraeus et al., 2000, Microbiology 146(12): 3051-3059, and by Kaberdin and Blasi, 2006, FEMS Microbiol. Rev. 30(6): 967-979.
Polyadenylation Sequences
The control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3’-terminus of the polynucleotide which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence that is functional in the host cell may be used.
Signal Peptides
The control sequence may also be a signal peptide coding region that encodes a signal peptide linked to the N-terminus of a polypeptide and directs the polypeptide into the cell’s secretory pathway. The 5’-end of the coding sequence of the polynucleotide may inherently contain a signal peptide coding sequence naturally linked in translation reading frame with the segment of the coding sequence that encodes the polypeptide. Alternatively, the 5’-end of the coding sequence may contain a signal peptide coding sequence that is heterologous to the coding sequence. A heterologous signal peptide coding sequence may be required where the coding sequence does not naturally contain a signal peptide coding sequence. Alternatively, a heterologous signal peptide coding sequence may simply replace the natural signal peptide coding sequence to enhance secretion of the polypeptide.
Propeptides
The control sequence may also be a propeptide coding sequence that encodes a propeptide positioned at the N-terminus of a polypeptide. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally inactive and can be converted to an active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding sequence may be obtained from the genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), Myceliophthora thermophila laccase (WO 95/33836), Rhizomucor miehei aspartic proteinase, and Saccharomyces cerevisiae alpha-factor.
Where both signal peptide and propeptide sequences are present, the propeptide sequence is positioned next to the N-terminus of a polypeptide and the signal peptide sequence is positioned next to the N-terminus of the propeptide sequence. Additionally or alternatively, when both signal peptide and propeptide sequences are present, the polypeptide may comprise only a part of the signal peptide sequence and/or only a part of the propeptide sequence. Alternatively, the final or isolated polypeptide may comprise a mixture of mature polypeptides and polypeptides which comprise, either partly or in full length, a propeptide sequence and/or a signal peptide sequence.
Expression Vectors
In a 2nd aspect, the invention relates to expression vectors comprising a nucleic acid construct according to the 1st aspect.
The present invention also relates to recombinant expression vectors comprising a polynucleotide of the present invention, a promoter, and transcriptional and translational stop signals. The various nucleotide and control sequences may be joined together to produce a recombinant expression vector that may include one or more convenient restriction sites to allow for insertion or substitution of the polynucleotide encoding the polypeptide at such sites. Alternatively, the polynucleotide may be expressed by inserting the polynucleotide or a nucleic acid construct comprising the polynucleotide into an appropriate vector for expression. In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.
The recombinant expression vector may be any vector (e.g., a plasmid or virus) that can be conveniently subjected to recombinant DNA procedures and can bring about expression of the polynucleotide. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vector may be a linear or closed circular plasmid.
The vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self-replication. Alternatively, the vector may be one that, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. Furthermore, a single vector or plasmid or two or more vectors or plasmids that together contain the total DNA to be introduced into the genome of the host cell, or a transposon, may be used.
The vector preferably contains one or more selectable markers that permit easy selection of transformed, transfected, transduced, or the like cells. A selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
The vector preferably contains at least one element that permits integration of the vector into the host cell's genome or autonomous replication of the vector in the cell independent of the genome.
For integration into the host cell genome, the vector may rely on the polynucleotide’s sequence encoding the polypeptide or any other element of the vector for integration into the genome by homologous recombination, such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
For autonomous replication, the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question. The origin of replication may be any plasmid replicator mediating autonomous replication that functions in a cell. The term “origin of replication” or “plasmid replicator” means a polynucleotide that enables a plasmid or vector to replicate in vivo.
More than one copy of a polynucleotide of the present invention may be inserted into a host cell to increase production of a polypeptide. For example, 2 or 3 or 4 or 5 or more copies are inserted into a host cell. An increase in the copy number of the polynucleotide can be obtained by integrating at least one additional copy of the sequence into the host cell genome or by including an amplifiable selectable marker gene with the polynucleotide where cells containing amplified copies of the selectable marker gene, and thereby additional copies of the polynucleotide, can be selected for by cultivating the cells in the presence of the appropriate selectable agent.
Host Cells
The present invention also relates to recombinant host cells, comprising a first polynucleotide of the present invention operably linked to one or more control sequences that direct the production of a GLP-1.
In a 3rd aspect, the invention also relates to Bacillus cells comprising in its genome one or more nucleic acid construct according to the 1st aspect; and/or one or more expression vector according to the 2nd aspect.
In one embodiment the cell comprises at least 2 copies of the one or more nucleic acid construct and/or one or more expression vector, e.g., at least 3 copies, at least 4 copies, or at least 5 copies. In one embodiment the cell is a mutant of a parent Bacillus strain, and comprising one or more protease genes selected from the group consisting of: alkaline protease (aprL), glu-specific protease (mprL), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), a second minor extracellular serine protease (vpr), a Cell-wall associated protease (wprA), and an intracellular serine protease (ispA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment the at least six modifications comprise modification of aprL, mprL, bprAB, epr, vpr, and wprA.
In one embodiment at least six proteases selected from the list of alkaline protease, glu- specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, and a Cell-wall associated protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment at least seven proteases selected from the list of alkaline protease, a glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, a Cell-wall associated protease, and an intracellular serine protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment the mutant is deficient in the production of at least six proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, and Cell-wall associated protease.
In one embodiment the mutant is deficient in the production of at least 7 proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, Cell-wall associated protease, and intracellular serine protease.
In one embodiment at least 7 modifications comprise modification of aprL, mprL, bprAB, epr, vpr, wprA, and ispA.
In one embodiment the mutant is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, and Cell-wall associated protease compared to the parent Bacillus strain, when cultivated under identical conditions.
In one embodiment the mutant is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, Cell-wall associated protease, and intracellular serine protease compared to the parent Bacillus strain, when cultivated under identical conditions.
In one embodiment the aprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID
NO: 8.
In one embodiment the mprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 9.
In one embodiment the bprAB gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 10.
In one embodiment the epr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 11 .
In one embodiment the vpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 12.
In one embodiment the wprgene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 13.
In one embodiment the ispA gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID
NO: 14. In one embodiment yield of the polypeptide of interest is increased in the mutant compared to the parent Bacillus cell when cultivated under identical conditions.
In one embodiment the yield of the polypeptide of interest is increased at least 5%, e.g., at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 105%, at least 110%, at least 115%, at least 120%, at least 125%, at least 130%, at least 135%, at least 140%, at least 145%, at least 150%, at least 155%, at least 160%, at least 165%, at least 170%, at least 175%, at least 180%, at least 185%, at least 190%, at least 195%, at least 200%, at least 205%, at least 210%, at least 215%, at least 220%, at least 225%, at least 230%, at least 235%, at least 240%, at least 245%, or at least 250%, relative to the yield of the parent Bacillus cell.
In one embodiment the yield of the polypeptide of interest is increased at least 100%, e.g., at least 200%, relative to the yield of the parent Bacillus cell.
The host cell may be any Bacillus cell useful in the recombinant production of a polypeptide of interest.
The bacterial host cell may be any Bacillus cell including, but not limited to, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus thuringiensis cells. In an embodiment, the Bacillus cell is a Bacillus amyloliquefaciens, Bacillus licheniformis and Bacillus subtilis cell.
For purposes of this invention, Bacillus classes/genera/species shall be defined as described in Patel and Gupta, 2020, Int. J. Syst. Evol. Microbiol. 70: 406-438.
A construct or vector comprising a polynucleotide is introduced into a host cell so that the construct or vector is maintained as a chromosomal integrant or as a self-replicating extra- chromosomal vector as described earlier. The choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source. The polypeptide can be native or heterologous to the recombinant host cell. Also, at least one of the one or more control sequences can be heterologous to the polynucleotide encoding the polypeptide. The recombinant host cell may comprise a single copy, or at least two copies, e.g., three, four, five, or more copies of the polynucleotide of the present invention.
In a 4th aspect, the invention also relates to mutants of a parent Bacillus strain comprising in its genome a heterologous promoter operably linked to a polynucleotide encoding a polypeptide of interest, and comprising in its genome one or more protease genes selected from the group consisting of: alkaline protease (aprL), glu-specific protease (mprL), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), a second minor extracellular serine protease (vpr), a Cell-wall associated protease (wprA), and an intracellular serine protease (/spA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment the at least six modifications comprise modification of aprL, mprL, bprAB, epr, vpr, and wprA.
In one embodiment at least six proteases selected from the list of alkaline protease, glu- specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, and a Cell-wall associated protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment at least seven proteases selected from the list of alkaline protease, a glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, a Cell-wall associated protease, and an intracellular serine protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment the mutant is deficient in the production of at least six proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, and a Cell-wall associated protease.
In one embodiment the mutant is deficient in the production of at least 7 proteases selected from the list of alkaline protease, a glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, a Cell-wall associated protease, and an intracellular serine protease.
In one embodiment the at least 7 modifications comprise modification of aprL, mprL, bprAB, epr, vpr, wprA, and ispA.
In one embodiment the mutant is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, and Cell-wall associated protease compared to the parent Bacillus strain, when cultivated under identical conditions.
In one embodiment the mutant is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, Cell-wall associated protease, and intracellular serine protease compared to the parent Bacillus strain, when cultivated under identical conditions.
In one embodiment protease activity of each of the native proteases encoded by aprL, mprL, bprAB, epr, vpr, and wprA is reduced or eliminated.
In one embodiment protease activity of each of the native proteases encoded by aprL, mprL, bprAB, epr, vpr, wprA, and ispA is reduced or eliminated.
In one embodiment the protease activity is reduced or eleminated by substitution, insertion, or deletion of one or more nucleotides; and/or wherein protease activity is reduced or eliminated by full and/or partial gene deletion.
In one embodiment the mutant comprises at least 2 copies of the heterologous promoter operably linked to the polynucleotide encoding the polypeptide of interest, e.g., at least 3 copies, at least 4 copies, or at least 5 copies.
In one embodiment the polynucleotide encoding the polypeptide of interest is operably linked to a polynucleotide encoding a signal peptide.
In one embodiment the polynucleotide encoding the polypeptide of interest and/or the polynucleotide encoding the signal peptide is operably linked to one or more promoter.
In one embodiment the aprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 8. In one embodiment the mprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 9. In one embodiment the bprAB gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 10.
In one embodiment the epr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 11.
In one embodiment the vpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 12.
In one embodiment the wpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 13.
In one embodiment the ispA gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 14.
In one embodiment the cell or mutant is further comprising one or more of the genes selected from the list of amyl, bgIC, catL, cypX, ford, ggt, gntP, lacA2, sacB, spollAC, and xylA, wherein one or more of the genes are modified rendering their respective gene products (e.g. polypeptides) truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment the cell or mutant is further comprising one or more of the genes selected from the list of amyl, bgIC, catL, cypX, ford, ggt, gntP, lacA2, sacB, spollAC, and xylA, wherein 11 of the one or more genes are modified rendering their respective gene products (e.g. polypeptides) truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
In one embodiment yield of the polypeptide of interest is increased compared to the parent Bacillus cell when cultivated under identical conditions.
In one embodiment the yield of the polypeptide of interest is increased at least 5%, e.g., at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 105%, at least 110%, at least 115%, at least 120%, at least 125%, at least 130%, at least 135%, at least 140%, at least 145%, at least 150%, at least 155%, at least 160%, at least 165%, at least 170%, at least 175%, at least 180%, at least 185%, at least 190%, at least 195%, at least 200%, at least 205%, at least 210%, at least 215%, at least 220%, at least 225%, at least 230%, at least 235%, at least 240%, at least 245%, or at least 250%, relative to the yield of the parent Bacillus cell.
In one embodiment the yield of the polypeptide of interest is increased at least 100%, e.g. at least 200%, relative to the yield of the parent Bacillus cell.
In one embodiment the cell or mutant is a Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, or Bacillus thuringiensis cell.
In one embodiment the cell or mutant is a Bacillus subtilis, or a Bacillus licheniformis. In one embodiment the polypeptide of interest is heterologous to the Bacillus cell.
In one embodiment the polypeptide of interest comprises or consists of a proteasesensitive polypeptide.
In one embodiment the polypeptide of interest comprises or consists of an enzyme, e.g., an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, betagalactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase; or a therapeutic polypeptide, e.g. a therapeutic polypeptide selected from the list of a GLP-1 receptor agonist, an antibody, an antibody fragment, an antibody-based drug, a Fc fusion protein, an anticoagulant, a blood factor, a bone morphogenetic protein, an engineered protein scaffold, an enzyme, a growth factor, a blood clotting factor, a hormone, an interferon (such as an interferon alpha-2b), an interleukin, a lactoferrin, an alpha-lactalbumin, a beta-lactalbumin, an ovomucoid, an ovostatin, a cytokine, an obestatin, a human galactosidase (such as an human alphagalactosidase A), a vaccine, a protein vaccine, and a thrombolytic.
In one embodiment the polypeptide of interest comprises or consists of a GLP-1 receptor agonist, a lactase, an amylase, or an alternansucrase.
In one embodiment polypeptide of interest comprises or consists of a GLP-1 receptor agonist, said GLP-1 receptor agonist comprising or consisting of: a) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or b) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the polypeptide of interest comprises or consists of a fusion polypeptide according to the 6th aspect.
In one embodiment the polypeptide of interest does not comprise a cutinase, and/or does not comprise a disperin, e.g., a disperin having hexosaminidase activity.
In one embodiment at least one of the one or more control sequences is heterologous to the polynucleotide encoding the polypeptide of interest.
In one embodiment the cell or mutant is isolated.
In one embodiment the cell or mutant is purified.
Methods for introducing DNA into prokaryotic host cells are well-known in the art, and any suitable method can be used including but not limited to protoplast transformation, competent cell transformation, electroporation, conjugation, transduction, with DNA introduced as linearized or as circular polynucleotide. Persons skilled in the art will be readily able to identify a suitable method for introducing DNA into a given prokaryotic cell depending, e.g., on the genus. Methods for introducing DNA into prokaryotic host cells are for example described in Heinze et al., 2018, BMC Microbiology 18:56, Burke et al., 2001 , Proc. Natl. Acad. Sci. USA 98: 6289-6294, Choi et al., 2006, J. Microbiol. Methods 64: 391-397, and Donald et al., 2013, J. Bacteriol. 195(11): 2612- 2620.
Methods of Production
In a 5th aspect, the invention relates to methods of producing a polypeptide of interest, the method comprising cultivating a host cell or mutant according to the 3rd aspect and/or 4th aspect under conditions conducive for production of the polypeptide of interest; and optionally recovering the polypeptide of interest.
The present invention also relates to methods of producing a polypeptide interest, comprising (a) cultivating a mutant according to the 3rd aspect, under conditions conducive for production of the polypeptide of interest; and optionally, (b) recovering the polypeptide of interest. In one aspect, the cell is a Bacillus cell. In another aspect, the cell is a Bacillus licheniformis cell. In another aspect, the cell is a Bacillus subtilis cell.
The host cell is cultivated in a nutrient medium suitable for production of the polypeptide using methods known in the art.
In one embodiment step a) comprises cultivation in a cultivation medium comprising 2- aminoisobutyric acid (aib).
For example, the cell may be cultivated by shake flask cultivation, or small-scale or large- scale fermentation (including continuous, batch, fed-batch, or solid-state, and/or microcarrierbased fermentations) in laboratory or industrial fermentors in a suitable medium and under conditions allowing the polypeptide to be expressed and/or isolated.
In one embodiment the host cell or mutant is cultivated in continuous fermentation, batch fermentation, or fed-batch fermentation.
In one embodiment the host cell or mutant is cultivated in continuous fermentation, preferably wherein the continuous fermentation has a duration of at least 96 hours, e.g., at least 120 hours, at least 144 hours, at least 168 hours, at least 192 hours, at least 216 hours, at least 240 hours, at least 264 hours, at least 288 hours, at least 312 hours, or at least 336 hours.
Suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it can be recovered from cell lysates.
The polypeptide may be detected using methods known in the art that are specific for the polypeptide, including, but not limited to, the use of specific antibodies, formation of an enzyme product, disappearance of an enzyme substrate, or an assay determining the relative or specific activity of the polypeptide.
The polypeptide may be recovered from the medium using methods known in the art, including, but not limited to, collection, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. In one aspect, a whole fermentation broth comprising the polypeptide is recovered. In another aspect, a cell-free fermentation broth comprising the polypeptide is recovered.
The polypeptide may be purified by a variety of procedures known in the art to obtain substantially pure polypeptides and/or polypeptide fragments (see, e.g., Wingfield, 2015, Current Protocols in Protein Science’, 80(1): 6.1.1-6.1.35; Labrou, 2014, Protein Downstream Processing, 1129: 3-10).
In an alternative aspect, the polypeptide is not recovered. Fusion Polypeptides
In a 6th aspect, the invention relates to fusion polypeptides comprising: a) a signal peptide comprising or consisting of:
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18, and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 1 , 16, or 18, wherein the three-dimensional structure is calculated by Alphafold; and b) a GLP-1 receptor agonist comprising or consisting of
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the GLP-1 receptor agonist comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7.
In one embodiment the signal peptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ I D NO: 1 , 16, or 18.
In one embodiment the GLP-1 receptor agonist comprises or consists of a) a polypeptide derived from SEQ ID NO: 3, 4, 5, 6, or 7, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the signal peptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 1 , 16, or 18, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polypeptide comprises a linker.
In one embodiment the linker is located at the N-terminal end of the signal peptide, and preferably at the C-terminal end of the GLP-1 receptor agonist.
In one embodiment the linker is a cleavable linker, e.g., for cleavage with a protease.
In one embodiment the linker comprises or consists of: an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 2.
In one embodiment the linker comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 2.
In one embodiment the linker comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 2, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the linker comprises or consists of a) a polypeptide derived from SEQ ID NO: 2, by having 1-15 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 alterations, in particular substitutions; (b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30. In one embodiment the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the fusion polypeptide comprises or consists of a) a polypeptide derived from any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 19.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 19.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 19, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 19, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 22.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 22.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 22, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 22, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 25.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 25.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 25, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 25, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 28.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 28.
In one embodiment the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 28, wherein the three-dimensional structure is calculated by Alphafold.
In one embodiment the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 28, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions; (b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
In one embodiment the fusion polyeptide is isolated.
In one embodiment the fusion polypeptide is purified.
In one embodiment sequence identity of the fusion polypeptide is determined by Sequence Identity Determination Method 1.
The fusion polypeptide may have an N-terminal and/or C-terminal extension of one or more amino acids, e.g., 1-5 amino acids.
In another aspect, the fusion polypeptide is a fragment containing at least 31 amino acid residues (e.g., amino acids 43 to 73 of SEQ ID NO: 19, 22, 25, or 28), at least 29 amino acid residues (e.g., amino acids 43 to 71 of SEQ ID NO: 19, 22, 25, or 28), or at least 28 amino acid residues (e.g., amino acids 44 to 71 of SEQ ID NO: 19, 22, 25, or 28).
In another aspect, the fusion polypeptide has at most 5%, e.g., at most 4%, at most 3%, at most 2%, or at most 1% sequence differences to the polypeptide of any one of SEQ ID NO: 19,
20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 wherein the fusion polypeptide has GLP-1 receptor agonist activity.
In some embodiments, the fusion polypeptide has a molecular weight of 3.5-10 kD, e.g., 4-8 kD, such as 4.5-7 kD.
In another aspect, the fusion polypeptide is derived from any one of SEQ ID NO: 19, 20,
21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 by substitution, deletion or addition of one or several amino acids. In another aspect, the fusion polypeptide is derived from a mature polypeptide of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 by substitution, deletion or addition of one or several amino acids. In another aspect, the fusion polypeptide is derived from any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 by substitution, deletion or addition of one or more amino acids. In some embodiments, the fusion polypeptide is a variant of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 comprising a substitution, deletion, and/or insertion at one or more positions. In one aspect, the number of amino acid substitutions, deletions and/or insertions introduced into the fusion polypeptide of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15. The amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding and/or activity of the protein; small deletions, typically of 1-30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a poly-histidine tract, an antigenic epitope or a binding module. Essential amino acids in a polypeptide can be identified according to procedures known in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and Wells, 1989, Science 244: 1081-1085). In the latter technique, single alanine mutations are introduced at every residue in the molecule, and the resultant molecules are tested for GLP-1 receptor agonist activity to identify amino acid residues that are critical to the activity of the molecule. See also, Hilton et al., 1996, J. Biol. Chem. 271 : 4699-4708. The active site of the enzyme or other biological interaction can also be determined by physical analysis of structure, as determined by such techniques as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in conjunction with mutation of putative contact site amino acids. See, for example, de Vos et al., 1992, Science 255: 306-312; Smith et al., 1992, J. Mol. Biol. 224: 899-904; Wlodaver et al., 1992, FEBS Lett. 309: 59-64. The identity of essential amino acids can also be inferred from an alignment with a related polypeptide, and/or be inferred from sequence homology and conserved catalytic machinery with a related polypeptide or within a polypeptide or protein family with polypeptides/proteins descending from a common ancestor, typically having similar three-dimensional structures, functions, and significant sequence similarity. Additionally or alternatively, protein structure prediction tools can be used for protein structure modelling to identify essential amino acids and/or active sites of polypeptides. See, for example, Jumper et al., 2021 , “Highly accurate protein structure prediction with AlphaFold”, Nature 596: 583-589.
Single or multiple amino acid substitutions, deletions, and/or insertions can be made and tested using known methods of mutagenesis, recombination, and/or shuffling, followed by a relevant screening procedure, such as those disclosed by Reidhaar-Olson and Sauer, 1988, Science 241 : 53-57; Bowie and Sauer, 1989, Proc. Natl. Acad. Sci. USA 86: 2152-2156; WO 95/17413; or WO 95/22625. Other methods that can be used include error-prone PCR, phage display (e.g., Lowman eta!., 1991 , Biochemistry 30: 10832-10837; US 5,223,409; WO 92/06204), and region-directed mutagenesis (Derbyshire et al., 1986, Gene 46: 145; Ner et a/., 1988, DNA 7: 127).
Mutagenesis/shuffling methods can be combined with high-throughput, automated screening methods to detect activity of cloned, mutagenized polypeptides expressed by host cells (Ness et al., 1999, Nature Biotechnology 17: 893-896). Mutagenized DNA molecules that encode active polypeptides can be recovered from the host cells and rapidly sequenced using standard methods in the art. These methods allow the rapid determination of the importance of individual amino acid residues in a polypeptide.
In one aspect, the invention relates to polynucleotides encoding the fusion polypeptide of the 6th aspect. In one embodiment the polynucleotide is isolated. In one embodiment, the polynucleotide is purified. Fermentation Broth Formulations or Cell Compositions
In a 7th aspect, the invention relates to a composition, cell composition, or fermentation broth comprising the fusion polypeptide of the 6th aspect. The composition may comprise an enzyme stabilizer (examples of which include polyols such as propylene glycol or glycerol, sugar or sugar alcohol, lactic acid, reversible protease inhibitor, boric acid, or a boric acid derivative, e.g., an aromatic borate ester, or a phenyl boronic acid derivative such as 4-formylphenyl boronic acid).
The present invention also relates to a fermentation broth formulation or a cell composition comprising a fusion polypeptide of 6th aspect. The fermentation broth formulation or the cell composition further comprises additional ingredients used in the fermentation process, such as, for example, cells (including, the host cells containing the gene encoding the fusion polypeptide which are used to produce the polypeptide of interest), cell debris, biomass, fermentation media and/or fermentation products. In some embodiments, the composition is a cell-killed whole broth containing organic acid(s), killed cells and/or cell debris, and culture medium.
The term "fermentation broth" as used herein refers to a preparation produced by cellular fermentation that undergoes no or minimal recovery and/or purification. For example, fermentation broths are produced when microbial cultures are grown to saturation, incubated under carbon-limiting conditions to allow protein synthesis (e.g., expression of enzymes by host cells) and secretion into cell culture medium. The fermentation broth can contain unfractionated or fractionated contents of the fermentation materials derived at the end of the fermentation. Typically, the fermentation broth is unfractionated and comprises the spent culture medium and cell debris present after the microbial cells (e.g., Bacillus cells) are removed, e.g., by centrifugation. In some embodiments, the fermentation broth contains spent cell culture medium, extracellular enzymes, and viable and/or nonviable microbial cells.
In some embodiments, the fermentation broth formulation or the cell composition comprises a first organic acid component comprising at least one 1-5 carbon organic acid and/or a salt thereof and a second organic acid component comprising at least one 6 or more carbon organic acid and/or a salt thereof. In some embodiments, the first organic acid component is acetic acid, formic acid, propionic acid, a salt thereof, or a mixture of two or more of the foregoing and the second organic acid component is benzoic acid, cyclohexanecarboxylic acid, 4-methylvaleric acid, phenylacetic acid, a salt thereof, or a mixture of two or more of the foregoing.
In one aspect, the composition contains an organic acid(s), and optionally further contains killed cells and/or cell debris. In some embodiments, the killed cells and/or cell debris are removed from a cell-killed whole broth to provide a composition that is free of these components.
The fermentation broth formulation or cell composition may further comprise a preservative and/or anti-microbial (e.g., bacteriostatic) agent, including, but not limited to, sorbitol, sodium chloride, potassium sorbate, and others known in the art. The cell-killed whole broth or cell composition may contain the unfractionated contents of the fermentation materials derived at the end of the fermentation. Typically, the cell-killed whole broth or cell composition contains the spent culture medium and cell debris present after the microbial cells (e.g., Bacillus cells) are grown to saturation, incubated under carbon-limiting conditions to allow protein synthesis. In some embodiments, the cell-killed whole broth or cell composition contains the spent cell culture medium, extracellular enzymes, and killed Bacillus cells. In some embodiments, the microbial cells present in the cell-killed whole broth or composition can be permeabilized and/or lysed using methods known in the art.
A whole broth or cell composition as described herein is typically a liquid, but may contain insoluble components, such as killed cells, cell debris, culture media components, and/or insoluble enzyme(s). In some embodiments, insoluble components may be removed to provide a clarified liquid composition.
The whole broth formulations and cell compositions of the present invention may be produced by a method described in WO 90/15861 or WO 2010/096673.
The present invention is further described by the following examples that should not be construed as limiting the scope of the invention.
Examples
Molecular biological methods
DNA manipulations and transformations were performed by standard molecular biology methods as described in:
• Sambrook et al. (1989): Molecular cloning: A laboratory manual. Cold Spring Harbor laboratory, Cold Spring Harbor, NY.
• Ausubel et al. (eds) (1995): Current protocols in Molecular Biology. John Wiley and Sons.
• Harwood and Cutting (eds) (1990): Molecular Biological Methods for Bacillus. John Wiley and Sons.
Enzymes for DNA manipulation were obtained from New England Biolabs, Inc. and used essentially as recommended by the supplier.
Direct transformation into B. licheniformis was done as previously described in US 2019/0185847 A1. Conjugation into B. licheniformis was performed as described in WO 2018/077796 A1.
Genomic DNA was prepared by using the commercially available QIAamp DNA Blood Kit from Qiagen. The respective DNA fragments were amplified by PCR using the Phusion Hot Start DNA Polymerase system (Thermo Scientific). PCR amplification reaction mixtures contained 1 L (0,1 pg) of template DNA, 1 L of sense primer (20 pmol/pL), 1 L of anti-sense primer (20 pmol/pL), 10pL of 5X PCR buffer with 7,5mM MgCh, 8pL of dNTP mix (1 ,25 mM each), 39pL water, and 0.5pL (2 II) DNA polymerase. A thermocycler was used to amplify the fragment. The PCR products were purified from a 1.2% agarose gel with 1x TBE buffer using the Qiagen QIAquick Gel Extraction Kit (Qiagen, Inc., Valencia, CA) according to the manufacturer's instructions.
The condition for POE-PCR was as follows: purified PCR products were used in a subsequent PCR reaction to create a single fragment using splice overlapping PCR (SOE) using the Phusion Hot Start DNA Polymerase system (Thermo Scientific) as follows. The very 5’ end fragment and the very 3’ end fragment have complementary end which will allow the SOE to concatemer into the POE PCR product. The PCR amplification reaction mixture contained 50 ng of each of the three gel purified PCR products. POE PCR was performed as described in (You, C et a/ (2017) Methods Mol. Biol. 116, 183-92).
Media
Bacillus strains were grown on LB agar (10g/L Tryptone, 5g/L yeast extract, 5g/L NaCI, 15g/L agar) plates or in TY liquid medium (20g/L T ryptone, 5g/L yeast extract, 7mg/L FeCl2, 1 mg/L MnCl2, 15mg/L MgCl2). To select for erythromycin resistance, agar and liquid media were supplemented with 5pg/ml erythromycin.
LB agar: 10 g/l peptone from casein; 5 g/l yeast extract, 10 g/l sodium chloride; 12 g/l Bacto-agar adjusted to pH 7.0 +/- 0.2. Premix from Merck was used (LB-agar (Miller) 110283).
Fermentations
For Biolector fermentations (example 2 to 4), strains were fermented in flower plates (MTP-48-B) in 1mL TY media for24hrat 37C, 1000 rpm in the Biolector (m2p-labs) as a preculture. The cultivation plates were inoculated from an over-night culture grown in a M-tube in 10mL TY media. The flower plates were inoculated to a OD(450nm) of 0,05 and fermented with nutrient controlled media for 72hrs.
SDS-PAGE electrophoresis
Raw supernatant from the Biolector fermentation was prepared for SDS electrophoresis as follows: 75 pL raw supernatant was mixed with 25 pL NuPAGE™ LDS Sample Buffer (4X) plus 10 pL NuPAGE™ Sample Reducing Agent and boiled for 5min. 15 pL was loaded onto NuPAGE Bis-Tris gel.
GLP-1 assay LC-MS (Liquid Chromatography Mass Spectroscopy) was used to quantify and detect the GLP-1 receptor agonist. Suitable LC-MS protocols are disclosed in WO19243502 (e.g., pages 25-28). The concentration of GLP-1 receptor agonist may be determined using any suitable method. For example, LC-MS may be used, or immunoassays such as RIA (Radio Immuno Assay), ELISA (Enzyme-Linked Immuno Sorbent Assay), and LOCI (Luminescence Oxygen Channeling Immunoasssay). General protocols for suitable RIA and ELISA assays are found in, e.g., WO 2009/030738 on pages 116- 118.
Strains
The parental B. licheniformis strains used in the following examples comprise one or more protease deletions in the genes encoding alkaline protease (encoded by aprL gene with SEQ ID NO: 8), Glu- specific protease (encoded by mprL gene with SEQ ID NO: 9), bacillopeptidase F (encoded by bprAB gene with SEQ ID NO: 10), minor extracellular serine proteases (encoded by epr gene with SEQ ID NO: 11 , and encoded by vpr gene with SEQ ID NO: 12), Cell-wall associated protease (encoded by wprA gene with SEQ ID NO: 13), and/or intracellular serine protease (encoded by the ispA gene with SEQ ID NO: 14).
The d2 strain has deletions in the genes aprL and mprL.
The d3 strain has deletions in the genes ispA, aprL and mprL.
The d6 strain has deletions in the genes aprL, mprL, epr, vpr, wprA, and bprAB.
The d7 strain has 7 deletions in the genes aprL, mprL, epr, vpr, wprA, bprAB, and ispA.
All parental strains are described in Table 1.
Table 1. Bacillus licheniformis protease deletion lineages (parental strains)
Table 2. Bacillus licheniformis strain overview (generated from parental strains of Table 1) Example 1 : Generating strains expressing GLP-1 for the evaluation of protease deletion requirements
The GLP-1 expressing strains were constructed on the basis of the parental strains of the d2, d6 or d7 lineage described in Table 1. To do so, synthetic DNA was ordered comprising an open reading frame encoding the aprL signal peptide of SEQ ID NO: 16 and the GLP-1 of SEQ ID NO: 7 under control of the triple promoter P3 (SEQ ID NO: 15, and as described in WO 99/43835). A POE was generated consisting of the synthetic DNA and plasmid elements needed to generate a phit donor. Integration of such donor plasmids into attB sites using an integrase is described in WO 2006042548 A1. The resulting plasmid was transformed into a Bacillus subtilis donor and conjugated into the either the d2, d6 or d7 strain linage from Table 1 creating the following strains: BT11246, BT11247, BT11248, BT11249, and BT11255 shown in Table 2.
Example 2: Generating strains expressing GLP-1 for the evaluation of productivity
The GLP-1 expressing strains were constructed on the basis of the d7 lineage described in Table 1. To do so, synthetic DNA was ordered containing the open reading frame encoding GLP-1 (either the GLP-1 with SEQ ID NO: 7 or the GLP-1 with SEQ ID NO: 3) combined with one of the polynucleotide sequence encoding the either the signal peptide SPaprL (SEQ ID NO: 16), SPamyL (SEQ ID NO: 17), SPaprH (SEQ ID NO: 1), or SPII3 1 E6 (SEQ ID NO: 1). For all constructs, the GLP-1 expression was under control of the triple promoter P3 (SEQ ID NO: 15, and as described in WO 99/43835).
For selected constructs (see strains BT11296, BT11297, and BT19030 of Table 1) GLP- 1 was expressed with the 14 amino acid long linker of SEQ ID NO:2 located between the SP and the GLP-1.
POEs were generated by combining the different synthetic DNAs with plasmid elements needed to generate flp-FRT donor plasmids (WO 2018/077796 A1). The resulting plasmids were transformed into a Bacillus subtilis donor and conjugated into the d7 strain linage from Table 1 creating the following strains: BT11290, BT11292, BT11293, BT11294, BT11295, BT11296, BT11297, and BT19030. These generated strains each comprised 3 copies of the GLP-1 expression cassette.
Example 3: GLP-1 expressed in B. licheniformis using various signal peptides
Expression constructs for GLP-1 with various signal peptides were generated as described in Example 2. B. licheniformis strains BT11290 and BT11291 were expressing GLP-1 with SEQ ID NO: 7 (BT11290) and GLP-1 with SEQ ID NO:3 (BT11291), respectively, utilizing the signal peptide from aprH (SEQ ID NO: 18) and served as the reference constructs. Strains were fermented in a Biolector for 72 hours. Culture broth was loaded onto an SDS-PAGE gel for analysis. Expression of GLP-1 SEQ ID NO: 7 and GLP-1 SEQ ID NO:3 utilizing the signal peptide from aprH (SEQ ID NO: 18) was confirmed in Figure 1 , lane 2 and 3 indicated by the “*” as height of GLP-1 relative to the ladder in lane 1. The aprL signal peptide (SEQ ID NO: 16) in BT11292 and BT11293 increased GLP-1 expression (Figure 1 , lane 4 and 5) relative to the GLP-1 expression using the aprH SP. As also shown in Fig. 1 , no GLP-1 was detected with the amyL signal peptide (SEQ ID NO: 17) in BT11294 and BT11295 (Figure 1 , lane 6 and 7). The improved performance of the aprL signal peptide (SEQ ID NO: 16) compared to aprH signal peptide (SEQ ID NO: 18) was observed by the higher intensity band in Fig. 1 lane 1 and 2 compared to lane 3 and 4. Expression of GLP-1 using the aprL SP was detected in BT 11296 and BT 11297 (Figure 1 , lane 8 and 9). For BT 11296 and BT 11297 the increased molecular weight of the GLP-1 may be caused by the 14 AA linker and/or amino acid residues from the aprL SP which were not cleaved off during secretion.
Example 4: Signal peptide of SEQ ID NO: 1 increased GLP-1 expression
The strains BT11297 and BT19030, generated as described in Example 2 and Table 1 , were compared to another regarding GLP-1 (SEQ ID NO: 3) expression. Both strains were cultivated for 3 days in the Biolector.
As can be seen from Figure 2A, both strains expressed GLP-1 (arrow indicates height of GLP-1 relative to ladder). Additionally, as can be seen from both Figure 2A and Figure 2B strain BT19030 with the signal peptide of SEQ ID NO: 1 showed significantly increased GLP-1 expression compared to BT11297 with the aprL signal peptide. As seen in Fig. 2B the signal peptide with SEQ ID NO: 1 increased GLP-1 yield by 3,2-fold compared to the GLP-1 yield of the aprL signal peptide (SEQ ID NO: 16).
Example 5: At least six protease deletions are required for GLP-1 expression and stabilization
In Example 5 we investigated the sensitivity of GLP-1 towards Bacillus host cell proteases. To investigate the GLP-1 stability in Bacillus strains, lineages d2 (BT 11249), d6 (BT 11247), and d7 (BT11255) expressing GLP-1 (as described in Tables 1-2) were fermented in a Biolector for 5 days. Samples were taken at day 1 , 2, 4, and 5 and analyzed by SDS-PAGE (Fig. 3). GLP- 1 expression was confirmed in both the BT 11247 strain (d6 lineage) and also the BT 11255 strain (d7 lineage), see Fig. 3 lane 2-5 for BT11247, and lane 7-10 for BT11255 (GLP-1 indicated by arrow). As also shown in Fig. 3, GLP-1 expression was observed in the BT11247 (d6) at comparable levels to BT11255 (d7). Thus, actively expressed GLP-1 can be expressed and also stabilized in both a d6 and d7 linage. No GLP-1 expression/stabilization was observed from the d2 lineage in BT11249 (Fig. 3, lane 12-15) indicating that GLP-1 production is not possible in Bacillus cells without tailored protease deletion strategies as native Bacillus proteases are degrading GLP-1 during fermentation. Surprisingly, deletion of at least 6 proteases is necessary to obtain measurable levels of GLP-1 during fermentation.
Overview of sequence list
SEQ ID NO: 1 1 E6 signal peptide B. amyloliquefaciens
SEQ ID NO: 2 linker
SEQ ID NO: 3 GLP-1 receptor agonist WT
SEQ ID NO: 4 GLP-1 receptor agonist; liraglutide
SEQ ID NO: 5 GLP-1 receptor agonist; semaglutide
SEQ ID NO: 6 GLP-1 receptor agonist (truncated liraglutide)
SEQ ID NO: 7 GLP-1 receptor agonist (truncated WT)
SEQ ID NO: 8 aprL sequence B. licheniformis
SEQ ID NO: 9 mprL sequence B. licheniformis
SEQ ID NO: 10 bprAB sequence B. licheniformis
SEQ ID NO: 11 epr sequence B. licheniformis
SEQ ID NO: 12 vpr sequence B. licheniformis
SEQ ID NO: 13 wprA sequence B. licheniformis
SEQ ID NO: 14 ispA sequence B. licheniformis
SEQ ID NO: 15 P3 promoter
SEQ ID NO: 16 aprL SP B. licheniformis
SEQ ID NO: 17 amyL SP B. licheniformis
SEQ ID NO: 18 aprH SP B. clausii
SEQ ID NO: 19 1 E6 SP- linker - WT GLP-1
SEQ ID NO: 20 aprL SP- linker - WT GLP-1
SEQ ID NO: 21 aprH SP- linker - WT GLP-1
SEQ ID NO: 22 1 E6 SP- linker - liraglutide
SEQ ID NO: 23 aprL SP- linker - liraglutide
SEQ ID NO: 24 aprH SP- linker - liraglutide
SEQ ID NO: 25 1 E6 SP- linker - truncated liraglutide
SEQ ID NO: 26 aprL SP- linker - truncated liraglutide
SEQ ID NO: 27 aprH SP- linker - truncated liraglutide
SEQ ID NO: 28 1 E6 SP- linker - truncated WT
SEQ ID NO: 29 aprL SP- linker - truncated WT SEQ ID NO: 30 aprH SP- linker - truncated WT
The invention described and claimed herein is not to be limited in scope by the specific aspects herein disclosed, since these aspects are intended as illustrations of several aspects of the invention. Any equivalent aspects are intended to be within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. In the case of conflict, the present disclosure including definitions will control.
The invention is further defined by the following numbered paragraphs: A nucleic acid construct comprising: a first polynucleotide encoding a signal peptide comprising or consisting of an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18; and a second polynucleotide encoding a polypeptide of interest, wherein the polypeptide of interest comprises or consists of a glucagon like peptide-1 (GLP-1) receptor agonist, and wherein the first polynucleotide and the second polynucleotide are operably linked in translational fusion, preferably wherein the first polynucleotide and the second polynucleotide are heterologous to another. The nucleic acid construct according to paragraph 1 , wherein the signal peptide comprises or consists of: an amino acid sequence having at least 85%, e.g., at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1. The nucleic acid construct according to any one of paragraphs 1-2, wherein the signal peptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 1. The nucleic acid construct according to paragraph 1 , wherein the signal peptide comprises or consists of: an amino acid sequence having at least 85%, e.g., at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 16, or 18. The nucleic acid construct according to paragraph 1 , wherein the signal peptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 16, or 18. The nucleic acid construct according to any one of paragraphs 1-5, wherein the signal peptide comprises or consists of: a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 1 , wherein the three-dimensional structure is calculated by Alphafold. The nucleic acid construct according to any one of paragraphs 1-6, wherein the signal peptide comprises or consists of: a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 16, or 18, wherein the three-dimensional structure is calculated by Alphafold. The nucleic acid construct according to any one of the preceding paragraphs, wherein the GLP-1 receptor agonist comprises or consists of a) a polypeptide derived from SEQ ID NO: 3, 4, 5, 6 or 7, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b). The nucleic acid construct according to any one of the preceding paragraphs, wherein the signal peptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 1 , 16, or 18, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b). The nucleic acid construct according to any one of the previous paragraphs, further comprising a promoter, wherein the promoter is heterologous to the first polynucleotide and heterologous to the second polynucleotide, and wherein the promoter is operably linked to the first polynucleotide. 11. The nucleic acid construct according to any one of the previous paragraphs, wherein the promoter comprises or consists of a P3 promoter, or a P3-based promoter.
12. The nucleic acid according to paragraphs 1-11 , wherein the promoter is a tandem promoter comprising the P3 promoter, or is a tandem promoter derived from the P3 promoter.
13. The nucleic acid construct according to any one of the previous paragraphs, wherein the promoter comprises or consists of a nucleic acid sequence having a sequence identity of least 80%, e.g. at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%, to SEQ ID NO: 15, preferably the promoter comprises, consists essentially of, or consists of SEQ ID NO: 15.
14. The nucleic acid construct according to any one of the previous paragraphs, further comprising a third polynucleotide downstream of the first polynucleotide and upstream of the second polynucleotide, wherein the third polynucleotide is encoding a linker, and wherein the first, second, and third polynucleotide are operably linked in translational fusion.
15. The nucleic acid construct according to paragraph 14, wherein the linker is a cleavable linker.
16. The nucleic acid construct according to any one of paragraphs 14-15, wherein the linker comprises or consists of: an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 2.
17. The nucleic acid construct according to paragraph 16, wherein the linker comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 2.
18. The nucleic acid construct according to any one of paragraphs 14-17, wherein the linker comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 2, wherein the three- dimensional structure is calculated by Alphafold. 19. The nucleic acid construct according to any one of paragraphs 1-18, wherein the GLP-1 receptor agonist comprises or consists of: an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NOs: 3, 4, 5, 6, or 7.
20. The nucleic acid construct according to any one of paragraphs 1-19, wherein the GLP-1 receptor agonist comprises or consists of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7.
21. The nucleic acid construct according to any one of paragraphs 1-20, wherein the GLP-1 receptor agonist comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold.
22. The nucleic acid construct according to any one of paragraphs 1-21 , wherein the GLP-1 receptor agonist is a human GLP-1 receptor agonist, a fragment thereof, or a variant thereof.
23. The nucleic acid construct of any one of paragraphs 1-22, which is isolated.
24. The nucleic acid construct of any one of paragraphs 1-23, which is purified.
25. The nucleic acid construct of any one of paragraphs 1-24, wherein sequence identity is determined by Sequence Identity Determination Method 1.
26. A nucleic acid construct or expression vector comprising the nucleic acid construct of any one of paragraphs 1-25, wherein the first polynucleotide and/or the second polynucleotide is operably linked to one or more control sequences, e.g., one or more promoter, that direct the production of the GLP-1 receptor agonist in an expression host, preferably in a Bacillus host, e.g., Bacillus subtilis, or Bacillus licheniformis.
27. A Bacillus cell comprising in its genome: a) one or more nucleic acid construct according to any one of paragraphs 1 to 26; and/or b) one or more expression vector according to paragraph 26.
28. The Bacillus host cell according to paragraph 27, wherein the cell comprises at least 2 copies of the one or more nucleic acid construct and/or one or more expression vector, e.g., at least 3 copies, at least 4 copies, or at least 5 copies.
28a. The Bacillus cell according to any one of paragraphs 27-28, wherein the cell is a mutant of a parent Bacillus strain, and comprising one or more protease genes selected from the group consisting of: alkaline protease (aprL), glu-specific protease (mprL), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), a second minor extracellular serine protease (vpr), a cellwall associated protease (wprA), and an intracellular serine protease (ispA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
29. The Bacillus cell according to paragraph 28a, wherein the at least six modifications comprise modification of aprL, mprL, bprAB, epr, vpr, and wprA.
30. The Bacillus cell according to any one of paragraphs 27 to 29, wherein at least six proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, and a cellwall associated protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
31 . The Bacillus cell according to any one of paragraphs 27 to 30, wherein at least seven proteases selected from the list of alkaline protease, a glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, a cell-wall associated protease, and an intracellular serine protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
32. The Bacillus cell according to any one of paragraphs 27 to 31 , wherein the mutant is deficient in the production of at least six proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, and cell-wall associated protease.
33. The Bacillus cell according to any one of paragraphs 27 to 32, wherein the mutant is deficient in the production of at least 7 proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, cell-wall associated protease, and intracellular serine protease.
34. The Bacillus cell according to any one of paragraphs 27 to 33, wherein the at least 7 modifications comprise modification of aprL, mprL, bprAB, epr, vpr, wprA, and ispA.
35. The Bacillus cell according to any one of paragraphs 27 to 34, which is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, and Cell-wall associated protease compared to the parent Bacillus strain, when cultivated under identical conditions.
35a. The Bacillus cell according to any one of paragraphs 27 to 35, which is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, Cell-wall associated protease, and intracellular serine protease compared to the parent Bacillus strain, when cultivated under identical conditions.
36. The Bacillus cell according to any one of paragraphs 27-35a, wherein the aprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 8.
37. The Bacillus cell according to any one of paragraphs 27-36, wherein the mprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 9.
38. The Bacillus cell according to any one of paragraphs 27-37, wherein the bprAB gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 10.
39. The Bacillus cell according to any one of paragraphs 27-38, wherein the epr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 11 .
40. The Bacillus cell according to any one of paragraphs 27-39, wherein the vpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 12.
41. The Bacillus cell according to any one of paragraphs 27-40, wherein the wpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 13.
42. The Bacillus cell according to any one of paragraphs 27-41 , wherein the ispA gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 14.
43. The Bacillus cell according to any one of the previous paragraphs, wherein yield of the polypeptide of interest is increased in the mutant compared to the parent Bacillus cell when cultivated under identical conditions.
44. The Bacillus cell according to any one of the previous paragraphs, wherein the yield of the polypeptide of interest is increased at least 5%, e.g., at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 105%, at least 110%, at least 115%, at least 120%, at least 125%, at least 130%, at least 135%, at least 140%, at least 145%, at least 150%, at least 155%, at least 160%, at least 165%, at least 170%, at least 175%, at least 180%, at least
185%, at least 190%, at least 195%, at least 200%, at least 205%, at least 210%, at least
215%, at least 220%, at least 225%, at least 230%, at least 235%, at least 240%, at least
245%, or at least 250%, relative to the yield of the parent Bacillus cell.
45. The Bacillus cell according to any one of the previous paragraphs, wherein the yield of the polypeptide of interest is increased at least 100%, e.g., at least 200%, relative to the yield of the parent Bacillus cell.
46. A mutant of a parent Bacillus strain comprising in its genome a heterologous promoter operably linked to a polynucleotide encoding a polypeptide of interest, and comprising in its genome one or more protease genes selected from the group consisting of: alkaline protease (aprL), glu-specific protease (mprL), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), a second minor extracellular serine protease (vpr), a cellwall associated protease (wprA), and an intracellular serine protease (/spA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
47. The mutant according to paragraph 46, wherein the at least six modifications comprise modification of aprL, mprL, bprAB, epr, vpr, and wprA.
48. The mutant according to any one of paragraphs 46-47, wherein at least six proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, and a cell-wall associated protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
49. The mutant according to any one of paragraphs 46-48, wherein at least seven proteases selected from the list of alkaline protease, a glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, a Cell-wall associated protease, and an intracellular serine protease are truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions. The mutant according to any one of paragraphs 46-49, wherein the mutant is deficient in the production of at least six proteases selected from the list of alkaline protease, glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, and a Cell-wall associated protease. The mutant according to any one of paragraphs 46-50, wherein the mutant is deficient in the production of at least 7 proteases selected from the list of alkaline protease, a glu-specific protease, bacillopeptidase F, a first minor extracellular serine protease, a second minor extracellular serine protease, a Cell-wall associated protease, and an intracellular serine protease. The mutant according to any one of paragraphs 46-51 , wherein the at least 7 modifications comprise modification of aprL, mprL, bprAB, epr, vpr, wprA, and ispA. The mutant according to any one of paragraphs 46-52, which is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, and Cell-wall associated protease compared to the parent Bacillus strain, when cultivated under identical conditions. The mutant according to any one of paragraphs 46-53, which is completely deficient in or produces at least 50 percent less of the alkaline protease, glu-specific protease, bacillopeptidase F, first minor extracellular serine protease, second minor extracellular serine protease, Cell-wall associated protease, and intracellular serine protease compared to the parent Bacillus strain, when cultivated under identical conditions. The mutant according to any one of paragraphs 46-54, wherein protease activity of each of the native proteases encoded by aprL, mprL, bprAB, epr, vpr, and wprA is reduced or eliminated. a. The mutant according to any one of paragraphs 46-55, wherein protease activity of each of the native proteases encoded by aprL, mprL, bprAB, epr, vpr, wprA, and ispA is reduced or eliminated. 56. The mutant according to any one of paragraphs 46-55a, wherein the protease activity is reduced or eleminated by substitution, insertion, or deletion of one or more nucleotides; and/or wherein protease activity is reduced or eliminated by full and/or partial gene deletion.
57. The mutant according to any one of paragraphs 46-56, wherein the cell comprises at least 2 copies of the heterologous promoter operably linked to the polynucleotide encoding the polypeptide of interest, e.g., at least 3 copies, at least 4 copies, or at least 5 copies.
58. The mutant according to any one of paragraphs 46-57, wherein the polynucleotide encoding the polypeptide of interest is operably linked to a polynucleotide encoding a signal peptide.
59. The mutant according to any one of paragraphs 46-58, wherein the second polynucleotide encoding the polypeptide of interest and/or the first polynucleotide encoding the signal peptide is operably linked to one or more promoter.
60. The mutant according to any one of paragraphs 46-59, wherein the aprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 8.
61. The mutant according to any one of paragraphs 46-60, wherein the mprL gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 9.
62. The mutant according to any one of paragraphs 46-61 , wherein the bprAB gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 10.
63. The mutant according to any one of paragraphs 46-62, wherein the epr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 11 .
64. The mutant according to any one of paragraphs 46-63, wherein the vpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 12.
65. The mutant according to any one of paragraphs 46-64, wherein the wpr gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 13.
66. The mutant according to any one of paragraphs 46-65, wherein the ispA gene comprises or consists of a polynucleotide having at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity to the polynucleotide sequence of SEQ ID NO: 14.
67. The Bacillus cell or mutant of a parent strain according to any one of the previous paragraphs, comprising one or more of the genes selected from the list of amyl, bgIC, catL, cypX, ford, ggt, gntP, lacA2, sacB, spollAC, and xylA, wherein one or more of the genes are modified rendering their respective gene products (e.g. polypeptides) truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
67a. The Bacillus cell or mutant of a parent strain according to any one of the previous paragraphs, comprising one or more of the genes selected from the list of amyl, bgIC, catL, cypX, ford, ggt, gntP, lacA2, sacB, spollAC, and xylA, wherein 11 of the one or more genes are modified rendering their respective gene products (e.g. polypeptides) truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
68. The Bacillus cell or mutant according to any one of the previous paragraphs, wherein yield of the polypeptide of interest is increased compared to the parent Bacillus cell when cultivated under identical conditions.
69. The Bacillus cell or mutant according to any one of the previous paragraphs, wherein the yield of the polypeptide of interest is increased at least 5%, e.g., at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 105%, at least 110%, at least 115%, at least 120%, at least 125%, at least 130%, at least 135%, at least 140%, at least 145%, at least
150%, at least 155%, at least 160%, at least 165%, at least 170%, at least 175%, at least
180%, at least 185%, at least 190%, at least 195%, at least 200%, at least 205%, at least
210%, at least 215%, at least 220%, at least 225%, at least 230%, at least 235%, at least
240%, at least 245%, or at least 250%, relative to the yield of the parent Bacillus cell.
70. The Bacillus cell or mutant according to any one of the previous paragraphs, wherein the yield of the polypeptide of interest is increased at least 100%, e.g. at least 200%, relative to the yield of the parent Bacillus cell.
71 . The Bacillus cell or mutant according to any one of the previous paragraphs, wherein the cell or mutant is a Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, or Bacillus thuringiensis cell.
72. The Bacillus cell or mutant according to any one of the preceding paragraphs, wherein the cell or mutant is a Bacillus subtilis, or a Bacillus licheniformis.
73. The Bacillus cell or mutant according to any one of the preceding paragrahs, wherein the polypeptide of interest is heterologous to the Bacillus cell.
74. The mutant according to any one of paragraphs 46-73, wherein the polypeptide of interest comprises or consists of a protease-sensitive polypeptide.
75. The mutant according to any one paragraphs 46-74, wherein the polypeptide of interest comprises or consists of an enzyme, e.g., an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, betagalactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta- xylosidase; or a therapeutic polypeptide, e.g. a therapeutic polypeptide selected from the list of a GLP-1 receptor agonist, an antibody, an antibody fragment, an antibody-based drug, a Fc fusion protein, an anticoagulant, a blood factor, a bone morphogenetic protein, an engineered protein scaffold, an enzyme, a growth factor, a blood clotting factor, a hormone, an interferon (such as an interferon alpha-2b), an interleukin, a lactoferrin, an alpha-lactalbumin, a beta-lactalbumin, an ovomucoid, an ovostatin, a cytokine, an obestatin, a human galactosidase (such as an human alpha-galactosidase A), a vaccine, a protein vaccine, and a thrombolytic. The mutant according to any one paragraphs 46-75, wherein the polypeptide of interest comprises or consists of a GLP-1 receptor agonist, a lactase, an amylase, or an alternansucrase. a. The mutant according to any one of paragraphs 46-76, wherein the polypeptide of interest comprises or consists of a GLP-1 receptor agonist, said GLP-1 receptor agonist comprising or consisting of: a) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or b) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold. The mutant according to any one paragraphs 46-76a, wherein the polypeptide of interest does not comprise a cutinase, and/or does not comprise a disperin, e.g. a disperin having hexosaminidase activity. 78. The Bacillus cell or mutant according to any one of the preceding paragraphs, wherein at least one of the one or more control sequences is heterologous to the polynucleotide encoding the polypeptide of interest.
79. The Bacillus cell or mutant according to any one of the preceding paragraphs, which is isolated.
80. The Bacillus cell or mutant according to any one of the preceding paragraphs, which is purified.
81. A method of producing a polypeptide of interest, the method comprising: a) cultivating a host cell or mutant according to any one of paragraphs 27-80 under conditions conducive for production of the polypeptide of interest; and optionally b) recovering the polypeptide of interest.
82. The method according to paragraph 81 , wherein step a) comprises cultivation in a cultivation medium comprising 2-aminoisobutyric acid (aib).
83. The method according to any one of paragraphs 81-82, wherein the host cell or mutant is cultivated in continuous fermentation, batch fermentation, or fed-batch fermentation.
84. The method according to any one of paragraphs 81-83, wherein the host cell or mutant is cultivated in continuous fermentation, preferably wherein the continuous fermentation has a duration of at least 96 hours, e.g., at least 120 hours, at least 144 hours, at least 168 hours, at least 192 hours, at least 216 hours, at least 240 hours, at least 264 hours, at least 288 hours, at least 312 hours, or at least 336 hours.
85. A fusion polypeptide comprising: a) a signal peptide comprising or consisting of:
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16 or 18, and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three- dimensional structure of the polypeptide of SEQ ID NO: 1 , 16, or 18, wherein the three- dimensional structure is calculated by Alphafold; and b) a GLP-1 receptor agonist comprising or consisting of
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three- dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three- dimensional structure is calculated by Alphafold.
86. The fusion polypeptide according to paragraph 85, wherein the GLP-1 receptor agonist comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7.
87. The fusion polypeptide of any one of paragraphs 85-86, wherein the signal peptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18.
88. The fusion polypeptide according to any one of paragraphs 85-87, wherein the GLP-1 receptor agonist comprises or consists of a) a polypeptide derived from SEQ ID NO: 3, 4, 5, 6, or 7, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or (c) a fragment of the polypeptide of a), or b). The fusion polypeptide according to any one of the preceding paragraphs, wherein the signal peptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 1 , 16, or 18, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b). The fusion polypeptide according to any one of the preceding paragraphs, further comprising a linker. The fusion polypeptide of paragraph 90, wherein the linker is located at the N-terminal end of the signal peptide, and preferably at the C-terminal end of the GLP-1 receptor agonist. The fusion polypeptide according to any one of paragraphs 90-91 , wherein the linker is a cleavable linker, e.g., for cleavage with a protease. The fusion polypeptide according to any one of paragraphs 90-92, wherein the linker comprises or consists of: an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 2. The fusion polypeptide according to any one of paragraphs 90-93, wherein the linker comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 2. The fusion polypeptide according to any one of paragraphs 90-94, wherein the linker comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 2, wherein the three-dimensional structure is calculated by Alphafold. The fusion polypeptide according to any one of the preceding paragraphs, wherein the linker comprises or consists of a) a polypeptide derived from SEQ ID NO: 2, by having 1-15 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b). The fusion polypeptide according to any one of paragraphs 85-96, wherein the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30. The fusion polypeptide according to any one of paragraphs 85-98, wherein the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30. The fusion polypeptide according to any one of paragraphs 85-98, wherein the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30, wherein the three-dimensional structure is calculated by Alphafold. . The fusion polypeptide according to any one of the preceding paragraphs, wherein the fusion polypeptide comprises or consists of a) a polypeptide derived from any one of SEQ ID NO: 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b). . The fusion polypeptide according to any one of paragraphs 85-100, wherein the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 19. . The fusion polypeptide according to any one of paragraphs 85-101 , wherein the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 19. . The fusion polypeptide according to any one of paragraphs 85-102, wherein the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 19, wherein the three-dimensional structure is calculated by Alphafold. . The fusion polypeptide according to any one of the preceding paragraphs, wherein the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 19, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b). . The fusion polypeptide according to any one of paragraphs 85-100, wherein the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 22.
106. The fusion polypeptide according to any one of paragraphs 85-100, or paragraph 105, wherein the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 22.
107. The fusion polypeptide according to any one of paragraphs 85-100 or 105-106, wherein the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 22, wherein the three-dimensional structure is calculated by Alphafold.
108. The fusion polypeptide according to any one of the preceding paragraphs, wherein the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 22, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
109. The fusion polypeptide according to any one of paragraphs 85-100, wherein the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 25.
110. The fusion polypeptide according to any one of paragraphs 85-100 or paragraph 109, wherein the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 25. 111. The fusion polypeptide according to any one of paragraphs 85-100 or 109-110, wherein the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 25, wherein the three-dimensional structure is calculated by Alphafold.
112. The fusion polypeptide according to any one of the preceding paragraphs, wherein the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 25, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
113. The fusion polypeptide according to any one of paragraphs 85-100, wherein the fusion polypeptide comprises or consists of an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 28.
114. The fusion polypeptide according to any one of paragraphs 85-100 or paragraph 113, wherein the fusion polypeptide comprises or consists of a polypeptide with the amino acid sequence of SEQ ID NO: 28.
115. The fusion polypeptide according to any one of paragraphs 85-100 or 113-114, wherein the fusion polypeptide comprises or consists of a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 28, wherein the three-dimensional structure is calculated by Alphafold. 16. The fusion polypeptide according to any one of the preceding paragraphs, wherein the fusion polypeptide comprises or consists of a) a polypeptide derived from SEQ ID NO: 28, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
117. The fusion polypeptide of any one of paragraphs 85-116, which is isolated.
118. The fusion polypeptide of any one of paragraphs 85-117, which is purified.
119. The fusion polypeptide of any one of paragraphs 85-118, wherein sequence identity is determined by Sequence Identity Determination Method 1.
120. A composition comprising the fusion polypeptide of any one of paragraphs 85-119.
121. A cell composition comprising the fusion polypeptide of any one of paragraphs 85-119.
122. A fermentation broth comprising the fusion polypeptide of any one of paragraphs 85-119.
123. A polynucleotide encoding the fusion polypeptide of any one of paragraphs 85-119.
124. The polynucleotide of paragraph 123, which is isolated.
125. The polynucleotide of any one of paragraphs 123-124, which is purified.

Claims

Claims
1 . A nucleic acid construct comprising: a first polynucleotide encoding a signal peptide comprising or consisting of an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18; and a second polynucleotide encoding a polypeptide of interest, wherein the polypeptide of interest comprises or consists of a glucagon like peptide-1 (GLP-1) receptor agonist, and wherein the first polynucleotide and the second polynucleotide are operably linked in translational fusion.
2. The nucleic acid construct according to claim 1 , wherein the signal peptide comprises or consists of: a) an amino acid sequence having at least 85%, e.g., at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18, and/or b) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three- dimensional structure of the polypeptide of SEQ ID NO: 1 , 16, or 18, wherein the three- dimensional structure is calculated by Alphafold.
3. The nucleic acid construct according to any one of claims 1-2, wherein the GLP-1 receptor agonist comprises or consists of: a) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or b) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1 .0, to the three-dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three-dimensional structure is calculated by Alphafold.
4. An expression vector comprising a nucleic acid construct according to any one of claims 1-3.
5. A Bacillus cell comprising in its genome: a) one or more nucleic acid construct according to any one of claims 1 to 3; and/or b) one or more expression vector according to claim 4.
6. The Bacillus cell according to claim 5, wherein the cell is a mutant of a parent Bacillus strain, the mutant comprising one or more protease genes selected from the group consisting of: alkaline protease (aprL), glu-specific protease (mprL), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), a second minor extracellular serine protease (vpr), a cellwall associated protease (wprA), and an intracellular serine protease (/spA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
7. A mutant of a parent Bacillus strain comprising in its genome a heterologous promoter operably linked to a polynucleotide encoding a polypeptide of interest, and comprising in its genome one or more protease genes selected from the group consisting of: alkaline protease (aprL), glu-specific protease (mprL), bacillopeptidase F (bprAB), a first minor extracellular serine protease (epr), second minor extracellular serine protease (vpr), cell-wall associated protease (wprA), and intracellular serine protease (/spA), wherein at least six of the one or more protease genes are modified rendering at least six proteases truncated, partly or fully inactivated, present at reduced level or eliminated compared to the parent Bacillus strain when cultivated under identical conditions.
8. The Bacillus cell or mutant according to any one of claims 6-7, wherein the at least six modifications comprise modification of aprL, mprL, bprAB, epr, vpr, and wprA.
9. The Bacillus cell or mutant according to any one of claims 6-8, wherein the aprL gene comprises or consists of a polynucleotide having at least 60% sequence identity to the polynucleotide sequence of SEQ ID NO: 8, the mprL gene comprises or consists of a polynucleotide having at least 60% sequence identity to the polynucleotide sequence of SEQ ID NO: 9, the bprAB gene comprises or consists of a polynucleotide having at least 60% sequence identity to the polynucleotide sequence of SEQ ID NO: 10, the epr gene comprises or consists of a polynucleotide having at least 60% sequence identity to the polynucleotide sequence of SEQ ID NO: 11 , the vpr gene comprises or consists of a polynucleotide having at least 60% sequence identity to the polynucleotide sequence of SEQ ID NO: 12, and wherein the wpr gene comprises or consists of a polynucleotide having at least 60% sequence identity to the polynucleotide sequence of SEQ ID NO: 13.
10. The Bacillus cell or mutant according to any one of claims 5-9, wherein the cell or mutant is a Bacillus subtilis, or a Bacillus licheniformis.
11. The Bacillus cell or mutant according to any one of claims 6-10, wherein yield of the polypeptide of interest is increased compared to the parent Bacillus cell when cultivated under identical conditions.
12. The Bacillus cell or mutant according to any one of claims 7-11 , wherein the polypeptide of interest comprises or consists of a GLP-1 receptor agonist.
13. A method of producing a polypeptide of interest, the method comprising: a) cultivating a Bacillus cell or mutant according to any one of claims 5-12 under conditions conducive for production of the polypeptide of interest; and optionally b) recovering the polypeptide of interest.
14. A fusion polypeptide, comprising: a) a signal peptide comprising or consisting of:
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 1 , 16, or 18, and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three- dimensional structure of the polypeptide of SEQ ID NO: 1 , 16, or 18, wherein the three- dimensional structure is calculated by Alphafold; and b) a GLP-1 receptor agonist comprising or consisting of
(i) an amino acid sequence having at least 80%, e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3, 4, 5, 6, or 7; and/or
(ii) a polypeptide having a TM-score of at least 0.90, e.g., at least 0.905, at least 0.910, at least 0.915, at least 0.920, at least 0.925, at least 0.930, at least 0.935, at least 0.940, at least 0.945, at least 0.950, at least 0.955, at least 0.960, at least 0.965, at least 0.970, at least 0.975, at least 0.980, at least 0.985, at least 0.990, at least 0.995, or even 1.0, to the three- dimensional structure of the polypeptide of SEQ ID NO: 3, 4, 5, 6, or 7, wherein the three- dimensional structure is calculated by Alphafold.
15. The fusion polypeptide according to claim 14, wherein the GLP-1 receptor agonist comprises or consists of
(a) a polypeptide derived from SEQ ID NO: 3, 4, 5, 6, or 7, by having 1-30 alterations (e.g., substitutions, deletions and/or insertions at one or more positions, e.g., 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 alterations, in particular substitutions;
(b) a polypeptide derived from the polypeptide of a), wherein the N- and/or C-terminal end has been extended by addition of one or more amino acids; and/or
(c) a fragment of the polypeptide of a), or b).
16. A composition, or fermentation broth comprising the fusion polypeptide of any one of claims 14-15.
PCT/EP2025/068095 2024-07-17 2025-06-26 Optimized bacillus host cells Pending WO2026017381A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP24189205 2024-07-17
EP24189205.8 2024-07-17

Publications (1)

Publication Number Publication Date
WO2026017381A1 true WO2026017381A1 (en) 2026-01-22

Family

ID=91958718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2025/068095 Pending WO2026017381A1 (en) 2024-07-17 2025-06-26 Optimized bacillus host cells

Country Status (1)

Country Link
WO (1) WO2026017381A1 (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1990015861A1 (en) 1989-06-13 1990-12-27 Genencor International, Inc. A method for killing cells without cell lysis
WO1992006204A1 (en) 1990-09-28 1992-04-16 Ixsys, Inc. Surface expression libraries of heteromeric receptors
US5223409A (en) 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
WO1994025612A2 (en) 1993-05-05 1994-11-10 Institut Pasteur Nucleotide sequences for the control of the expression of dna sequences in a cellular host
WO1995017413A1 (en) 1993-12-21 1995-06-29 Evotec Biosystems Gmbh Process for the evolutive design and synthesis of functional polymers based on designer elements and codes
WO1995022625A1 (en) 1994-02-17 1995-08-24 Affymax Technologies N.V. Dna mutagenesis by random fragmentation and reassembly
WO1995033836A1 (en) 1994-06-03 1995-12-14 Novo Nordisk Biotech, Inc. Phosphonyldipeptides useful in the treatment of cardiovascular diseases
WO1999043835A2 (en) 1998-02-26 1999-09-02 Novo Nordisk Biotech, Inc. Methods for producing a polypeptide in a bacillus cell
WO2006042548A1 (en) 2004-10-22 2006-04-27 Novozymes A/S Stable genomic integration of multiple polynucleotide copies
WO2009030738A1 (en) 2007-09-05 2009-03-12 Novo Nordisk A/S Glucagon-like peptide-1 derivatives and their pharmaceutical use
WO2010096673A1 (en) 2009-02-20 2010-08-26 Danisco Us Inc. Fermentation broth formulations
US8563289B2 (en) 2004-02-13 2013-10-22 Novozymes A/S Protease variants
WO2015091613A1 (en) 2013-12-17 2015-06-25 Novo Nordisk A/S Enterokinase cleavable polypeptides
WO2017024198A1 (en) * 2015-08-06 2017-02-09 The Trustees Of The University Of Pennsylvania Glp-1 and use thereof in compositions for treating metabolic diseases
WO2018077796A1 (en) 2016-10-25 2018-05-03 Novozymes A/S Flp-mediated genomic integrationin bacillus licheniformis
CN109868281A (en) * 2019-03-06 2019-06-11 江苏大学附属医院 A method of Gluca Gen sample peptide -1 is expressed using bacillus subtilis
US20190185847A1 (en) 2016-07-06 2019-06-20 Novozymes A/S Improving a Microorganism by CRISPR-Inhibition
WO2019243502A1 (en) 2018-06-21 2019-12-26 Novo Nordisk A/S Novel compounds for treatment of obesity
EP3838925A1 (en) * 2018-09-06 2021-06-23 Zhejiang Palo Alto Pharmaceuticals, Inc. Long-acting recombinant glp1-fc-cd47 protein, preparation method and use thereof
WO2022110499A1 (en) * 2020-11-27 2022-06-02 广州汉腾生物科技有限公司 Application of signal peptide in expression of glp-1 fusion protein

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5223409A (en) 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
WO1990015861A1 (en) 1989-06-13 1990-12-27 Genencor International, Inc. A method for killing cells without cell lysis
WO1992006204A1 (en) 1990-09-28 1992-04-16 Ixsys, Inc. Surface expression libraries of heteromeric receptors
WO1994025612A2 (en) 1993-05-05 1994-11-10 Institut Pasteur Nucleotide sequences for the control of the expression of dna sequences in a cellular host
WO1995017413A1 (en) 1993-12-21 1995-06-29 Evotec Biosystems Gmbh Process for the evolutive design and synthesis of functional polymers based on designer elements and codes
WO1995022625A1 (en) 1994-02-17 1995-08-24 Affymax Technologies N.V. Dna mutagenesis by random fragmentation and reassembly
WO1995033836A1 (en) 1994-06-03 1995-12-14 Novo Nordisk Biotech, Inc. Phosphonyldipeptides useful in the treatment of cardiovascular diseases
WO1999043835A2 (en) 1998-02-26 1999-09-02 Novo Nordisk Biotech, Inc. Methods for producing a polypeptide in a bacillus cell
US8563289B2 (en) 2004-02-13 2013-10-22 Novozymes A/S Protease variants
WO2006042548A1 (en) 2004-10-22 2006-04-27 Novozymes A/S Stable genomic integration of multiple polynucleotide copies
WO2009030738A1 (en) 2007-09-05 2009-03-12 Novo Nordisk A/S Glucagon-like peptide-1 derivatives and their pharmaceutical use
WO2010096673A1 (en) 2009-02-20 2010-08-26 Danisco Us Inc. Fermentation broth formulations
WO2015091613A1 (en) 2013-12-17 2015-06-25 Novo Nordisk A/S Enterokinase cleavable polypeptides
WO2017024198A1 (en) * 2015-08-06 2017-02-09 The Trustees Of The University Of Pennsylvania Glp-1 and use thereof in compositions for treating metabolic diseases
US20190185847A1 (en) 2016-07-06 2019-06-20 Novozymes A/S Improving a Microorganism by CRISPR-Inhibition
WO2018077796A1 (en) 2016-10-25 2018-05-03 Novozymes A/S Flp-mediated genomic integrationin bacillus licheniformis
WO2019243502A1 (en) 2018-06-21 2019-12-26 Novo Nordisk A/S Novel compounds for treatment of obesity
EP3838925A1 (en) * 2018-09-06 2021-06-23 Zhejiang Palo Alto Pharmaceuticals, Inc. Long-acting recombinant glp1-fc-cd47 protein, preparation method and use thereof
CN109868281A (en) * 2019-03-06 2019-06-11 江苏大学附属医院 A method of Gluca Gen sample peptide -1 is expressed using bacillus subtilis
WO2022110499A1 (en) * 2020-11-27 2022-06-02 广州汉腾生物科技有限公司 Application of signal peptide in expression of glp-1 fusion protein

Non-Patent Citations (50)

* Cited by examiner, † Cited by third party
Title
"Current protocols in Molecular Biology.", 1995, JOHN WILEY AND SONS
"Molecular Biological Methods for Bacillus.", 1990, JOHN WILEY AND SONS
ARMENTEROS ET AL., NAT. BIOTECHNOL., vol. 37, 2019, pages 420 - 423
BOWIESAUER, PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 2152 - 2156
BURKE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 98, 2001, pages 6289 - 6294
CAO CHUNLAI ET AL: "Signal Peptide Optimization to Prevent N-terminal Truncation of Glucagon Like Peptide-1/IgG-Fc Fusion Protein", INTERNATIONAL JOURNAL OF PEPTIDE RESEARCH AND THERAPEUTICS, vol. 27, no. 1, 31 August 2020 (2020-08-31), pages 579 - 586, XP037366828, ISSN: 1573-3149, DOI: 10.1007/S10989-020-10112-9 *
CARTER ET AL., PROTEINS: STRUCTURE, FUNCTION, AND GENETICS, vol. 6, 1989, pages 240 - 248
CHOI ET AL., J. MICROBIOL. METHODS, vol. 64, 2006, pages 391 - 397
COLLINS-RACIE ET AL., BIOTECHNOLOGY, vol. 13, 1995, pages 982 - 987
CONTRERAS ET AL., BIOTECHNOLOGY, vol. 9, 1991, pages 378 - 381
COOPER ET AL., EMBO J., vol. 12, 1993, pages 2575 - 2583
CUNNINGHAMWELLS, SCIENCE, vol. 244, 1989, pages 1081 - 1085
DAWSON ET AL., SCIENCE, vol. 266, 1994, pages 776 - 779
DERBYSHIRE ET AL., GENE, vol. 46, 1986, pages 145
DONALD ET AL., J. BACTERIOL., vol. 195, no. 11, 2013, pages 2612 - 2620
EATON ETAL ET AL., BIOCHEMISTRY, vol. 25, 1986, pages 505 - 512
FORD ET AL., PROTEIN EXPRESSION AND PURIFICATION, vol. 2, pages 95 - 107
GEISBERG ET AL., CELL, vol. 156, no. 4, 2014, pages 812 - 824
HAMBRAEUS ET AL., MICROBIOLOGY, vol. 146, no. 12, 2000, pages 3051 - 3059
HEINZE ET AL., BMC MICROBIOLOGY, vol. 18, 2018, pages 56
HILTON ET AL., J. BIOL. CHEM., vol. 271, 1996, pages 4699 - 4708
HOLMSANDER, TRENDS BIOCHEM. SCI., vol. 20, 1995, pages 478 - 480
HUE ET AL., J. BACTERIOL., vol. 177, 1995, pages 3465 - 3471
J. L. BOSE: "Methods in Molecular Biology,", 2016, SPRINGER PROTOCOLS, article "The Genetic Manipulation of Staphylococc"
JUMPER ET AL.: "Highly accurate protein structure prediction with AlphaFold", NATURE, vol. 596, 2021, pages 583 - 589, XP037990370, DOI: 10.1038/s41586-021-03819-2
KABERDINBLASI, FEMS MICROBIOL. REV, vol. 30, no. 6, 2006, pages 967 - 979
LABROU, PROTEIN DOWNSTREAM PROCESSING, vol. 1129, 2014, pages 3 - 10
LOWMAN ET AL., BIOCHEMISTRY, vol. 30, 1991, pages 10832 - 10837
MARTIN ET AL., J. IND. MICROBIOL. BIOTECHNOL, vol. 3, 2003, pages 568 - 576
MOROZOV ET AL., EUKARYOTIC CELL, vol. 5, no. 11, 2006, pages 1838 - 1846
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
NER ET AL., DNA, vol. 7, 1988, pages 127
NESS ET AL., NATURE BIOTECHNOLOGY, vol. 17, 1999, pages 893 - 896
PATELGUPTA, INT. J. SYST. EVOL. MICROBIOL., vol. 70, 2020, pages 406 - 438
RASMUSSEN-WILSON ET AL., APPL. ENVIRON. MICROBIOL., vol. 63, 1997, pages 3488 - 3493
REIDHAAR-OLSONSAUER, SCIENCE, vol. 241, 1988, pages 53 - 57
RICE ET AL.: "EMBOSS: The European Molecular Biology Open Software Suite", TRENDS GENET., vol. 16, 2000, pages 276 - 277, XP004200114, DOI: 10.1016/S0168-9525(00)02024-2
SAMBROOK ET AL.: "Molecular cloning: A laboratory manual.", 1989, COLD SPRING HARBOR LABORATORY
SHINDYALOVBOURNE, PROTEIN ENG., vol. 11, 1998, pages 739 - 747
SMITH ET AL., J. MOL. BIOL., vol. 224, 1992, pages 899 - 904
SONG ET AL., PLOS ONE, vol. 11, no. 7, 2016, pages e0158447
STEVENS, DRUG DISCOVERY WORLD, vol. 4, 2003, pages 35 - 48
SVETINA ET AL., J. BIOTECHNOL, vol. 76, 2000, pages 245 - 251
VARADI ET AL., NUCLEIC ACIDS RES., vol. 50, no. D1, 2021, pages D439 - D444
VOS ET AL., SCIENCE, vol. 255, 1992, pages 306 - 312
WINGFIELD, CURRENT PROTOCOLS IN PROTEIN SCIENCE, vol. 80, no. 1, 2015, pages 1 - 35
WLODAVER ET AL., FEBS LETT., vol. 309, 1992, pages 59 - 64
YOU, C ET AL., METHODS MOL. BIOL., vol. 116, 2017, pages 183 - 92
ZHANGSKOLNICK, NUCLEIC ACIDS RES., vol. 33, no. 7, 2005, pages 2302 - 2309
ZHANGSKOLNICK, PROTEINS, vol. 57, 2004, pages 702 - 710

Similar Documents

Publication Publication Date Title
Li et al. Bottlenecks in the expression and secretion of heterologous proteins in Bacillus subtilis
CN102803290B (en) For increasing the Bacillus strain that protein produces
EP2689015B1 (en) Methods for producing secreted polypeptides
CN101903519B (en) Enhanced protein production in bacillus
US12492391B2 (en) Cognate foldase co-expression
EP3571217B1 (en) Methods and compositions for obtaining natural competence in bacillus host cells
US9732331B2 (en) Codon modified amylase from Bacillus akibai
EP4359423A1 (en) Bacillus licheniformis host cell for production of a compound of interest with increased purity
WO2026017381A1 (en) Optimized bacillus host cells
WO2023247514A1 (en) Recombinant mannanase expression
CN101978056A (en) Plasmid vector
US20250034535A1 (en) Improved protein production in recombinant bacteria
WO2025012213A1 (en) Artificial signal peptides
CN115335503A (en) Compositions and methods for enhancing protein production in Bacillus cells
JP7706628B2 (en) Modified 5&#39; untranslated region and method for producing target substance using same
JP2025102402A (en) Method for modifying the 5&#39; untranslated region
WO2023285348A1 (en) Recombinant cutinase expression
US20220170003A1 (en) Means And Methods For Improving Protease Expression
WO2024120767A1 (en) Modified rna polymerase activities
EP4061939A1 (en) Selection marker free methods for modifying the genome of bacillus and compositions thereof