[go: up one dir, main page]

WO2024240965A2 - Procédé de criblage basé sur des gouttelettes - Google Patents

Procédé de criblage basé sur des gouttelettes Download PDF

Info

Publication number
WO2024240965A2
WO2024240965A2 PCT/EP2024/077083 EP2024077083W WO2024240965A2 WO 2024240965 A2 WO2024240965 A2 WO 2024240965A2 EP 2024077083 W EP2024077083 W EP 2024077083W WO 2024240965 A2 WO2024240965 A2 WO 2024240965A2
Authority
WO
WIPO (PCT)
Prior art keywords
interest
polynucleotide
polypeptide
cells
sequence
Prior art date
Application number
PCT/EP2024/077083
Other languages
English (en)
Other versions
WO2024240965A3 (fr
Inventor
Philipp Gruner
Kenneth Kyndi GRAVESEN
Ole Skyggebjerg
Original Assignee
Novozymes A/S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novozymes A/S filed Critical Novozymes A/S
Publication of WO2024240965A2 publication Critical patent/WO2024240965A2/fr
Publication of WO2024240965A3 publication Critical patent/WO2024240965A3/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1429Signal processing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1075Isolating an individual clone by screening libraries by coupling phenotype to genotype, not provided for in other groups of this subclass
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1086Preparation or screening of expression libraries, e.g. reporter assays
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1456Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals
    • G01N15/1459Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals the analysis being performed on a sample stream
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/149Optical investigation techniques, e.g. flow cytometry specially adapted for sorting particles, e.g. by their size or optical properties
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N2015/1006Investigating individual particles for cytology
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2500/00Screening for compounds of potential therapeutic value
    • G01N2500/10Screening for compounds of potential therapeutic value involving cells

Definitions

  • the present invention relates to methods for screening a biological-library using a microfluidic chip.
  • the invention also relates to nucleic acid sequences, vectors, and host cells which have been isolated and/or generated by the methods of the invention.
  • the methods of the invention also relate to identification of polynucleotides of interest and/or cells having desired characteristics.
  • Droplets sorted in the chip are sometimes referred to as microdroplets or microencapsulations, they typically have an average diameter of about 20 micrometer and are used as compartments or miniscule reaction vessels. They can contain live microbial cells that are, for example, secreting an enzyme. Additionally or alternatively, the droplets can contain cell extracts that enable the expression of a protein encoded by a polynucleotide of interest.
  • the droplets may also contain other components, for example, a fluorogenic enzyme substrate that can reveal the activity of an enzyme.
  • the method of the invention is as robust as conventional methods (as demonstrated in Example 3) while showing reduced variability (evidenced by decreased standard deviation as shown in Example 4). Decreased standard deviation facilitates the generation of better performing computational models.
  • the method of the invention enables the screening of larger libraries. Screening a higher number of library members allows for the generation of more effective models (as shown in Example 6). The method of the invention also identifies library members that are difficult or impossible to identify using other methods (as shown in Example 7).
  • Droplet microfluidics accelerates the speed with which biological screening data can be generated by a factor of ca. 1000 at less than 1% of the cost compared to conventional HTS methods.
  • the methods of the invention require a lower number of cells and/or library members in the starting sample without compromising the read-out quality.
  • the scoring of the polynucleotide of interest allows a detailed study of the relationship between variants of the polynucleotide of interest and a desired effect, e.g., enzyme activity or enzyme yield.
  • the methods of the invention allow a higher resolution of read-outs, i.e., it is possible to differentiate between multiple sub-sets after droplet separation based on multiple threshold values. For example, in some cases it is not the highest binding activity of a polypeptide towards a substrate or inhibitor that is favorable, but a moderate binding activity in a “sweet spot” is preferred.
  • variant polynucleotide sequences correlated with positive effects e.g., increased enzyme yield or increased enzyme activity
  • variant sequences correlated with less- desired effects e.g., reduced enzyme yield or reduced enzyme activities
  • the results obtained with the methods of the invention unexpectedly have a strong correlation with the results of the MTP screening methods.
  • using the methods of the instant invention allows to skip or replace the well-establised MTP screening methods which are known to be time- and resource-demanding.
  • the methods of the invention have also shown a lower standard deviation (STD) compared to MTP. This lower STD is of particular advantage when using the resulting data as input data for a machine learning model as a lower STD results in a high confidence machine learning model in form of a more precise and robust algorithm.
  • STD standard deviation
  • the method of the invention allows to identify promising library members from a large library, which members otherwise would not have been identified using conventional screening methods as these conventional methods are limited to smaller libraries only.
  • the methods of the invention provide a strategy where droplets are sorted into multiple, at least three, output channels (pools). Then the abundance of each individual sequence in each pool is measured and calculated as a score for each polynucleotide sequence. In this manner, droplet screening technology can be employed for the efficient generation of extensive data sets, while considering each and every polynucleotide sequence present in each pool.
  • the scoring, identification and/or sequencing of one or more polynucleotide of interest in one or more output channel can be utilized to train a computational model to obtain further insights about the sequence properties, to improve desired sequence characteristics (e.g., increased yield, and/or increased enzyme activity), and/or to generate synthetic sequences with such improved characteristics.
  • desired sequence characteristics e.g., increased yield, and/or increased enzyme activity
  • synthetic sequences with such improved characteristics.
  • the invention relates to a method for screening a biological library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device, d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the at least three
  • the invention relates to a host cell comprising in its genome the synthetic polynucleotide of interest generated in additional step h), and/or a polynucleotide of interest identified in step e).
  • the invention in a third aspect, relates to a method of producing a polypeptide of interest, the method comprising the steps of cultivating the cell according to the second aspect, under conditions conducive for production of the polypeptide.
  • Figure 1 shows a schematic overview of microfluidic device with multiple output channels according to one embodiment of the method of the invention.
  • Figure 2 shows the assay responses for 100.000 droplets and the thresholds used for separation into the five output channels (pools 1-5).
  • Figure 3 shows the relative abundance for 102 signal peptide variants sorted into five output channels (pools 1-5).
  • Figure 4 shows the correlation between the scores of the droplet method of the invention and the MTP assay.
  • Figure 5 shows the standard deviation (o) for MTP fermentations (A) and droplet fermentations (B) based on counts and relative protein yield.
  • Figure 6 shows the fraction of proline containing sequences amongst the sequences obtained from a MTP screen (A) and a ranking of signal peptide sequences obtained from the same MTP screen (B).
  • Figure 8 shows the correlation coefficient between predictions and observed values dependent on the size of the training data.
  • Non-limiting examples for DNA sequence variants include a library of wildtype cells comprising native DNA sequence variants.
  • the library of polynucleotides of interest is comprised in wildtype cells.
  • Non-limiting examples for amino acid variants include a library or purified polypeptide variants, and a library of recombinant cells expressing polypeptide variants.
  • machine learning algorithms include, but are not limited to:
  • Linear Regression A foundational algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data points. It is commonly used for tasks like predicting numerical values, such as housing prices based on factors like square footage and location.
  • Decision Trees A method that uses a tree-like structure to make decisions based on multiple conditions. Each node in the tree represents a decision based on a particular feature, eventually leading to a leaf node with the final prediction or classification. Decision trees are employed for tasks like classification, where an algorithm determines the category of an input based on features.
  • a random forest model is an advanced machine learning algorithm for diverse applications such as classification, regression, and data analysis.
  • the algorithm constructs an ensemble comprising numerous decision trees. Notably, each decision tree is established utilizing a subset of the training dataset and a randomized assortment of input features.
  • the distinctive potency of the random forest model stems from its capacity to amalgamate predictions derived from multiple decision trees. This fusion, termed "bagging," engenders augmented accuracy and heightened resilience in contrast to individual trees.
  • the random forest model effectively averts overfitting. Consequently, its performance is markedly enhanced in terms of making accurate predictions on novel, previously unseen data instances.
  • the random forest model adeptly manages high-dimensional datasets and intricate feature interdependencies, rendering it particularly applicable to intricate real-world predicaments. It is notable for its competence in accommodating missing data values, ensuring sustained accuracy even when confronted with incompleteness within portions of the data.
  • Neural Networks Complex algorithms inspired by the structure and function of biological neural networks. They consist of layers of interconnected nodes (neurons) that process and transform data. Deep learning, a subset of neural networks, involves multiple hidden layers and is utilized for tasks like natural language processing, image generation, and autonomous driving.
  • Reinforcement Learning An approach where an algorithm learns to make sequences of decisions by interacting with an environment to maximize a cumulative reward. This is often used in robotics, game playing, and autonomous systems.
  • Naive Bayes A probabilistic algorithm based on Bayes' theorem that is particularly effective for text classification tasks like spam detection and sentiment analysis.
  • PCA Principal Component Analysis
  • Generative Adversarial Network A specialized class of machine learning algorithm that involves two neural networks, a generator, and a discriminator, engaged in a competitive process.
  • the generator creates synthetic data instances (such as images or text) that resemble real data, while the discriminator evaluates whether a given data instance is real or generated.
  • the two networks iteratively refine their performance, with the generator aiming to produce increasingly realistic data and the discriminator improving its ability to differentiate between real and generated data.
  • a non-limiting example of a suitable GAN is disclosed in WO2024/133344 (Novozymes A/S).
  • ddPCR The term “Droplet Digital PCR” or “ddPCR” refers to an advanced molecular biology technique employed for the precise analysis and quantification of nucleic acids, including DNA and RNA, within a sample. This method represents an innovation over conventional polymerase chain reaction (PCR) methodologies, devised to address the inherent limitations of traditional PCR by enabling accurate measurement and detection of rare target sequences or subtle variations in target concentrations.
  • PCR polymerase chain reaction
  • the sample containing the target nucleic acid is intelligently subdivided into numerous individual droplets, each operating as an independent reaction compartment. This strategic partitioning step facilitates the isolated amplification of the target nucleic acid, minimizing the potential for amplification biases and interference from non-target molecules.
  • the droplets undergo fluorescence-based analysis, determining the presence or absence of the amplified target sequence within each individual droplet.
  • Droplet sorter means an arrangement within the microfluidic device which allows the sorting of droplets into three or more output channels, wherein the sorting is based on the amount of screenable product detected in the droplet.
  • the sorting is carried out by using one or more sorting means, e.g., electrodes or valves.
  • the amount of screenable product of the droplet is detected by one or more sensing means, and communicated to the sorting means, e.g., two or more electrodes.
  • expression means any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
  • Expression vector refers to a linear or circular DNA construct comprising a DNA sequence encoding a polypeptide, which coding sequence is operably linked to a suitable control sequence capable of effecting expression of the DNA in a suitable host.
  • control sequences may include a promoter to effect transcription, an optional operator sequence to control transcription, a sequence encoding suitable ribosome binding sites on the mRNA, enhancers and sequences which control termination of transcription and translation.
  • Extension means an addition of one or more amino acids to the amino and/or carboxyl terminus of a polypeptide, wherein the “extended” polypeptide has enzyme activity.
  • fragment means a polypeptide having one or more amino acids absent from the amino and/or carboxyl terminus of the mature polypeptide, wherein the fragment has enzyme activity.
  • Fusion polypeptide is a polypeptide in which one polypeptide is fused at the N-terminus and/or the C-terminus of a polypeptide of the present invention.
  • a fusion polypeptide is produced by fusing a polynucleotide encoding another polypeptide to a polynucleotide of the present invention, or by fusing two or more polynucleotides of the present invention together.
  • Techniques for producing fusion polypeptides are known in the art, and include ligating the coding sequences encoding the polypeptides so that they are in frame and that expression of the fusion polypeptide is under control of the same promoter(s) and terminator.
  • Fusion polypeptides may also be constructed using intein technology in which fusion polypeptides are created post-translationally (Cooper et al., 1993, EMBO J. 12: 2575-2583; Dawson et al., 1994, Science 266: 776-779).
  • a fusion polypeptide can further comprise a cleavage site between the two polypeptides. Upon secretion of the fusion protein, the site is cleaved releasing the two polypeptides. Examples of cleavage sites include, but are not limited to, the sites disclosed in Martin et al., 2003, J. Ind. Microbiol. Biotechnol. 3: 568-576; Svetina et al., 2000, J.
  • heterologous means, with respect to a host cell, that a polypeptide or nucleic acid does not naturally occur in the host cell.
  • heterologous means, with respect to a polypeptide or nucleic acid, that a control sequence, e.g., promoter, of a polypeptide or nucleic acid is not naturally associated with the polypeptide or nucleic acid, i.e., the control sequence is from a gene other than the gene encoding the mature polypeptide.
  • Host Strain or Host Cell is an organism comprising a polynucleotide of interest.
  • exemplary host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing a polypeptide of interest and/or fermenting saccharides, and/or probiotic microorganisms.
  • a recomobinant host strain or recombinant host cell is an organism into which an expression vector, phage, virus, or other DNA construct, including a polynucleotide encoding a polypeptide of interest (e.g., an amylase) has been introduced.
  • exemplary recombinant host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing the polypeptide of interest and/or fermenting saccharides.
  • the term "host cell" includes protoplasts created from cells.
  • Isolated means a polypeptide, nucleic acid, cell, or other specified material or component that has been separated from at least one other material or component, including but not limited to, other proteins, nucleic acids, cells, etc.
  • An isolated polypeptide, nucleic acid, cell or other material is thus in a form that does not occur in nature.
  • An isolated polypeptide includes, but is not limited to, a culture broth containing the secreted polypeptide expressed in a host cell.
  • Mature polypeptide means a polypeptide in its mature form following N-terminal and/or C-terminal processing (e.g., removal of signal peptide).
  • Mature polypeptide coding sequence means a polynucleotide that encodes a mature polypeptide.
  • the microfluidic device comprises a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303).
  • the microfluidic device also comprises a plurality of liquid inlets and/or liquid inlets.
  • the device comprises an incubation chamber (500).
  • Native means a nucleic acid or polypeptide naturally occurring in a host cell.
  • Nucleic acid encompasses DNA, RNA, heteroduplexes, and synthetic molecules capable of encoding a polypeptide. Nucleic acids may be single stranded or double stranded, and may be chemical modifications. The terms “nucleic acid” and “polynucleotide” are used interchangeably. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present compositions and methods encompass nucleotide sequences that encode a particular amino acid sequence. Unless otherwise indicated, nucleic acid sequences are presented in 5'-to-3' orientation.
  • nucleic acid construct means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which is synthetic, and which comprises one or more control sequences operably linked to the nucleic acid sequence.
  • operably linked means that specified components are in a relationship (including but not limited to juxtaposition) permitting them to function in an intended manner.
  • a regulatory sequence is operably linked to a coding sequence such that expression of the coding sequence is under control of the regulatory sequence.
  • the polynucleotide of interest encodes a protease.
  • Suitable proteases include those of bacterial, fungal, plant, viral or animal origin e.g. microbial or vegetable origin. Microbial origin is preferred. Chemically modified or protein engineered variants are included. It may be an alkaline protease, such as a serine protease or a metalloprotease.
  • a serine protease may for example be of the S1 family, such as trypsin, or the S8 family such as subtilisin.
  • a metalloproteases protease may for example be a thermolysin from e.g. family M4 or other metalloprotease such as those from M5, M7 or M8 families.
  • Serine endopeptidases hydrolyse the substrate N-Succinyl-Ala-Ala-Pro-Phe pnitroanilide.
  • the reaction was performed at room temperature at pH 9.0.
  • the release of pNA results in an increase of absorbance at 405 nm and this increase is proportional to the enzymatic activity measured against a standard.
  • purified means a nucleic acid, polypeptide or cell that is substantially free from other components as determined by analytical techniques well known in the art (e.g., a purified polypeptide or nucleic acid may form a discrete band in an electrophoretic gel, chromatographic eluate, and/or a media subjected to density gradient centrifugation).
  • a purified nucleic acid or polypeptide is at least about 50% pure, usually at least about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, about 99.5%, about 99.6%, about 99.7%, about 99.8% or more pure (e.g., percent by weight or on a molar basis).
  • a composition is enriched for a molecule when there is a substantial increase in the concentration of the molecule after application of a purification or enrichment technique.
  • the term "enriched" refers to a compound, polypeptide, cell, nucleic acid, amino acid, or other specified material or component that is present in a composition at a relative or absolute concentration that is higher than a starting composition.
  • the term “purified” as used herein refers to the polypeptide or cell being essentially free from components (especially insoluble components) from the production organism. In other aspects, the term “purified” refers to the polypeptide being essentially free of insoluble components (especially insoluble components) from the native organism from which it is obtained. In one aspect, the polypeptide is separated from some of the soluble components of the organism and culture medium from which it is recovered. The polypeptide may be purified (/.e., separated) by one or more of the unit operations filtration, precipitation, or chromatography.
  • the polypeptide may be purified such that only minor amounts of other proteins, in particular, other polypeptides, are present.
  • purified as used herein may refer to removal of other components, particularly other proteins and most particularly other enzymes present in the cell of origin of the polypeptide.
  • the polypeptide may be "substantially pure", i.e., free from other components from the organism in which it is produced, e.g., a host organism for recombinantly produced polypeptide.
  • the polypeptide is at least 40% pure by weight of the total polypeptide material present in the preparation.
  • the polypeptide is at least 50%, 60%, 70%, 80% or 90% pure by weight of the total polypeptide material present in the preparation.
  • a "substantially pure polypeptide” may denote a polypeptide preparation that contains at most 10%, preferably at most 8%, more preferably at most 6%, more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1%, and even most preferably at most 0.5% by weight of other polypeptide material with which the polypeptide is natively or recombinantly associated.
  • the substantially pure polypeptide is at least 92% pure, preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 97% pure, more preferably at least 98% pure, even more preferably at least 99% pure, most preferably at least 99.5% pure by weight of the total polypeptide material present in the preparation.
  • the polypeptide of the present invention is preferably in a substantially pure form i.e., the preparation is essentially free of other polypeptide material with which it is natively or recombinantly associated). This can be accomplished, for example by preparing the polypeptide by well-known recombinant methods or by classical purification methods.
  • Recombinant is used in its conventional meaning to refer to the manipulation, e.g., cutting and rejoining, of nucleic acid sequences to form constellations different from those found in nature.
  • the term recombinant refers to a cell, nucleic acid, polypeptide or vector that has been modified from its native state.
  • recombinant cells express genes that are not found within the native (non-recombinant) form of the cell, or express native genes at different levels or under different conditions than found in nature.
  • the term “recombinant” is synonymous with “genetically modified” and “transgenic”.
  • Recover means the removal of a polypeptide from at least one fermentation broth component selected from the list of a cell, a nucleic acid, or other specified material, e.g., recovery of the polypeptide from the whole fermentation broth, or from the cell-free fermentation broth, by polypeptide crystal harvest, by filtration, e.g.
  • Score In the context of the invention a score is calculated for each of the one or more polynucleotide of interest.
  • the score is the sum of products of the normalized relative abundances in each output channel multiplied with the sorting threshold score for the corresponding output channel.
  • the score is calculated as described in Example 3.
  • Screenable product means a molecule which is detectable by the sensing means (600).
  • the screenable product includes but is not limited to fluorescent molecules (e.g., green fluorescent protein (GFP), mCherry, mVenus, DsRed, EGFP, nile red (9-(diethylamino)benzo[a]phenoxazin-5-one), a fluorescent vitamine, DAPI (4’,6- diamidino-2-phenylindole), and BIODIPY), and fluorogenic molecules, e.g. fluorgenic Rhodamine.
  • GFP green fluorescent protein
  • mCherry mCherry
  • mVenus mVenus
  • DsRed EGFP
  • nile red (9-(diethylamino)benzo[a]phenoxazin-5-one
  • DAPI 4,6- diamidino-2-phenylindole
  • BIODIPY BIODIPY
  • fluorogenic molecules e.g
  • the screenable product is added to the emulsion, or is generated from a substrate by a process taking place in the droplet, e.g., during incubation.
  • the screenable product is a polypeptide expressed in the droplets.
  • the screenable product is a host cell in the droplets.
  • the screenable product comprises an absorbing molecule.
  • the absorbing molecule comprises para-nitro-anilin (PNA).
  • the amount of the screenable product in the droplet may be inversely proportional to the amount of a polypeptide of interest expressed in the droplet, and/or by the host cells, e.g., when the polypeptide of interest binds or degrades the screenable product.
  • the amount of the screenable product in the droplet may be proportional to the amount of a polypeptide of interest expressed in the droplet, and/or by the host cells, e.g., when the polypeptide of interest degrades a substrate, which results in formation of the screenable product, or when the screenable product incorporates into the host cells or parts thereof (e.g., host cell membrane, or host cell wall), for example Nile Red.
  • the screenable product can thus, for example, be used as a proxy for one or more of the features selected from the list of cell growth, cell division, polypeptide of interest expression, polypeptide of interest binding, polypeptide of interest stability, and polypeptide of interest activity.
  • more than one screenable product is present in the droplets, e.g., to determine two or more different features selected from the aforementioned features.
  • Sequence identity The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter “sequence identity”.
  • the sequence identity between two amino acid sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 6.6.0 or later.
  • the parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix.
  • the Needle program In order for the Needle program to report the longest identity, the -nobrief option must be specified in the command line.
  • the output of Needle labeled “longest identity” is calculated as follows:
  • the sequence identity between two polynucleotide sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, supra), preferably version 6.6.0 or later.
  • the parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NLIC4.4) substitution matrix.
  • the nobrief option must be specified in the command line.
  • the output of Needle labeled “longest identity” is calculated as follows:
  • Signal Peptide A "signal peptide” is a sequence of amino acids attached to the N- terminal portion of a protein, which facilitates the secretion of the protein outside the cell.
  • the mature form of an extracellular protein lacks the signal peptide, which is cleaved off during the secretion process.
  • Subsequence means a polynucleotide having one or more nucleotides absent from the 5' and/or 3' end of a mature polypeptide coding sequence; wherein the subsequence encodes a fragment having enzyme activity.
  • variant means a polypeptide having enzyme activity comprising a man-made mutation, i.e., a substitution, insertion (including extension), and/or deletion (e.g., truncation), at one or more positions.
  • a substitution means replacement of the amino acid occupying a position with a different amino acid;
  • a deletion means removal of the amino acid occupying a position; and
  • an insertion means adding 1-5 amino acids (e.g., 1-3 amino acids, in particular, 1 amino acid) adjacent to and immediately following the amino acid occupying a position.
  • Wild-type in reference to an amino acid sequence or nucleic acid sequence means that the amino acid sequence or nucleic acid sequence is a native or naturally- occurring sequence.
  • naturally-occurring refers to anything (e.g., proteins, amino acids, or nucleic acid sequences) that is found in nature.
  • non-naturally occurring refers to anything that is not found in nature (e.g., recombinant nucleic acids and protein sequences produced in the laboratory or modification of the wild-type sequence).
  • the invention relates to a method for screening a biological-library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest, and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device , d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the steps of: a
  • the emulsion of droplets comprises one or more host cells.
  • each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
  • each droplet comprises at most one host cell, or a plurality of host cells derived from the same parent host cell.
  • each droplet comprises at most one polynucleotide of interest.
  • the screenable product is produced by the host cells.
  • the screenable product is catalyzed by an enzyme, preferably the enzyme is encoded by the polynucleotide of interest.
  • the screenable product is encoded by the one or more polynucleotide of interest.
  • the screenable product is produced by a polypeptide expressed by the host cells.
  • the screenable product is produced by a polypeptide encoded by the one or more polynucleotide of interest.
  • the screenable product is a polypeptide expressed by the host cells.
  • the screenable product is an enzyme
  • the enzyme is expressed by the host cells.
  • the enzyme is selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, betagalactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, prote
  • the screenable product is degraded by the host cells.
  • the screenable product is degraded by the polypeptide encoded by the one or more polynucleotide of interest.
  • the screenable product is degraded by a polypeptide expressed by the host cells.
  • the screenable product is an enzyme substrate, preferably for an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alphaglucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phyta
  • the amount of screenable product is inversely proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
  • the amount of screenable product is proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
  • the screenable product comprises or consists of one or more host cells.
  • the screenable product comprises or consists of substantially all the host cells in a droplet.
  • the score is proportional, e.g., normalized, to the number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
  • the score is the total number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
  • the score is proportional, e.g., normalized, to the number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
  • the microfluidic device comprises an incubation zone (500).
  • the incubation does not take place in the microfluidic chip.
  • the incubation takes place on and/or in the microfluidic device.
  • the droplet sorter comprises one or more sensing means (600), preferably located downstream of the incubation zone (500), and/or upstream of the sorting means (401 , 402).
  • the one or more sensing means (600) comprises a fluorescence sensor.
  • the one or more sensing means (600) comprises an absorption sensor.
  • the one or more sensing means (600) comprises an image sensor, e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
  • an image sensor e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
  • the one or more sensing means (600) comprises a NEMS (nanoelectromechanical system) sensor.
  • the one or more sensing means (600) comprises a mass analyzer suitable for mass spectrometry, e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
  • a mass analyzer suitable for mass spectrometry e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
  • step e) comprises DNA amplification of the one or more polynucleotide of interest within each output channel.
  • the DNA amplification is a PCR method.
  • the one or more polynucleotide of interest is identified by a DNA barcode.
  • the droplet sorter (200) comprises one or more sorting means (401 , 402).
  • the one or more sorting means comprises at least two electrodes.
  • the one or more sorting means consists of one electrode.
  • the one or more sorting means consists of two electrodes.
  • the biological library comprises or consists of wild-type cells with different genotype and/or different phenotype.
  • the biological library comprises or consists of recombinant cells.
  • the biological library encodes different variants of the same polypeptide of interest, preferably the polypeptide of interest is an enzyme.
  • the biological library encodes different signal peptide variants.
  • the biological library encodes different promoter variants.
  • the biological library comprises different codon-optimized DNA sequences encoding the same amino acid sequence of a polypeptide of interest, e.g., a signal peptide, and/or an enzyme.
  • the polynucleotide of interest encodes a polypeptide of interest.
  • the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
  • the polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
  • the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a control sequence.
  • control sequence is a promoter sequence, a signal peptide, a leader sequence, a polyadenylation sequence, a propeptide sequence, or a transcription terminator.
  • the polynucleotide of interest comprises a first polynucleotide of interest encoding a signal peptide, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
  • the polynucleotide of interest comprises a first polynucleotide of interest comprising a promoter sequence, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
  • the biological library comprises identical second polynucleotides of interest, and a plurality of variants of the first polynucleotides of interest.
  • the biological library comprises identical first polynucleotides of interest, and a plurality of variants of the second polynucleotides of interest.
  • the first polynucleotide of interest is heterologous to the second polynucleotide of interest.
  • the first polynucleotide of interest is endogenous to the second polynucleotide of interest.
  • the one or more polynucleotide of interest comprises a promoter, a polynucleotide encoding a signal peptide, a polynucleotide encoding a polypeptide of interest, or a native host cell gene.
  • the polynucleotide of interest is substantially the whole genome of the host cell.
  • the one or more polynucleotide of interest is heterologous to the host cell.
  • the one or more polynucleotide of interest is endogenous to the host cell.
  • the first polynucleotide of interest is heterologous to the host cell.
  • the first polynucleotide of interest is endogenous to the host cell.
  • the second polynucleotide of interest is heterologous to the host cell.
  • the second polynucleotide of interest is endogenous to the host cell.
  • the first and second polynucleotide of interest are heterologous to the host cell.
  • the first and second polynucleotide of interest are endogenous to the host cell.
  • the one or more polynucleotide of interest encodes a polypeptide of interest.
  • the polypeptide of interest is an enzyme, a nanobody, an antibody, an antibody-fragment, a fluorescent polypeptide, e.g., GFP, or an alpha-lactalbumin.
  • the amount of screenable product in the droplet is proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
  • the amount of screenable product in the droplet is inversely proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
  • the biological library comprises at least 100 different one or more polynucleotides of interest, at least 200 different one or more polynucleotides of interest, at least 500 different one or more polynucleotides of interest, at least 1 000 different one or more polynucleotides of interest, at least 2 000 different one or more polynucleotides of interest, at least 3 000 different one or more polynucleotides of interest, at least 5 000 different one or more polynucleotides of interest, at least 10 000 different one or more polynucleotides of interest, at least 100 000 different one or more polynucleotides of interest, at least 1 000 000 different one or more polynucleotides of interest, at least 10 000 000 different one or more polynucleotides of interest, at least 50 000 000 different one or more polynucleotides of interest, or at least 100 000 000 different polynucleotides of interest.
  • the biological library comprises at least 100 different host cells, at least 200 different host cells, at least 500 different host cells, at least 1 000 different host cells, at least 2 000 different host cells, at least 3 000 different host cells, at least 5 000 different host cells est, at least 10 000 different host cells, at least 100 000 different host cells, at least 200 000 different host cells, at least 500 000 different host cells, at least 1 000 000 different host cells, at least 5 000 000 different host cells, at least 10 000 000 different host cells, or at least 100 000 000 different host cells.
  • the amount of screenable product in the droplet is proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
  • the amount of screenable product in the droplet is inversely proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
  • the amount of screenable product in the droplet is proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
  • the amount of screenable product in the droplet is inversely proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
  • the one or more droplets comprise a substrate.
  • the substrate comprises or consists of the screenable product.
  • the substrate is a fluorescent substrate.
  • the substrate is a fluorogenic Rhodamine.
  • the substrate is a fluorochrome.
  • the substrate is a fluorogenic substrate.
  • the substrate comprises a fluorophore, e.g., fluorescein, or fluorescein- labelled starch. In one embodiment, the substrate is Nile red.
  • the substrate is DAPI (4’,6-diamidino-2-phenylindole).
  • each droplet before the optional incubation, comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells, at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells.
  • each droplet comprises an average occupation of at most 0.01 polynucleotide of interest, at most 0.02 polynucleotide of interest, at most 0.03 polynucleotide of interest, at most 0.04 polynucleotide of interest, at most 0.05 polynucleotide of interest, at most 0.06 polynucleotide of interest, at most 0.07 polynucleotide of interest, at most 0.08 polynucleotide of interest, at most 0.09 polynucleotide of interest, at most 0.1 polynucleotide of interest, at most 0.2 polynucleotide of interest, at most 0.3 polynucleotide of interest, at most 0.4 polynucleotide of interest, at most 0.5 polynucleotide of interest, at most 0.6 polynucleotide of interest, or at most 0.7 polynucleotide of interest; preferably at most 0.1 polynucleotide of interest.
  • the droplet sorting is facilitated by an acoustic wave generated by one or more acoustic wave generators (401 , 402) adjacent to the droplet sorter.
  • the droplet sorting is facilitated by a local pressure change generated by one or more pressure-controlled outlets (401 , 402) adjacent to the droplet sorter, e.g., wherein the one or more pressure-controlled outlets are comprised in one or more output channel.
  • the amount of screenable product in step c) is determined using a fluorescence-based signal, absorbance, Raman spectroscopy, mass spectrometry (MS), or MALDI-MS.
  • a relative and/or an absolute amount of the screenable product per droplet is determined by the one or more sensing means (600).
  • one or more output channels comprise at least 10 000 droplets, at least 50 000 droplets, at least 100 000 droplets, at least 500 000 droplets, at least 1 000 000 droplets, at least 2 000 000 droplets, at least 5 000 000 droplets, at least 10 000 000 droplets, or at least 100 000 000 droplets.
  • the droplet sorter comprises at least four output channels, at least five output channels, at least six output channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels.
  • the host cell is is a yeast host cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
  • the host cell is a filamentous fungal host cell, e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus
  • the host cell is a prokaryotic host cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gramnegative bacteria selected from the group consisting of Campylobacter, E.
  • a prokaryotic host cell e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gramnegative bacteria selected from the group consisting of Campylobacter, E.
  • coli Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp.
  • Bacillus alkalophilus Bacillus amyloliquefaciens
  • Bacillus brevis Bacillus circulans, Bac
  • the host cell is Bacillus subtilis.
  • the host cell is Bacillus licheniformis.
  • the host cell is Trichoderma reesei.
  • the host cell is Aspergillus niger.
  • the host cell is Aspergillus oryzae.
  • the host cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis. It is envisioned that the method of the present invention may employ a library of isolated polynucleotides and in vitro expression systems, as well as recombinant cells, and/or wildytpe cells. However, a preferred embodiment of the invention is a setup comprising host cells, i.e., wherein the library is comprised within a host cell.
  • each individual polynucleotide of the library is in its own separate host cell.
  • each droplet in step (b) of the first aspect comprises at most a single host cell, which optionally can be incubated to grow into a plurality of cells before determining the amount of screenable product in step c).
  • the substrate for the one or more enzyme is fluorogenic and the activity of the enzyme converts the fluorogenic substrate into a fluorescent product (screenable product).
  • the polynucleotide library member inside the droplet needs to be identified.
  • the polynucleotide is identified through DNA sequencing.
  • the polynucleotide may also have been outfitted with an identifying sequence tag to serve as a "bar code" when the library was constructed, thus obviating the need for sequencing. Based on the identification of the bar-code, the DNA sequence of the polynucleotide would then immediately be known and it would, thus, be identified.
  • the one or more polynucleotide of interest is identified in step e) by DNA sequencing of the one or more polynucleotide of interest.
  • the aliquotes are usually much smaller in volume than the droplets, but they may in principle range in size up to the same volume as the droplets or even larger. In the examples below, the aliquotes are significantly smaller than the droplets.
  • microfluidic devices that enable the application of an electric field to merge or coalesce two or more droplets are disclosed, for example, in WO 2007/061448.
  • Another way to introduce small aliquotes of an aqueous liquid into an aqueous droplet in a microfluidic device is known as "pico-injection" and is disclosed, for example, in WO 2010/151776.
  • the aliquotes were introduced into the droplets by merging or coalescing the aliquotes and the droplets through the application of an electric field.
  • the aliquotes are introduced into the droplets by merging or coalescing the aliquotes and the droplets through the application of an electric field or by injection.
  • Figure 1 shows one embodiment of the invention, wherein the device comprises a droplet sorter (200) with five output channels (301 , 302, 303, 304, 305), and with an incubation zone (500).
  • the device furthermore comprises sensing means (600) and two electrodes (401 , 402).
  • Droplets comprising host cells and screenable product are shown in circles. Schematically, the amount of screenable product present in each droplet is represented by a black filling. Schematically, the amount of black color is proportional to the amount of screenable product present in each droplet.
  • Flow directed from the incubation zone (500) to the output channels (301 , 302, 303, 304, 305) allows droplets to pass the sensing means (600) which determines the amount of screenable product in each droplet (step c)).
  • the sensing means (600) communicates the amount of screenable product to the electrodes (401 , 402). Based on the information about the amount of screenable product in each droplet, the electrodes apply an electric field which allows sorting of the droplet into one of the five output channels (step d)).
  • droplets with high amount of screenable product are sorted into the top output channel (305) and collected in pool 5, while droplets with no/low amount of screenable product are sorted into the lowest output channel (301) and collected in pool 1.
  • droplets with intermediate amounts of screenable product are sorted into the remaining three output channels (302, 303, and 304) and collected in pools 2-4.
  • the design with three or more output channels allows parallel sorting into multiple output channels, using multiple predetermined threshold values, wherein no sample volume is lost.
  • the methods of the present invention utilize biological libraries of variants (amino acid sequences, DNA sequences and/or host cell variants), but also enable the generation of synthetic variant sequences based on the read-out of the method.
  • synthetic sequence variants are generated by substitution, deletion or addition of one or several amino acids (for polypeptide variants) or one or several nucleotides (for DNA sequence variants).
  • the polypeptide variant is derived from a mature polypeptide by substitution, deletion or addition of one or several amino acids.
  • the number of amino acid substitutions, deletions and/or insertions introduced into the polypeptide is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15.
  • amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding and/or activity of the protein; small deletions, typically of 1-30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a poly-histidine tract, an antigenic epitope or a binding module.
  • Essential amino acids in a polypeptide can be identified according to procedures known in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and Wells, 1989, Science 244: 1081-1085). In the latter technique, single alanine mutations are introduced at every residue in the molecule, and the resultant molecules are tested for enzyme activity to identify amino acid residues that are critical to the activity of the molecule. See also, Hilton et al., 1996, J. Biol. Chem. 271 : 4699-4708.
  • the active site of the enzyme or other biological interaction can also be determined by physical analysis of structure, as determined by such techniques as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in conjunction with mutation of putative contact site amino acids. See, for example, de Vos et al., 1992, Science 255: 306-312; Smith et al., 1992, J. Mol. Biol. 224: 899-904; Wlodaver et al., 1992, FEBS Lett. 309: 59-64.
  • the identity of essential amino acids can also be inferred from an alignment with a related polypeptide, and/or be inferred from sequence homology and conserved catalytic machinery with a related polypeptide or within a polypeptide or protein family with polypeptides/proteins descending from a common ancestor, typically having similar three- dimensional structures, functions, and significant sequence similarity.
  • protein structure prediction tools can be used for protein structure modelling to identify essential amino acids and/or active sites of polypeptides. See, for example, Jumper et al., 2021 , “Highly accurate protein structure prediction with AlphaFold”, Nature 596: 583-589.
  • Single or multiple amino acid substitutions, deletions, and/or insertions can be made and tested using known methods of mutagenesis, recombination, and/or shuffling, followed by a relevant screening procedure, such as those disclosed by Reidhaar-Olson and Sauer, 1988, Science 241 : 53-57; Bowie and Sauer, 1989, Proc. Natl. Acad. Sci. USA 86: 2152-2156; WO 95/17413; or WO 95/22625.
  • DNA-variant sequences are derived by substitution, deletion or addition of one or several nucleic acids.
  • the polynucleotide may also be mutated by introduction of nucleotide substitutions that do not result in a change in the amino acid sequence of the polypeptide, but which correspond to the codon usage of the host organism intended for production of the enzyme, or by introduction of nucleotide substitutions that may give rise to a different amino acid sequence.
  • nucleotide substitutions see, e.g., Ford et al., 1991 , Protein Expression and Purification 2: 95-107.
  • DNA sequences for the library design may be obtained from microorganisms of any genus.
  • polypeptide sequences comprising e.g., an enzyme, a signal peptide, or a nanobody may be obtained from microorganisms of any genus.
  • the term “obtained from” as used herein in connection with a given source shall mean that the polypeptide encoded by a polynucleotide is produced by the source or by a strain in which the polynucleotide of the invention has been inserted.
  • the polypeptide obtained from a given source is secreted extracellularly.
  • the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art will readily recognize the identity of appropriate equivalents.
  • the polypeptides may be identified and obtained from other sources including microorganisms isolated from nature (e.g., soil, composts, water, etc.) or DNA samples obtained directly from natural materials (e.g., soil, composts, water, etc.) using the above-mentioned probes. Techniques for isolating microorganisms and DNA directly from natural habitats are well known in the art. A polynucleotide encoding the polypeptide may then be obtained by similarly screening a genomic DNA or cDNA library of another microorganism or mixed DNA sample.
  • the polynucleotide can be isolated or cloned by utilizing techniques that are known to those of ordinary skill in the art (see, e.g., Davis et al., 2012, Basic Methods in Molecular Biology, Elsevier). Screening a biological library comprising control sequences
  • the present invention also relates to screening a biological library, wherein the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest comprising a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
  • the biological library comprises a plurality of variants of the control sequence.
  • the second polynucleotide of interest is operably linked to one or more control sequences (first polynucleotide of interest) that direct the expression of the second polynucleotide of interest in a suitable host cell under conditions compatible with the control sequences.
  • control sequence may be manipulated in a variety of ways to provide for expression of the polypeptide of interest, and/or to create a control sequence library. Manipulation of the control sequence prior to its insertion into a vector may be desirable or necessary depending on the expression vector. Techniques for modifying the control sequences utilizing recombinant DNA methods are well known in the art.
  • the control sequence may be a promoter, a polynucleotide that is recognized by a host cell for expression of a polynucleotide encoding a polypeptide of the present invention.
  • the promoter contains transcriptional control sequences that mediate the expression of the polypeptide.
  • the promoter may be any polynucleotide that shows transcriptional activity in the host cell including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
  • Suitable promoters for directing transcription of the polynucleotide of the present invention in a bacterial host cell are described in Sambrook et al. , 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Lab., NY, Davis et al., 2012, supra, and Song et al., 2016, PLOS One 11(7): e0158447.
  • the control sequence may also be a transcription terminator, which is recognized by a host cell to terminate transcription.
  • the terminator is operably linked to the 3’-terminus of the polynucleotide encoding the polypeptide. Any terminator that is functional in the host cell may be used in the present invention.
  • Preferred terminators for bacterial host cells may be obtained from the genes for Bacillus clausii alkaline protease (aprH), Bacillus licheniformis alpha-amylase (amyL), and Escherichia coli ribosomal RNA (rrnB).
  • aprH Bacillus clausii alkaline protease
  • AmyL Bacillus licheniformis alpha-amylase
  • rrnB Escherichia coli ribosomal RNA
  • Preferred terminators for filamentous fungal host cells may be obtained from Aspergillus or Trichoderma species, such as obtained from the genes for Aspergillus niger glucoamylase, Trichoderma reesei beta-glucosidase, Trichoderma reesei cellobiohydrolase I, and Trichoderma reesei endoglucanase I, such as the terminators described in Mukherjee et al., 2013, “Trichoderma: Biology and Applications”, and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
  • Preferred terminators for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C (CYC1), and Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase.
  • Other useful terminators for yeast host cells are described by Romanos et al., 1992, Yeast 8: 423-488.
  • control sequence may also be an mRNA stabilizer region downstream of a promoter and upstream of the coding sequence of a gene which increases expression of the gene.
  • mRNA stabilizer regions are obtained from a Bacillus thuringiensis crylllA gene (WO 94/25612) and a Bacillus subtilis SP82 gene (Hue etal., 1995, J. Bacteriol. 177: 3465-3471).
  • mRNA stabilizer regions for fungal cells are described in Geisberg et al., 2014, Cell 156(4): 812-824, and in Morozov et al., 2006, Eukaryotic Ce// 5(11): 1838-1846.
  • the control sequence may also be a leader, a non-translated region of an mRNA that is important for translation by the host cell.
  • the leader is operably linked to the 5’-terminus of the polynucleotide encoding the polypeptide. Any leader that is functional in the host cell may be used. Suitable leaders for bacterial host cells are described by Hambraeus et al., 2000, Microbiology 146(12): 3051-3059, and by Kaberdin and Blasi, 2006, FEMS Microbiol. Rev. 30(6): 967-979.
  • Preferred leaders for filamentous fungal host cells may be obtained from the genes for Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase.
  • Suitable leaders for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP).
  • ENO-1 Saccharomyces cerevisiae enolase
  • Saccharomyces cerevisiae 3-phosphoglycerate kinase Saccharomyces cerevisiae alpha-factor
  • Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase ADH2/GAP
  • the control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3’-terminus of the polynucleotide which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence that is functional in the host cell may be used.
  • Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes for Aspergillus nidulans anthranilate synthase, Aspergillus niger glucoamylase, Aspergillus niger alpha-glucosidase, Aspergillus oryzae TAKA amylase, and Fusarium oxysporum trypsin-like protease.
  • control sequence may also be a signal peptide coding region that encodes a signal peptide linked to the N-terminus of a polypeptide and directs the polypeptide into the cell’s secretory pathway.
  • the 5’-end of the coding sequence of the polynucleotide may inherently contain a signal peptide coding sequence naturally linked in translation reading frame with the segment of the coding sequence that encodes the polypeptide.
  • the 5’-end of the coding sequence may contain a signal peptide coding sequence that is heterologous to the coding sequence.
  • a heterologous signal peptide coding sequence may be required where the coding sequence does not naturally contain a signal peptide coding sequence.
  • a heterologous signal peptide coding sequence may simply replace the natural signal peptide coding sequence to enhance secretion of the polypeptide. Any signal peptide coding sequence that directs the expressed polypeptide into the secretory pathway of a host cell may be used.
  • Effective signal peptide coding sequences for bacterial host cells are the signal peptide coding sequences obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis beta-lactamase, Bacillus stearothermophilus alphaamylase, Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM), and Bacillus subtilis prsA. Further signal peptides are described by Freudl, 2018, Microbial Cell Factories 17: 52.
  • Effective signal peptide coding sequences for filamentous fungal host cells are the signal peptide coding sequences obtained from the genes for Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Aspergillus oryzae TAKA amylase, Humicola insolens cellulase, Humicola insolens endoglucanase V, Humicola lanuginosa lipase, and Rhizomucor miehei aspartic proteinase, such as the signal peptide described by Xu etal., 2018, Biotechnology Letters 40: 949-955
  • Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide coding sequences are described by Romanos et al., 1992, supra.
  • the control sequence may also be a propeptide coding sequence that encodes a propeptide positioned at the N-terminus of a polypeptide.
  • the resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases).
  • a propolypeptide is generally inactive and can be converted to an active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide.
  • the propeptide coding sequence may be obtained from the genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), Myceliophthora thermophila laccase (WO 95/33836), Rhizomucor miehei aspartic proteinase, and Saccharomyces cerevisiae alpha-factor.
  • the propeptide sequence is positioned next to the N-terminus of a polypeptide and the signal peptide sequence is positioned next to the N-terminus of the propeptide sequence.
  • the polypeptide may comprise only a part of the signal peptide sequence and/or only a part of the propeptide sequence.
  • the final or isolated polypeptide may comprise a mixture of mature polypeptides and polypeptides which comprise, either partly or in full length, a propeptide sequence and/or a signal peptide sequence.
  • regulatory sequences that regulate expression of the polypeptide relative to the growth of the host cell.
  • regulatory sequences are those that cause expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound.
  • Regulatory sequences in prokaryotic systems include the lac, tac, and trp operator systems.
  • yeast the ADH2 system or GAL1 system may be used.
  • filamentous fungi the Aspergillus n/gerglucoamylase promoter, Aspergillus oryzae TAKA alpha-amylase promoter, and Aspergillus oryzae glucoamylase promoter, Trichoderma reesei cellobiohydrolase I promoter, and Trichoderma reesei cellobiohydrolase II promoter may be used.
  • Other examples of regulatory sequences are those that allow for gene amplification. In fungal systems, these regulatory sequences include the dihydrofolate reductase gene that is amplified in the presence of methotrexate, and the metallothionein genes that are amplified with heavy metals.
  • the control sequence may also be a transcription factor, a polynucleotide encoding a polynucleotide-specific DNA-binding polypeptide that controls the rate of the transcription of genetic information from DNA to mRNA by binding to a specific polynucleotide sequence.
  • the transcription factor may function alone and/or together with one or more other polypeptides or transcription factors in a complex by promoting or blocking the recruitment of RNA polymerase.
  • Transcription factors are characterized by comprising at least one DNA-binding domain which often attaches to a specific DNA sequence adjacent to the genetic elements which are regulated by the transcription factor.
  • the transcription factor may regulate the expression of a protein of interest either directly, i.e., by activating the transcription of the gene encoding the protein of interest by binding to its promoter, or indirectly, i.e., by activating the transcription of a further transcription factor which regulates the transcription of the gene encoding the protein of interest, such as by binding to the promoter of the further transcription factor.
  • Suitable transcription factors for fungal host cells are described in WO 2017/144177.
  • Suitable transcription factors for prokaryotic host cells are described in Seshasayee et al., 2011 , Subcellular Biochemistry 52: 7- 23, as well in Balleza et al., 2009, FEMS Microbiol. Rev. 33(1): 133-151.
  • the method of the present invention also utilizes recombinant expression vectors comprising a polynucleotide of interest.
  • the various nucleotide and control sequences may be joined together to produce a recombinant expression vector that may include one or more convenient restriction sites to allow for insertion or substitution of the polynucleotide of interest at such sites.
  • the polynucleotide may be expressed by inserting the polynucleotide or a nucleic acid construct comprising the polynucleotide into an appropriate vector for expression.
  • the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.
  • the recombinant expression vector may be any vector (e.g., a plasmid or virus) that can be conveniently subjected to recombinant DNA procedures and can bring about expression of the polynucleotide.
  • the choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced.
  • the vector may be a linear or closed circular plasmid.
  • the vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome.
  • the vector may contain any means for assuring self-replication.
  • the vector may be one that, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated.
  • a single vector or plasmid or two or more vectors or plasmids that together contain the total DNA to be introduced into the genome of the host cell, or a transposon may be used.
  • the vector preferably contains one or more selectable markers that permit easy selection of transformed, transfected, transduced, or the like cells.
  • a selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
  • the vector preferably contains at least one element that permits integration of the vector into the host cell's genome or autonomous replication of the vector in the cell independent of the genome.
  • the vector may rely on the polynucleotide’s sequence encoding the polypeptide or any other element of the vector for integration into the genome by homologous recombination, such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
  • homologous recombination such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ non-homologous end-joining
  • the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question.
  • the origin of replication may be any plasmid replicator mediating autonomous replication that functions in a cell.
  • the term “origin of replication” or “plasmid replicator” means a polynucleotide that enables a plasmid or vector to replicate in vivo.
  • More than one copy of a polynucleotide of interest may be inserted into a host cell to increase production of a polypeptide. For example, 2 or 3 or 4 or 5 or more copies are inserted into a host cell.
  • An increase in the copy number of the polynucleotide can be obtained by integrating at least one additional copy of the sequence into the host cell genome or by including an amplifiable selectable marker gene with the polynucleotide where cells containing amplified copies of the selectable marker gene, and thereby additional copies of the polynucleotide, can be selected for by cultivating the cells in the presence of the appropriate selectable agent.
  • the invention relates to a host cell comprising in its genome a polynucleotide sequence of interest generated in additional step h), and/or a polynucleotide sequence identified in step e).
  • the present invention also relates to host cells which are not recombinant, i.e. , wild type host cells.
  • host cells include but are not limited to probiotics, e.g. wherein the host cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
  • the present invention also relates to recombinant host cells, comprising a polynucleotide of interest, and/or comprising a polynucleotide operably linked to one or more control sequences that direct the production of a polypeptide of interest.
  • a construct or vector comprising a polynucleotide is introduced into a host cell so that the construct or vector is maintained as a chromosomal integrant or as a self-replicating extra- chromosomal vector as described earlier.
  • the choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source.
  • the polypeptide can be native or heterologous to the recombinant host cell.
  • at least one of the one or more control sequences can be heterologous to the polynucleotide encoding the polypeptide.
  • the recombinant host cell may comprise a single copy, or at least two copies, e.g., three, four, five, or more copies of the polynucleotide of the present invention.
  • the host cell may be any mammalian cell useful in the recombinant production of a polypeptide of interest, e.g., a Chinese hamster ovary cell, a BHK cell, a mouse cell, a HEK cell.
  • the host cell may be any microbial cell useful in the recombinant production of a polypeptide of interest, e.g., a prokaryotic cell or a fungal cell.
  • the prokaryotic host cell may be any Gram-positive or Gram-negative bacterium.
  • Grampositive bacteria include, but are not limited to, Bacillus, Bifidobacteria, e.g. BB-12®, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, and Streptomyces.
  • Gram-negative bacteria include, but are not limited to, Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma.
  • the bacterial host cell may be any Bacillus cell including, but not limited to, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus thuringiensis cells.
  • the Bacillus cell is a Bacillus amyloliquefaciens, Bacillus licheniformis and Bacillus subtilis cell.
  • Bacillus classes/genera/species shall be defined as described in Patel and Gupta, 2020, Int. J. Syst. Evol. Microbiol. 70: 406-438.
  • the bacterial host cell may also be any Streptococcus cell including, but not limited to, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus cells.
  • the bacterial host cell may also be any Streptomyces cell including, but not limited to, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
  • Methods for introducing DNA into prokaryotic host cells are well-known in the art, and any suitable method can be used including but not limited to protoplast transformation, competent cell transformation, electroporation, conjugation, transduction, with DNA introduced as linearized or as circular polynucleotide. Persons skilled in the art will be readily able to identify a suitable method for introducing DNA into a given prokaryotic cell depending, e.g., on the genus. Methods for introducing DNA into prokaryotic host cells are for example described in Heinze et al., 2018, BMC Microbiology 18:56, Burke et al., 2001 , Proc. Natl. Acad. Sci. USA 98: 6289-6294, Choi et al., 2006, J. Microbiol. Methods 64: 391-397, and Donald et al., 2013, J. Bacteriol. 195(11): 2612- 2620.
  • the host cell may be a fungal cell.
  • “Fungi” as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota as well as the Oomycota and all mitosporic fungi (as defined by Hawksworth et al., In, Ainsworth and Bisby’s Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK).
  • Fungal cells may be transformed by a process involving protoplast-mediated transformation, Agrobacterium-mediated transformation, electroporation, biolistic method and shock-wave-mediated transformation as reviewed by Li et al., 2017, Microbial Cell Factories 16: 168 and procedures described in EP 238023, Yelton et al., 1984, Proc. Natl. Acad. Sci. USA 81 : 1470-1474, Christensen et al., 1988, Bio/TechnologyQ: 1419-1422, and Lubertozzi and Keasling, 2009, Biotechn. Advances 27: 53-75.
  • any method known in the art for introducing DNA into a fungal host cell can be used, and the DNA can be introduced as linearized or as circular polynucleotide.
  • the fungal host cell may be a yeast cell.
  • yeast as used herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). For purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (Skinner, Passmore, and Davenport, editors, Soc. App. Bacteriol. Symposium Series No. 9, 1980).
  • the yeast host cell may be a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
  • the yeast host cell is a Pichia or Komagataella cell, e.g., a Pichia pastoris cell (Komagataella phaffii).
  • the fungal host cell may be a filamentous fungal cell.
  • “Filamentous fungi” include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., 1995, supra).
  • the filamentous fungi are generally characterized by a mycelial wall composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative.
  • the filamentous fungal host cell may be an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell.
  • the filamentous fungal host cell is an Aspergillus, Trichoderma or Fusarium cell. In a further preferred embodiment, the filamentous fungal host cell is an Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, or Fusarium venenatum cell.
  • the filamentous fungal host cell may be an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zona
  • the host cell is isolated.
  • the host cell is purified.
  • the present invention also relates to methods of producing a polypeptide of interest, comprising (a) cultivating a cell according to the second aspect, under conditions conducive for production of the polypeptide; and optionally, (b) recovering the polypeptide.
  • the cell is a Bacillus cell.
  • the cell is a Bacillus licheniformis cell.
  • the cell is an Aspergillus cell.
  • the cell is an Aspergillus niger cell.
  • the cell is an Aspergillus oryzae cell.
  • the cell is an Trichoderma reesei cell.
  • the present invention also relates to methods of producing a host cell broth, comprising (a) cultivating a host cell according to the second aspect, under conditions conducive for production of the host cell; and optionally, (b) recovering the host cell.
  • the recovered cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
  • the host cell is cultivated in a nutrient medium suitable for production of the polypeptide using methods known in the art.
  • the cell may be cultivated by shake flask cultivation, or small-scale or large-scale fermentation (including continuous, batch, fed-batch, or solid-state, and/or microcarrier-based fermentations) in laboratory or industrial fermentors in a suitable medium and under conditions allowing the polypeptide to be expressed and/or isolated.
  • suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it can be recovered from cell lysates.
  • the polypeptide may be detected using methods known in the art that are specific for the polypeptide, including, but not limited to, the use of specific antibodies, formation of an enzyme product, disappearance of an enzyme substrate, or an assay determining the relative or specific activity of the polypeptide.
  • the polypeptide may be recovered from the medium using methods known in the art, including, but not limited to, collection, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. In one aspect, a whole fermentation broth comprising the polypeptide is recovered. In another aspect, a cell-free fermentation broth comprising the polypeptide is recovered.
  • the polypeptide may be purified by a variety of procedures known in the art to obtain substantially pure polypeptides and/or polypeptide fragments (see, e.g., Wingfield, 2015, Current Protocols in Protein Science’, 80(1): 6.1.1-6.1.35; Labrou, 2014, Protein Downstream Processing, 1129: 3-10).
  • polypeptide is not recovered.
  • the invention relates to the methods according to the first aspect, additionally comprising step g) training a computational model, e.g., machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
  • a computational model e.g., machine learning algorithm
  • the computational model of step g) is selected from the list of a linear regression, a decision tree, a random forest model, a support vector machine (SVM), a neural network, a K-means clustering, a native Bayes, a Gaussian mixture model (GMM), or a generative model.
  • SVM support vector machine
  • GMM Gaussian mixture model
  • the computational model is performed in an electronic device, for providing a candidate biological sequence, the method comprising:
  • the model is a generative model.
  • the generative model is non-unidirectional.
  • the input biological sequence comprises one or more polynucleotide of interest identified in step e).
  • the input biological sequence is one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
  • the candidate biological sequence is one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
  • the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
  • the candidate biological sequence is a nucleic acid sequence increasing compatibility with a host cell.
  • the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
  • the candidate biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
  • the input biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, and wherein the candidate biological sequence is a nucleic acid sequence encoding a polypeptide of interest.
  • the model is a generative model.
  • the generative model is non-unidirectional.
  • the generative model is one or more of: a generative adversarial network model, a Wasserstein generative adversarial network model, a diffusion model, and a variational autoencoder.
  • applying the generative non-unidirectional model to the input data comprises partitioning the generative non-unidirectional model into a plurality of generators, wherein each generator of the plurality of generators is configured to determine, based on the input data, one or more candidate biological sequences for a subset of nucleotides and/or a subset of amino acids and a predetermined criterion.
  • determining the candidate biological sequence by applying the model to the input data comprises: predicting, using the generator, a compatibility of the candidate biological sequence with the host cell;
  • the predetermined criterion is based on one or more of:
  • the method comprises training the model based on a training set of biological sequences, wherein the training set of biological sequences includes training data indicative of one or more biological sequences related to the host cell.
  • the training set of biological sequences is heterologous to the genus of the host cell, preferably heterologous to one or more species of the host cell.
  • the training data comprises training input data indicative of one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
  • the training data comprises training output data indicative of one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
  • training the model comprises predicting, using a discriminator taking as input the training set of biological sequences, and a training candidate biological sequence, a score indicative of the training candidate biological sequence being a referenced biological sequence
  • the method is comprising obtaining, from a test environment data repository, experimental data associated with the candidate biological sequence and the host cell; wherein the experimental data indicates a yield performance of the candidate biological sequence associated with the host cell.
  • the method is comprising validating the candidate biological sequence based on the experimental data.
  • the method is comprising selecting one or more generators based on the experimental data.
  • the method is comprising adapting the model based on the experimental data.
  • obtaining input data indicative of an input biological sequence comprises obtaining the input data for the input biological sequence from a database and/or a memory of the electronic device.
  • the invention also relates to an electronic device comprising a memory circuitry, a processor circuitry, and an interface, wherein the electronic device is configured to perform any of the methods according to the invention.
  • the invention also relates to computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of the invention.
  • the method is comprising additional step h) generating one or more synthetic polynucleotide of interest based on the output of the computational model.
  • the one or more synthetic polynucleotide generated in step h) comprises or consists of a candidate biological sequence.
  • the one or more synthetic polynucleotide of interest generated in step h) is codon-optimized.
  • the one or more synthetic polynucleotide of interest generated in step h) encodes a polypeptide with increased substrate binding, increased receptor binding, increased substrate specificity, increased specific activity, and/or increased stability.
  • the one or more synthetic polynucleotide of interest generated in step h) results increased expression of a polypeptide of interest.
  • the one or more synthetic polynucleotide of interest generated in step h) comprises a control sequence.
  • the biological sequence pairs can be found in a database, such as a public database and/or a private database (such as National Center for Biotechnology Information NCBI database and/or a Nucleotide Archive e.g. EMBL). It may be envisaged to transfer the extracted learning to the experimental settings.
  • a database such as a public database and/or a private database (such as National Center for Biotechnology Information NCBI database and/or a Nucleotide Archive e.g. EMBL). It may be envisaged to transfer the extracted learning to the experimental settings.
  • the present disclosure allows learning compatibility rules from native sequence pairs provided by a database and providing the learned compatibility rules. It may be envisaged that the compatibility rules are further adapted to experimental settings.
  • the present disclosure allows some interaction between machine learning approaches and experimental approaches.
  • Machine learning-based analysis of biological sequences and experimental screening approaches contribute with two different layers of learnings.
  • the first layer of learnings allows extraction of complex biological rules that need to be obeyed while the second layer of learning accumulates data specific to the experimental settings.
  • the learning extracted is used to provide (using a Deep Learning approach, such a generative model) a relevant subset of candidate biological sequences that can now be feasibly screened using experimental methods.
  • the disclosed technique may lead to unlocking the potential of experimental screening approaches by markedly reducing the complexity of the process of finding satisfactory ‘partner’ biological sequences.
  • the actual quality of a candidate biological sequence is validated by experiments.
  • the learnings can be used as feedback to update the model (thus ‘informing’ the model about the quality of the suggestions).
  • the present disclosure provides a method, performed by an electronic device, for providing a candidate biological sequence.
  • the method can be a computer-implemented method.
  • the method comprises obtaining input data indicative of an input biological sequence, e.g. from a biological library with diverse sequences.
  • the input data can be associated with the input biological sequence and/or be representative of the input biological sequence.
  • the input data comprises data representative of the input biological sequence, such as data representative of one or more properties of the input biological sequence.
  • the one or more properties of the input data include one or more of: a sequence of amino acids, a sequence of nucleic acids, a three-dimensional structure of the input biological polypeptide sequence (e.g. obtained by Alpha-Fold2), a folding of the input biological sequence, and a pairing of nucleic acids.
  • the method comprises determining the candidate biological sequence by applying a model to the input data, e.g. generative model.
  • the candidate biological sequence is determined for compatibility with a host cell, e.g., targeting compatibility with a given host cell, and/or for increasing compatibility with the given host cell.
  • the model applied to the input data aims at increasing one or more expression steps for a polypeptide of interest in a host cell, e.g. increasing or modifying one or more of: transcription, post-transcriptional modification, translation, post-translational modification, folding, secretion, phenotypic trait, and yield for a polypeptide of interest in a host cell.
  • Yield may be intra-cellular and/or extra-cellular. In other words, yield may be seen as a target performance parameter to optimise when determining the candidate biological sequence. It may be noted that yield may be optimized via various steps, such as modified secretion, modified transcription, modified translation, separately or jointly.
  • the model generates, based on the input data, the candidate biological sequence.
  • the candidate biological sequence may be determined based on one or more of: score generated in step f), host cell data, input data, and information indicating the type of biological sequence to be determined as candidate biological sequence.
  • the strain library comprises 102 different signal peptide-encoding polynucleotides which were ordered as synthetic DNA, and fused upstream to a protease-encoding DNA sequence (encoding a serine endopeptidase).
  • the library was transformed into Bacillus licheniformis strain MOL3320 as described in patent US 2019/0185847 A1. Selection was done on ERM. The resulting strains expressed the protease with different signal peptide variants.
  • the generated strains were then fermented compartmentalized in 50 pL droplets made from nutrient controlled media in fluorinated oil (HFE 7500) on a microfluidic droplet production chip.
  • the droplets were stabilized with 2 wt% fluorosurfactant (008-Fluorosurfactant, RanBiotech).
  • the resulting emulsion was incubated in a collection vial at 37°C for 4 days.
  • the serine endopeptidase secreted by the host cells hydrolyses a proprietary fluorogenic rhodamine substrate.
  • fluorescent rhodamine substrates include Rhodamine 110-bis-(succinoyl-L-alanyl-L-alanyl-L- prolyl-L-phenylalanyl amide) (CPC Scientifc Inc., San Jose, CA). The substrate was added to each droplet on a microfluidic chip and after 4 minutes of incubation, the fluorescent assay response was measured which is shown in Fig. 2.
  • the release of Rhodamine 110 resulted in an increase of fluorescence at 520 nm. The increase is proportional to the enzymatic activity measured against a standard.
  • the measured level of fluorescence signal is directly related to the concentration of protease in each droplet. Using the device shown in Fig.
  • each droplet was sorted into one of the five output channels depending on its measured fluorescence level.
  • the five output channels were connected to five collection tubes and after collection of at least 1000 droplets in each tube, we seperated the collection tubes from the microfluidic device.
  • the collected cell pools were named Pool 1 , Pool 2, Pool 3, Pool 4, and Pool 5 (see Fig. 1).
  • the signal peptide sequences upstream of the protease-encoding DNA sequence contained in each pool were amplified via PCR, and were subsequently sent for DNA sequencing.
  • Pool 1 comprised empty droplets (peak at around 2500 RFU) and droplets with no or very weak activity.
  • sorting the library into five pools allows to investigate each signal peptide variant according to the protease activity of the related droplet.
  • each library member is analyzed, providing an analysis of the complete library, without loosing data about one or more library members as each signal peptide coding sequence will be sequenced in the subsequent step after sorting is performed (see example 2).
  • the strains were fermented for app. 120 hours and protease activity was measured at the end of fermentation.
  • the five droplet sorting pools are ordered according to the fluorescence thresholds used for separation (Pool 5 contains droplets measured with the highest fluorescence signals, and Pool 1 with the lowest fluorescence signals).
  • SP sequences with intermediate protease activites were found in Pool 2 (mid-low protease activities) and in Pool 4 (mid-high protease activities). Sorting into more than two pools, e.g., into 5 pools as shown in this example, increased the output resolution and allows to identify not only the very best or worst performers, but also to identify sequences which lay inbetween. With regards to signal peptides, for example, such approach is particularly beneficial when aiming for fine-tuned expression of a polypeptide of interest.
  • the screening method allowed to efficiently screen the complete library whilst sorting the library members into five pools, allowing a detailed analysis and understanding of each library member.
  • This example validates the results of the multi-channel sorted SP library (examples 1 and 2) against the results obtained from cultivating the same SP library in a MTP-format.
  • a score is calculated for a given sequence.
  • the fraction of the corresponding reads in a pool is determined by diving the number of reads of the given sequence by the total number of reads obtained when sequencing the entire pool. This is done for every pool generated in the experiment.
  • the relative proportions of the given sequence in each of the pools is calculated across all pools.
  • the score of the given sequence is calculated by summing up the multiplication products of the relative proportions with the corresponding selection thresholds for each pool.
  • a score was obtained for each signal peptide sequence cultivated in MTP-format, based on the protease activity shown for each sequence.
  • the multi-channel droplet sorting of the invention represents an improved and substantially cheaper screening method, saving both time and sample volume, whilst providing a high resolution output when screening large biological libraries (see Fig. 3).
  • Example 4 Microdroplet method reduces the standard deviation of the assigned scores
  • Example 5 Processing the screening results with a computational model
  • This example validates the results of the multi-channel microdroplet sorted SP library (examples 1 and 2) against the results obtained from cultivating the same SP library in a MTP- format.
  • Fig. 6A on the y-axis indicates the fraction of proline containing sequences.
  • the fraction of signal peptides containing proline depending on the yield of the signal peptide, i.e., circa two-third of the good sequences contain at least one proline, whereas only circa one-third of the bad sequences contain at least one proline.
  • the model concludes that the presence of proline in a signal peptide is a strong indicator for good expression of the investigated POI.
  • Example 6 Increased Training Data Size from Microdroplets improves performance of machine learning model
  • Example 7 Microdroplet method allows identification of superior library members
  • This example compares the results of the multi-channel sorted SP libraries from examples 1 and 2 to the results obtained from screening the same SP library in a MTP-format. In contrast to the previous examples using 5 pools, this example sorted the library into 7 pools. Each library member is given a unique signal peptide identifier. Each library member consists of a different DNA sequence encoding a signal peptide.
  • the SP library was sorted into 7 pools, i.e., pool 1 to pool 7. Droplets with lowest signal were sorted into pool 1 , whereas droplets with highest signal were sorted into pool 7. Pools 2-6 comprised cells with library members that showed signals lower that the threshold for pool 7 and higher than the threshold for pool 1. In other words, signal thresholds increased from pool 1 to pool 7.
  • Table 1 shows the results of 304 library members identified using the microdroplet method. Sequences identified both in MTP screen and in microdroplet screening are marked in gray shade (e.g. SP_GAN_208). For each given sequence, table 1 shows the amount of droplets comprising said sequence in each pool. For example, sequence SP_GAN_205 appeared in 61 droplets of pool 7, and in 2 droplets of pool 1. Additionally, for the sequences identified with the microdroplet method the table shows a score calculated as described in example 3. For sequences also identified in MTP, Table 1 shows a relative activity which was identified during MTP cultivation. Importantly, the sequences in Table 1 are ranked by descending droplet score, i.e., highest droplet scores are on top of Table 1 , wherease lowest droplet scores are in the bottom of Table 1.
  • the method of the invention allows to identify library members which otherwise would have been overseen and/or not found using conventional methods such as MTP. These library members are shown as lines with a clear background in Table 1. Lines with a gray background represent library members that have been identified using MTP.
  • the sequence SP_GAN_208 (marked in grey) was, in terms of scoring, the best performing library member identified in the MTP screen. The same library member was also identified as well performing sequence during the microdroplet screening. However, the microdroplet method of the invention identified 6 additional library members which showed a higher score compared to SP_GAN_208, which were identified as SP_GAN_205, SP_GAN_206, SP_GAN_217, SP_GAN_126, SP_GAN_47 and SP_GAN_232. Thus, the method of the invention is highly beneficial for further improving biotechnological challenges, e.g., by increasing expression of a POI with a new signal peptide sequence.
  • a method for screening a biological library comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest, and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device , d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present
  • each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
  • each droplet comprises at most one host cell, or a plurality of host cells derived from the same parent host cell.
  • each droplet comprises at most one polynucleotide of interest.
  • the enzyme is selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alphaglucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phyta
  • the screenable product is an enzyme substrate, preferably for an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic
  • the screenable product is a fluorescent product.
  • the fluorescent product is converted from a fluorogenic substrate by an enzyme encoded by the polynucleotide of interest.
  • the amount of screenable product is inversely proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
  • the amount of screenable product is proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
  • the screenable product comprises or consists of one or more host cells.
  • the screenable product comprises or consists of substantially all the host cells in a droplet.
  • the screenable product is a product of an enzymatic reaction, preferably of a reaction catalyzed by an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha-glucosi
  • the score is proportional, e.g., normalized, to the number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
  • the method of any one of the preceding paragraphs, wherein the score is the total number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
  • the method of any one of the preceding paragraphs, wherein the score is proportional, e.g., normalized, to the number of identical DNA sequences for a second polynucleotide of interest present in an output channel. 28.
  • the score is the total number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
  • microfluidic device comprises an incubation zone (500).
  • the cells comprised in one droplet are genetically identical, i.e., the cells are derived from one parental host cell, preferably the same parental host cell.
  • the droplet sorter comprises one or more sensing means (600), preferably located downstream of the incubation zone (500), and/or upstream of the sorting means (401 , 402).
  • the one or more sensing means (600) comprises a fluorescence sensor.
  • the one or more sensing means (600) comprises an absorption sensor.
  • the one or more sensing means (600) comprises an image sensor, e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
  • the one or more sensing means (600) comprises a NEMS (nanoelectromechanical system) sensor.
  • the one or more sensing means (600) comprises a mass analyzer suitable for mass spectrometry, e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
  • a mass analyzer suitable for mass spectrometry e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
  • step e) comprises DNA amplification of the one or more polynucleotide of interest within each output channel.
  • step e) comprises DNA sequencing of the one or more polynucleotide of interest, e.g., after PCR amplification, or by nanopore sequencing.
  • step e the one or more polynucleotide of interest is identified by a DNA barcode.
  • step g) training a computational model, e.g., a machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
  • a computational model e.g., a machine learning algorithm
  • step g) is selected from the list of a linear regression, a decision tree, a random forest model, a support vector machine (SVM), a neural network, a K-means clustering, a native Bayes, a Gaussian mixture model (GMM), or a generative model.
  • SVM support vector machine
  • GMM Gaussian mixture model
  • - determining the candidate biological sequence by applying a model, e.g. generative model, to the input data, preferably wherein the generative model is non-unidirectional; and providing biological sequence data indicative of the candidate biological sequence.
  • a model e.g. generative model
  • the input biological sequence is one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
  • the candidate biological sequence is one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
  • the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
  • the candidate biological sequence is a nucleic acid sequence increasing compatibility with a host cell.
  • the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
  • the candidate biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
  • the input biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, and wherein the candidate biological sequence is a nucleic acid sequence encoding a polypeptide of interest.
  • the generative model is one or more of: a generative adversarial network (GAN) model, a Wasserstein generative adversarial network model, a diffusion model, and a variational autoencoder.
  • GAN generative adversarial network
  • applying the generative non-unidirectional model to the input data comprises partitioning the generative nonunidirectional model into a plurality of generators, wherein each generator of the plurality of generators is configured to determine, based on the input data, one or more candidate biological sequences for a subset of nucleotides and/or a subset of amino acids and a predetermined criterion.
  • determining the candidate biological sequence by applying the model to the input data comprises: predicting, using the generator, a compatibility of the candidate biological sequence with the host cell;
  • the method comprising training the model based on a training set of biological sequences, wherein the training set of biological sequences includes training data indicative of one or more biological sequences related to the host cell.
  • the training set of biological sequences is heterologous to the genus of the host cell, preferably heterologous to one or more species of the host cell.
  • the training data comprises training input data indicative of one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
  • the training data comprises training output data indicative of one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
  • training the model comprises predicting, using a discriminator taking as input the training set of biological sequences, and a training candidate biological sequence, a score indicative of the training candidate biological sequence being a referenced biological sequence.
  • the method comprising obtaining, from a test environment data repository, experimental data associated with the candidate biological sequence and the host cell; wherein the experimental data indicates a yield performance of the candidate biological sequence associated with the host cell.
  • obtaining input data indicative of an input biological sequence comprises obtaining the input data for the input biological sequence from a database and/or a memory of the electronic device.
  • An electronic device comprising a memory circuitry, a processor circuitry, and an interface, wherein the electronic device is configured to perform any of the methods according to any one of the preceding paragraphs.
  • a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of any one of the preceding paragraphs.
  • step h comprises or consists of a candidate biological sequence.
  • step h encodes a polypeptide with increased substrate binding, increased receptor binding, increased substrate specificity, increased specific activity, and/or increased stability.
  • step h comprises a control sequence
  • the droplet sorter (200) comprises one or more sorting means (401 , 402).
  • the one or more sorting means comprises, or consists of one or more electrode, one or more acoustic wave generator, one or more valve, and/or one or more pressure-controlled outlets.
  • the one or more sorting means comprises at least two electrodes.
  • the biological library comprises or consists of wild-type cells with different genotype and/or different phenotype.
  • the biological library comprises different codon-optimized DNA sequences encoding the same amino acid sequence of a polypeptide of interest, e.g., a signal peptide, and/or an enzyme.
  • the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
  • polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
  • the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a control sequence.
  • control sequence is a promoter sequence, a signal peptide, a leader sequence, a polyadenylation sequence, a propeptide sequence, or a transcription terminator.
  • the polynucleotide of interest comprises a first polynucleotide of interest encoding a signal peptide, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
  • the polynucleotide of interest comprises a first polynucleotide of interest comprising a promoter sequence, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
  • the biological library comprises identical second polynucleotides of interest, and a plurality of variants of the first polynucleotides of interest.
  • the biological library comprises identical first polynucleotides of interest, and a plurality of variants of the second polynucleotides of interest.
  • the one or more polynucleotide of interest comprises a promoter, a polynucleotide encoding a signal peptide, a polynucleotide encoding a polypeptide of interest, or a native host cell gene.
  • the method of any one of the preceding paragraphs wherein the first and second polynucleotide of interest are endogenous to the host cell. .
  • the one or more polynucleotide of interest encodes a polypeptide of interest.
  • the polypeptide of interest is an enzyme, a nanobody, an antibody, an antibody-fragment, a fluorescent polypeptide, e.g., GFP, or an alpha-lactalbumin. .
  • the amount of screenable product in the droplet is proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
  • the amount of screenable product in the droplet is inversely proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
  • the biological library comprises at least 100 different one or more polynucleotides of interest, at least 200 different one or more polynucleotides of interest, at least 500 different one or more polynucleotides of interest, at least 1 000 different one or more polynucleotides of interest, at least 2000 different one or more polynucleotides of interest, at least 3 000 different one or more polynucleotides of interest, at least 5 000 different one or more polynucleotides of interest, at least 10 000 different one or more polynucleotides of interest, at least 100 000 different one or more polynucleotides of interest, at least 1 000 000 different one or more polynucleotides of interest, at least 10 000 000 different one or more polynucleotides of interest, at least 50000 000 different one or more polynucleotides of interest, or at least 100 000 000 different polynucleotides of interest.
  • the biological library comprises at least 100 different host cells, at least 200 different host cells, at least 500 different host cells, at least 1 000 different host cells, at least 2 000 different host cells, at least 3 000 different host cells, at least 5 000 different host cells est, at least 10 000 different host cells, at least 100 000 different host cells, at least 200 000 different host cells, at least 500 000 different host cells, at least 1 000 000 different host cells, at least 5 000 000 different host cells, at least 10 000 000 different host cells, or at least 100 000 000 different host cells. .
  • the amount of screenable product in the droplet is proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest. 116.
  • the amount of screenable product in the droplet is inversely proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
  • the amount of screenable product in the droplet is proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
  • the amount of screenable product in the droplet is inversely proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
  • the substrate comprises a fluorophore, e.g., fluorescein, or fluorescein-labelled starch.
  • each droplet, before the optional incubation comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells, at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells. .
  • each droplet comprises an average occupation of at most 0.01 polynucleotide of interest, at most 0.02 polynucleotide of interest, at most 0.03 polynucleotide of interest, at most 0.04 polynucleotide of interest, at most 0.05 polynucleotide of interest, at most 0.06 polynucleotide of interest, at most 0.07 polynucleotide of interest, at most 0.08 polynucleotide of interest, at most 0.09 polynucleotide of interest, at most 0.1 polynucleotide of interest, at most 0.2 polynucleotide of interest, at most 0.3 polynucleotide of interest, at most 0.4 polynucleotide of interest, at most 0.5 polynucleotide of interest, at most 0.6 polynucleotide of interest, or at most 0.7 polynucleotide of interest; preferably at most 0.1 polyn
  • the droplet sorting is facilitated by an electric field generated by one or more electrode (401 , 402) adjacent to the droplet sorter. .
  • the droplet sorting is facilitated by an acoustic wave generated by one or more acoustic wave generators (401 , 402) adjacent to the droplet sorter.
  • the droplet sorting is facilitated by a local pressure change generated by one or more pressure-controlled outlets (401 , 402) adjacent to the droplet sorter, e.g., wherein the one or more pressure- controlled outlets are comprised in one or more output channel.
  • step c) the amount of screenable product in step c) is determined using a fluorescence-based signal, absorbance, Raman spectroscopy, mass spectrometry (MS), or MALDI-MS.
  • a relative and/or an absolute amount of the screenable product per droplet is determined by the one or more sensing means (600).
  • one or more output channels comprise at least 10 000 droplets, at least 50 000 droplets, at least 100 000 droplets, at least 500 000 droplets, at least 1 000 000 droplets, at least 2 000 000 droplets, at least 5 000 000 droplets, at least 10 000 000 droplets, or at least 100 000 000 droplets.
  • the droplet sorter comprises at least four output channels, at least five output channels, at least six output channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels.
  • the host cell is is a yeast host cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell. .
  • yeast host cell e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisia
  • the host cell is a filamentous fungal host cell, e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus n
  • the host cell is a prokaryotic host cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram- negative bacteria selected from the group consisting of Campylobacter, E.
  • a prokaryotic host cell e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram- negative bacteria selected from the group consisting of Campylobacter, E.
  • coli Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp.
  • Bacillus alkalophilus Bacillus amyloliquefaciens
  • Bacillus brevis Bacillus circulans, Bac
  • Bacillus licheniformis Bacillus licheniformis.
  • Bifidobacterium e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
  • a host cell comprising in its genome a polynucleotide sequence of interest generated in step h), and/or a polynucleotide sequence identified in step e).
  • the host cell of any one of the preceding paragraphs which comprises at least two copies, e.g., three, four, five, or more copies of the polynucleotide sequence of interest.
  • a method of producing a polypeptide of interest comprising the steps of cultivating the cell according to any one of the preceding paragraphs, under conditions conducive for production of the polypeptide.
  • a nucleic acid construct or expression vector comprising a polynucleotide of interest identified by step e), and/or a polynucleotide sequence generated in step h).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Physics & Mathematics (AREA)
  • Dispersion Chemistry (AREA)
  • Pathology (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Signal Processing (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

La présente invention concerne des procédés de criblage d'une bibliothèque biologique. L'invention concerne également des séquences d'acides nucléiques, des vecteurs et des cellules hôtes qui ont été isolées et/ou générées par les procédés de l'invention.
PCT/EP2024/077083 2023-09-29 2024-09-26 Procédé de criblage basé sur des gouttelettes WO2024240965A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP23200852.4 2023-09-29
EP23200852 2023-09-29

Publications (2)

Publication Number Publication Date
WO2024240965A2 true WO2024240965A2 (fr) 2024-11-28
WO2024240965A3 WO2024240965A3 (fr) 2025-01-09

Family

ID=88237933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/077083 WO2024240965A2 (fr) 2023-09-29 2024-09-26 Procédé de criblage basé sur des gouttelettes

Country Status (1)

Country Link
WO (1) WO2024240965A2 (fr)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0238023A2 (fr) 1986-03-17 1987-09-23 Novo Nordisk A/S Procédé de production de produits protéiniques dans aspergillus oryzae et promoteur à utiliser dans aspergillus
WO1992006204A1 (fr) 1990-09-28 1992-04-16 Ixsys, Inc. Banques de recepteurs heteromeres a expression en surface
US5223409A (en) 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
WO1994025612A2 (fr) 1993-05-05 1994-11-10 Institut Pasteur Sequences de nucleotides pour le controle de l'expression de sequences d'adn dans un hote cellulaire
WO1995017413A1 (fr) 1993-12-21 1995-06-29 Evotec Biosystems Gmbh Procede permettant une conception et une synthese evolutives de polymeres fonctionnels sur la base d'elements et de codes de remodelage
WO1995022625A1 (fr) 1994-02-17 1995-08-24 Affymax Technologies N.V. Mutagenese d'adn par fragmentation aleatoire et reassemblage
WO1995033836A1 (fr) 1994-06-03 1995-12-14 Novo Nordisk Biotech, Inc. Phosphonyldipeptides efficaces dans le traitement de maladies cardiovasculaires
WO2007061448A2 (fr) 2005-05-18 2007-05-31 President And Fellows Of Harvard College Fabrication de passages conducteurs, microcircuits et microstructures dans des reseaux microfluidiques
WO2010151776A2 (fr) 2009-06-26 2010-12-29 President And Fellows Of Harvard College Injection de fluide
WO2017144177A1 (fr) 2016-02-26 2017-08-31 Keskin Hüseyin Simulateur de conduite et/ou de vol
US20190185847A1 (en) 2016-07-06 2019-06-20 Novozymes A/S Improving a Microorganism by CRISPR-Inhibition
WO2024133344A1 (fr) 2022-12-20 2024-06-27 Novozymes A/S Procédé de fourniture d'une séquence biologique candidate et dispositif électronique associé

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016150771A1 (fr) * 2015-03-20 2016-09-29 Novozymes A/S Sélection à base de gouttelettes par injection
PT3289362T (pt) * 2015-04-30 2022-06-21 European Molecular Biology Laboratory Deteção e classificação de gotas microfluídicas
EP4041310A4 (fr) * 2019-10-10 2024-05-15 1859, Inc. Procédés et systèmes de criblage microfluidique

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0238023A2 (fr) 1986-03-17 1987-09-23 Novo Nordisk A/S Procédé de production de produits protéiniques dans aspergillus oryzae et promoteur à utiliser dans aspergillus
US5223409A (en) 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
WO1992006204A1 (fr) 1990-09-28 1992-04-16 Ixsys, Inc. Banques de recepteurs heteromeres a expression en surface
WO1994025612A2 (fr) 1993-05-05 1994-11-10 Institut Pasteur Sequences de nucleotides pour le controle de l'expression de sequences d'adn dans un hote cellulaire
WO1995017413A1 (fr) 1993-12-21 1995-06-29 Evotec Biosystems Gmbh Procede permettant une conception et une synthese evolutives de polymeres fonctionnels sur la base d'elements et de codes de remodelage
WO1995022625A1 (fr) 1994-02-17 1995-08-24 Affymax Technologies N.V. Mutagenese d'adn par fragmentation aleatoire et reassemblage
WO1995033836A1 (fr) 1994-06-03 1995-12-14 Novo Nordisk Biotech, Inc. Phosphonyldipeptides efficaces dans le traitement de maladies cardiovasculaires
WO2007061448A2 (fr) 2005-05-18 2007-05-31 President And Fellows Of Harvard College Fabrication de passages conducteurs, microcircuits et microstructures dans des reseaux microfluidiques
WO2010151776A2 (fr) 2009-06-26 2010-12-29 President And Fellows Of Harvard College Injection de fluide
WO2017144177A1 (fr) 2016-02-26 2017-08-31 Keskin Hüseyin Simulateur de conduite et/ou de vol
US20190185847A1 (en) 2016-07-06 2019-06-20 Novozymes A/S Improving a Microorganism by CRISPR-Inhibition
WO2024133344A1 (fr) 2022-12-20 2024-06-27 Novozymes A/S Procédé de fourniture d'une séquence biologique candidate et dispositif électronique associé

Non-Patent Citations (54)

* Cited by examiner, † Cited by third party
Title
"Biology and Activities of Yeast", 1980, SOC. APP. BACTERIOL. SYMPOSIUM SERIES
BALLEZA ET AL., FEMS MICROBIOL. REV, vol. 33, no. 1, 2009, pages 133 - 151
BOWIESAUER, PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 2152 - 2156
BURKE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 98, 2001, pages 6289 - 6294
CARTER ET AL., PROTEINS: STRUCTURE, FUNCTION, AND GENETICS, vol. 6, 1989, pages 240 - 248
CHOI ET AL., J. MICROBIOL. METHODS, vol. 64, 2006, pages 391 - 397
CHRISTENSEN ET AL., BIO/TECHNOLOGY, vol. 6, 1988, pages 1419 - 1422
COLLINS-RACIE ET AL., BIOTECHNOLOGY, vol. 13, 1995, pages 982 - 987
CONTRERAS ET AL., BIOTECHNOLOGY, vol. 9, 1991, pages 378 - 381
COOPER ET AL., EMBO J., vol. 12, 1993, pages 2575 - 2583
CUNNINGHAMWELLS, SCIENCE, vol. 244, 1989, pages 1081 - 1085
DAVIS ET AL.: "Basic Methods in Molecular Biology", 2012, ELSEVIER
DERBYSHIRE ET AL., GENE, 1986, pages 145
DONALD ET AL., J. BACTERIOL, vol. 195, no. 11, 2013, pages 2612 - 2620
EATON ET AL., BIOCHEMISTRY, vol. 25, 1986, pages 505 - 512
FORD ET AL., PROTEIN EXPRESSION AND PURIFICATION, vol. 2, 1991, pages 95 - 107
FREUDL, MICROBIAL CELL FACTORIES, vol. 17, 2018, pages 52
GEISBERG ET AL., CELL, vol. 156, no. 4, 2014, pages 812 - 824
GUOSHERMAN, MOL. CELLULAR BIOL, vol. 15, 1995, pages 5983 - 5990
HAMBRAEUS ET AL., MICROBIOLOGY, vol. 146, no. 12, 2000, pages 3051 - 3059
HAWKSWORTH ET AL.: "In, Ainsworth and Bisby's Dictionary of The Fungi", 1995, CAB INTERNATIONAL, UNIVERSITY PRESS
HEINZE ET AL., BMC MICROBIOLOGY, vol. 18, 2018, pages 56
HILTON ET AL., J. BIOL. CHEM., vol. 271, 1996, pages 4699 - 4708
HUE ET AL., J. BACTERIOL, vol. 177, 1995, pages 3465 - 3471
JUMPER ET AL.: "Highly accurate protein structure prediction with AlphaFold", NATURE, vol. 596, 2021, pages 583 - 589, XP055888904, DOI: 10.1038/s41586-021-03819-2
KABERDINBLASI, FEMS MICROBIOL. REV, vol. 30, no. 6, 2006, pages 967 - 979
LABROU, PROTEIN DOWNSTREAM PROCESSING, vol. 1129, 2014, pages 3 - 10
LI ET AL., MICROBIAL CELL FACTORIES, vol. 16, 2017, pages 168
LOWMAN ET AL., BIOCHEMISTRY, vol. 30, 1991, pages 10832 - 10837
LUBERTOZZIKEASLING, BIOTECHN. ADVANCES, vol. 27, 2009, pages 53 - 75
MARTIN ET AL., J. IND. MICROBIOL. BIOTECHNOL, vol. 3, 2003, pages 568 - 576
MOROZOV ET AL., EUKARYOTIC CELL, vol. 5, no. 11, pages 1838 - 1846
MUKHERJEE ET AL., TRICHODERMA: BIOLOGY AND APPLICATIONS, 2013
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
NER ET AL., DNA, vol. 7, 1988, pages 127
NESS ET AL., NATURE BIOTECHNOLOGY, vol. 17, 1999, pages 893 - 896
PATELGUPTA, INT. J. SYST. EVOL. MICROBIOL, vol. 70, 2020, pages 406 - 438
RASMUSSEN-WILSON ET AL., APPL. ENVIRON. MICROBIOL, vol. 63, 1997, pages 3488 - 3493
REIDHAAR-OLSONSAUER, SCIENCE, vol. 241, 1988, pages 53 - 57
RICE ET AL.: "Trends Genet", vol. 16, 2000, article "EMBOSS: The European Molecular Biology Open Software Suite", pages: 276 - 277
ROMANOS ET AL., YEAST, vol. 8, 1992, pages 423 - 488
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LAB
SCHMOLLDATTENBÖCK: "Gene Expression Systems in Fungi: Advancements and Applications", FUNGAL BIOLOGY, 2016
SESHASAYEE ET AL., SUBCELLULAR BIOCHEMISTRY, vol. 52, 2011, pages 7 - 23
SMITH ET AL., J. MOL. BIOL., vol. 224, 1992, pages 899 - 904
SMOLKE ET AL., SYNTHETIC BIOLOGY: PARTS, DEVICES AND APPLICATIONS, 2018
SONG ET AL., PLOS ONE, vol. 11, no. 7, 2016, pages 0158447
STEVENS, DRUG DISCOVERY WORLD, vol. 4, 2003, pages 35 - 48
SVETINA ET AL., J. BIOTECHNOL, vol. 76, 2000, pages 245 - 251
V\AODAVER ET AL., FEBS LETT, vol. 309, 1992, pages 59 - 64
VOS ET AL., SCIENCE, vol. 255, 1992, pages 306 - 312
WINGFIELD, CURRENT PROTOCOLS IN PROTEIN SCIENCE, vol. 80, no. 1, 2015, pages 1 - 35
XU ET AL., BIOTECHNOLOGY LETTERS, vol. 40, 2018, pages 949 - 955
YELTON ET AL., PROC. NATL. ACAD. SCI. USA, vol. 81, pages 1470 - 1474

Also Published As

Publication number Publication date
WO2024240965A3 (fr) 2025-01-09

Similar Documents

Publication Publication Date Title
US8569029B2 (en) DNase expression in recombinant host cells
WO2024133344A1 (fr) Procédé de fourniture d'une séquence biologique candidate et dispositif électronique associé
US9528096B1 (en) Phytases and uses thereof
US10351832B2 (en) Phytases and uses thereof
US10995325B2 (en) Additional phytase variants and methods
US9045748B2 (en) Methods for transforming and expression screening of filamentous fungal cells with a DNA library
CN108603181B (zh) 植酸酶及其用途
US20160304887A1 (en) Introducing or Inactivating Female Fertility in Filamentous Fungal Cells
US9605245B1 (en) Phytases and uses thereof
WO2024240965A2 (fr) Procédé de criblage basé sur des gouttelettes
US20150307871A1 (en) Method for generating site-specific mutations in filamentous fungi
AU2019382494A1 (en) Polypeptides having lipase activity and use thereof for wheat separation
US20220267783A1 (en) Filamentous fungal expression system
CN101578367A (zh) 良好表达的合成基因的选择
EP3263698B1 (fr) Nouvelles phytases et leurs utilisations
US20230250405A1 (en) Second Additional Phytase Variants and Methods
WO2025132815A1 (fr) Nouvelles nucléases cas et polynucléotides codant pour celles-ci
EP3541954A1 (fr) Construction de molécules d'adn assistée par un extrait de cellules de levure
WO2024120767A1 (fr) Activités d'arn polymérase modifié
WO2017211803A1 (fr) Co-expression de polypeptides hétérologues pour augmenter le rendement