Detailed Description
Aspects of the present disclosure relate to compositions and methods for characterizing analytes using nanopore-based systems. The present disclosure is based in part on a protein nanopore complex formed by a CsgG pore and one or more accessory proteins that form one or more channel contractions in the nanopore complex. In some embodiments, the one or more accessory proteins are fusion proteins. As further described in the examples, it has surprisingly been found that helper proteins that confer certain desirable characteristics to CsgG nanopores (e.g., modulation of pore width, extension of pore lumen, formation of one or more additional contractions, etc.) can be designed de novo using computer-based structural analysis tools. In some embodiments, the accessory protein (e.g., fusion protein) of the slave design forms one or more additional contractions in the lumen of the CsgG well and improves discrimination of the polymer units as the analyte moves through the nanopore.
Helper proteins
Protein nanopore complexes (also interchangeably referred to as protein pore complexes) as described in the present disclosure may include one or more accessory proteins. As used herein, the terms "peptide," "polypeptide," or "protein" are used interchangeably herein and refer to two or more amino acids joined together by peptide bonds. In some embodiments, the protein (also referred to as a polypeptide or peptide) comprises between 2 and 2000 amino acids. In some embodiments, the protein comprises between 2 and 10 amino acids, between 2 and 25 amino acids, between 2 and 50 amino acids, between 2 and 100 amino acids, between 2 and 500 amino acids, or between 2 and 1000 amino acids (or any number of amino acids therebetween, such as 2,3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 250, 500, 750, 1000 amino acids, etc.). In some embodiments, the protein comprises more than 2000 amino acids. In some embodiments, the peptide, polypeptide, or protein is of synthetic origin (e.g., is not found in nature, e.g., is not naturally expressed in any living organism). In some embodiments, the peptide, polypeptide, or protein is naturally occurring (e.g., naturally expressed in a living organism that has not been genetically modified to express the peptide, polypeptide, or protein). In some embodiments, the peptide, polypeptide, or protein may be naturally expressed by an organism. In some embodiments, the peptide, polypeptide, or protein is expressed heterologous to the organism (e.g., an organism genetically modified to express the peptide, polypeptide, or protein). In some embodiments, the peptide, polypeptide, or protein is chemically synthesized (e.g., by in vitro transcription, peptide synthesis, etc.). The peptide, polypeptide, or protein may comprise one or more naturally occurring amino acids (L-amino acids, D-amino acids, etc.), one or more non-naturally occurring amino acids (e.g., radiolabeled amino acids, non-canonical amino acids, non-natural amino acids, etc.), or a combination of one or more naturally occurring amino acids and one or more non-naturally occurring amino acids.
In some embodiments, the helper protein is a fusion protein. The term "fusion protein" refers to a naturally occurring, synthetic, semisynthetic, or recombinant single protein molecule that comprises all or a portion of two or more heterologous polypeptides (e.g., polypeptides that are heterologous with respect to each other) joined by peptide bonds. In some embodiments, the fusion protein comprises all or a portion of at least 2,3, 4, 5, 6, 7, 8, 9, or 10 heterologous polypeptides joined by peptide bonds. As used herein, "a portion of a peptide" refers to 2 or more amino acids of the peptide. In some embodiments, a portion of the peptide comprises at least 5, 10, 20, 30, 50, or 100 amino acids (e.g., ,5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99 or 100 amino acids) of the complete amino acid sequence of the peptide or of the complete amino acid sequence of the peptide, which are contiguous or gapped. Portions of the fusion protein may be arranged in any suitable manner (e.g., C-terminal to N-terminal, N-terminal to C-terminal, C-terminal to C-terminal, N-terminal to N-terminal, etc.). In some embodiments, the C-terminus of the first portion can be joined (e.g., linked) to the N-terminus of the second portion. The portions of the fusion protein may be directly joined (e.g., a portion of the amino acids may be directly joined to a second portion of the amino acids via a peptide bond between the terminal amino acids of each portion) or indirectly joined (e.g., a portion of the amino acids of the fusion protein may be bound, e.g., via a first peptide bond, to a linker that is bound to a second portion of the fusion protein via a second peptide bond). In some embodiments, the first helper protein is a first portion of a fusion protein and the second helper protein is a second portion of the fusion protein. The use of linkers to join portions of fusion proteins is further described herein, for example in the section entitled "linkers".
In some embodiments, the protein nanopore complex comprises a plurality of subunits or monomers (e.g., a plurality of CsgG monomers) disposed about a central cavity or pore (also referred to as the "inner cavity" of the nanopore). The formation of protein nanopores is further described herein, for example, in the section entitled "CsgG pores". In some embodiments, one or more (e.g., 1,2, 3,4, 5,6,7,8, 9, 10, 11, 12, 13, 14, 15, or more) accessory proteins are arranged in or form a continuous channel (e.g., a continuous lumen) with the lumen of the nanopore. In some embodiments, the protein nanopore complex comprises a ratio of Kong Shanti (e.g., csgG Kong Shanti) to accessory protein of 9:1, 9:2, 9:3, 9:4, 9:5, 9:6, 9:7, 9:8, 9:9 (e.g., 1:1), 9:10, 9:11, 9:12, 9:13, 9:14, 9:15, 9:16, 9:17, or 9:18 (e.g., 1:2). In some embodiments, one or more accessory proteins or one or more fusion proteins may have the same symmetry as the nanopore. For example, when the nanopore comprises eight monomers around the central axis, there are eight helper proteins (or eight fusion proteins), or when the nanopore comprises nine monomers around the central axis, there are nine helper proteins (or nine fusion proteins), etc. In some embodiments, the one or more helper proteins (or one or more fusion proteins) may comprise more or less monomers than the nanopore, such as one more or one less.
The inner lumen of the nanopore or protein nanopore complex may have one or more constrictions. "constriction", "orifice", "constriction region", "channel constriction" or "constriction site" as used interchangeably herein refers to an aperture defined by the luminal surface of a pore or protein pore complex that functions to allow ions and target molecules (e.g., without limitation, polynucleotides or single nucleotides) but not other non-target molecules to pass through the pore or protein pore complex channel. Shrinkage is typically the narrowest pore within a pore or protein pore complex or within a channel defined by a pore or pore complex. Shrinkage may be used to limit the passage of molecules through the pores. The size of the shrinkage is often a key factor in determining the suitability of a well or well complex for analyte characterization. If the shrinkage is too small, the molecule to be characterized will not pass. However, to achieve maximum effect on ion flow through the channel, each constriction should not be too large. For example, each constriction should not be wider than the solvent-accessible lateral diameter of the target analyte. Ideally, the diameter of each constriction should be as close as possible to the transverse diameter of the analyte passing through.
The amount of shrinkage in the protein pore complexes described in this disclosure may vary. In some embodiments, the protein pore complex comprises at least 1,2, 3, 4, 5, or more contractions. In some embodiments, the protein pore complex comprises 2 or 3 contractions. In some embodiments, the protein pore complex comprises 2 contractions. In some embodiments, the first contraction is formed by a first helper protein and the second contraction is formed by a second helper protein. In some embodiments, the first constriction is formed by a portion of the CsgG nanopore and the second constriction is formed by a helper protein or fusion protein. In some embodiments, the protein pore complex comprises 3 contractions. In some embodiments, the first contraction is formed by a portion of the CsgG nanopore, the second contraction is formed by a first helper protein, and the third contraction is formed by a second helper protein. In some embodiments, the first constriction is formed by a portion of the CsgG nanopore, and the second constriction and the third constriction are formed by a fusion protein.
The narrowest point of the central lumen or bore typically forms a constriction in the continuous channel. In some embodiments, the contracted diameter is calculated by measuring the distance between the α -carbons (C a) that extend furthest into the lumen of the nanopore to form contracted amino acid residues. In some embodiments, the contracted diameter is calculated by measuring the distance between van der Waals radii (VAN DER WAALS RADII) of atoms that extend furthest into the lumen of the nanopore to form the contraction. In some embodiments, the minimum diameter of the shrinkage (e.g., shrinkage formed by a portion of the CsgG protein, shrinkage formed by the accessory protein, shrinkage formed by the fusion protein, etc.) is in the range of about 0.5nm to about 4.0 nanometers (e.g., as measured by the distance between van der waals radii). In some embodiments, the minimum diameter of the constriction is in the range of about 0.5 to about 3.0 nanometers, or about 0.5 to about 2.0 nanometers, preferably from about 0.7 to about 1.8 nanometers, about 0.8 to about 1.7 nanometers, about 0.9 to about 1.6 nanometers, or about 1.0 to about 1.5 nanometers, such as about 1.1, 1.2, 1.3, or 1.4 nanometers. In some embodiments, the smallest diameter of the constriction is aboutTo aboutWithin a range of, for example Or (b)(E.g., as measured by C a to C a). In some embodiments, the smallest diameter of the constriction is aboutTo aboutWithin a range (e.g., as measured by C a to C a). In some embodiments, the smallest diameter of the constriction is aboutTo aboutWithin a range (e.g., as measured by C a to C a).
The distance between one or more contractions in the lumen of the protein pore complex may vary. In some embodiments, the distance between the first and second constriction regions is aboutTo aboutWithin a range of (2). In some embodiments, the distance between the first and second constriction regions is about the length
Or (b)In some embodiments, the distance between the first and second constriction regions is greater in length than(E.g.,Etc.).
In some embodiments, the distance between the second and third constriction regions is aboutTo aboutWithin a range of (2). In some embodiments, the distance between the first and second constriction regions is about the length
Or (b)In some embodiments, the distance between the second and third constriction regions is greater in length than(E.g.,Etc.).
In some embodiments, the distance between the first and third constriction regions is aboutTo aboutWithin a range of (2). In some embodiments, the distance between the first and second constriction regions is about the length
Or (b)In some embodiments, the distance between the first and third constriction regions is greater in length than(E.g.,Etc.).
In some embodiments, the helper protein (or fusion protein) may be modified from its native state to provide a contraction with a desired minimum diameter. For example, the helper protein may be modified, such as by introducing one or more large residues via targeted mutation, to produce a contraction having a minimum diameter within the specified range described above. In one embodiment, the maximum height of the helper protein is about 3nm to about 20nm, such as about 4nm to about 10nm. In one embodiment, the channels in the accessory protein are about 3nm to about 20nm in length, such as about 4nm to about 10nm. Height is the dimension of the accessory protein in the direction perpendicular to the membrane.
In some embodiments, the helper protein (e.g., the first helper protein or the second helper protein) or the fusion protein (e.g., the first portion of the fusion protein or the second portion of the fusion protein) extends outside the lumen of the protein pore complex. The helper protein or fusion protein may extend outside of the cis or trans side of the protein pore complex lumen (e.g., when the protein pore complex is inserted into a membrane). In some embodiments, the distance that the helper protein or fusion protein extends outside the lumen of the protein pore complex is calculated by measuring the distance between C a of the amino acid residue of the helper protein or fusion protein and a reference amino acid (e.g., amino acid residue Phe144 or Tyr196 of a wild-type CsgG monomer) of the protein pore (e.g., csgG pore). In some embodiments, the helper protein or fusion protein extends to about outside the lumenAnd aboutBetween them. In some embodiments, the helper protein or fusion protein extends to about outside the lumenAnd aboutBetween them. In some embodiments, the helper protein or fusion protein extends to about outside the lumenAnd aboutBetween them. In some embodiments, the helper protein or fusion protein extends to about outside the lumen
Or about
The length between the first and second shrinkage of the protein pore complex generally affects the axial length of the protein pore complex. In some embodiments, the axial length of the protein pore complex refers to the distance between the top of the lumen of the protein pore complex and the bottom of the lumen of the protein pore complex. In some embodiments, the protein pore complex has a size greater thanIs provided. In some embodiments, the axial length of the protein pore complex (e.g., a protein pore complex comprising one or more accessory proteins or one or more fusion proteins) is aboutTo aboutWithin a range of, for example
Or (b)
In some embodiments, the helper protein or fusion protein comprises one or more positively charged amino acids, such as arginine, lysine, or histidine, or an aromatic amino acid, such as tyrosine or tryptophan, that is located at or near (e.g., within about 1, 2, 3, 4, or 5nm of) the contraction formed by the helper protein or fusion protein. In some embodiments, the helper protein or fusion protein comprises one or more polar amino acids, negative amino acids, or hydrophobic amino acids that are located at or near (e.g., within about 1, 2, 3, 4, or 5nm of) the contraction formed by the helper protein or fusion protein. In some embodiments, the contracted one or more amino acids located at or near (e.g., within about 1, 2, 3, 4, or 5nm of the contraction) formed by the helper protein or fusion protein is asparagine, threonine, serine, or glutamic acid. These amino acids generally facilitate interactions between the pore and the polynucleotide.
The localization of one or more accessory proteins (or one or more fusion proteins) of the protein pore complex may vary. In some embodiments, the helper protein (or fusion protein) is located entirely within the lumen of the protein pore complex. In some embodiments, the helper protein or fusion protein comprises a portion that extends beyond the lumen of the protein pore complex, e.g., extends above the lumen of the protein pore complex (e.g., extends above the cap region on the cis side of the protein pore complex) and/or extends below the protein pore complex (e.g., extends below the transmembrane domain (e.g., the barrel) on the trans side of the protein pore complex). In some embodiments, a helper protein or fusion protein (or a portion of a helper protein or fusion protein, such as the first portion or the second portion) is attached to the nanopore (e.g., a CsgG nanopore). In some embodiments, the helper protein or fusion protein (or portion thereof) is covalently attached to the nanopore. In some embodiments, the helper protein or fusion protein (or portion thereof) is non-covalently attached to the nanopore. In some embodiments, the first and second helper proteins are attached (e.g., covalently attached, non-covalently attached, etc.) to each other. In some embodiments, the first portion of the fusion protein and the second portion of the fusion protein are attached to each other (e.g., covalently attached, non-covalently attached, etc.). In some embodiments, the helper protein or fusion protein (or a portion of the helper protein or fusion protein, such as the first portion or the second portion) is not attached to the nanopore (e.g., csgG nanopore). In some embodiments, the first and second helper proteins are not attached to each other.
In some embodiments, the helper protein (e.g., first helper protein) is not CsgF or CsgF peptide, or a functional homolog, fragment, or modified form thereof. In some embodiments, the portion (e.g., the first portion and/or the second portion) of the fusion protein is not CsgF or CsgF peptide, or a functional homolog, fragment, or modified form thereof. In some embodiments, the helper protein is not a CsgG nanopore or a homolog, fragment, or modified form thereof. In some embodiments, a portion (e.g., the first portion and/or the second portion) of the fusion protein is not a CsgG nanopore or a homolog, fragment, or modified form thereof.
In some embodiments, the accessory protein is not a polynucleotide binding protein. In some embodiments, the helper protein is not a functional polynucleotide binding protein, e.g., the helper protein is not a polynucleotide binding protein having enzymatic activity. In some embodiments, the accessory protein may be a protein other than a nucleic acid processing enzyme, e.g., an accessory protein that is not a helicase or a polymerase, or a protein derived from such an enzyme. In some embodiments, the accessory protein has no enzymatic activity. In some embodiments, the helper protein does not undergo a conformational change as the target analyte passes through successive channels formed in the protein pore complex.
In some embodiments, the helper protein or fusion protein (e.g., part of the fusion protein) is a component of the nanopore system or a modified component of such a system, rather than a component that forms a transmembrane pore. An example of this component is a truncated version of CsgF or CsgF. In some embodiments, the helper protein or fusion protein comprises CsgF protein or a homolog or modified form thereof, such as a fragment. In some embodiments, the pore complex comprises CsgF proteins or peptides and non-CsgG pores, homologs or modified forms thereof, such as fragments.
The term "CsgF protein" or "CsgF peptide" preferably defines a CsgF peptide (i.e., an N-terminal fragment) that has been truncated from its C-terminus. The CsgF peptide may be a fragment of wild-type e.coli CsgF (e.g. as shown in fig. 3A), or a fragment of a wild-type homolog of e.coli CsgF, such as, for example, a peptide comprising any of the amino acid sequences shown in WO 2019/002893 (incorporated herein by reference in its entirety). CsgF homolog refers to a polypeptide having at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity with wild-type e.coli CsgF. CsgF homologs can also be referred to as polypeptides containing the PFAM domain PF10614, which is characteristic of CsgF-like proteins. A list of the currently known CsgF homologs and CsgF constructs can be found in http:// pfam. Xfam. Org// family/PF 10614. Mature CsgF (e.g., as shown in FIG. 3A) can be divided into three main regions, "CsgF shrink peptide" (FCP), "neck" region and "head" region. The "head" region of CsgF peptides differs from the shrinkage of the wells as described herein. The "head" region of CsgF peptides may also be referred to as the "C-terminal head domain". The structure of CsgF is discussed in detail in WO 2019/002893 (incorporated herein by reference in its entirety).
In some embodiments, the CsgF peptide is a truncated CsgF peptide that lacks the C-terminal head, lacks a portion of the C-terminal head and neck domain of CsgF (e.g., the truncated CsgF peptide may comprise only a portion of the neck domain of CsgF), or lacks the C-terminal head and neck domain of CsgF. The CsgF peptide may lack a portion of the CsgF neck domain, e.g., the CsgF peptide may comprise a portion of the neck domain, such as, for example, amino acid residue 36 at the N-terminus from the neck domain (e.g., residues 36-40, 36-41, 36-42, 36-43, 36-45, 36-46 up to residues 36-50 or 36-60 of wild-type e.coli CsgF). In some embodiments, the CsgF peptide comprises a CsgG binding region and a region that forms a constriction in the lumen of the well. The CsgG binding region typically comprises residues 1 to 11 and/or 29 to 32 of CsgF protein (e.g., wild-type e.coli CsgF or a homolog from another species) and may include one or more modifications. The region in the well where the shrinkage is formed typically comprises residues 9 to 28 of CsgF protein (e.g., wild-type e.coli CsgF or a homolog from another species) and may include one or more modifications. In some embodiments, residues 9 to 17 comprise the conserved motif N 9PXFGGXXX17 and form a turn region. In some embodiments, residues 9 to 28 form an α -helix. In some embodiments, the amino acid residue at position 17 of the CsgF peptide forms the apex of the constriction region, corresponding to the narrowest portion of the constriction in the hole CsgF. In some embodiments, the CsgF constriction region is also in stable contact with the csggβ -barrel predominantly at residues 8, 9, 11, 12, 18, 21 and 22 of the CsgF peptide. In some embodiments, the CsgF peptide comprises or consists of amino acid sequence GTMTFQFRNPNFGGNPNNGAFLLNSAQAQN (SEQ ID NO: 60), which corresponds to amino acid residues 1-30 of wild-type E.coli CsgF. In some embodiments, the CsgF peptide is a first accessory protein. In some embodiments, the CsgF peptide is part (e.g., the first portion or the second portion) of a fusion protein. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-23 of wild-type e.coli CsgF. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-23 of wild-type e.coli CsgF. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-24 of wild-type e.coli CsgF. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-24 of wild-type e.coli CsgF.
In some embodiments, the CsgF peptide has a length of 28 to 60 amino acids, such as 29 to 49, 30 to 45, or 32 to 40 amino acids. In some embodiments, the CsgF peptide comprises 29 to 35 amino acids or 29 to 45 amino acids. In some embodiments, the CsgF peptide comprises 24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59 or 60 amino acids. In some embodiments, the CsgF peptide comprises all or part of FCP, which corresponds to residues 1 to 35 (or corresponding residues in CsgF homologs) of wild-type e.coli CsgF. In some embodiments, where CsgF peptides are shorter than FCP, truncation is preferably performed at the C-terminus.
In the CsgF peptide, one or more residues may be modified. For example, the CsgF peptide may comprise modifications at positions corresponding to one or more of the positions G1, M3, T4, F5, R8, N9, N11, F12, N17, A20, N24, A26 and Q29 of SEQ ID NO. 60. In some embodiments, the CsgF peptide is modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more unnatural amino acids, one or more polar amino acids, or one or more photoreactive amino acids, e.g., at a position corresponding to one or more of the following positions in SEQ ID NO:60, G1, T4, F5, R8, N9, N11, F12, N17, A20, N24, A26, Q27, and Q29. Any number and combination of such introductions may be made. Preferably by substitution.
In some embodiments, the CsgF peptide comprises modifications N15, N17, A20, N24, and A28 at positions corresponding to one or more of the following positions in SEQ ID NO: 60. In some embodiments, the CsgF peptide comprises one or more of the following substitutions :N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E;N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E;A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E;N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E; or A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E.
In some embodiments, the CsgF peptide is preferably a variant of any of the CsgF sequences discussed above (including SEQ ID NO: 60), which contains one or more modifications as compared to the comparison sequence. The variant will preferably be at least 40% homologous to the amino acid sequence of SEQ ID NO. 60 over the entire length of this sequence based on amino acid identity. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity over the entire sequence with the amino acid sequence of SEQ ID NO. 60. The variant will preferably be at least 40% identical to the sequence over the entire length of the amino acid sequence of SEQ ID NO. 60. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical over the entire sequence to SEQ ID NO. At least 80%, for example at least 85%, 90% or 95% amino acid identity ("hard homology") may be present over a stretch of 15 or more, for example 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more consecutive amino acids. These homology/identity levels apply equally to any of the other CsgF peptides described above.
Any number CsgF peptides, such as 1, 2, 3, 4,5, 6, 7, 8, 9, or 10, in a pore or pore complex, may contain one or more substitutions as compared to SEQ ID NO. 60. In some embodiments, all six to ten monomers in a pore or pore complex preferably contain one or more substitutions as compared to SEQ ID NO: 60. The CsgF peptides in the pore complex may be the same or different. The CsgF peptide is preferably the same in each pore monomer conjugate in the pore complexes of the present disclosure.
Aspects of the disclosure relate to helper proteins or fusion proteins comprising one or more alpha helices. In some embodiments, such proteins may be referred to as "helix-forming proteins". The present disclosure is based, in part, on the recognition that a helix-forming protein can be located in the lumen of certain nanopores (e.g., csgG nanopores) to form one or more contractions in the lumen of the nanopores, and that the presence of such one or more contractions improves the signal-to-noise ratio (e.g., discrimination of polynucleotide bases) of the resulting protein pore complex. The term "helix" or "helical" generally refers to a coiled arrangement of proteins that forms a helix and results from the formation of hydrogen bonds between backbones of non-contiguous amino acid residues in a repeating pattern. In some embodiments, the helices are alpha-helices (also known as 3.6 13 -helices), each of which comprises about 3.6 amino acid residues, 13 of which participate in the loop formed by the hydrogen bond. In some embodiments, the helix is a 3 10 helix, which contains about three residues per turn, and has 10 atoms in the loop formed by the formation of hydrogen bonds.
The number of helices (e.g., alpha helices, 3 10 helices, pi helices, etc.) in the helper protein or fusion protein may vary. In some embodiments, the number of helices in the helper protein (e.g., first helper protein, second helper protein, etc.) is in the range of about 0 to about 15, e.g., 0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. In some embodiments, the number of helices in the helper protein (e.g., first helper protein, second helper protein, etc.) is greater than 15 (e.g., 20, 25, etc.). In some embodiments, the fusion protein (e.g., first portion of the fusion protein, second portion of the fusion protein, etc.) comprises between 0 and about 15 helices (e.g., alpha helix, 3 10 helices, pi helices, etc.), such as 0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 helices.
The number of turns in the helix (e.g., alpha helix, 3 10 helix, pi helix, etc.) may vary. In some embodiments, each helix (e.g., alpha helix, 3 10 helix, pi helix, etc.) of the helper protein or fusion protein comprises about 0 to about 15 helices, such as 0, 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 helices. The helix (e.g., alpha helix, 310 helix, pi helix, etc.) may include one or more half-helices (e.g., half turns), e.g., 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, etc. turns.
The number of amino acids forming the helix (e.g., alpha helix, 3 10 helix, pi helix, etc.) may vary. In some embodiments, each helix (e.g., alpha helix, 3 10 helix, pi helix, etc.) of the helper protein or fusion protein comprises between 2 and 55 amino acid residues, e.g., 2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54 or 55 amino acid residues.
The helix angle of the helper protein or fusion protein may vary. In some embodiments, the helix includes Phi angles in the range of about-45 ° to-90 ° (e.g., ,-45°、-46°、-47°、-48°、-49°、-50°、-51°、-52°、-53°、-54°、-55°、-56°、-57°、-58°、-59°、-60°、-61°、-62°、-63°、-64°、-65°、-66°、-67°、-68°、-69°、-70°、-71°、-72°、-73°、-74°、-75°、-76°、-77°、-78°、-79°、-80°、-81°、-82°、-83°、-84°、-85°、-86°、-87°、-88°、-89° or-90 °). In some embodiments, the spiral includes a Psi angle in the range of about 0 ° to-70 ° (e.g., ,0°、-1°、-2°、-3°、-4°、-5°、-6°、-7°、-8°、-9°、-10°、-11°、-12°、-13°、-14°、-15°、-16°、-17°、-18°、-19°、-20°、-21°、-22°、-23°、-24°、-25°、-26°、-27°、-28°、-29°、-30°、-31°、-32°、-33°、-34°、-35°、-36°、-37°、-38°、-39°、-40°、-41°、-42°、-43°、-44°、-45°、-46°、-47°、-48°、-49°、-50°、-51°、-52°、-53°、-54°、-55°、-56°、-57°、-58°、-59°、-60°、-61°、-62°、-63°、-64°、-65°、-66°、-67°、-68°、-69° or-70 °). In some embodiments, each of the helices comprises 1 to 20 amino acid residues having a Phi angle in the range of about-45 ° to-90 ° and a Psi angle in the range of about 0 ° to-70 °. In some embodiments, each of the helices comprises 1 to 30 amino acid residues having a Phi angle in the range of about-45 ° to-90 ° and a Psi angle in the range of about 0 ° to-70 °.
In some embodiments, one or more helices of the helper protein or fusion protein include structural features that facilitate stacking the helices together. "stacking" of helices generally refers to the intimate association of two or more helices with one another due to covalent or non-covalent interactions between the helices, e.g., salt bridges, hydrogen bonds, disulfide bonds, and intimate hydrophobic side-chain to side-chain contact, side-chain to main-chain contact, main-chain to main-chain contact, and the like, as described in Walter and Argos, J Mol biol.1996, month 1, 26; 255 (3): 536-53. Doi:10.1006/jmbi.1996.0044. Methods for predicting spiral packing are known, for example, as described in Eilers et al Proc NATL ACAD SCI U S A.5.23.2000; 97 (11): 5796-5801.
Aspects of the present disclosure relate to the recognition that circularized fusion proteins improve the discrimination of target analytes in protein pore complexes. "circularized" proteins generally refer to proteins (e.g., fusion proteins) that comprise one or more intramolecular interactions that result in the formation of one or more circularly arranged bonds. Examples of cyclizations include side-chain to side-chain cyclizations (e.g., intramolecular disulfide bond formation), head-to-tail cyclizations (e.g., formation of an amide bond between N-and C-terminal amino acids of a protein), tail-to-side-chain cyclizations, and head-to-side-chain cyclizations, e.g., as described in Hayes et al Org Biomol chem.2021, 5, 12, 19 (18): 3983-4001. In some embodiments, the fusion protein comprises one or more side-to-side chain cyclization linkages. In some embodiments, at least one of the side-to-side chain cyclization bonds is a disulfide bond. In some embodiments, one or more cyclization bonds results in cyclization between the first portion of the fusion protein and the second portion of the fusion protein (e.g., cyclization between CsgF peptide and the helix-forming protein). In some embodiments, the helper protein or fusion protein comprises a loop region (e.g., a linker forming a loop region) comprising one or more circularized bonds. In some embodiments, the cyclized bond is formed by a chemical crosslinker and/or includes a disulfide bond.
CsgG nanopore
Aspects of the disclosure relate to protein pore complexes. In some embodiments, the protein pore complexes described in the present disclosure comprise nanopores (e.g., csgG nanopores). A nanopore is a hole or channel through a membrane that allows hydrated ions driven by an applied potential to flow through or within the membrane.
In some embodiments, the nanopore is a transmembrane protein pore. Transmembrane protein pores typically span the entire membrane and may have structures that extend beyond the membrane on one or both sides. Transmembrane protein pores are mono-or multimeric proteins that allow hydrated ions to flow from one side of the membrane to the other. A transmembrane protein pore comprises a channel that allows an analyte (e.g., a polynucleotide such as DNA or RNA) to move or be moved into and/or through the pore.
Transmembrane protein pores typically include a barrel or channel through which ions can flow. The subunits of the pore generally surround the central axis and contribute chains to the transmembrane β -barrel or channel or transmembrane α -helical bundle or channel.
The barrel or channel of a transmembrane protein pore typically contains amino acids that facilitate interactions with the polynucleotide. These amino acids are preferably located near the constriction of the barrel or channel (such as within 1,2, 3, 4 or 5 nm). Transmembrane protein pores typically contain one or more polar or hydrophobic residues. These amino acids generally facilitate interactions between the pore and a nucleotide, polynucleotide or nucleic acid.
In some embodiments, the nanopore is a CsgG pore, such as, for example, csgG from escherichia coli strain K-12 sub-strain MC4100 or a homolog or mutant thereof. The mutant CsgG pore may comprise one or more mutant monomers. The CsgG pores may be homopolymers comprising the same monomer or heteropolymers comprising two or more different monomers. Suitable wells derived from CsgG are disclosed in WO 2016/034591, WO 2017/149418, WO 2017/149197, WO2017/149318, international patent application numbers PCT/GB2018/051191 and PCT/GB2018/051858, and chinese patent publication numbers CN113773373, CN113896776, CN113912683 and CN113754743, each of which is incorporated herein by reference in its entirety. Other examples of CsgG wells include, but are not limited to Uniprot reference numbers K4KIX, A0a086D1N6, A0A1I1MNE8, A0a143HJG2, aoA090RS48, and A0a090SZM0.
The CsgG pores typically contain one or more CsgG monomers. The CsgG pore monomer is a monomer capable of forming CsgG pores. Such monomers are known in the art, in particular from WO 2019/002893 (incorporated herein by reference in its entirety). The CsgG pore preferably comprises one or more of (a) a cap region, (b) a constriction region, and (c) a transmembrane β barrel region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c), or (a), (b) and (c). The CsgG pore monomer preferably comprises one or more of (a) a cap-forming region, (b) a shrink-forming region, and (c) a transmembrane β -barrel-forming region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c), or (a), (b) and (c). The CsgG pore formed from the monomer may have any structure, but preferably has or comprises the structure of a wild-type e.coli CsgG pore (e.g. as described in PDB accession No. 4UV 3). The protein structure of CsgG defines channels or pores that allow molecules and ions to translocate from one side of the membrane to the other.
The CsgG pore may be of any size, but preferably has the size of a wild-type e.coli CsgG pore (e.g. as described in PDB accession No. 4UV 3). These dimensions are shown in fig. 19. In some embodiments, the CsgG pores have about 100 to about at their widest pointSuch as from about 110 to about at its widest pointOr about 115 to about In some embodiments, the CsgG pores have about at their widest pointIs formed on the outer diameter of the steel sheet. In some embodiments, the CsgG pores have a diameter of about 80 to aboutSuch as from about 90 to aboutOr about 95 to aboutIn some embodiments, the CsgG well has aboutIs a combination of the total length of (a) and (b). References to "total length" and "length" refer to the length of a hole or hole region when viewed from the side (see, e.g., cis-to-trans cross-sections of holes inserted into a film). This may be a side view in fig. 19. In some embodiments, the outer diameter is measured by calculating the C a to C a distance from the furthest amino acid residue on the outside of the CsgG pore. In some embodiments, the outer diameter is measured by calculating the distance from the van der waals radius of the furthest amino acid residue on the outside of the CsgG pore.
In some embodiments, the cap region has a thickness of about 20 to aboutSuch as about 30 to aboutOr about 35 to aboutIn some embodiments, the cap region has aboutIs a length of (c). In some embodiments, the channel defined by the cap region has a diameter of about 30 to aboutSuch as an opening having a diameter of about 40 to aboutOr about 45 to aboutIn some embodiments, the channel defined by the cap region has a diameter of aboutIs provided. In some embodiments, the diameter of the channel defined by the cap region at its narrowest point is from about 20 to aboutSuch as a diameter at its narrowest point of about 30 to aboutOr about 32 to aboutIn some embodiments, the diameter of the channel defined by the cap region at its narrowest point is preferably aboutIn some embodiments, the outer diameter is measured by calculating the C a to C a distance of the amino acid residues closest together on the channel of the cap region of the CsgG pore. In some embodiments, the outer diameter is measured by calculating the distance of the van der Waals radii of the amino acid residues closest together on the channel of the cap region.
In some embodiments, the constriction region formed by the CsgG pores (when present) has a concentration of about 5 to aboutSuch as about 10 to aboutOr about 15 to aboutIn some embodiments, the constriction region has a width of aboutIs a length of (c). In some embodiments, the diameter of the channel defined by the constriction is from about 2 to aboutSuch as a diameter at its narrowest point of about 5 to aboutAbout 8 to aboutOr about 10 to aboutIn some embodiments, the diameter of the channel defined by the constriction is aboutIn some embodiments, the diameter of the channel defined by the constriction is aboutIn some embodiments, the diameter of the constriction is from about 2 to aboutSuch as a diameter of about 5 to aboutAbout 8 to aboutOr about 10 to aboutIn some embodiments, the diameter of the constriction is aboutIn some embodiments, the shrinkage region of a CsgG well is measured by calculating the C a to C a distance of the amino acid residue that extends furthest into the lumen of the well and forms a shrinkage. In some embodiments, the outer diameter is measured by calculating the distance of the van der Waals radius of the amino acid residue that extends furthest into the lumen of the hole and forms the constriction.
In some embodiments, the transmembrane β barrel region has about 20 to aboutSuch as about 30 to aboutOr about 35 to aboutIn some embodiments, the transmembrane β barrel has a cross-membrane concentration of aboutIs a length of (c). In some embodiments, the diameter of the channel defined by the transmembrane β barrel region at its narrowest point is from about 20 to aboutSuch as a diameter at its narrowest point of about 30 to aboutOr about 35 to aboutIn some embodiments, the diameter of the channel defined by the transmembrane β barrel region at its narrowest point is about
All of the above measurements are based on backbone-to-backbone measurements of amino acids forming the different regions (as shown in fig. 19).
SEQ ID NO. 59 shows the sequence of the wild-type E.coli CsgG as mature protein. Residues 1 to 41 of SEQ ID NO. 59 form a cap region. Residues 64 to 131 of SEQ ID NO. 59 form a constriction region. Residues 156 to 180 and 212 to 262 of SEQ ID NO. 59 form a transmembrane beta barrel region.
In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO:59 in that it has a cysteine at a position corresponding to position 153 or 133 of SEQ ID NO: 59. In some embodiments, the variant CsgG monomer may also be referred to as a modified CsgG pore monomer or a mutant CsgG Kong Shanti. Modifications or mutations in the variants include, but are not limited to, any one or more modifications disclosed herein or combinations of such modifications. CsgG Kong Shanti may be a CsgG homologous monomer. A CsgG homolog monomer is a polypeptide having at least 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% or 99% complete sequence identity with a wild-type E.coli CsgG as set forth in SEQ ID NO 59. The CsgG homolog, also known as PFAM domain PF 03783-containing polypeptide, is characteristic of CsgG-like proteins. A list of CsgG homologs and CsgG frameworks known to date can be found at http:// pfam. Xfam. Org// family/PF 03783.
In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO:59 that includes one or more modifications in addition to a cysteine at a position corresponding to position 153 or 133 in SEQ ID NO: 59. The variant will preferably be at least 40% homologous to the amino acid sequence of SEQ ID NO. 59, based on amino acid identity, over the entire length of the sequence. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity over the entire sequence with the amino acid sequence of SEQ ID NO 59. The variant will preferably be at least 40% identical to the sequence over the entire length of the amino acid sequence of SEQ ID NO. 59. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical over the whole sequence to SEQ ID NO 59.
Sequence identity may also relate to fragments or portions of CsgG pore monomers. Thus, a sequence may have less than 40% overall sequence homology/identity to SEQ ID NO. 59, but the sequence of a particular region, domain or subunit may share at least 80%, 90% or up to 99% sequence homology/identity to the corresponding region of SEQ ID NO. 59. At least 80%, for example at least 85%, 90% or 95% amino acid identity ("hard homology") may be present over a stretch of 100 or more, for example 125, 150, 175 or 200 or more consecutive amino acids. In some embodiments, the CsgG pore monomer is preferably a variant of SEQ ID NO. 3, which comprises a sequence that is at least 40% homologous to the cap region (residues 1 to 41) of SEQ ID NO. 3. More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 1 to 41 of SEQ ID No. 59. In some embodiments, the variant comprises a sequence that is at least 40% identical to residues 1 to 41 of SEQ ID NO. 59. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and more preferably at least 95%, 97%, or 99% identical to residues 1 to 41 of SEQ ID No. 59.
In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO:59 comprising a sequence that is at least 40% homologous to the contraction region (residues 64 to 131) of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and more preferably at least 95%, 97%, or 99% homologous based on amino acid identity to residues 64 to 131 of SEQ ID No. 59. In some embodiments, the variant comprises a sequence that is at least 40% identical to residues 64 to 131 of SEQ ID NO. 59. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and more preferably at least 95%, 97%, or 99% identical to residues 64 to 131 of SEQ ID No. 59.
In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO:59 comprising a sequence that is at least 40% homologous to the transmembrane β barrel region (residues 156-180 and 212-262) of SEQ ID NO: 3. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 156-180 and 212-262 of SEQ ID NO. 59. In some embodiments, the variants comprise a sequence that is at least 40% identical to residues 156-180 and 212-262 of SEQ ID NO. 59. In some embodiments, the variants comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and more preferably at least 95%, 97%, or 99% identical to residues 156-180 and 212-262 of SEQ ID NO. 59.
CsgG pore monomers are highly conserved (as can be readily appreciated from figures 45 to 47 of WO 2017/149193). Furthermore, from knowledge of the mutation associated with SEQ ID NO:59, the equivalent position of the mutation of the CsgG pore monomer other than SEQ ID NO:59 can be determined.
Thus, reference to a mutant CsgG pore monomer comprising a variant of the sequence as set forth in SEQ ID No. 59 and specific amino acid mutations thereof as set forth in the claims and elsewhere in the specification also encompasses a mutant CsgG Kong Shanti comprising a variant of any of the sequences set forth in SEQ ID nos. 68 to 88 of WO 2019/002893 (incorporated herein by reference in its entirety) and the corresponding amino acid mutations thereof. The CsgG pore monomer may also be any of the sequences shown in CN 113773373A, CN 113896776A, CN 113912683A and CN 113754743A or a variant thereof.
Homology can be determined using standard methods in the art. For example, the UWGCG Package provides the BESTFIT program that can be used to calculate homology, e.g., using on its default settings (Devereux et al (1984) Nucleic ACIDS RESEARCH, pages 387-395). PILEUP and BLAST algorithms can be used to calculate homology or alignment sequences, such as identifying equivalent residues or corresponding sequences (typically at their default settings), for example as described in Altschul S.F. (1993) J Mol Evol 36:290-300; altschul, S.F. et al (1990) J Mol Biol 215:403-10. Software for performing BLAST analysis is publicly available at the national center for Biotechnology information (http:// www.ncbi.nlm.nih.gov /).
SEQ ID NO. 59 is a wild type CsgG Kong Shanti from E.coli strain K-12 subclone MC4100. The variant of SEQ ID NO. 59 may comprise any substitution present in another CsgG homologue. Preferred CsgG homologs are shown in SEQ ID NOS.68 to 88 of WO 2019/002893, which is incorporated herein by reference in its entirety. Variants may comprise a combination of one or more of the substitutions present in SEQ ID NO:68 to 88WO 2019/002893 (incorporated herein by reference in its entirety), including one or more, as compared to SEQ ID NO:59
The CsgG pore monomers in the pore monomer conjugates of the present disclosure generally retain the ability to form the same 3D structure as the wild-type CsgG pore monomer, such as the same 3D structure as the CsgG pore monomer having the sequence of SEQ ID NO: 59. The 3D structure of CsgG is known in the art and is disclosed, for example, in Goyal et al (2014) Nature516 (7530): 250-3. In addition to the mutations described herein, any number of mutations can be made in the wild-type CsgG sequence, provided that the CsgG pore monomer retains the improved properties imparted to it by the mutation.
Amino acid substitutions, for example up to 1,2, 3, 4, 5, 10, 20 or 30 substitutions, may be made to the amino acid sequence of SEQ ID NO. 59 in addition to those discussed above. Conservative substitutions replace amino acids with other amino acids of similar chemical structure, similar chemical nature, or similar side-chain volume. The introduced amino acids may have similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality, or charge as the amino acids they replace. Alternatively, conservative substitutions may introduce another amino acid that is aromatic or aliphatic in place of the pre-existing aromatic or aliphatic amino acid.
In some embodiments, csgG pore monomers are modified to incorporate one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more unnatural amino acids, one or more polar amino acids, or one or more photoreactive amino acids. Any number and combination of such introductions may be made. Preferably by substitution.
One or more amino acid residues of the amino acid sequence of SEQ ID NO. 59 may additionally be deleted from the above-mentioned polypeptide. Up to 1,2,3,4, 5, 10, 20 or 30 or more residues may be deleted.
Variants may include fragments of SEQ ID NO. 59. Such fragments retain pore-forming activity. Fragments may be at least 50, at least 100, at least 150, at least 200, or at least 250 amino acids in length. Such fragments can be used to create wells. The fragment preferably comprises the transmembrane domain of SEQ ID NO. 59, namely K135-Q153 and S183-S208.
One or more amino acids may alternatively or additionally be added to the polypeptides described above. Extension may be provided at the amino terminus or the carboxy terminus of the amino acid sequence of SEQ ID NO. 59 or a polypeptide variant or fragment thereof. The extension may be very short, for example 1 to 10 amino acids in length. Alternatively, the extension may be longer, e.g. up to 50 or 100 amino acids. The carrier protein may be fused to an amino acid sequence. Other fusion proteins are discussed in more detail elsewhere in this disclosure, for example in the section entitled "helper proteins".
A variant of SEQ ID NO. 59 is a polypeptide having an amino acid sequence that differs from the amino acid sequence of SEQ ID NO. 59 and retains its ability to form a hole. Variants typically contain the region of SEQ ID NO. 59 responsible for pore formation. The pore forming ability of CsgG containing β -barrel is provided by β -sheet in the transmembrane β barrel region of each subunit monomer. The variant of SEQ ID NO. 59 generally comprises the region of SEQ ID NO. 59 forming the β -sheet, namely K134-Q154 and S183-S208. One or more modifications may be made to the β -sheet forming region of SEQ ID NO. 3, provided that the resulting variant retains its ability to form a pore.
One or more modifications in the CsgG pore monomer preferably improve the ability of the pore complex comprising the pore monomer to characterize the analyte. For example, modifications/mutations/substitutions are contemplated to alter the number, size, shape, placement or orientation of contractions within the channels from the pore monomer conjugates of the present disclosure. The CsgG pore monomer or variant of SEQ ID NO:59 may have any of the specific modifications or substitutions disclosed in WO 2016/034591, WO 2017/149416, WO 2017/149193, WO 2017/149418, WO 2018/211241 and WO 2019/002893 (all incorporated herein by reference in their entirety).
Preferred modifications or substitutions in SEQ ID NO. 59 include, but are not limited to, one or more of, such as 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more or all of:
(a) Substitution at position Y51, such as Y51I, Y, L, Y, 51, A, Y, 51V, Y, T, Y, 51S, Y, Q or Y51N;
(b) Substitution at position N55, such as N55I, N L, N55A, N55V, N55T, N S or N55Q;
(c) Substitutions at position F56, such as F56I, F56L, F56A, F56V, F56T, F56S, F Q or F56N;
(d) Substitution at position L90, such as L90N, L90D, L90E, L R or L90K;
(e) Substitution at position N91, such as N91D, N91E, N91R or N91K;
(f) Substitutions at position K94, such as K94R, K94F, K94Y, K94Q, K94W, K94L, K S or K94N;
(g) Substitution at position R192, such as R192Q, R192F, R192S, R D or R192T, and
(I) Substitution at position C215, such as C215T, C215S, C215I, C215L, C215A, C215V or C215G.
The variant of SEQ ID NO.3 may further comprise a deletion of one or more positions, such as a deletion of T104-N109, a deletion of F193-L199 or a deletion of F195-L199.
Any number of CsgG Kong Shanti, such as 6, 7, 8, 9, or 10, in a pore or pore complex may be a variant of SEQ ID NO: 59. All six to ten monomers in a pore or pore complex are preferably variants of SEQ ID NO: 59. Variants in the pore complex may be the same or different. The variants are preferably identical in each pore monomer conjugate in the pore complex.
Joint
In some embodiments, the protein pore complex is stabilized by attaching (e.g., covalently attaching) a helper protein or a fusion protein to the nanopore. The covalent linkage may be, for example, disulfide or click chemistry. As a further example, cysteine residues may be linked by a linker such as BMOE. The helper or fusion proteins and/or transmembrane protein nanopores may be modified to facilitate such covalent interactions. In some embodiments, the helper protein or fusion protein is non-covalently attached to the nanopore. In some embodiments, the helper protein or fusion protein is attached to the nanopore through one or more (e.g., 1,2, 3,4, 5, or more) linkers.
In some embodiments, the helper protein or fusion protein is attached to the nanopore by hydrophobic interactions and/or by one or more disulfide bonds. One or more monomers in the pores, such as 2, 3, 4, 5, 6, 8, 9, e.g., all monomers, may be modified to enhance such interactions. This may be accomplished in any suitable manner. Other suitable interactions include salt bridging, electrostatic interactions, hydrogen bond formation, peptide bond formation, and pi-pi interactions.
At least one cysteine residue in the amino acid sequence of the transmembrane protein nanopore located at the interface between the nanopore and the helper protein (or fusion protein) may be disulfide bonded to at least one cysteine residue in the amino acid sequence of the helper protein located at the interface between the nanopore and the helper protein. In some embodiments, at least one cysteine residue in the amino acid sequence of the first helper protein is disulfide bonded to at least one cysteine residue in the amino acid sequence of the second helper protein. In some embodiments, at least one cysteine residue in the amino acid sequence of the first portion of the fusion protein is disulfide bonded to at least one cysteine residue in the amino acid sequence of the second portion of the fusion protein. The cysteine residues in the nanopore and/or the cysteine residues in the helper protein or the fusion protein may be cysteine residues not present in the wild-type transmembrane protein pore monomer or the wild-type helper protein. A plurality of disulfide bonds, such as 2, 3,4,5, 6,7,8 or 9 to 16, 18, 24, 27, 32, 36, 40, 45, 48, 54, 56 or 63, may be formed between the nanopore in the pore complex and the helper protein (or fusion protein). One or both of the nanopore and the helper protein (or fusion protein) may comprise at least one monomer or subunit, such as up to 8, 9, or 10 monomers or subunits, comprising a cysteine residue at the interface between the nanopore and the helper protein (or fusion protein).
The nanopore and/or helper protein (or fusion protein) may comprise one or more hydrophobic amino acid residues at the interface between the nanopore and the helper protein (or fusion protein) that are more hydrophobic than residues present at corresponding positions in the wild-type nanopore and/or helper protein (or fusion protein). At least one monomer or subunit in the nanopore and/or at least one monomer or subunit in the helper protein (or fusion protein) may comprise at least one residue at the interface between the nanopore and the helper protein (or fusion protein) that is more hydrophobic than the residue present at the corresponding position in the wild-type pore or helper protein (or fusion protein). For example, 2 to 10, such as 3,4, 5,6,7,8, or 9 residues in the nanopore and/or helper protein (or fusion protein) may be more hydrophobic than residues at the same position in the corresponding wild-type nanopore and/or helper protein (or fusion protein). Such hydrophobic residues enhance the interaction between the nanopore in the pore complex and the helper protein (or fusion protein). In the case when the residue at the interface in the wild-type nanopore or helper protein (or fusion protein) is R, Q, N or E, the hydrophobic residue is typically I, L, V, M, F, W, A or Y. In the case where the residue at the interface in the wild-type nanopore or helper protein (or fusion protein) is I, the hydrophobic residue is typically L, V, M, F, W, A or Y. In the case where the residue at the interface in the wild-type nanopore or helper protein (or fusion protein) is L, the hydrophobic residue is typically I, V, M, F, W, A or Y.
Molecular dynamics modeling can be performed to determine which residues in the helper protein and nanopore are very close. This information can be used to design helper proteins and/or transmembrane protein nanopore mutants that can increase the stability of the complex. For example, simulations can be performed using the GROMACS software package version 4.6.5 with the GROMOS 53a6 force field and SPC water model of frozen EM structure using proteins. The complex may be solvated and then the energy minimized using a steepest descent algorithm. Throughout the simulation, constraints can be imposed on the backbone of the protein, but the side chains of the residues can move freely. Using Berendsen thermostats and Berendsen barostats to 300K, the system can be simulated for 20ns in the NPT ensemble. The contact between the accessory protein and the nanopore may be analyzed using GROMACS analysis software and/or locally written code. Two residues may be defined as having been contacted if they are within 3 angstroms of each other.
For example, in a well complex, the interaction between CsgF peptide and the CsgG well may be stabilized, for example, by hydrophobic interactions, electrostatic interactions, or covalent bonds at positions corresponding to one or more pairs of positions of SEQ ID NO:60 and SEQ ID NO:59, respectively, 1 and 153, 4 and 133, 5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201, 12 and 149, 12 and 203, 26 and 191, 29 and 144, or 30 and 196. Residues at one or more of these positions in CsgF and/or CsgG may be modified to enhance the interaction between CsgG and CsgF in the well.
Covalent attachment or binding is for example via a cysteine, wherein the sulfhydryl side group of the cysteine is covalently linked to another amino acid residue or moiety and/or via interaction between unnatural (photo) reactive amino acids. The (photo) reactive amino acids refer to artificial analogs of natural amino acids that can be used to crosslink protein complexes, and can be incorporated into proteins and peptides in vivo or in vitro. The photoreactive amino acid analogs commonly used are the photoreactive bisaziridine analogs of leucine and methionine and p-benzoyl-phenyl-alanine, as well as azido homoalanine, homopropargylglycine, homoallelic glycine, p-acetyl-Phe, p-azido-Phe, p-propargyloxy-Phe and p-benzoyl-Phe (Wang et al 2012; chin et al 2002). When exposed to ultraviolet light, they are activated and covalently bound to interacting proteins that are a few angstroms away from the photoreactive amino acid analog.
Pore complexes can be prepared and disulfide bond formation induced by the use of an oxidizing agent (e.g., copper-phenanthroline). Other interactions (e.g., hydrophobic interactions, charge-charge interactions/electrostatic interactions) may also be used at these sites instead of cysteine interactions. In another embodiment, unnatural amino acids can also be incorporated at those positions. In this embodiment, the covalent bond is formed by click chemistry. For example, unnatural amino acids with azide or alkyne or with Dibenzocyclooctyne (DBCO) groups and/or bicyclo [6.1.0] nonyne (BCN) groups can be introduced at one or more of these positions.
For example, the CsgG pore may comprise at least one CsgG monomer, such as 2,3,4, 5, 6, 7, 8, 9, or 10 modified to promote attachment to a helper protein or fusion protein. For example, cysteine residues may be introduced at one or more of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207, and 209 corresponding to position 59 of SEQ ID NO. and/or at any position predicted to be in contact with a helper protein or fusion protein to facilitate covalent attachment to the helper protein or fusion protein. Alternatively or in addition to covalent attachment via a cysteine residue, the pore may be stabilized by hydrophobic interactions or electrostatic interactions. To facilitate such interactions, non-naturally reactive or photoreactive amino acids at positions corresponding to one or more of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 59.
For example, csgF peptides can be modified to promote attachment to CsgG pores. For example, a cysteine residue may be introduced at one or more of positions 1, 4, 5, 8, 9, 11, 12, 26, or 29 corresponding to SEQ ID NO:60 and/or any position expected to be in contact with CsgG to facilitate covalent attachment to CsgG. Alternatively or in addition to covalent attachment via a cysteine residue, the pore may be stabilized by hydrophobic interactions or electrostatic interactions. To facilitate such interactions, a non-naturally reactive or photoreactive amino acid at a position corresponding to one or more of positions 1,2, 3, 4, 5, 8, 9, 11, 12, 26 or 29 of SEQ ID NO. 60.
Such stabilizing mutations may be combined with any other modification to the helper protein or fusion protein, such as a modification to improve the interaction of the pore complex with the polynucleotide, or a modification to improve certain properties of the complex (e.g., discrimination of polymer units, such as nucleotides of the polynucleotide).
In some embodiments, the nanopore may be isolated, substantially isolated, purified, or substantially purified. If the well is completely free of any other components, such as lipids or other wells, the well is isolated or purified. The pores are substantially isolated if they are mixed with a carrier or diluent that does not interfere with their intended use. For example, a well is substantially isolated or substantially purified if it is in a form that contains less than 10%, less than 5%, less than 2%, or less than 1% of other components (such as block copolymers, lipids, or other wells). Alternatively, the holes may be present in the membrane. Suitable membranes are discussed below.
The pore complexes may be present in the membrane as individual or unitary pores. Alternatively, the pore complexes may be present in a homologous or heterologous population of two or more pores.
The helper protein or fusion may be attached directly to the transmembrane protein nanopore, or both proteins (e.g., a first helper protein and a second helper protein; a first portion of the fusion protein and a second portion of the fusion protein, etc.) may be attached using a linker, such as a chemical cross-linker or peptide linker.
Suitable chemical cross-linking agents are well known in the art. Examples of cross-linking agents include, but are not limited to, 2, 5-dioxopyrrolidin-1-yl 3- (pyridin-2-yl dithio) propionate, 2, 5-dioxopyrrolidin-1-yl 4- (pyridin-2-yl dithio) butyrate, and 2, 5-dioxopyrrolidin-1-yl 8- (pyridin-2-yl dithio) octanoate. In some embodiments, the crosslinker is succinimidyl 3- (2-pyridinedithio) propionate (SPDP). Typically, the molecule is covalently attached to the bifunctional crosslinker prior to covalent attachment of the molecule/crosslinker complex to the mutant monomer, but the bifunctional crosslinker may also be covalently attached to the monomer prior to attachment of the bifunctional crosslinker/monomer complex to the molecule. In some embodiments, the linker is resistant to Dithiothreitol (DTT). Other suitable linkers include, but are not limited to, iodoacetamide-based and maleimide-based linkers.
Suitable amino acid linkers, such as peptide linkers, are known in the art. The length, flexibility and hydrophilicity of the amino acid or peptide linker are typically designed such that the helper protein or fusion protein forms a constriction in the pore complex. Preferred flexible peptide linkers are stretches of 2 to 20, such as 4,6, 8, 10 or 16 serine and/or glycine amino acids. More preferred flexible linkers include (SG)1、(SG)2、(SG)3、(SG)4、(SG)5、(SG)8、(SG)10、(SG)15 or (SG) 20, where S is serine and G is glycine. Preferred rigid linkers are stretches of 2 to 30, such as 4,6, 8, 16 or 24 proline amino acids. More preferred rigid linkers include (P) 12, where P is proline.
Suitable chemical crosslinkers include, but are not limited to, those containing functional groups such as maleimides, active esters, succinimides, azides, alkynes such as dibenzocyclooctynyl alcohol (DIBO or DBCO), difluorocycloalkynes and linear alkynes, phosphines such as those used for traceless and non traceless staudinger ligation, haloacetyl groups such as iodoacetamide, phosgene-type reagents, sulfonyl chloride reagents, isothiocyanates, acyl halides, hydrazines, disulfides, vinyl sulfones, aziridines, and photoreactive reagents such as aryl azides, diazidines.
The reaction between the amino acid and the functional group may be spontaneous, such as cysteine/maleimide, or may require external reagents, such as Cu (I) for linking azide and linear alkyne.
The linker may comprise any molecule that extends across the desired distance. The length of the linker may vary from one carbon (phosgene-type linker) to many angstroms. Examples of linker molecules include, but are not limited to, polyethylene glycol (PEG), polypeptides, polysaccharides, deoxyribonucleic acid (DNA), peptide Nucleic Acid (PNA), threose Nucleic Acid (TNA), glycerol Nucleic Acid (GNA), saturated and unsaturated hydrocarbons, polyamides. These linkers may be inert or reactive, in particular they may be chemically cleaved at defined positions, or they themselves may be modified with fluorophores or ligands. After covalent attachment of the helper protein or fusion protein to the CsgG pore monomer, the linker is preferably resistant to Dithiothreitol (DTT).
In some embodiments, the crosslinking agent is selected from: 2, 5-Dioxopyrrolidin-1-yl 3- (pyridin-2-yldithio) propionate, 2, 5-Dioxopyrrolidin-1-yl 4- (pyridin-2-yldithio) butyrate and 2, 5-Dioxopyrrolidin-1-yl 8- (pyridin-2-yldithio) octanoate, bismaleimide PEG 1k, bismaleimide PEG 3.4k, bismaleimide PEG 5k, bismaleimide PEG 10k, bis (maleimide) ethane (BMOE), bismaleimide hexane (BMH), 1, 4-bismaleimide butane (BMB), 1, 4-bismaleimide-2, 3-dihydroxybutane (BMDB) BM [ PEO ]2 (1, 8-bismaleimide diglycol), BM [ PEO ]3 (1, 11-bismaleimide triglycol), tris [ 2-maleimidoethyl ] amine (TMEA), DTME dithiobismaleimide ethane, bismaleimide PEG3, bismaleimide PEG11, DBCO-maleimide, DBCO-PEG4-NH2, DBCO-PEG4-NHS, DBCO-PEG-DBCO 2.8kDa, DBCO-PEG-DBCO 4.0kDa, DBCO-15 atoms-DBCO, DBCO-26 atoms-DBCO, DBCO-35 atoms-DBCO, DBCO-PEG4-S-S-PEG 3-biotin, DBCO-S-S-PEG 3-biotin, DBCO-S-S-PEG 11-biotin, (succinimidyl 3- (2-pyridyldithio) propionate (SPDP) and maleimide-PEG (2 kDa) -maleimide (α, ω -bismaleimide poly (ethylene glycol)). In some embodiments, the cross-linking agent is maleimide-propyl-SRDFWRS- (1, 2-diaminoethane) -propyl-maleimide.
The linked CsgG pore monomer and the helper protein or fusion protein may be coupled via covalent bond formation between the groups. Any of the specific linkers disclosed in WO 2010/086602 (incorporated herein by reference in its entirety) may be used.
The linker may be marked. Suitable labels include, but are not limited to, fluorescent molecules (such as Cy3 or555 A radioisotope (e.g., 125I、35S、32 P), an enzyme, an antibody, an antigen, a polynucleotide, and a ligand (such as biotin). Such a marker allows quantification of the amount of linker. The label may also be a cleavable purification tag, such as biotin, or a specific sequence shown in the identification method, such as a peptide that is not present in the protein itself but released by trypsin digestion.
The preferred method of attaching the pore monomer conjugate is via a cysteine bond. This can be mediated by bifunctional chemical crosslinkers or by amino acid linkers with terminally existing cysteine residues.
Another preferred attachment method is via 4-azidophenylalanine (Faz) linkage. This can be mediated by a bifunctional chemical linker or by a polypeptide linker with a terminally present Faz residue.
In some embodiments, the linker is a bond formed by a sulfur (VI) fluoride exchange (SuFEx) reaction. In some embodiments, a helper protein (e.g., csgF or a portion of CsgF) can be functionalized with sulfonyl fluoride groups that, when in proper proximity, can react with a nucleophilic amino acid (e.g., a nucleophilic amino acid of a CsgG pore monomer, a nucleophilic acid of another helper protein, etc.) to form a sulfonyl bond (SuFEX).
The helper protein or fusion protein may be genetically fused to the transmembrane protein nanopore. If the entire construct is expressed from a single polynucleotide coding sequence, the pore monomer and the helper protein (or fusion protein) are genetically fused. The monomer or subunit, the accessory protein (or fusion protein) may be fused directly to the monomer or subunit of the transmembrane protein nanopore. Alternatively, the monomer or subunit, the accessory protein (or fusion protein) may be fused to the monomer or subunit of the transmembrane protein nanopore via one or more linkers.
The distance between the CsgG pore monomer and the helper protein or fusion protein and/or the length of the linker in the CsgG pore monomer conjugate is preferably less than about 2.00nm, such as less than about 1.90nm, less than about 1.80nm, less than about 1.70nm, less than about 1.60nm, less than about 1.50nm, less than about 1.40nm, less than about 1.30nm, less than about 1.20nm, less than about 1.10nm, less than about 1.00nm, less than about 0.90nm, less than about 0.80nm, less than about 0.70nm, less than about 0.60nm, less than about 0.50nm, or less than about 0.40nm. The distance between the CsgG pore monomer and the helper protein or fusion protein and/or the length of the linker in the pore monomer conjugate is preferably less than about 1.20nm. This distance/length may be achieved using maleimidocaprooic acid, as discussed in more detail below. The distance between the CsgG pore monomer and the helper protein or fusion protein and/or the length of the linker in the pore monomer conjugate is preferably less than about 0.8nm. This distance/length can be achieved using maleimide propionic acid as discussed below.
The distance between the CsgG pore monomer and the helper protein or fusion protein and/or the length of the linker in the pore monomer conjugate is preferably about 0.40nm to about 2.0nm, such as about 0.45nm to about 1.90nm, about 0.50nm to about 1.80nm, about 0.55nm to about 1.7nm, about 0.60nm to about 1.6nm, about 0.65nm to about 1.5nm, about 0.7nm to about 1.4nm, about 0.75nm to about 1.3nm, about 0.80nm to about 1.2nm, about 0.85nm to about 1.1nm, and about 0.90nm to about 1.00nm. The distance between the CsgG pore monomer and the helper protein or fusion protein and/or the length of the linker in the pore monomer conjugate is preferably about 0.50nm to about 1.50nm. The distance between the CsgG pore monomer and the helper protein or fusion protein and/or the length of the linker in the pore monomer conjugate is preferably about 0.60nm to about 1.2nm. This distance/length may be achieved using any of the specific maleimide-containing linkers discussed below.
The maleimide-containing linker may be any of the linkers discussed below with reference to the constructs described herein. The maleimide-containing linker preferably comprises or consists of a maleimide group and a linear carbon chain of 2, 3,4, 5, 6 or more carbon atoms. The linear carbon chain is typically attached to a nitrogen atom in the maleimide group. The linear carbon chain also preferably comprises a terminal carboxyl group. The carboxyl group is capable of forming an amide bond with an amino acid in the helper protein or the fusion protein. The linker is preferably maleimide acetic acid, maleimide propionic acid, maleimide butyric acid, maleimide valeric acid or maleimide caproic acid. The linker is most preferably maleimide propionic acid. The joint is shown in fig. 15.
The present disclosure also provides a pore monomer conjugate comprising CsgG Kong Shanti covalently attached to a helper protein or fusion protein, wherein the helper protein or fusion protein is covalently attached to a cysteine residue in the CsgG pore monomer via a linker comprising a thiol-reactive group. The thiol-reactive group may be a maleimide group, a pyridyldisulfide group, a halogen group, a p-fluoro group, an alkenyl group, an alkynyl group, a vinyl sulfone group, or a sulfosulfone group. These groups are shown in FIG. 16. The linker comprising a thiol-reactive group may be any linker discussed below with reference to the constructs of the present disclosure. The linker preferably comprises or consists of a thiol reactive group and a linear carbon chain of 2, 3, 4, 5,6 or more carbon atoms. The linear carbon chain also preferably comprises a terminal carboxyl group. The carboxyl group is capable of forming an amide bond with an amino acid in the helper protein or the fusion protein. The linker may be any of the specific maleimide-containing linkers discussed above, wherein the maleimide is replaced with a different thiol-reactive group. The linker containing the thiol-reactive group may be any length discussed above.
Suitable linking groups can be designed using conventional modeling techniques. The linker is generally flexible enough to allow the monomers or subunits to assemble into their respective protein oligomers and align along their common symmetry axis to create a continuous channel within the pore complex.
Identification and selection of helper proteins
Aspects of the present disclosure relate to computer-based methods of designing and/or selecting helper proteins and/or fusion proteins for inclusion in protein pore complexes (e.g., protein pore complexes comprising CsgG nanopores). In some embodiments, the method includes providing an amino acid sequence (e.g., csgF amino acid sequence) as an input to software that includes code that implements protein backbone sequence selection techniques and processes the amino acid sequence to produce a backbone amino acid sequence as an output. In some embodiments, the Protein backbone selection technique may be MASTER (e.g., as described in Zhou and Grigoryan, protein Sci.2015, month 4; 24 (4): 508-524), the entire contents of which are incorporated herein by reference). In some embodiments, the protein backbone selection technique includes selecting a protein backbone structure having one or more target features (e.g., the ability to form one or more helical regions, the ability to stack with one or more helical regions of a protein pore, etc.) from known protein backbone structures (e.g., as described in protein database PDB). In some embodiments, backbone structures are provided as inputs to software that includes code that implements protein sequence design and structure prediction techniques and processes the backbone structures to produce one or more de novo designed peptide sequences. In some embodiments, the protein sequence design and structure prediction technique may be Rosetta (e.g., as described in Leaver-Fay et al, chapter -Rosetta3:An Object-Oriented Software Suite for the Simulation and Design of Macromolecules,Methods in Enzymology,Academic Press,, volume 487, 2011, pages 545-574, doi.org/10.1016/B978-0-12-381270-4.00019-6, the entire contents of which are incorporated herein by reference). In some embodiments, the de novo designed peptide sequence comprises one or more target features identical to one or more desired features of the backbone amino acid sequence.
Method of producing a nanoporous complex
In one embodiment, a pore complex comprising a helper protein or fusion protein and a transmembrane protein nanopore may be prepared via co-expression. In some embodiments, the method includes the steps of expressing both the pore monomer and the helper protein or fusion protein, or the helper protein or monomer, in a suitable host cell and allowing in vivo complex pore formation. In this embodiment, at least one gene encoding a pore monomer in one vector and a gene encoding an accessory protein or fusion protein, or at least one accessory protein subunit or monomer in a second vector may be transformed together to express the protein and produce a complex in the transformed cell. This is preferably done ex vivo or in vitro. Alternatively, the two genes encoding Kong Shanti and the helper protein (or fusion protein) or subunits thereof may be placed in a vector, either under the control of a single promoter or under the control of two separate promoters, which may be the same or different.
Another method for producing a pore complex formed by a helper protein or fusion protein and a transmembrane protein nanopore is in vitro reconstitution of the protein to obtain a functional pore. In some embodiments, the method includes the step of contacting the monomers of the transmembrane protein nanopore with a helper protein (or fusion protein), or a helper protein subunit or monomer, in a suitable system to allow complex formation. The system may be an "in vitro system", which refers to a system comprising at least the components and environments necessary to perform the method, and which allows for a more detailed, more convenient or more efficient analysis than with an entire organism, using biomolecules, organisms, cells (or parts of cells) outside of their normal naturally occurring environment. The in vitro system may further comprise a suitable buffer composition provided in a test tube, wherein said protein component forming the complex has been added. Those skilled in the art will recognize the option of providing the system.
In this embodiment, the nanopore may be created by expressing the monomer separately from the helper protein or fusion protein. The pore monomer or nanopore may be purified from cells transformed with a vector encoding at least one pore monomer or with more than one vector each expressing a pore monomer. The helper protein or fusion protein may be purified from cells transformed with a vector encoding at least one helper protein or fusion protein. The purified pore monomer/nanopore may then be incubated with a helper protein or fusion protein to prepare a pore complex.
In another embodiment, the nanopore monomer and/or the helper protein or fusion protein is produced by In Vitro Translation and Transcription (IVTT), respectively. The nanopore monomer may then be incubated with a helper protein or fusion protein to prepare a pore complex.
The above embodiments may be combined such that, for example, (i) the nanopore is produced in vivo and the helper protein or fusion protein is produced in vivo, (ii) the nanopore is produced in vitro and the helper protein or fusion protein is produced in vivo, (iii) the nanopore is produced in vivo and the helper protein or fusion protein is produced in vitro, or (iv) the nanopore is produced in vitro and the helper protein or fusion protein is produced in vitro.
One or both of the nanopore monomer and the helper protein or fusion protein may be labeled to facilitate purification. Purification can also be performed when the nanopore monomer and/or the helper protein or fusion protein are unlabeled. Methods known in the art (e.g., ion exchange, gel filtration, hydrophobic interaction column chromatography, etc.) may be used alone or in various combinations to purify the components of the pore complexes.
Any known tag may be used for either of these two proteins. In one embodiment, two tag purifications may be used to purify the pore complex from its constituent parts. For example, strep tags can be used in nanopores and His tags can be used in helper proteins (or fusion proteins), and vice versa. Similar end results can be obtained when the two proteins are purified separately and mixed together, followed by another round of Strep and His purification.
The pore complexes may be prepared either prior to insertion into the membrane or after insertion of the nanopores into the membrane. However, the nanopore may be inserted into the membrane, and then an accessory protein (or fusion protein) may be added so that the pore complex may be formed in situ. For example, in one embodiment, where the trans-or cis-side of the membrane is an accessible system (e.g., in a chip or chamber for electrophysiological measurements), a nanopore may be inserted into the membrane, and then an accessory protein (or fusion protein) may be added from the trans-or cis-side of the membrane, such that the complex may be formed in situ.
In one embodiment, the helper protein may comprise a protease cleavage site (e.g., TEV, HRV 3 or any other protease cleavage site) and be cleaved either before or after association with the nanopore. For example, full-length helper proteins (or fusion proteins) may be used to form the pores. Cleavage of amino acid residues that do not form part of the channel construction and that do not require interaction with the transmembrane pore may be cleaved from the helper protein or fusion protein. In this embodiment, once the pore complex is formed, a protease is used to cleave the helper protein or fusion protein. Alternatively, proteases may be used to produce helper or fusion proteins prior to assembly of the pore complex.
Some protease sites will leave additional tags (or a portion thereof, such as one or more amino acids of a tag) after cleavage. For example, the TEV protease cleavage sequence is ENLYFQS. TEV protease cleaves proteins between Q and S leaving ENLYFQ intact at the C-terminus of CsgF peptide. As another example, the HRV C3 cleavage site is LEVLFQGP and the enzyme cleaves between Q and G leaving LEVLFQ intact at the C-terminus of the CsgF peptide.
The protein may be chemically modified with molecular adaptors that facilitate interactions between the pores containing the monomers and the target nucleotide or target polynucleotide sequence. Suitable adaptors, including cyclic molecules, cyclodextrins, substances capable of hybridizing, DNA binding agents or mutual chelators, peptides or peptide analogues, synthetic polymers, aromatic planar molecules, positively charged small molecules or small molecules capable of hydrogen bonding are described in WO 2019/002893 (incorporated herein by reference in its entirety). Any of the methods and linkers discussed above may be used to attach the molecular adaptors.
The protein may be attached to a polynucleotide binding protein. This forms a modular sequencing system. Polynucleotide binding proteins are discussed below. The protein may be covalently attached to the monomer using any method known in the art. The monomers and proteins may be chemically fused or genetically fused. Gene fusion of monomers to polynucleotide binding proteins is discussed in WO 2010/004265 (incorporated herein by reference in its entirety). The polynucleotide binding protein may be attached via a cysteine bond using any of the methods described above.
The polynucleotide binding protein may be directly attached to the protein via one or more linkers. The molecule may be attached to CsgG Kong Shanti using a hybridization linker as described in WO 2010/086602 (incorporated herein by reference in its entirety). Alternatively, peptide linkers may be used. Suitable peptide linkers are discussed above.
Any protein can be produced using standard methods known in the art. The polynucleotide sequence encoding the protein may be derived and replicated using methods standard in the art. The polynucleotide sequence encoding the protein may be expressed in bacterial host cells using techniques standard in the art. Proteins can be produced in cells by expressing the polypeptide in situ from a recombinant expression vector. The expression vector optionally carries an inducible promoter to control expression of the polypeptide. These methods are described in Sambrook, j. And Russell, d. (2001) Molecular Cloning: A Laboratory Manual, 3 rd edition, cold Spring Harbor Laboratory Press, cold Spring Harbor, NY.
Proteins can be produced on a large scale after purification from the protein-producing organism by any protein liquid chromatography system or after recombinant expression. Typical protein liquid chromatography systems include FPLC, AKTA systems, bio-Cad systems, bio-Rad BioLogic systems, and Gilson HPLC systems.
System and method for controlling a system
In another aspect, the present disclosure is directed to a system for characterizing a target polynucleotide, the system comprising a membrane and a pore complex, wherein the pore complex comprises (i) a nanopore in the membrane, and (ii) an accessory protein or fusion protein attached to the nanopore, wherein the nanopore and accessory protein or fusion protein together form a continuous channel across the membrane, the channel comprising a first constriction region and a second constriction region.
The pore complex, nanopore and accessory or fusion protein may be any as described above.
In one embodiment, the system further comprises a first chamber and a second chamber, wherein the first chamber and the second chamber are separated by a membrane. When used to characterize a target polynucleotide, the system may further comprise a target polynucleotide, wherein the target polynucleotide is transiently located within the continuous channel, and wherein one end of the target polynucleotide is located in the first chamber and one end of the target polynucleotide is located in the second chamber.
In one embodiment, the system further comprises a conductive solution in contact with the nanopore, an electrode providing a voltage potential across the membrane, and a measurement system for measuring the current through the nanopore. In one embodiment, the voltage applied across the membrane and pore complex is +5v to-5V, such as-600 mV to +600mV or-400 mV to +400mV. The voltage used is preferably in the range of 100mV to 240mV and more preferably in the range of 120mV to 220 mV. By using an increased applied potential, the degree of discrimination between different nucleotides can be increased by the pore. Any suitable conductive solution may be used. For example, the solution may contain charge carriers such as metal salts, e.g. alkali metal salts, halide salts, e.g. chloride salts, such as alkali metal chloride salts. The charge carriers may comprise ionic liquids or organic salts, such as tetramethylammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethylammonium chloride or 1-ethyl-3-methylimidazolium chloride. In an exemplary system, the salt is present in an aqueous solution in the chamber. Usually potassium chloride (KCl), sodium chloride (NaCl), cesium chloride (CsCl) or a mixture of potassium ferrocyanide and potassium ferricyanide is used. Preferably KCl, naCl and a mixture of potassium ferrocyanide and potassium ferricyanide. The charge carrier may be membrane-asymmetric. For example, the type and/or concentration of charge carriers may be different on each side of the membrane (e.g., in each chamber).
The salt concentration may be in a saturated state. The salt concentration may be 3M or less, and is typically 0.1 to 2.5M, 0.3 to 1.9M, 0.5 to 1.8M, 0.7 to 1.7M, 0.9 to 1.6M, or 1M to 1.4M. The salt concentration is preferably 150mM to 1M. The method is preferably performed using a salt concentration of at least 0.3M, such as at least 0.4M, at least 0.5M, at least 0.6M, at least 0.8M, at least 1.0M, at least 1.5M, at least 2.0M, at least 2.5M, or at least 3.0M. The high salt concentration provides a high signal to noise ratio and allows the current to indicate the presence of nucleotides identified against the normal current fluctuation background.
The buffer may be present in the conductive solution. Typically, the buffer is a phosphate buffer. Other suitable buffers are HEPES and Tris-HCl buffers. The pH of the conductive solution may be 4.0 to 12.0, 4.5 to 10.0, 5.0 to 9.0, 5.5 to 8.8, 6.0 to 8.7, or 7.0 to 8.8, or 7.5 to 8.5. The pH used is preferably about 6.9.
The system may include an array of pore complexes present in the membrane. In a preferred embodiment, each membrane in the array comprises one pore complex. Due to the manner in which the array is formed, for example, the array may comprise one or more films that do not comprise a pore complex, and/or one or more films that comprise two or more pore complexes. The array may comprise from about 2 to about 12,000 layers, such as from about 10 to about 800 layers, from about 20 to about 600 layers, from about 30 to about 500 layers, from about 250 to about 2000 layers, from about 500 to about 4000 layers, from about 1000 to about 5000 layers, from about 2500 to about 10,000 layers, or from about 5000 to about 12,000 layers of film. In some embodiments, the array comprises more than 12,000 layers of film.
The system may be included in a device. The device may be any conventional device for analyte analysis, such as an array or chip. The apparatus is preferably arranged to perform the disclosed method. For example, the device may include a chamber containing an aqueous solution and a barrier dividing the chamber into two sections. The barrier typically has pores in which a membrane containing the pores is formed. Alternatively, the barrier forms a membrane in which the pores are present.
In one embodiment, the apparatus includes a sensor device capable of supporting a plurality of wells and membranes and operable to perform analyte characterization using the wells and membranes, and at least one port for delivering a material for performing the characterization.
In one embodiment, the apparatus includes a sensor device capable of supporting a plurality of wells and membranes, the sensor device operable to perform analyte characterization using the wells and membranes, and at least one reservoir for containing a material for performing the characterization.
In one embodiment, the apparatus includes a sensor device capable of supporting a membrane and a plurality of wells and membranes and operable to perform analyte characterization using the wells and membranes, at least one reservoir for containing a material for performing characterization, a fluidic system configured to controllably supply material from the at least one reservoir to the sensor device, and one or more receptacles for receiving respective samples, the fluidic system configured to selectively supply samples from the one or more receptacles to the sensor device.
The device may also include circuitry capable of applying an electrical potential and measuring an electrical signal across the membrane and pore complexes. The device may be any of those described in WO 2008/102120, WO 2009/077734, WO 2010/122293, WO 2011/067559 or WO 00/28312.
Film and method for producing the same
Any suitable membrane may be used in the system. The membrane is preferably an amphiphilic layer. The amphiphilic layer is a layer formed of amphiphilic molecules having hydrophilic and lipophilic properties, such as phospholipids. The amphiphilic molecules may be synthetic or naturally occurring. Non-naturally occurring amphiphiles and amphiphiles that form monolayers are known in the art and include, for example, block copolymers (Gonzalez-Perez et al, langmuir,2009,25,10447-10450). A block copolymer is a polymeric material in which two or more monomer subunits are polymerized together to produce a single polymer chain. The block copolymer generally has the properties contributed by each monomer subunit. However, block copolymers may have unique properties that are not possessed by polymers formed from individual subunits. The block copolymer may be designed such that one of the monomer subunits is hydrophobic (i.e., lipophilic) while the other subunits are hydrophilic when in an aqueous medium. In this case, the block copolymer may possess amphiphilic properties, and may form a structure simulating a biofilm. The block copolymer may be diblock (composed of two monomer subunits), but may also be constructed from more than two monomer subunits to form a more complex arrangement that appears to be an amphiphile. The copolymer may be a triblock, tetrablock or pentablock copolymer. The film is preferably a triblock copolymer film.
Archaebacteria bipolar tetraether lipids are naturally occurring lipids that are structured such that the lipids form a monolayer film. These lipids are typically found in extreme microorganisms, thermophilic microorganisms, halophilic microorganisms and acidophilic microorganisms that survive in harsh biological environments. Its stability is believed to be due to the fusion properties of the final bilayer. It is simple to construct block copolymer materials that mimic these biological entities by producing triblock polymers with the general motif hydrophilic-hydrophobic-hydrophilic. Such materials can form monomeric membranes that behave like lipid bilayers and encompass a range of stages from vesicles to lamellar membranes. Membranes formed from these triblock copolymers retain several advantages over biolipid membranes. Because triblock copolymers are synthesized, the exact construction can be carefully controlled to provide the correct chain length and properties required to form films and interact with pores and other proteins.
The block copolymers may also be composed of subunits that are not classified as aprotic materials, for example, the hydrophobic polymers may be made from siloxanes or other non-hydrocarbon based monomers. The hydrophilic subsections of the block copolymer may also possess low protein binding properties, which allows for the creation of a membrane that is highly resistant when exposed to the original biological sample. This headgroup unit may also be derived from a non-classical lipid headgroup.
Triblock copolymer membranes also have increased mechanical and environmental stability compared to biolipid membranes, such as much higher operating temperatures or pH ranges. The synthetic nature of the block copolymers provides a platform for tailoring polymer-based films for various applications.
The film is most preferably one of the films disclosed in international application No. WO2014/064443 or WO 2014/064444.
Amphiphilic molecules may be chemically modified or functionalized to facilitate coupling of polynucleotides. The amphiphilic layer may be a single layer or a double layer. The amphiphilic layer is generally planar. The amphiphilic layer may be curved. The amphiphilic layer may be supported.
Amphiphilic membranes are generally naturally mobile, essentially acting as two-dimensional fluids, with a lipid diffusion rate of about 10 -8cm s-1. This means that the pore and coupled polynucleotide can move generally within the amphiphilic membrane.
The membrane may be a lipid bilayer. Lipid bilayers are a model of cell membranes and serve as an excellent platform for a series of experimental studies. For example, lipid bilayers can be used for in vitro studies of membrane proteins by single channel recording. Alternatively, the lipid bilayer may be used as a biosensor to detect the presence of a range of substances. The lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include, but are not limited to, planar lipid bilayers, support bilayers, or liposomes. The lipid bilayer is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in WO 2008/102121, WO 2009/077734 and WO 2006/100484.
Methods for forming lipid bilayers are known in the art. Lipid bilayers are typically formed by the method of Montal and Mueller (Proc. Natl. Acad. Sci. USA.,1972; 69:3561-3566), in which lipid monolayers are supported on aqueous/air interfaces passing on either side of an opening perpendicular to the interface. Lipids are typically added to the surface of the aqueous electrolyte solution by first dissolving the lipid in an organic solvent and then evaporating the solvent drop on the surface of the aqueous solution on either side of the pores. Once the organic solvent evaporates, the solution/air interface on either side of the opening physically moves up and down through the opening until a bilayer is formed. A planar lipid bilayer may be formed across an aperture in the membrane or across an opening into the groove.
The Montal & Mueller method is popular because it is a cost effective and relatively simple method of forming a high quality lipid bilayer suitable for protein pore insertion. Other common methods of bilayer formation include tip dipping, smearing of the bilayer, and patch clamp of liposome bilayer.
Tip-impregnated bilayer formation requires contacting an open-cell surface (e.g., a pipette tip) with the surface of a test solution carrying a monolayer of lipid. Also, a lipid monolayer is first created at the solution/air interface by allowing lipid droplets dissolved in an organic solvent to evaporate at the solution surface. The bilayer is then formed by the Langmuir-Schaefer method and requires mechanical automation to move the openings relative to the solution surface.
For the application of the bilayer, lipid droplets dissolved in an organic solvent are applied directly to the openings and immersed in an aqueous test solution. The lipid solution is thinly coated on the openings using a brush or equivalent. The dilution of the solvent causes the formation of lipid bilayers. However, complete removal of solvent from the bilayer is difficult, and thus the bilayer formed by this method is less stable and more prone to noise during electrochemical measurements.
Patch clamp is commonly used for the study of biological cell membranes. The cell membrane is clamped to the pipette tip by suction and the membrane is attached to the opening. The method has been adapted to create lipid bilayers by sandwiching the liposome, then disrupting the liposome, leaving the lipid bilayer sealed over the pipette orifice. This method requires stable, large and unilamellar liposomes and the fabrication of small open pores in materials with glass surfaces.
Liposomes can be formed by sonication, extrusion or Mozafari methods (Colas et al (2007) Micron 38:841-847). In a preferred embodiment, the lipid bilayer is formed as described in International application No. WO 2009/077734. Advantageously, in this method, the lipid bilayer is formed from dried lipids. In the most preferred embodiment, the lipid bilayer is formed across an opening as described in WO 2009/077734.
The lipid bilayer is formed from two opposing lipid layers. The two layers of lipids are arranged such that their hydrophobic tail groups face each other to form a hydrophobic interior. The hydrophilic head groups of the lipids face outward toward the aqueous environment on each side of the bilayer. Bilayers can exist in many lipid phases including, but not limited to, liquid disordered phases (fluid lamellar), liquid ordered phases, solid ordered phases (lamellar gel phases, interdigital gel phases) and planar bilayer crystals (lamellar subggel phases, lamellar crystalline phases).
Any lipid composition that forms a lipid bilayer may be used. The lipid composition is selected such that a lipid bilayer is formed having the desired properties, such as surface charge, ability to support membrane proteins, bulk density, or mechanical properties. The lipid composition may comprise one or more different lipids. For example, a lipid composition may contain up to 100 lipids. The lipid composition preferably contains 1 to 10 lipids. The lipid composition may comprise naturally occurring lipids and/or artificial lipids.
Lipids generally comprise a head group, an interface moiety, and two hydrophobic tail groups, which may be the same or different. Suitable head groups include, but are not limited to, neutral head groups such as Diacylglycerol (DG) and Ceramide (CM), zwitterionic head groups such as Phosphatidylcholine (PC), phosphatidylethanolamine (PE) and Sphingomyelin (SM), negatively charged head groups such as Phosphatidylglycerol (PG), phosphatidylserine (PS), phosphatidylinositol (PI), phosphoric Acid (PA) and Cardiolipin (CA), and positively charged head groups such as Trimethylammoniopropane (TAP). Suitable interface moieties include, but are not limited to, naturally occurring interface moieties, such as glycerol-based or ceramide-based moieties. Suitable hydrophobic tail groups include, but are not limited to, saturated hydrocarbon chains such as lauric acid (n-dodecanoic acid), myristic acid (n-tetradecanoic acid), palmitic acid (n-hexadecanoic acid), stearic acid (n-octadecanoic acid), and arachidic acid (n-eicosanoic acid), unsaturated hydrocarbon chains such as oleic acid (cis-9-octadecanoic acid), and branched hydrocarbon chains such as phytanic acid. The length of the chains in the unsaturated hydrocarbon chain and the position and number of double bonds may vary. The chain length in branched hydrocarbon chains and the position and number of branches such as methyl groups can vary. The hydrophobic tail group may be attached to the interface moiety as an ether or ester. The lipid may be mycolic acid.
Lipids may also be chemically modified. The head or tail groups of the lipid may be chemically modified. Suitable lipids whose head groups have been chemically modified include, but are not limited to, PEG modified lipids such as 1, 2-diacyl-sn-glycero-3-phosphoethanolamine-N- [ methoxy (polyethylene glycol) -2000], functionalized PEG lipids such as 1, 2-distearoyl-sn-glycero-3-phosphoethanolamine-N- [ biotinyl (polyethylene glycol) 2000], and lipids modified for conjugation such as 1, 2-dioleoyl-sn-glycero-3-phosphoethanolamine-N- (succinyl) and 1, 2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N- (biotinyl). Suitable lipids whose tail groups have been chemically modified include, but are not limited to, polymerizable lipids such as 1, 2-bis (10, 12-trimethylbenzenedioyl) -sn-glycero-3-phosphorylcholine, fluorinated lipids such as 1-palmitoyl-2- (16-fluoropalmitoyl) -sn-glycero-3-phosphorylcholine, deuterated lipids such as 1, 2-dipalmitoyl-D62-sn-glycero-3-phosphorylcholine, and ether-linked lipids such as 1, 2-di-O-phytyl-sn-glycero-3-phosphorylcholine. Lipids may be chemically modified or functionalized to facilitate coupling of polynucleotides.
Amphiphilic layers, such as lipid compositions, typically contain one or more additives that will affect the properties of the layer. Suitable additives include, but are not limited to, fatty acids such as palmitic acid, myristic acid and oleic acid, fatty alcohols such as palmitol, myristyl alcohol and oleyl alcohol, sterols such as cholesterol, ergosterol, lanosterol, sitosterol and stigmasterol, lysophospholipids such as 1-acyl-2-hydroxy-sn-glycero-3-phosphorylcholine, and ceramides.
In another preferred embodiment, the film comprises a solid layer. The solid layer may be formed of organic and inorganic materials including, but not limited to, microelectronic materials, insulating materials (such as Si 3N4、A12O3 and SiO), organic and inorganic polymers (such as polyamide), plastics (such as) Or elastomers (such as two-component addition cure silicone rubber) and glass. The solid layer may be formed of graphene. Suitable graphene layers are disclosed in WO 2009/035647. If the membrane comprises a solid layer, the pores are typically present in an amphiphilic membrane or layer contained within the solid layer, such as within holes, pores, gaps, channels, grooves or slits within the solid layer. The skilled artisan can prepare suitable solid state/amphiphilic hybridization systems. Suitable systems are disclosed in WO 2009/020682 and WO 2012/005857. Any of the amphiphilic membranes or layers discussed above may be used.
The methods are typically performed using (i) an artificial amphiphilic layer comprising pores, (ii) an isolated naturally occurring lipid bilayer comprising pores, or (iii) cells having pores inserted therein. The method is typically performed using an artificial amphiphilic layer, such as an artificial triblock copolymer layer. The layer may comprise other transmembrane and/or intramembrane proteins and other molecules than pores. Suitable equipment and conditions are discussed below. The methods of the present disclosure are typically performed in vitro.
Method for characterizing an analyte
In another aspect, a method of determining the presence, absence, or one or more characteristics of a target analyte is disclosed. The method includes contacting the target analyte with a membrane comprising a pore complex such that the target analyte moves relative to a continuous channel, such as into or through the continuous channel, which comprises at least two structures provided by the nanopore and an accessory protein or peptide, respectively, in the pore complex, and making one or more measurements while the analyte moves relative to the channel, and thereby determining the presence, absence, or one or more characteristics of the analyte. The analyte may pass through the nanometer Kong Shousu and subsequently shrink through the accessory protein. In an alternative embodiment, the analyte may shrink through the accessory protein, followed by the nanometer Kong Shousu, depending on the orientation of the pore complex in the membrane.
In one embodiment, the method is used to determine the presence, absence, or one or more characteristics of the target analyte. The method may be used to determine the presence, absence or one or more characteristics of at least one analyte. The method may involve determining the presence, absence or one or more characteristics of two or more analytes. The method may include determining the presence, absence, or one or more characteristics of any number of analytes, such as 2, 5, 10, 15, 20, 30, 40, 50, 100, or more analytes. Any number of features of one or more analytes may be determined, such as 1, 2, 3,4, 5, 10, or more features.
Binding of molecules in the channel of the pore complex or near any of the openings of the channel will have an effect on the open channel ion flow through the pore, which is the nature of "molecular sensing" of the pore channel. In a similar manner to nucleic acid sequencing applications, changes in open channel ion flow can be measured by changes in current using suitable measurement techniques (e.g., WO 2000/28312 and d.stoddart et al, proc.Natl. Acad.Sci.,2010,106,7702-7 or WO 2009/077734). The extent of the reduction in ion flow, as measured by the reduction in current, is related to the size of the obstruction within or near the aperture. Thus, binding of a molecule of interest (also referred to as an "analyte") in or near a well provides a detectable and measurable event, forming the basis of a "biosensor". Suitable molecules for nanopore sensing include nucleic acids, proteins, peptides, polysaccharides and small molecules (referred to herein as low molecular weight (e.g., <900Da or <500 Da) organic or inorganic compounds), such as drugs, toxins, cytokines and contaminants. Detecting the presence of biomolecules may be applied in personalized medicine development, medicine, diagnostics, life sciences research, environmental monitoring, and security and/or defense industries.
The target analyte may be a metal ion, an inorganic salt, a polymer, an amino acid, a peptide, a polypeptide, a protein, a nucleotide, an oligonucleotide, a polynucleotide, a monosaccharide, a polysaccharide, a dye, a bleach, a drug, a diagnostic agent, a recreational drug, an explosive, a toxic compound, or an environmental contaminant. The method may involve determining the presence, absence, or one or more characteristics of two or more analytes of the same type, such as two or more proteins, two or more nucleotides, or two or more drugs. Alternatively, the method may involve determining the presence, absence, or one or more characteristics of two or more different types of analytes, such as one or more proteins, one or more nucleotides, and one or more drugs.
The target analyte may be secreted from the cell. Alternatively, the target analyte may be an analyte present inside the cell, such that the analyte must be extracted from the cell before the method can be performed.
In one embodiment, the analyte is an amino acid, peptide, polypeptide, or protein. The amino acid, peptide, polypeptide or protein may be naturally occurring or non-naturally occurring. Polypeptides or proteins may include synthetic or modified amino acids therein. Several different types of modifications to amino acids are known in the art. Suitable amino acids and modifications thereof are described above. It will be appreciated that the target analyte may be modified by any method available in the art.
In a preferred embodiment, the analyte is a polynucleotide, such as a nucleic acid. A polynucleotide is defined as a macromolecule comprising two or more nucleotides. Naturally occurring nucleobases in DNA and RNA can be distinguished by their physical size. When a nucleic acid molecule or single base passes through the channel of a nanopore, the size difference between the bases results in a directly related decrease in ion flow through the channel. The change in ion flow can be recorded. Suitable electrical measurement techniques for recording changes in ion flow are described, for example, in WO 2000/28312 and d.stoddart et al, proc.Natl. Acad.Sci.,2010,106, pages 7702-7 (single channel recording device), and in WO 2009/077734 (multi channel recording technique), for example. By proper calibration, the characteristic reduction of ion flow can be used to identify specific nucleotides and related bases through the channel in real time. In typical nanopore nucleic acid sequencing, the open channel ion flow is reduced as each nucleotide of the nucleic acid sequence of interest passes through the nanopore's channel in turn due to partial obstruction of the channel by the nucleotide. This reduction in ion flux is measured using the appropriate recording techniques described above. The decrease in ion flow can be calibrated to the decrease in measured ion flow through a known nucleotide of the channel, thereby creating a means for determining which nucleotide is passing through the channel, and thus, when performed sequentially, providing a means of determining the nucleotide sequence of a nucleic acid passing through the nanopore. In order to accurately determine individual nucleotides, it is often desirable that the reduction in ion flow through the channel is directly related to the size of the individual nucleotide through the constriction (or "read head"). It will be appreciated that whole nucleic acid polymers that "spiral" through the pore via the action of, for example, an associated polymerase or helicase may be sequenced. Alternatively, the sequence may be determined by passage of nucleotide triphosphates sequentially removed from the target nucleic acid in the vicinity of the pore (see, for example, WO 2014/187924).
The polynucleotide or nucleic acid may comprise any combination of any nucleotides. The nucleotides may be naturally occurring or artificial. One or more nucleotides in the polynucleotide may be oxidized or methylated. One or more nucleotides in the polynucleotide may be damaged. For example, the polynucleotide may comprise a pyrimidine dimer. Such dimers are often associated with uv-light induced damage and are the primary cause of cutaneous melanoma. One or more nucleotides in the polynucleotide may be modified, for example with a label or tag, suitable examples of which are known to the skilled person. The polynucleotide may comprise one or more spacers. Nucleotides typically contain a nucleobase, a sugar and at least one phosphate group. Nucleobases and sugars form nucleosides. Nucleobases are typically heterocyclic. Nucleobases include, but are not limited to, purines and pyrimidines, and more specifically include adenine (a), guanine (G), thymine (T), uracil (U), and cytosine (C). The sugar is typically pentose. Nucleotide sugars include, but are not limited to, ribose and deoxyribose. The sugar is preferably deoxyribose. The polynucleotide preferably comprises deoxyadenosine (dA), deoxyuridine (dU) and/or thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC). The nucleotides are typically ribonucleotides or deoxyribonucleotides. Nucleotides typically contain mono-, di-or triphosphates. The nucleotides may comprise more than three phosphates, such as 4 or 5 phosphates. The phosphate may be attached to the 5 'or 3' side of the nucleotide. The nucleotides in the polynucleotide may be linked to each other in any manner. Nucleotides are typically linked by their sugar and phosphate groups, as in nucleic acids. Nucleotides can be linked by their nucleobases, as in pyrimidine dimers. The polynucleotide may be single-stranded or double-stranded. At least a portion of the polynucleotide is preferably double stranded. The polynucleotide is most preferably ribonucleic acid (RNA) or deoxyribonucleic acid (DNA). In particular, the method of using a polynucleotide as an analyte alternatively comprises determining one or more characteristics selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide, and (v) whether the polynucleotide is modified.
The polynucleotide may have any length (i). For example, the polynucleotide may be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400, or at least 500 nucleotides or nucleotide pairs in length. The length of a polynucleotide may be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs, or 100000 or more nucleotides or nucleotide pairs. Any number of polynucleotides may be studied. For example, the method may involve characterizing 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100, or more polynucleotides. If two or more polynucleotides are characterized, they may be different polynucleotides or two instances of the same polynucleotide. Polynucleotides may be naturally occurring or man-made. For example, the method can be used to verify the sequence of the oligonucleotide produced. The method is typically performed in vitro.
The nucleotide may have any identity (ii) and includes, but is not limited to, adenosine Monophosphate (AMP), guanosine Monophosphate (GMP), thymidine Monophosphate (TMP), uridine Monophosphate (UMP), 5-methylcytidine monophosphate, 5-hydroxymethylcytosine monophosphate, cytidine Monophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dabp), deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate (dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine monophosphate (dCMP), and deoxymethylcytidine monophosphate. The nucleotide is preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP and dUMP. Nucleotides may be abasic (i.e., lack nucleobases). Nucleotides may also lack nucleobases and sugars (i.e., be C3 spacers). The sequence (iii) of nucleotides is determined by the sequential identity of subsequent nucleotides attached to each other in the 5 'to 3' direction of the strand throughout the polynucleotide strain.
The pore complexes comprising at least two contractions are particularly useful in analyzing homopolymers. For example, a pore may be used to determine the sequence of a polynucleotide comprising two or more, such as at least 3, 4, 5, 6, 7, 8, 9, or 10 identical contiguous nucleotides. For example, the pore may be used to sequence polynucleotides comprising polyA, polyT, polyG and/or a polyC region.
In some embodiments, csgG Kong Shousu consists of residues at positions 51, 55, and 56 of SEQ ID NO: 59. The interaction of the DNA of approximately 5 bases at any given time with the shrinkage of the well dominates the current signal as the DNA passes through the shrinkage. Although some CsgG wells (e.g., csgG wells lacking one or more accessory proteins or fusion proteins as described herein) are very good at reading mixed sequence regions of DNA (when A, T, g and C are mixed), when a homopolymerizing region (e.g., polyT, polyG, polyA, polyC) is present within the DNA, the signal becomes flat and lacks some information. Because 5 bases dominate the signal of CsgG and its shrinkage mutants, it is difficult to discern homopolymers longer than 5 without using additional residence time information. However, if the DNA is passing through a second constriction, more DNA bases will interact with the combined constriction, thereby increasing the length of the homopolymer that can be discerned.
Kit for detecting a substance in a sample
In another aspect, the present disclosure also provides a kit for characterizing a polynucleotide of interest. The kit includes components of the disclosed well complex and membrane. The film is preferably formed from these components. The pore complexes are preferably present in the membrane, together forming transmembrane pore complex channels. The kit may comprise components of any type of membrane, such as amphiphilic layers or triblock copolymer membranes. The kit may further comprise a polynucleotide binding protein, such as a nucleic acid processing enzyme, e.g. a polymerase or helicase. The kit may further comprise one or more anchors, such as cholesterol, for coupling the polynucleotide to the membrane. The kit may further comprise one or more polynucleotide adaptors that are attachable to the target polynucleotide to facilitate characterization of the polynucleotide. In one embodiment, an anchor, such as cholesterol, is attached to the polynucleotide adapter. The kit may additionally include one or more other reagents or instruments that enable any of the embodiments described above. Such reagents or instruments include one or more of a suitable buffer (aqueous solution), a means for obtaining a sample from a subject (such as a container or instrument containing a needle), a means for amplifying and/or expressing a polynucleotide, or a voltage or patch clamp device. The reagents may be present in the kit in a dry state such that the fluid sample re-suspends the reagents. The kit may also optionally include instructions that enable the kit to be used in the methods of the present disclosure or details regarding which organisms the method may be used in. Finally, the kit may also include additional components useful in polynucleotide characterization.
It is to be understood that although specific embodiments, specific configurations, and materials and/or molecules have been discussed herein with respect to engineered cells and methods according to the present disclosure, various changes or modifications in form and detail may be made without departing from the scope and spirit of the present disclosure. The following examples are provided to better illustrate specific embodiments and should not be construed as limiting the application. The application is limited only by the claims.
Examples
Example 1
To create helical shrinkage, a slave head design is used to select small protein domains that fold well and project to the desired extent into the interior cavity of the nanopore. Many procedures can be used for this purpose. This example describes a workflow that uses degree MASTER to facilitate backbone design and uses Rosetta with variable backbone geometry for sequence selection.
To create a new domain projected into the bore lumen, a program such as RF diffusion, CHROMA or program MASTER may be used. Here we use MASTER. Searching in a Protein Database (PDB) for structures meeting the criteria of 1) stabilization of the target region of CsgF (residues 16-30), 2) projection into the nanopore to create diameters between when all units are generated using a 9-fold symmetry operatorAnd (3) withNew contractions (Ca-to-Ca distance extending furthest to amino acid residues in the pore lumen), 3) the new domain should not conflict with any atom in CsgG or symmetric pairing from CsgF.
First, a helix is identified that interfaces with the target region in CsgF and its symmetrical neighbors, the geometry of which is often observed in the native protein in PDB, and is therefore "designable". After clustering the outputs according to the RMSD of the target region plus the found spiral pairs, the best candidate is selected based on the number of closely related spiral-spiral pairs found in the database (fig. 1). In this way, the geometry of the spiral is selected to be closely packed with respect to the target and its N-terminal four amino acids. Furthermore, the database is searched for helices that perform an advantageous helix-helix interaction with the symmetrical cognate partner. The database of helical backbones was used to select linkers (e.g., loop structures) that connect the helices (fig. 1). The sequence of the resulting backbone was then designed using Rosetta. Representative sequences (e.g., SEQ ID NOS: 1-58) were generated.
The sequence for experimental verification was selected based on the lowest energy score and the highest PackStat score, as shown in fig. 2. To further prioritize sequences, aggregation propensity can also be tested using one of a variety of aggregation and amyloid prediction procedures.
Example 2
Materials and methods
Coli CsgG well production
Recombinant expression vectors encoding CsgG variant nanopores with C-terminal Strep affinity tag and ampicillin resistance gene were transformed into chemically competent E.coli cells. Cells were plated on LB agar plates containing the appropriate antibiotics for selection and incubated overnight at 37 ℃. Individual colonies from the agar plates were inoculated into LB medium containing the appropriate antibiotics and grown overnight with shaking at 37 ℃. Cultures were diluted into self-induction medium with the addition of the necessary antibiotics and incubated with shaking at 18 ℃ for 68 hours. Cells were harvested by centrifugation, then lysed and extracted into a buffer containing 1x Bugbuster extraction reagent (Merck 70921) and 0.1% DDM. The lysate is centrifuged and the wells are purified from the soluble extract using affinity chromatography, heat treatment, and then size exclusion chromatography, as judged by SDS-PAGE to select oligomeric nanopores.
CsgG/CsgF or fusion protein Complex Forming protocol
The CsgG-CsgF complex was prepared from the purified nanopores as described above and chemically synthesized slave fusion proteins with or without maleimide modification. For fusion proteins comprising cysteine, cyclization of the fusion protein is achieved by cross-linking a thiol at the appropriate cysteine. The nanopore buffer was exchanged into pH 7.0 buffer without reducing agent and incubated with 8-fold molar excess of peptide relative to CsgG monomer for 1 hour at 25 ℃. The sample was then heated at 60 ℃ for 15 minutes, followed by centrifugation to remove any precipitate and DTT was added to prevent any further reaction.
SDS-PAGE analysis
Mu.g of complex and CsgG-only well control were added to each 0.5mL ProteinLoBind Eppendorf tube (Fisher, 10316752) and 10. Mu.L volumes were made with reaction buffer. The final volume was made 20. Mu.L by adding 10uL of 2 XLaemmli buffer. Each sample was all loaded onto a 4% -20% TGX gel (BioRad, 5671093) run with 1x TGS buffer (Sigma, T7777). Run at 300V for 21 minutes. To image the gel, spyro Ruby (Merk, S4942) stain was used according to the manufacturer' S instructions. This was then imaged on a GE Typhoon gel imager using a 450nm laser.
For some assays, 1ug of complex and CsgG-only well control were added to each PCR tube and 10 ul volumes were made with reaction buffer. Freshly prepared 1M DTT stock was prepared and incorporated into each PCR tube at a final concentration of 10 mM. The final volume was made 20. Mu.L by adding 10. Mu.L of 2 XLaemmli buffer. Each sample was heated on a PCR thermocycler at 95 ℃ for 2 minutes. It was cooled for 5 minutes and then the material from each sample was all loaded onto a 4% -20% TGX gel (BioRad, 5671093) run with 1x TGS buffer (Sigma, T7777). Run at 300V for 21 minutes. To image the gel, spyro Ruby (Merk, S4942) stain was used according to the manufacturer' S instructions. This was then imaged on a GE Typhoon gel imager using a 450nm laser.
Electrical measurement
Electrical measurements were taken from CsgG alone, csgG/CsgF, or CsgG/fusion protein complexes inserted into a min flow cell. After inserting individual wells into the block copolymer membrane, 1mL of buffer comprising 25mM potassium phosphate, 150mM potassium ferrocyanide (II), 150mM potassium ferricyanide (III), pH 8.0 was flowed through the system to remove any excess nanopores.
The analyte used to assess the DNA profile was a 3.6 kilobase DNA segment from the 3' end of the lambda genome, as shown in figure 23. Analyte preparation, analyte ligation to Y adaptors, SPRI bead purification of the ligated analyte and addition to the minION flow cell were performed using the Oxford Nanopore Technologies Q-SQK-LSK110 protocol.
Electrical measurements were obtained using minION Mk b from Oxford Nanopore Technologies. Standard sequencing scripts were run at-180 mV for 6 hours, with static flicking every 5 minutes to remove extended nanopore blocks. Raw data was collected in a batch FAST5 file using MinKNOW software (Oxford Nanopore Technologies).
Discriminant analysis
FAST5 containing DNA curves (e.g., electrical measurements) comprising a 3.6 kilobase DNA segment from the 3' end of the lambda genome (3.6 Kb lambda) were obtained. The DNA curve was trimmed using a custom Python script to remove any electrical signal measurements captured before DNA sequencing began.
The trimmed 3.6Kb lambda curve and the corresponding genome for this region reference parameters for training the neural network. The neural network has 4 layers modeling a sequence of user-specified window lengths and associated current levels of those sequences. The window length specified for these models allows a region of +/-12 nucleotides to contribute to the current level at any one location.
A trained neural network is used to predict the current level that will correspond to the 3.6Kb lambda DNA reference sequence. It is also used to predict current levels from all possible single base edits of the sequence.
Changing the base at a single position (L) in the sequence will change the predicted current as the base passes through the main constriction of the hole, and will also change the current before and after the base passes through the main constriction. The compiled 3.6Kb lambda sequence set was analyzed for predicted current levels to calculate the range of predicted currents at position l+x (offset) when the base at position L changed. The offset between-16 and +16 is analyzed at each location. The median range of the predicted current at each offset is calculated to give the data in the graph. The model is centered such that the maximum peak indicating CsgG shrinkage corresponds to position 0.
Example 3
De novo fusion protein sequences designed using Rosetta were analyzed and sequences were selected for experimental validation based on the lowest energy score and highest PackStat score (figure 2). PSIPRED (e.g., as described in McGuffin LJ, bryson, K, jones D, bioinformatics,16,404-405,2000) assays were performed to predict fusion protein secondary structure. Residues are colored depending on whether they are predicted to be chains, helices, and curls, respectively. The secondary structural analysis of the mature sequence of the fusion protein (e.g., the extended CsgF protein) and wild-type CsgF designed de novo is shown in figure 3A. Structural analysis of the fusion proteins ONT1 to ONT10, ONT11 to ONT20 and ONT21 to ONT25 designed de novo is shown in fig. 3B to 3C.
The three-dimensional structure of the surrogate sequence of the nascent fusion protein was also studied using protein folding algorithms. The predicted 3-D structures of the fusion proteins ONT1 to ONT10, ONT11 to ONT20 and ONT21 to ONT25 designed de novo are shown in FIGS. 4A to 4C. The structure is colored according to the confidence measure (predicted local distance difference test (pLDDT)).
SDS-PAGE gel analysis was performed on CsgG-only wells and CsgG/fusion protein complexes. The complexes contained CsgF-del (S31-F119) control or de novo designed fusion proteins with or without maleimide crosslinking agent (FIG. 5). Complexes comprising fusion proteins showed band shift, indicating that these samples were nanopore complexes. Note that the sample was not heated prior to loading onto the gel. SDS-PAGE gel analysis was also performed on CsgG-only wells and CsgG/fusion protein complexes. These complexes contained CsgF-del (S31-F119) control or de novo designed fusion proteins with or without maleimide crosslinking agent (FIG. 6). Prior to loading onto the gel, the wells are broken down into their constituent monomer components when boiled in the presence of DTT. Note that no band displacement was observed in the absence of maleimide crosslinking, indicating that these bands consist of CsgG monomers only. Lane 7 shows the band shift compared to CsgG-only control, indicating that the fusion protein was covalently bound to the CsgG well due to the presence of maleimide. Lanes 8 and 9 show further band shifts due to the increased mass of the fusion protein. This indicates that the fusion protein is covalently bound to the CsgG pore.
The ion current (pA) versus time(s) was measured for single stranded DNA translocation through CsgG-only wells. Each individual figure corresponds to a single well inserted minION into the flow cell. The observed open cell current for CsgG-only wells was about 180pA at an applied voltage of-180 mV. Table 1 below shows representative data for the median range, median noise, and median signal-to-noise ratio (SNR) of protein pore complexes as described in the present disclosure.
TABLE 1 index Table
Figures 7 to 11 show traces of representative ion current (pA) versus time (S) when single stranded DNA translocates through a hole of CsgG alone, csgG comprising del (S31-F119) CsgF peptide or CsgG comprising a fusion protein designed de novo. The original current trace is shown as a black line and the event detection signal is shown as a red line. For each well, the top row shows the complete DNA current trace, while the bottom row shows an enlarged view of the first portion of the current trace. The pore opening current of CsgG-only wells was observed to be about 175pA to 200pA, and the median current of the DNA curve was about 75pA. For wells containing CsgF peptides, the open pore current was about 90pA to 120pA and the median current was about 35pA to 50pA. Figure 7 shows the trace of DNA translocation through CsgG-only wells. FIG. 8 shows the trace of representative ionic current (pA) versus time (S) with (right) or without (left) maleimide crosslinker when single-stranded DNA translocates through CsgG comprising del (S31-F119) CsgF peptide. FIG. 9 shows a trace of representative ion current (pA) versus time(s) when single stranded DNA translocates through Csgs comprising de novo designed fusion protein ONLP20623 in the absence of a maleimide crosslinking agent. FIG. 10 shows traces of representative ion current (pA) versus time(s) when single stranded DNA translocations are passed through Csgs comprising either the fusion protein ONLP20624 (without maleimide crosslinking) or ONLP20627 (with maleimide crosslinking) designed de novo. FIG. 11 shows traces of representative ion current (pA) versus time(s) when single stranded DNA translocations are passed through Csgs comprising either a fusion protein ONLP20628 (with maleimide crosslinker) or ONLP20625 (without maleimide crosslinker) designed de novo. In some embodiments, the fusion protein comprises 37R residues as well as cysteine residues to form internal disulfide bonds within the peptide, i.e., cyclize the fusion protein.
As the DNA molecule translocates through the pore, a curve is generated showing the locations within the pore and its contribution to the overall change in ion current level ("discrimination"). The distance within the pore is measured in nucleotide steps relative to the main contraction. A negative value corresponds to a position below the main contraction, while a positive value corresponds to a position above the main contraction (CsgG). The dashed box shows the region that will be affected by the introduction of the fusion protein designed de novo. Fig. 12 shows a representative curve when DNA molecules translocate through CsgG-only pores. Only the CsgG well (add/subtract Q153C) shows one major discrimination peak at position 0. FIG. 13 shows a representative curve when DNA molecules translocate through a CsgG/CsgF well. The dashed box shows the region that will be affected by the introduction of the fusion protein designed de novo. CsgG-CsgF-del (S31-F119) wells with (right) or without (left) maleimide crosslinker showed two discerned peaks. As seen in CsgG-only wells, the main discrimination peak is at position 0, and the additional discrimination peak is 4 to 6 nucleotides below the main contraction (positions-4 to-6). This further discrimination region has less effect on ion current than the main discrimination peak at position 0. FIG. 14 shows a representative curve when DNA molecules translocate through a CsgG/fusion protein (ONLP 20641 or ONLP 20644) pore. Complexes comprising CsgG and a de novo designed fusion protein containing K37R, with (right) or without (left) maleimide crosslinker, with cyclization, show three distinct peaks. As seen in CsgG-only wells, the main discrimination peak is at position 0 and the additional peaks are at positions-6 and-9. The peak at position-9 corresponds to the expected shrinkage produced by the de novo designed fusion protein when folded in the correct orientation.
Example 3
The holes formed by the nine subunits shown in SEQ ID NO. 61 (with or without maleimide crosslinker; neither cyclized) were also tested as described in example 2. The results are shown in fig. 17 to 18.
Representative sequence
>(SEQ ID NO:1)CsgF-WT-del(S31-F119))-Ext(31-GGELAAKLWANGDETNALSLFQTIIQS)(ONLP20623)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGGELAAKLWANGDETNALSLFQTIIQS>(SEQ ID NO:2)CsgF-WT-K37R-del(S31-F119)-Ext(31-GGELAAKLWANGDETNALSLFQTIIQS)(ONLP20624)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGGELAARLWANGDETNALSLFQTIIQS>(SEQ ID NO:3)CsgF-WT-N24C/K37R-del(S31-F119)-Ext(31-GGELAAKLWANGDETNALSLFQTIIQSC)(ONLP20625)
GTMTFQFRNPNFGGNPNNGAFLLCSAQAQNGGELAARLWANGDETNALSLFQTIIQSC>(SEQ ID NO:4)Mat-CsgF-Eco-(WT-Del(S31-F119)-Ext(31-AGELAKKLWENGNVNQALSLFQTVIQS)(ONLZ19432,DGLONT76)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGNVNQALSLFQTVIQS>(SEQ ID NO:5)Mat-CsgF-Eco-(WT-K36R/K37R-Del(S31-F119)-Ext(31-AGELAKKLWENGNVNQALSLFQTVIQS)(ONLZ19431)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELARRLWENGNVNQALSLFQTVIQS>(SEQ ID NO:6)Mat-CsgF-Eco-(WT-N24C/K36R/K37R-Del(S31-F119)-Ext(31-AGELARRLWENGNVNQALSLFQTVIQSC)(ONLZ19781)
GTMTFQFRNPNFGGNPNNGAFLLCSAQAQNAGELARRLWENGNVNQALSLFQTVIQSC>(SEQ ID NO:7)ONT113_2
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAAELAAKLWANADETNALSLFQTIIQS>(SEQ ID NO:8)ONT113_3
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAAELAAKLWANADETNALSLFQTLIQS>(SEQ ID NO:9)ONT1
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKKGDLTNALSLFQTVIQS>(SEQ ID NO:10)ONT2
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELVEKLFKNGDWTNAISIFQTVIQS>(SEQ ID NO:11)ONT 3
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGDETNALSLFQTVIQS>(SEQ ID NO:12)ONT 4
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWKNGDETNALSLFQTVIQS>(SEQ ID NO:13)ONT 5
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGDETNALSLFQTVVQS>(SEQ ID NO:14)ONT 6
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGNESDALSLFQTVIQS>(SEQ ID NO:15)ONT 7
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFENGDKTNALSLFQTVIQS>(SEQ ID NO:16)ONT 8
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGDETNALSLFQTVIQS>(SEQ ID NO:17)ONT 9
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGNSEDALALFRTVVQS>(SEQ ID NO:18)ONT 10
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGDMENAMKLFQTVIAS>(SEQ ID NO:19)ONT 11
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGDKDRALALFRTVIQS>(SEQ ID NO:20)ONT 12
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELADKLWKNGDKDRALSLFQTVIQS>(SEQ ID NO:21)ONT 13
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGDMDRALALFRTVIAS>(SEQ ID NO:22)ONT 14
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGNEEDALALFRTVVAS>(SEQ ID NO:23)ONT 15
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDEENALKLFRTVVTS>(SEQ ID NO:24)ONT 16
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGNMEDALKLFRTVIAS>(SEQ ID NO:25)ONT 17
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAILWKNGNKSDALSLFQTVVTS>(SEQ ID NO:26)ONT 18
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDLTNALSLFQTVVQS>(SEQ ID NO:27)ONT 19
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLLRKGDVETALTLFAQVISG>(SEQ ID NO:28)ONT 20
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLILKGDLETALKLFAIVIAG>(SEQ ID NO:29)ONT 21
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLLRKGDVETALKLFAIVIAG>(SEQ ID NO:30)ONT 22
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLYENGLIELALMLFALVIAS>(SEQ ID NO:31)ONT 23
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELYKKLWDNGEVDKALDLFAKIIAG>(SEQ ID NO:32)ONT 24
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGKKLIEKGDLETALKLFAIVIAG>(SEQ ID NO:33)ONT 25
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGEIALRLLKNGKEEEALKTLLVTIAG>(SEQ ID NO:34)ONT26
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDETNALSLFQTVVTS>(SEQ ID NO:35)ONT27
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAILWKNGNKSDALSLFQTVVTS>(SEQ ID NO:36)ONT28
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDETNALSLFQTVVTS>(SEQ ID NO:37)ONT29
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGDLAAKLWKKGDETNALSLFQTVVTS>(SEQ ID NO:38)ONT30
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGNSSDALSLFQTVVTS>(SEQ ID NO:39)ONT31
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDETNALSLFQTVVTS>(SEQ ID NO:40)ONT32
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDSSNALSLFQTVVTS>(SEQ ID NO:41)ONT33
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGDLAAKLWKNGDETNALSLFQTVVTS>(SEQ ID NO:42)ONT34
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDLTNALSLFQTVVQS>(SEQ ID NO:43)ONT35
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDETNALSLFQTVVTS>(SEQ ID NO:44)ONT36
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNSGDLDRALALFRTVVTS>(SEQ ID NO:45)ONT37
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAKELYDNGDEKWALLLFRTVVTS>(SEQ ID NO:46)ONT38
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAELYKNGDEKNALLLFRTVVAS>(SEQ ID NO:47)ONT39
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDMENALALFRTVVTS>(SEQ ID NO:48)ONT40
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGNSEDALALFRTVVQS>(SEQ ID NO:49)ONT41
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNKGDEDRALALFRTVVQS>(SEQ ID NO:50)ONT42
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGDEENALALFRTVVTS>(SEQ ID NO:51)ONT43
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRSGDADRALALFRTVVTS>(SEQ ID NO:52)ONT44
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGNEEDALALFRTVVTS>(SEQ ID NO:53)ONT45
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNNGDEDRALALFRTVVQS>(SEQ ID NO:54)ONT46
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDEDRALALFRTVVTS>(SEQ ID NO:55)ONT47
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNSGDEDRALALFRTVVQS>(SEQ ID NO:56)ONT48
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLYNNGDLDRADATFRTVVQS>(SEQ ID NO:57)ONT49
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGNEEDALALFRTVVTS>(SEQ ID NO:58)ONT50
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGEIAKQLWEKGDESSAITVATIVLSS
Wild type Escherichia coli CsgG protein monomer (NO Signal sequence)
CLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVP
QSATAMLVTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIM
VEGSIIGYESNVKSGGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILS
YEVQAGVFRFIDYQRLLEGEVGYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNK
AERQNDILVKYRHMSVPPES
Residues 1-30 of the CsgF peptide (SEQ ID NO: 60)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQN
>(SEQ ID NO:61)CsgF-WT-del(S31-F119)-Ext(31-AGILAAQLWNNGDYDRALSLFIAVVQS-57)
GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGILAAQLWNNGDYDRALSLFIAVVQS