WO2024138382A1

WO2024138382A1 - Porin monomer, porin, mutant thereof and use thereof

Info

Publication number: WO2024138382A1
Application number: PCT/CN2022/142486
Authority: WO
Inventors: 陈俊毅; 林雪琼; 钟沛彬; 季州翔; 王乐乐; 曾涛; 郭斐; 黎宇翔; 董宇亮; 章文蔚; 徐讯
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2024-07-04
Anticipated expiration: 2025-06-27
Also published as: CN120265646A

Abstract

Provided in the present invention are a porin monomer, a porin, a mutant thereof and the use thereof. The porin monomer comprises: (a) a protein composed of an amino acid sequence as shown in SEQ ID NO: 1; or (b) a protein mutant, wherein the amino acid sequence of the protein mutant is subjected to substitution, deletion and/or addition of one or several amino acids at at least one of the following positions in the amino acid sequence as shown in SEQ ID NO: 1: position 63, position 64, etc., and the protein mutant has the function of forming a pore channel structure via polymerization; or (c) a porin monomer having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identity to the protein of (a) or (b), and having the function of forming a pore channel structure via polymerization. By means of the present invention, the problem in the prior art of the poor stability of a pore channel of the porin can be solved; and the present invention is applicable to the field of single-molecule sequencing.

Description

Porin monomer, porin and its mutant and its application

Technical Field

本发明涉及单分子测序领域，具体而言，涉及一种孔蛋白单体、孔蛋白及其突变体和其应用。The present invention relates to the field of single molecule sequencing, and in particular to a porin monomer, a porin and a mutant thereof, and applications thereof.

Background technique

目前，已有英国牛津纳米孔技术公司实现了MinION、GridION和PromethION等一系列纳米孔测序仪和齐碳QNome-3841纳米孔基因测序仪的商业仪器。然而其在测序准确度、通量以及芯片稳定性和可应用场景等方面仍存在较大不足，无法满足分子生物学研究的终极需求。因此，急需研制出一款高准确度、高集成度以及高稳定性的单分子测序仪。基于纳米孔的单分子测序仪是一个多学科多技术高度融合的检测系统。该仪器的研制需要物理、生物、化学、半导体、计算机等多学科的深度交叉与协同创新，从底层核心模块出发构建高精度的单分子纳米孔测序系统。At present, Oxford Nanopore Technologies in the UK has developed a series of nanopore sequencers such as MinION, GridION and PromethION, as well as commercial instruments such as the QNome-3841 nanopore gene sequencer. However, there are still major deficiencies in sequencing accuracy, throughput, chip stability and applicable scenarios, which cannot meet the ultimate needs of molecular biology research. Therefore, it is urgent to develop a single-molecule sequencer with high accuracy, high integration and high stability. The nanopore-based single-molecule sequencer is a highly integrated detection system with multiple disciplines and technologies. The development of this instrument requires deep cross-disciplinary and collaborative innovation in physics, biology, chemistry, semiconductors, computers and other disciplines to build a high-precision single-molecule nanopore sequencing system from the underlying core modules.

纳米孔测序要求孔道蛋白内部传感区域足够锐利，以在横向与纵向上均有高的空间分辨能力。截止目前，仅有耻垢分枝杆菌孔蛋白A(MspA)、curli-特异性转运通道(CsgG)等少数几种天然孔蛋白能够在产业上符合成为单分子检测器的孔蛋白的要求。如何通过基因挖掘的方法找到更多优异的可用于单分子测序的孔蛋白，仍是一个尚待解决的问题。Nanopore sequencing requires that the sensing region inside the pore protein is sharp enough to have high spatial resolution in both the horizontal and vertical directions. So far, only a few natural pore proteins such as Mycobacterium smegmatis porin A (MspA) and curli-specific transporter (CsgG) can meet the requirements of becoming single-molecule detectors in the industry. How to find more excellent pore proteins that can be used for single-molecule sequencing through gene mining methods is still an unresolved problem.

发明内容Summary of the invention

本发明的主要目的在于提供一种孔蛋白单体、孔蛋白及其突变体和其应用，以解决现有技术中孔蛋白的孔道稳定性差的问题。The main purpose of the present invention is to provide a porin monomer, a porin and a mutant thereof and applications thereof, so as to solve the problem of poor pore stability of porin in the prior art.

为了实现上述目的，根据本发明的第一个方面，提供了一种孔蛋白单体，该孔蛋白单体包括：(a)由SEQ ID NO：1所示的氨基酸序列组成的蛋白质；或(b)蛋白质突变体，蛋白质突变体的氨基酸序列在SEQ ID NO：1所示的氨基酸序列的如下至少一个位点发生取代、缺失和/或添加一个或几个氨基酸：63、64、65、66、67、68、69、103、107、108、109、113、116、117、123、126、153、156、167、169、201、206、209、213、216、218，且蛋白质突变体具有经聚合形成孔道结构的功能；或(c)与(a)或(b)中的蛋白质具有至少70％、至少75％、至少80％、至少85％、至少90％、至少95％、或至少99％同一性，且具有经聚合形成孔道结构的功能的孔蛋白单体。To achieve the above object, according to the first aspect of the present invention, there is provided a porin monomer, which comprises: (a) a protein consisting of the amino acid sequence shown in SEQ ID NO: 1; or (b) a protein mutant, wherein the amino acid sequence of the protein mutant is substituted, deleted and/or one or more amino acids are added at at least one of the following positions of the amino acid sequence shown in SEQ ID NO: 1: 63, 64, 65, 66, 67, 68, 69, 103, 107, 108, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135 8, 109, 113, 116, 117, 123, 126, 153, 156, 167, 169, 201, 206, 209, 213, 216, 218, and the protein mutant has the function of forming a pore structure by polymerization; or (c) a pore protein monomer that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identity with the protein in (a) or (b), and has the function of forming a pore structure by polymerization.

进一步地，(b)中，各位点取代的氨基酸的类型各自独立地选自如下：G63突变为G63A、G63S或G63T；E64突变为E64G、E64A、E64S、E64T、E64N或E64Q；K65突变为K65G、K65A、K65S、K65T、K65N或K65Q；F66突变为F66G、F66A、F66S、F66T、F66N或F66Q；A67突变为A67G、A67S、A67T、A67N或A67Q；N68突变为N68G、 N68A、N68S、N68T或N68Q；I69突变为I69G、I69A、I69S、I69T、I69N或I69Q；D103突变为D103N、D103A、D103G、D103S、D103T、D103Q、D103R或D103K；K107突变为K107N、K107A、K107G、K107S、K107T或K107Q；R109突变为R109W、R109F、R109Y、R109N、R109Q、R109S、R109T、R109A或R109G；R113突变为R113W、R113F、R113Y、R113N、R113Q、R113S、R113T、R113A或R113G；R116突变为R116N、R116A、R116G、R116S、R116T或R116Q；E117突变为E117N、E117A、E117G、E117S、E117T、E117Q、E117R或E117K；E123突变为E123N、E123A、E123G、E123S、E123T、E123Q、E123R或E123K；K126突变为K126N、K126Q、K126S、K126T、K126A或K126G；D153突变为D153A、D153G、D153V、D153L、D153I、D153Y、D153F或D153W；R156突变为R156A、R156G、R156N、R156Q、R156S、R156T、R156D或R156E；R167突变为R167N、R167Q、R167S、R167T、R167A或R167G；D169突变为D169N、D169Q、D169S、D169T、D169A或D169G；D201突变为D201N、D201Q、D201S、D201T、D201A或D201G；R206突变为R206N、R206Q、R206S、R206T、R206A、R206G、R206D或R206E；D209突变为D209N、D209Q、D209S、D209T、D209A、D209G、D209R或D209K；E213突变为E213N、E213Q、E213S、E213T、E213A或E213G；E216突变为E216N、E216Q、E216S、E216T、E216A或E216G；E218突变为E218N、E218Q、E218S、E218T、E218A或E218G。Further, in (b), the type of amino acid substituted at each position is independently selected from the following: G63 is mutated into G63A, G63S or G63T; E64 is mutated into E64G, E64A, E64S, E64T, E64N or E64Q; K65 is mutated into K65G, K65A, K65S, K65T, K65N or K65Q; F66 is mutated into F66G, F66A, F66S, F66T, F66N or F66Q; A67 is mutated into A67G, A67S, A67T, A67N or A67Q; N68 is mutated into N68G, N68A, N68S, N68T or N68Q; I69 mutated to I69G, I69A, I69S, I69T, I69N or I69Q; D103 mutated to D103N, D103A, D103G, D103S, D103T, D103Q, D103R or D103K; K107 mutated to K107N, K107A, K107G, K107S, K107T or K107Q; R109 mutated to R109W, R109F, R109Y, R109N, R109Q, R109S, R109T, R109A or R109G; R113 mutated to R113W, R113F, R113Q 3Y, R113N, R113Q, R113S, R113T, R113A or R113G; R116 is mutated to R116N, R116A, R116G, R116S, R116T or R116Q; E117 is mutated to E117N, E117A, E117G, E117S, E117T, E117Q, E117R or E117K; E123 is mutated to E123N, E123A, E123G, E123S, E123T, E123Q, E123R or E123K; K126 is mutated to K126N, K126Q, K126S, K126T, K126A or K126G; D15 3 mutated to D153A, D153G, D153V, D153L, D153I, D153Y, D153F or D153W; R156 mutated to R156A, R156G, R156N, R156Q, R156S, R156T, R156D or R156E; R167 mutated to R167N, R167Q, R167S, R167T, R167A or R167G; D169 mutated to D169N, D169Q, D169S, D169T, D169A or D169G; D201 mutated to D201N, D201Q, D201S, D201T, D201A or D201G; R 206 mutated to R206N, R206Q, R206S, R206T, R206A, R206G, R206D or R206E; D209 mutated to D209N, D209Q, D209S, D209T, D209A, D209G, D209R or D209K; E213 mutated to E213N, E213Q, E213S, E213T, E213A or E213G; E216 mutated to E216N, E216Q, E216S, E216T, E216A or E216G; E218 mutated to E218N, E218Q, E218S, E218T, E218A or E218G.

为了实现上述目的，根据本发明的第二个方面，提供了一种蛋白构建体，该蛋白构建体由2个或更多个上述孔蛋白单体、通过共价或非共价连接而成。In order to achieve the above object, according to the second aspect of the present invention, a protein construct is provided. The protein construct is composed of two or more porin monomers mentioned above, connected by covalent or non-covalent bonding.

为了实现上述目的，根据本发明的第三个方面，提供了一种孔蛋白，该孔蛋白由7-11个上述孔蛋白单体、通过共价或非共价连接而成。In order to achieve the above object, according to the third aspect of the present invention, a porin is provided, wherein the porin is composed of 7 to 11 porin monomers mentioned above linked covalently or non-covalently.

进一步地，孔蛋白由9个孔蛋白单体通过非共价连接而成。Furthermore, the porin is composed of 9 porin monomers linked non-covalently.

进一步地，孔蛋白的孔道直径为0.5～3nm。Furthermore, the pore diameter of the porin is 0.5 to 3 nm.

为了实现上述目的，根据本发明的第四个方面，提供了一种试剂盒，该试剂盒包括上述孔蛋白单体、或上述蛋白构建体，或上述孔蛋白。In order to achieve the above object, according to the fourth aspect of the present invention, a kit is provided, the kit comprising the above porin monomer, or the above protein construct, or the above porin.

进一步地，试剂盒还包括膜层，膜层包括脂质层或人造高分子膜。Furthermore, the kit also includes a membrane layer, and the membrane layer includes a lipid layer or an artificial polymer membrane.

进一步地，试剂盒还包括测序缓冲液和/或连接胆固醇的单链DNA。Furthermore, the kit also includes a sequencing buffer and/or single-stranded DNA linked to cholesterol.

为了实现上述目的，根据本发明的第四个方面，提供了一种分离的DNA分子，该DNA分子具有：编码上述孔蛋白单体、或编码上述蛋白构建体、或编码上述孔蛋白的核苷酸序列。To achieve the above object, according to the fourth aspect of the present invention, an isolated DNA molecule is provided, which has a nucleotide sequence encoding the above porin monomer, or encoding the above protein construct, or encoding the above porin.

进一步地，DNA分子具有SEQ ID NO：8所示的核苷酸序列。Further, the DNA molecule has the nucleotide sequence shown in SEQ ID NO: 8.

进一步地，与SEQ ID NO：8所示的核苷酸序列具有70％以上，优选80％以上，更优选90％以上，进一步优选95％以上同一性且编码具有相同功能蛋白质的DNA分子。Furthermore, a DNA molecule that has more than 70%, preferably more than 80%, more preferably more than 90%, and further preferably more than 95% identity with the nucleotide sequence shown in SEQ ID NO: 8 and encodes a protein with the same function.

为了实现上述目的，根据本发明的第五个方面，提供了一种重组载体，该重组载体包含上述DNA分子。In order to achieve the above object, according to the fifth aspect of the present invention, a recombinant vector is provided, which comprises the above DNA molecule.

为了实现上述目的，根据本发明的第六个方面，提供了一种宿主细胞，该宿主细胞转化有上述重组载体。In order to achieve the above object, according to the sixth aspect of the present invention, a host cell is provided, wherein the host cell is transformed with the above recombinant vector.

为了实现上述目的，根据本发明的第七个方面，提供了一种纳米孔传感器，该纳米孔传感器包括：膜层；以及插入膜层中间以形成孔道的孔蛋白，当跨越膜层施加电压时，孔道产生电流；其中，孔蛋白包括上述孔蛋白。In order to achieve the above-mentioned purpose, according to the seventh aspect of the present invention, a nanopore sensor is provided, which comprises: a membrane layer; and a pore protein inserted in the middle of the membrane layer to form a pore, and when a voltage is applied across the membrane layer, the pore generates current; wherein the pore protein comprises the above-mentioned pore protein.

进一步地，膜层包括脂质层或人造高分子膜；优选地，脂质层包括两亲脂类；优选地，两亲脂类包含磷脂双分子层；优选地，脂质层包括平面膜层或脂质体；优选地，脂质体包括多层脂质体或单层脂质体；优选地，脂质层包括二植酰磷脂酰胆碱组成的磷脂双分子层。Further, the membrane layer includes a lipid layer or an artificial polymer membrane; preferably, the lipid layer includes amphiphilic lipids; preferably, the amphiphilic lipids contain a phospholipid bilayer; preferably, the lipid layer includes a planar membrane layer or a liposome; preferably, the liposome includes a multilayer liposome or a unilamellar liposome; preferably, the lipid layer includes a phospholipid bilayer composed of diphytylphosphatidylcholine.

为了实现上述目的，根据本发明的第八个方面，提供了一种纳米孔测序装置，该纳米孔测序装置包括上述纳米孔传感器。In order to achieve the above objective, according to an eighth aspect of the present invention, a nanopore sequencing device is provided, which comprises the above nanopore sensor.

进一步地，纳米孔测序装置包括：电解槽，电解槽含有测序缓冲液；纳米孔传感器，纳米孔传感器位于电解槽的中央，并将电解槽及测序缓冲液分割为正极电解液区和负极电解液区；第一电极和第二电极，第一电极和第二电极分别设置在正极电解液区和负极电解液区，且第一电极和第二电极与信号处理芯片相连；优选地，第一电极和第二电极包括金属或复合电极材料；优选地，第一电极和第二电极不同，分别为银和氯化银；或者第一电极和第二电极相同，各自独立地选自金、铂、石墨烯或氮化钛。Furthermore, the nanopore sequencing device includes: an electrolytic cell, the electrolytic cell contains a sequencing buffer; a nanopore sensor, the nanopore sensor is located in the center of the electrolytic cell and divides the electrolytic cell and the sequencing buffer into a positive electrolyte area and a negative electrolyte area; a first electrode and a second electrode, the first electrode and the second electrode are respectively arranged in the positive electrolyte area and the negative electrolyte area, and the first electrode and the second electrode are connected to the signal processing chip; preferably, the first electrode and the second electrode include metal or composite electrode materials; preferably, the first electrode and the second electrode are different, silver and silver chloride, respectively; or the first electrode and the second electrode are the same, and each is independently selected from gold, platinum, graphene or titanium nitride.

为了实现上述目的，根据本发明的第九个方面，提供了一种测序方法，该测序方法利用上述孔蛋白，或者上述纳米孔传感器，或者上述纳米孔测序装置检测并解析待测生物分子通过孔蛋白的孔道时产生的电信号，确定待测生物分子的序列。In order to achieve the above-mentioned purpose, according to the ninth aspect of the present invention, a sequencing method is provided, which utilizes the above-mentioned pore protein, or the above-mentioned nanopore sensor, or the above-mentioned nanopore sequencing device to detect and analyze the electrical signal generated when the biological molecule to be tested passes through the pore of the pore protein, so as to determine the sequence of the biological molecule to be tested.

进一步地，待测生物分子包括如下任意一种修饰或未修饰的生物分子：DNA、RNA或多肽。Furthermore, the biomolecule to be detected includes any one of the following modified or unmodified biomolecules: DNA, RNA or polypeptide.

进一步地，待测生物分子是靶核酸序列，测序方法包括：(a)使靶核酸序列与核酸结合蛋白接触，核酸结合蛋白控制靶核酸序列通过孔蛋白的孔道的移动速度；(b)在跨孔施加电压时，靶核酸序列移动通过孔道，测量通过孔道的电信号，其中，不同类型的核苷酸通过孔道所产生的电信号不同，从而基于电信号确定靶核酸的序列信息；优选地，核酸结合蛋白选自核酸酶、聚合酶、拓扑异构酶、连接酶、解旋酶和单链结合蛋白；优选地，电信号包括电流。Further, the biological molecule to be detected is a target nucleic acid sequence, and the sequencing method includes: (a) contacting the target nucleic acid sequence with a nucleic acid binding protein, which controls the movement speed of the target nucleic acid sequence through the pore of the pore protein; (b) when a voltage is applied across the pore, the target nucleic acid sequence moves through the pore, and the electrical signal passing through the pore is measured, wherein different types of nucleotides generate different electrical signals when passing through the pore, thereby determining the sequence information of the target nucleic acid based on the electrical signal; preferably, the nucleic acid binding protein is selected from nucleases, polymerases, topoisomerases, ligases, helicases and single-stranded binding proteins; preferably, the electrical signal includes an electric current.

为了实现上述目的，根据本发明的第十个方面，提供了一种上述孔蛋白单体、上述孔蛋白、上述试剂盒、上述DNA分子、上述重组载体、上述宿主细胞、上述纳米孔传感器、或者上述纳米孔测序装置，在小分子检测、DNA测序、RNA测序或多肽测序中的应用。In order to achieve the above-mentioned purpose, according to the tenth aspect of the present invention, there is provided the above-mentioned porin monomer, the above-mentioned porin, the above-mentioned kit, the above-mentioned DNA molecule, the above-mentioned recombinant vector, the above-mentioned host cell, the above-mentioned nanopore sensor, or the above-mentioned nanopore sequencing device, for use in small molecule detection, DNA sequencing, RNA sequencing or polypeptide sequencing.

本发明提供了一种新的孔蛋白单体，该孔蛋白单体能够聚合形成孔蛋白BCP20，该蛋白及其突变体的稳定性较好，能够满足单分子纳米孔孔蛋白测序的要求，实现对于核苷酸、氨基酸、糖类、维生素等小分子、DNA、RNA以及多肽的检测。The present invention provides a new porin monomer, which can be polymerized to form porin BCP20. The protein and its mutants have good stability and can meet the requirements of single-molecule nanopore porin sequencing, and can realize the detection of small molecules such as nucleotides, amino acids, sugars, vitamins, DNA, RNA and polypeptides.

BRIEF DESCRIPTION OF THE DRAWINGS

构成本申请的一部分的说明书附图用来提供对本发明的进一步理解，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The drawings constituting a part of the present application are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the drawings:

图1示出了根据本发明实施例1的预测得到的BCP20的三维结构侧视图。FIG. 1 shows a side view of the three-dimensional structure of BCP20 predicted according to Example 1 of the present invention.

图2示出了根据本发明实施例1的预测得到的BCP20的三维结构俯视图。FIG. 2 shows a top view of the three-dimensional structure of BCP20 predicted according to Example 1 of the present invention.

图3示出了根据本发明实施例1的BCP20门控区(sensor区)的重点氨基酸的预测侧链结构示意图，其中A示出了sensor区内重点氨基酸之间的距离，B示出了sensor区内重点氨基酸的放大图。Figure 3 shows a schematic diagram of the predicted side chain structure of the key amino acids in the gating region (sensor region) of BCP20 according to Example 1 of the present invention, wherein A shows the distance between the key amino acids in the sensor region, and B shows an enlarged view of the key amino acids in the sensor region.

图4示出了根据本发明实施例4的BCP20蛋白的纯化得到的SDS-PAGE图。FIG. 4 shows an SDS-PAGE image obtained by purification of BCP20 protein according to Example 4 of the present invention.

图5示出了根据本发明实施例5的测序文库结构示意图，其中a：正义链(top strand)；b：反义链(bottom strand)；c：待测双链目的片段；d：解旋酶BCH105。Figure 5 shows a schematic diagram of the sequencing library structure according to Example 5 of the present invention, wherein a: positive strand (top strand); b: antisense strand (bottom strand); c: double-stranded target fragment to be tested; d: helicase BCH105.

图6示出了根据本发明实施例6的BCP20在磷脂双分子层中的不同电压下的开孔电流图。FIG. 6 shows a graph of the pore opening current of BCP20 according to Example 6 of the present invention at different voltages in a phospholipid bilayer.

图7示出了根据本发明实施例7的待测DNA穿过纳米孔BCP20的电流变化图。FIG. 7 shows a graph showing the current variation when the DNA to be tested passes through the nanopore BCP20 according to Example 7 of the present invention.

图8示出了根据本发明实施例7的结合有带有胆固醇的DNA单链的测序文库的结构示意图，其中a：正义链(top strand)；b：反义链(bottom strand)；c：待测双链目的片段；d：解旋酶BCH105；e：带有胆固醇的单链DNA。Figure 8 shows a schematic diagram of the structure of a sequencing library combined with a single-stranded DNA containing cholesterol according to Example 7 of the present invention, wherein a: positive strand (top strand); b: antisense strand (bottom strand); c: double-stranded target fragment to be tested; d: helicase BCH105; e: single-stranded DNA containing cholesterol.

图9示出了根据本发明实施例5的iSp18的具体结构。FIG. 9 shows the specific structure of iSp18 according to Embodiment 5 of the present invention.

Detailed ways

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将结合实施例来详细说明本发明。It should be noted that, in the absence of conflict, the embodiments and features in the embodiments of the present application can be combined with each other. The present invention will be described in detail below in conjunction with the embodiments.

如背景技术所提到的，现有可以用作单分子纳米测序的孔蛋白非常少，而且蛋白稳定性较差。因而，在本申请中，发明人尝试开发一种新的孔蛋白，通过计算机辅助结构预测的基因挖掘手段，从深海宏基因组中(来源于11000米深的马里亚纳海沟的样本)挖掘得到一种新的孔蛋白单体，多个单体可通过共价连接或非共价聚合为具有孔道的孔蛋白，该孔蛋白可以作为检测用蛋白，应用于核苷酸、氨基酸、糖类、维生素等小分子的检测中，或者应用于基于纳米孔的DNA、RNA或多肽测序中。因而提出了本申请的一系列保护方案。As mentioned in the background technology, there are very few existing porins that can be used for single-molecule nanosequencing, and the protein stability is poor. Therefore, in this application, the inventors try to develop a new porin, and through the gene mining method of computer-assisted structure prediction, a new porin monomer is excavated from the deep-sea metagenome (derived from samples of the Mariana Trench at a depth of 11,000 meters). Multiple monomers can be covalently linked or non-covalently polymerized into porins with pores. The porin can be used as a detection protein and applied to the detection of small molecules such as nucleotides, amino acids, sugars, vitamins, or in nanopore-based DNA, RNA or polypeptide sequencing. Therefore, a series of protection schemes of this application are proposed.

在本申请第一种典型的实施方式中，提供了一种孔蛋白单体，该孔蛋白单体包括：(a)由SEQ ID NO：1所示的氨基酸序列组成的蛋白质；或(b)蛋白质突变体，蛋白质突变体的氨基酸序列在SEQ ID NO：1所示的氨基酸序列的如下至少一个位点发生取代、缺失和/或添加一个或几个氨基酸：63、64、65、66、67、68、69、103、107、108、109、113、116、117、123、126、153、156、167、169、201、206、209、213、216、218，且蛋白质突变体具有经聚合形成孔道结构的功能；或(c)与(a)或(b)中的孔蛋白单体具有至少70％、至少75％、至少80％、至少85％、至少90％、至少95％、或至少99％同一性，且具有经聚合形成孔道结构的功能的孔蛋白单体。In a first typical embodiment of the present application, a porin monomer is provided, the porin monomer comprising: (a) a protein consisting of the amino acid sequence shown in SEQ ID NO: 1; or (b) a protein mutant, the amino acid sequence of the protein mutant undergoes substitution, deletion and/or addition of one or more amino acids at at least one of the following positions of the amino acid sequence shown in SEQ ID NO: 1: 63, 64, 65, 66, 67, 68, 69, 103, 107, 108, 109, 113, 116, 117, 123, 126, 153, 156, 167, 169, 201, 206, 209, 213, 216, 218, and the protein mutant has or (c) a porin monomer that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identical to the porin monomer in (a) or (b) and has the function of forming a pore structure by polymerization.

SEQ ID NO：1：SEQ ID NO: 1:

上述(a)限定的孔蛋白单体，能够聚合形成具有孔道结构的孔蛋白BCP20，应用于纳米孔测序时，能够使待测生物分子逐一从孔道中穿过，产生电流信号。在(a)序列的基础上，对蛋白进行突变，比如，在其他位置如(b)或(c)中公开的突变位点进行突变，取代和/或缺失和/或添加一个或几个氨基酸后，仍能够获得保持上述孔蛋白的孔道结构和功能。对孔蛋白单体进行突变，可能对蛋白及聚集体稳定性、孔道内径、孔道内壁氨基酸残基产生影响，从而影响其理化性质和待测生物分子通过性能，但突变的常规操作方式，以及筛选获得具有纳米孔道结构和功能活性的蛋白的方法，是本领域技术人员所公知。The porin monomers defined in (a) above can polymerize to form porin BCP20 with a pore structure, and when applied to nanopore sequencing, can allow the biomolecules to be tested to pass through the pore one by one, generating a current signal. Based on the sequence of (a), the protein is mutated, for example, at other positions such as the mutation sites disclosed in (b) or (c), after substitution and/or deletion and/or addition of one or more amino acids, the pore structure and function of the porin can still be obtained. Mutation of the porin monomers may affect the stability of the protein and aggregates, the inner diameter of the pore, and the amino acid residues on the inner wall of the pore, thereby affecting its physicochemical properties and the performance of the biomolecules to be tested. However, the conventional operation mode of mutation, and the method of screening and obtaining proteins with nanopore structure and functional activity are well known to those skilled in the art.

本说明书中的同一性(Identity)是指氨基酸序列之间的“同一性”，即氨基酸序列中的种类相同的氨基酸残基的比率的总计。氨基酸序列的同一性可以利用BLAST(Basic Local Alignment Search Tool)、FASTA等比对程序来确定。Identity in this specification refers to the "identity" between amino acid sequences, that is, the total ratio of the same type of amino acid residues in the amino acid sequence. The identity of amino acid sequences can be determined using alignment programs such as BLAST (Basic Local Alignment Search Tool) and FASTA.

70％、75％、80％、85％、90％、95％、99％以上(比如85％、86％、87％、88％、89％、90％、91％、92％、93％、94％、95％、96％、97％、98％、98.5％、99％、99.5％、99.6％、99.7％、99.8％以上，甚至99.9％以上)同一性且具有相同功能的蛋白质，其活性位点、活性口袋、活性机制、蛋白结构等均和a)序列提供的蛋白质大概率相同，为通过氨基酸突变获得的同源蛋白。Proteins with 70%, 75%, 80%, 85%, 90%, 95%, 99% or more (e.g. 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, 99.5%, 99.6%, 99.7%, 99.8% or more, or even 99.9% or more) identity and the same function, whose active sites, active pockets, active mechanisms, protein structures, etc. are most likely the same as the proteins provided by the sequence in a), are homologous proteins obtained by amino acid mutations.

如本文所用，氨基酸残基缩写如下：丙氨酸(Ala；A)、天冬酰胺(Asn；N)、天冬氨酸(Asp；D)、精氨酸(Arg；R)、半胱氨酸(Cys；C)、谷氨酸(Glu；E)、谷氨酰胺(Gln；Q)、甘氨酸(Gly；G)、组氨酸(His；H)、异亮氨酸(Ile；I)、亮氨酸(Leu；L)、赖氨酸(Lys；K)、蛋氨酸(Met；M)、苯丙氨酸(Phe；F)、脯氨酸(Pro；P)，丝氨酸(Ser；S)、苏氨酸(Thr；T)、色氨酸(Trp；W)、酪氨酸(Tyr；Y)和缬氨酸(Val；V)。As used herein, amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y) and valine (Val; V).

取代、替换等规则，一般情况下，性质类似的氨基酸之间相互替换后的效果也类似。例如，在上述同源蛋白中，可发生保守的氨基酸替换。“保守的氨基酸替换”包括但不限于：Substitution and replacement rules. Generally speaking, the effects of replacing amino acids with similar properties are similar. For example, conservative amino acid replacements may occur in the above homologous proteins. "Conservative amino acid replacements" include but are not limited to:

疏水性氨基酸(Ala、Cys、Gly、Pro、Met、Val、Ile、Leu)被其他疏水性氨基酸取代；Hydrophobic amino acids (Ala, Cys, Gly, Pro, Met, Val, Ile, Leu) are replaced by other hydrophobic amino acids;

侧链粗大的疏水性氨基酸(Phe、Tyr、Trp)被其他侧链粗大的疏水性氨基酸取代；The hydrophobic amino acids with bulky side chains (Phe, Tyr, Trp) are replaced by other hydrophobic amino acids with bulky side chains;

侧链带正电的氨基酸(Arg、His、Lys)被其他侧链带正电的氨基酸取代；Amino acids with positively charged side chains (Arg, His, Lys) are replaced by other amino acids with positively charged side chains;

侧链有极性不带电的氨基酸(Ser、Thr、Asn、Gln)被其他侧链有极性不带电的氨基酸取代。Amino acids with polar, uncharged side chains (Ser, Thr, Asn, Gln) are replaced by other amino acids with polar, uncharged side chains.

本领域技术人员也可以根据现有技术中的“blosum62评分矩阵”等本领域技术人员熟知的氨基酸替换规则对氨基酸进行保守替换。A person skilled in the art may also perform conservative substitutions on amino acids according to amino acid substitution rules well known to those skilled in the art, such as the "blosum62 scoring matrix" in the prior art.

本申请中所用的“AlphaFold2-Multimer”，是一种公开的能够预测蛋白复合体构象的人工智能模型，对于蛋白质立体结构的预测能够十分接近在真实试验中利用冷冻电子显微镜等设备所观测的水平。能够获得较为真实的蛋白结构，从而指导对于蛋白结构和蛋白活性的探究。The "AlphaFold2-Multimer" used in this application is a public artificial intelligence model that can predict the conformation of protein complexes. The prediction of protein three-dimensional structure can be very close to the level observed by cryo-electron microscopes and other equipment in real experiments. It can obtain a more realistic protein structure, thereby guiding the exploration of protein structure and protein activity.

在一种优选的实施例中，(b)中，各位点取代的氨基酸的类型各自独立地选自如下：G63突变为G63A、G63S、G63T；E64突变为E64G、E64A、E64S、E64T、E64N、E64Q；K65突变为K65G、K65A、K65S、K65T、K65N、K65Q；F66突变为F66G、F66A、F66S、F66T、F66N、F66Q；A67突变为A67G、A67S、A67T、A67N、A67Q；N68突变为N68G、N68A、N68S、N68T、N68Q；I69突变为I69G、I69A、I69S、I69T、I69N、I69Q；D103突变为D103N、D103A、D103G、D103S、D103T、D103Q、D103R、D103K；K107突变为K107N、K107A、K107G、K107S、K107T、K107Q；R109突变为R109W、R109F、R109Y、R109N、R109Q、R109S、R109T、R109A、R109G；R113突变为R113W、R113F、R113Y、R113N、R113Q、R113S、R113T、R113A、R113G；R116突变为R116N、R116A、R116G、R116S、R116T、R116Q；E117突变为E117N、E117A、E117G、E117S、E117T、E117Q、E117R、E117K；E123突变为E123N、E123A、E123G、E123S、E123T、E123Q、E123R、E123K；K126突变为K126N、K126Q、K126S、K126T、K126A、K126G；D153突变为D153A、D153G、D153V、D153L、D153I、D153Y、D153F、D153W；R156突变为R156A、R156G、R156N、R156Q、R156S、R156T、R156D、R156E；R167突变为R167N、R167Q、R167S、R167T、R167A、R167G；D169突变为D169N、D169Q、D169S、D169T、D169A、D169G；D201突变为D201N、D201Q、D201S、D201T、D201A、D201G；R206突变为R206N、R206Q、R206S、R206T、R206A、R206G、R206D、R206E；D209突变为D209N、D209Q、D209S、D209T、D209A、D209G、D209R、D209K；E213突变为E213N、E213Q、E213S、E213T、E213A、E213G；E216突变为E216N、E216Q、E216S、E216T、E216A、E216G；E218突变为E218N、E218Q、E218S、E218T、E218A、E218G；其中，数字前字母代表原始氨基酸，数字后字母代表突变氨基酸。In a preferred embodiment, in (b), the types of amino acids substituted at each position are independently selected from the following: G63 mutates to G63A, G63S, G63T; E64 mutates to E64G, E64A, E64S, E64T, E64N, E64Q; K65 mutates to K65G, K65A, K65S, K65T, K65N, K65Q; F66 mutates to F66G, F66A, F66T, F66N, F66Q; 66S, F66T, F66N, F66Q; A67 mutated to A67G, A67S, A67T, A67N, A67Q; N68 mutated to N68G, N68A, N68S, N68T, N68Q; I69 mutated to I69G, I69A, I69S, I69T, I69N, I69Q; D103 mutated to D103N, D103A, D103G, D103S, D 103T, D103Q, D103R, D103K; K107 mutated to K107N, K107A, K107G, K107S, K107T, K107Q; R109 mutated to R109W, R109F, R109Y, R109N, R109Q, R109S, R109T, R109A, R109G; R113 mutated to R113W, R113F, R113S 13Y, R113N, R113Q, R113S, R113T, R113A, R113G; R116 mutated to R116N, R116A, R116G, R116S, R116T, R116Q; E117 mutated to E117N, E117A, E117G, E117S, E117T, E117Q, E117R, E117K; E123 mutated to E123 N, E123A, E123G, E123S, E123T, E123Q, E123R, E123K; K126 mutated to K126N, K126Q, K126S, K126T, K126A, K126G; D153 mutated to D153A, D153G, D153V, D153L, D153I, D153Y, D153F, D153W; R156 mutated R156A, R156G, R156N, R156Q, R156S, R156T, R156D, R156E; R167 mutated to R167N, R167Q, R167S, R167T, R167A, R167G; D169 mutated to D169N, D169Q, D169S, D169T, D169A, D169G; D201 mutated to D201N, D 201Q, D201S, D201T, D201A, D201G; R206 mutated to R206N, R206Q, R206S, R206T, R206A, R206G, R206D, R206E; D209 mutated to D209N, D209Q, D209S, D209T, D209A, D209G, D209R, D209K; E213 mutated to E2 13N, E213Q, E213S, E213T, E213A, E213G; E216 mutated to E216N, E216Q, E216S, E216T, E216A, E216G; E218 mutated to E218N, E218Q, E218S, E218T, E218A, E218G; among them, the letters before the numbers represent the original amino acids, and the letters after the numbers represent the mutated amino acids.

在上述的突变位点中，主要包括三类位点：门控区(sensor区)的突变位点、孔蛋白入口处的突变位点和孔蛋白跨膜区的突变位点。其中，sensor区的突变位点主要决定了孔蛋白开孔情况，从而直接决定开孔电流。因为待测核酸带负电，因此通过调节孔蛋白入口处的突变位点的氨基酸，包括但不限于将不带电或带负电的氨基酸突变为带正电的氨基酸，或将带正电的氨基酸突变为其他类型氨基酸等，此种突变能够调节文库的捕获率。孔蛋白跨膜区的突变能够增强孔蛋白在脂质或者聚合物膜上的插膜稳定性。除此之外，位于孔蛋白的孔道结构内壁或出口处的、带有电荷的氨基酸也能够影响待测样品的穿孔。Among the above-mentioned mutation sites, three types of sites are mainly included: mutation sites in the gate region (sensor region), mutation sites at the entrance of the porin, and mutation sites in the transmembrane region of the porin. Among them, the mutation sites in the sensor region mainly determine the opening of the porin, thereby directly determining the opening current. Because the nucleic acid to be tested is negatively charged, by adjusting the amino acids at the mutation sites at the entrance of the porin, including but not limited to mutating uncharged or negatively charged amino acids to positively charged amino acids, or mutating positively charged amino acids to other types of amino acids, this mutation can adjust the capture rate of the library. Mutations in the transmembrane region of the porin can enhance the insertion stability of the porin on lipid or polymer membranes. In addition, charged amino acids located at the inner wall or exit of the pore structure of the porin can also affect the perforation of the sample to be tested.

上述突变位点中，G63、N68、I69、E64、K65、F66和A67位于sensor区。D103、K107、R109、R113、R116、E117、E123和K126位于孔蛋白入口处，对其带电性质的突变能对调节文库的捕获率以及测序噪声具有重要作用。D153位于孔蛋白跨膜区外壁，对其突变成疏水氨基酸能增强孔蛋白插孔的稳定性。R156、R167、D169、D201、E216、E218位于孔道结构内壁，为孔蛋白桶内壁的带电氨基酸，对其电荷的突变能促进待测DNA等待测分子顺利穿过孔蛋白。R206、D209、E213位于孔道结构的出口loop区(环区)，对其的突变能促使测序后的核酸链离开孔蛋白。Among the above mutation sites, G63, N68, I69, E64, K65, F66 and A67 are located in the sensor region. D103, K107, R109, R113, R116, E117, E123 and K126 are located at the entrance of the porin protein. Mutations in their charged properties can play an important role in regulating the capture rate of the library and sequencing noise. D153 is located on the outer wall of the porin transmembrane region. Mutations to hydrophobic amino acids can enhance the stability of the porin pore. R156, R167, D169, D201, E216 and E218 are located on the inner wall of the pore structure. They are charged amino acids on the inner wall of the porin barrel. Mutations in their charges can promote the smooth passage of the DNA molecules to be tested through the porin. R206, D209 and E213 are located in the exit loop region of the pore structure. Mutations to them can cause the sequenced nucleic acid chain to leave the porin.

上述突变位点和突变后的氨基酸，均能够影响孔蛋白单体及聚合形成的孔蛋白的孔道结构、通过性能、文库捕获能力等性质，尤其是位于sensor区的G63、N68和I69，位于孔蛋白孔道的最中央，对于蛋白孔道结构的直径、稳定性、与通过分子的亲和力、产生电信号的能力等性质的影响较大。其中N68和I69为sensor区最中央的氨基酸，对其的突变可以直接决定电流大小以及对穿孔待测物的分辨率，对上述2个位点的突变包括但不限于突变为N、Q、S、T、G或A等极性不带电或者小侧链的氨基酸。G63位于N68和I69下方，为了减少对测序电流和核酸穿孔的影响，可以将其突变为小侧链氨基酸，包括但不限于突变为A、S或T等。The above mutation sites and the mutated amino acids can affect the pore structure, passing performance, library capture ability and other properties of the pore protein monomer and the pore protein formed by polymerization, especially G63, N68 and I69 located in the sensor area, which are located in the center of the pore protein channel and have a greater impact on the diameter, stability, affinity with the passing molecules, and ability to generate electrical signals of the protein pore structure. Among them, N68 and I69 are the most central amino acids in the sensor area. Mutations to them can directly determine the current size and the resolution of the perforated analyte. Mutations to the above two sites include but are not limited to mutations to polar uncharged or small side chain amino acids such as N, Q, S, T, G or A. G63 is located below N68 and I69. In order to reduce the impact on sequencing current and nucleic acid perforation, it can be mutated to a small side chain amino acid, including but not limited to mutations to A, S or T.

在本申请第二种典型的实施方式中，提供了一种蛋白构建体，该蛋白构建体由2个或更多个上述孔蛋白单体、通过共价或非共价连接而成。In a second typical embodiment of the present application, a protein construct is provided. The protein construct is composed of two or more of the above-mentioned porin monomers linked covalently or non-covalently.

在本申请第三种典型的实施方式中，提供了一种孔蛋白，该孔蛋白由7-11个上述孔蛋白单体、通过共价或非共价连接而成。In a third typical embodiment of the present application, a porin is provided, wherein the porin is composed of 7 to 11 porin monomers mentioned above, connected covalently or non-covalently.

在一种优选的实施例中，孔蛋白由9个孔蛋白单体通过非共价连接而成。In a preferred embodiment, the porin is composed of 9 porin monomers linked non-covalently.

孔蛋白单体能够自发地通过氢键、离子键、疏水作用等力聚合在一起，聚合形成孔蛋白。因此表达、纯化获得的孔蛋白单体，在非变性的条件下以多聚体、尤其是九聚体的形式存在，而将蛋白质变性则是以孔蛋白单体的形式存在。Porin monomers can spontaneously aggregate together through hydrogen bonds, ionic bonds, hydrophobic interactions, etc. to form porins. Therefore, the porin monomers obtained by expression and purification exist in the form of polymers, especially nonamers, under non-denaturing conditions, while porin monomers exist after protein denaturation.

在一种优选的实施例中，孔蛋白的孔道直径为0.5～3nm。In a preferred embodiment, the pore diameter of the porin is 0.5-3 nm.

在一定范围上，孔蛋白的孔道直径越小，其在用于测序时的准确度越高。若孔蛋白的孔道直径过大(一次可能不止一个分子通过孔道)，难以满足单分子测序的需求，待测生物分子在过大的孔道中穿过时，产生的电流信号有可能会遗漏或产生错误，导致测序的准确度低。在单分子测序中，利用对同一分子进行多次测序的方式，获得准确的测序结果。因此测序的准确度越高，所需的测序次数和时间越短。利用测序准确度高的孔蛋白进行测序，能够大大减少测序的时间，降低成本，这种优势在高通量的测序中尤为明显。Within a certain range, the smaller the pore diameter of the porin, the higher its accuracy when used for sequencing. If the pore diameter of the porin is too large (more than one molecule may pass through the pore at a time), it is difficult to meet the needs of single-molecule sequencing. When the biological molecule to be tested passes through the overly large pore, the current signal generated may be missed or erroneous, resulting in low sequencing accuracy. In single-molecule sequencing, accurate sequencing results are obtained by sequencing the same molecule multiple times. Therefore, the higher the sequencing accuracy, the shorter the number and time of sequencing required. Using porin with high sequencing accuracy for sequencing can greatly reduce sequencing time and reduce costs. This advantage is particularly evident in high-throughput sequencing.

在本申请第四种典型的实施方式中，提供了一种试剂盒，该试剂盒包括上述孔蛋白单体、或蛋白构建体，或孔蛋白。In a fourth typical embodiment of the present application, a kit is provided, which includes the above-mentioned porin monomer, or protein construct, or porin.

为进一步提高操作便利性，在一种优选的实施例中，试剂盒还包括膜层，膜层包括脂质层或人造高分子膜。To further improve the convenience of operation, in a preferred embodiment, the kit further comprises a membrane layer, and the membrane layer comprises a lipid layer or an artificial polymer membrane.

优选地，脂质层包括两亲脂类；优选地，两亲脂类包含磷脂双分子层；优选地，脂质层包括平面膜层或脂质体；优选地，脂质体包括多层脂质体或单层脂质体；优选地，脂质层包括二植酰磷脂酰胆碱组成的磷脂双分子层。Preferably, the lipid layer comprises amphiphilic lipids; preferably, the amphiphilic lipids comprise a phospholipid bilayer; preferably, the lipid layer comprises a planar membrane layer or a liposome; preferably, the liposome comprises a multilayer liposome or a unilamellar liposome; preferably, the lipid layer comprises a phospholipid bilayer composed of diphytylphosphatidylcholine.

人造高分子膜包括但不限于聚硅氧烷、聚烯烃、全氟聚醚、全氟烃基聚醚、聚苯乙烯、聚氧丙烯、聚乙酸乙烯酯、聚氧丁烯、聚异戊二烯、聚丁二烯、聚氯乙烯、聚烷基丙烯酸酯、聚烷基甲基丙烯酸酯、聚丙烯腈、聚丙烯、PTHF、聚甲基丙烯酸酯、聚丙烯酸酯、聚砜、聚乙烯醚、聚(环氧丙烷)及其共聚物、基取代的C1-C6烷基丙烯酸酯和甲基丙烯酸酯、丙烯酰胺、甲基丙烯酰胺、(C1-C6烷基)丙烯酰胺和甲基丙烯酰胺、N，N-二烷基-丙烯酰胺、乙氧基丙烯酸酯和甲基丙烯酸酯、聚乙二醇单甲基丙烯酸酯和聚乙二醇单甲基醚甲基丙烯酸酯、羟基取代的(C1-C6烷基)丙烯酰胺和甲基丙烯酰胺、羟基取代的C1-C6烷基乙烯基醚、乙烯基磺酸钠、苯乙烯基磺酸钠、2-丙烯酰胺-2-甲基丙磺酸、N-乙烯基吡咯、N-乙烯基-2-吡咯烷酮、2-乙烯基恶唑啉、2-乙烯基-4，4′-双烷基恶唑啉基-5-酮、2，4-乙烯基吡啶、总共具有3-5个碳原子的乙烯化不饱和羧酸，氨基(C1-C6烷基)-、单(C1-C6烷氨基)(C1-C6烷基)-和双(C1-C6烷氨基)(C1-C6烷基)-丙烯酸酯和甲基丙烯酸酯、烯丙醇、3-三甲基铵甲基丙烯酸2-羟丙基酯氯化物、二甲基氨乙基甲基丙烯酸酯(DMAEMA)、二甲基氨乙基甲基丙烯酰胺、甘油甲基丙烯酸酯、N-(1，1-二甲基-3-氧代丁基)丙烯酰胺、环亚氨基醚、乙烯基醚、包含环氧衍生物的环醚、环不饱和醚、N-取代环乙亚胺、β-内酯和β-内酰胺、乙烯酮缩醛、乙烯基缩醛或正膦中的一种或多种。Artificial polymer membranes include, but are not limited to, polysiloxanes, polyolefins, perfluoropolyethers, perfluoroalkyl polyethers, polystyrenes, polyoxypropylenes, polyvinyl acetates, polyoxybutylenes, polyisoprene, polybutadiene, polyvinyl chlorides, polyalkyl acrylates, polyalkyl methacrylates, polyacrylonitrile, polypropylene, PTHF, polymethacrylates, polyacrylates, polysulfones, polyethylene ethers, poly(propylene oxide) and copolymers thereof, substituted C1-C6 alkyl acrylates and methacrylates, acrylamides, methacrylamides, (C1-C6 alkyl) acrylamides and methacrylamides, N,N-dialkyl-acrylamides, ethoxy acrylates and methacrylates, polyethylene glycol monomethacrylates and polyethylene glycol monomethyl ether methacrylates, hydroxy-substituted (C1-C6 alkyl) acrylamides and methacrylamides, hydroxy-substituted C1-C6 alkyl vinyl ethers, sodium vinyl sulfonates, sodium styrene sulfonates, 2-acrylamide-2-methyl The invention can be one or more of 2-hydroxypropyl 2-nitropropanesulfonic acid, N-vinylpyrrole, N-vinyl-2-pyrrolidone, 2-vinyloxazoline, 2-vinyl-4,4′-bisalkyloxazolinyl-5-one, 2,4-vinylpyridine, ethylenically unsaturated carboxylic acids having a total of 3 to 5 carbon atoms, amino(C1-C6 alkyl)-, mono(C1-C6 alkylamino)(C1-C6 alkyl)- and bis(C1-C6 alkylamino)(C1-C6 alkyl)-acrylates and methacrylates, allyl alcohol, 3-trimethylammonium methacrylate 2-hydroxypropyl ester chloride, dimethylaminoethyl methacrylate (DMAEMA), dimethylaminoethyl methacrylamide, glycerol methacrylate, N-(1,1-dimethyl-3-oxobutyl)acrylamide, cyclic imino ethers, vinyl ethers, cyclic ethers containing epoxy derivatives, cyclic unsaturated ethers, N-substituted ethylenimines, β-lactones and β-lactams, vinyl ketone acetals, vinyl acetals or phosphoranes.

在一种优选的实施例中，试剂盒还包括测序缓冲液和/或连接胆固醇的单链DNA。In a preferred embodiment, the kit further comprises a sequencing buffer and/or single-stranded DNA linked to cholesterol.

利用上述试剂盒中的孔蛋白、膜层和测序缓冲液，能够在测序缓冲液中，将一个或多个孔蛋白插入到膜层中，形成纳米孔传感器。纳米孔实验缓冲液能够提供维持孔蛋白和膜层稳定的中性环境，其中包含的金属离子使纳米孔实验缓冲液具有良好的导电性。对于膜层的选择可以有多种选择，在平面膜层或球形的脂质体上，在不同成分形成的脂质层上，孔蛋白均能够插入，形成纳米孔传感器。By using the porins, membrane layer and sequencing buffer in the above kit, one or more porins can be inserted into the membrane layer in the sequencing buffer to form a nanopore sensor. The nanopore experiment buffer can provide a neutral environment to maintain the stability of the porins and the membrane layer, and the metal ions contained in the nanopore experiment buffer have good conductivity. There are many options for the selection of the membrane layer. Porin can be inserted into a planar membrane layer or a spherical liposome, or on a lipid layer formed by different components to form a nanopore sensor.

单链DNA、RNA等待测分子上带有的胆固醇，可以与上述脂质层或人造高分子膜进行结合，有助于纳米孔捕获测序文库，降低测序文库上样量。上述试剂盒中的胆固醇，在实际使用时可先与待测分子结合后再加入纳米孔传感器所在的空间进行测序。The cholesterol on the single-stranded DNA, RNA and other test molecules can bind to the above-mentioned lipid layer or artificial polymer membrane, which helps the nanopore capture the sequencing library and reduces the amount of sequencing library loading. When actually used, the cholesterol in the above-mentioned kit can first bind to the test molecule and then be added to the space where the nanopore sensor is located for sequencing.

在本申请第五种典型的实施方式中，提供了一种分离的DNA分子，该DNA分子具有：编码上述孔蛋白单体、或编码上述蛋白构建体、或编码上述孔蛋白的核苷酸序列。In a fifth typical embodiment of the present application, an isolated DNA molecule is provided, wherein the DNA molecule has a nucleotide sequence encoding the above-mentioned porin monomer, or encoding the above-mentioned protein construct, or encoding the above-mentioned porin.

在一种优选的实施例中，DNA分子具有SEQ ID NO：8所示的核苷酸序列。In a preferred embodiment, the DNA molecule has a nucleotide sequence shown in SEQ ID NO: 8.

在一种优选的实施例中，与SEQ ID NO：8所示的核苷酸序列具有70％以上，优选80％以上，更优选90％以上，进一步优选95％以上同一性(比如可以是70％、75％、80％、85％、86％、87％、88％、89％、90％、91％、92％、93％、94％、95％、96％、97％、98％、98.5％、99％、99.5％、99.6％、99.7％、99.8％以上，甚至99.9％以上)且编码具有相同功能蛋白质的DNA分子。In a preferred embodiment, the DNA molecule has more than 70%, preferably more than 80%, more preferably more than 90%, and further preferably more than 95% identity with the nucleotide sequence shown in SEQ ID NO: 8 (for example, it can be 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, 99.5%, 99.6%, 99.7%, 99.8% or more, or even 99.9% or more) and encodes a protein with the same function.

SEQ ID NO：8：SEQ ID NO: 8:

上述DNA分子，能够编码具有本申请上述结构和功能的孔蛋白单体。在SEQ ID NO：8所示序列的基础上对核苷酸进行突变，在严格条件下与(a)限定的DNA分子杂交，且不发生移码突变。若突变发生在编码孔蛋白孔道上的核苷酸，可能会导致编码出孔道发生改变的孔蛋白，影响该孔蛋白的孔径和孔道内壁的氨基酸残基的性质；若突变发生在编码蛋白质非孔道部分的核苷酸上，可能会影响编码蛋白质的折叠方式、三维结构等性质，从而影响蛋白质的理化性质和稳定性。The above-mentioned DNA molecule can encode a porin monomer having the above-mentioned structure and function of the present application. The nucleotides are mutated on the basis of the sequence shown in SEQ ID NO: 8, and hybridized with the DNA molecule defined in (a) under strict conditions, and no frameshift mutation occurs. If the mutation occurs in the nucleotides encoding the porin pore, it may result in the encoding of a porin with an altered pore, affecting the pore size of the porin and the properties of the amino acid residues on the inner wall of the pore; if the mutation occurs in the nucleotides encoding the non-pore part of the protein, it may affect the folding mode, three-dimensional structure and other properties of the encoded protein, thereby affecting the physicochemical properties and stability of the protein.

本申请中“分离的”是指“通过人工”从其天然状态改变，即，如果它在自然界中发生，则将其改变和/或从其原始环境中分离出来。例如，天然存在于生命有机体中的多核苷酸或多肽不是“分离的”，然而从其天然状态的共存物中分离的相同的多核苷酸或多肽是“分离的”(如在本文中使用的术语)。"Isolated" in this application means changed "by the hand of man" from its natural state, i.e., if it occurs in nature, it is changed and/or separated from its original environment. For example, a polynucleotide or polypeptide naturally present in a living organism is not "isolated", however, the same polynucleotide or polypeptide separated from its coexisting state in its natural state is "isolated" (as the term is used in this article).

在本申请第六种典型的实施方式中，提供了一种重组载体，该重组载体包含上述DNA分子。In a sixth typical embodiment of the present application, a recombinant vector is provided, which comprises the above-mentioned DNA molecule.

在重组载体上插入上述DNA分子即孔蛋白表达基因，利用重组载体能够大量自我复制的功能，大量复制孔蛋白表达基因。此处的“重组”是指通过将来自一个物种的基因移植或剪接到不同物种的宿主有机体的细胞中而制备的基因工程化的DNA。这种DNA成为宿主基因结构的一部分并被复制。The above DNA molecule, i.e., the porin expression gene, is inserted into a recombinant vector, and the porin expression gene is replicated in large quantities by utilizing the ability of the recombinant vector to replicate itself in large quantities. "Recombinant" here refers to genetically engineered DNA prepared by transplanting or splicing a gene from one species into the cells of a host organism of a different species. This DNA becomes part of the host's gene structure and is replicated.

在本申请第七种典型的实施方式中，提供了一种宿主细胞，该宿主细胞转化有上述重组载体。In a seventh typical embodiment of the present application, a host cell is provided, wherein the host cell is transformed with the above-mentioned recombinant vector.

将上述重组载体转化入宿主细胞中，利用宿主细胞对重组载体上的孔蛋白表达基因进行复制、转录、翻译，能够大量产生孔蛋白。宿主细胞包括大肠杆菌、酵母菌、哺乳动物细胞、昆虫细胞等常用宿主细胞，利用宿主细胞对孔蛋白进行折叠使之形成正确的三维结构，获得结构和功能正常的孔蛋白。The above recombinant vector is transformed into a host cell, and the host cell is used to replicate, transcribe, and translate the porin expression gene on the recombinant vector, so that a large amount of porin can be produced. The host cell includes common host cells such as Escherichia coli, yeast, mammalian cells, and insect cells. The host cell is used to fold the porin to form a correct three-dimensional structure, and a porin with normal structure and function is obtained.

在本申请第八种典型的实施方式中，提供了一种纳米孔传感器，该纳米孔传感器包括：膜层；以及插入膜层中间以形成孔道的孔蛋白，当跨越膜层施加电压时，孔道产生电流；其中，孔蛋白包括上述孔蛋白。In an eighth typical embodiment of the present application, a nanopore sensor is provided, which includes: a membrane layer; and a pore protein inserted in the middle of the membrane layer to form a pore, and when a voltage is applied across the membrane layer, the pore generates current; wherein the pore protein includes the above-mentioned pore protein.

本申请中的纳米孔传感器特指有孔蛋白插入的膜层。此种纳米孔传感器，在膜层中插入带有孔道的孔蛋白，能够将孔蛋白的孔道朝向进行固定，在跨越膜层施加电压时，孔道直径垂直于电场力方向，膜层两侧的离子在电场力的作用下穿过孔蛋白的孔道，产生电流。The nanopore sensor in this application specifically refers to a membrane layer with porin inserted. This type of nanopore sensor inserts a porin with a pore in the membrane layer, which can fix the pore direction of the porin. When a voltage is applied across the membrane layer, the pore diameter is perpendicular to the direction of the electric field force, and the ions on both sides of the membrane layer pass through the pore of the porin under the action of the electric field force, generating current.

在一种优选的实施例中，膜层包括脂质层或人造高分子膜；优选地，脂质层包括两亲脂类；优选地，两亲脂类包含磷脂双分子层；优选地，脂质层包括平面膜层或脂质体；优选地，脂质体包括多层脂质体或单层脂质体；优选地，脂质层包括二植酰磷脂酰胆碱(DPhPC，1，2-diphytanoyl-sn-glycero-3-phosphocholine)组成的磷脂双分子层。In a preferred embodiment, the membrane layer comprises a lipid layer or an artificial polymer membrane; preferably, the lipid layer comprises amphiphilic lipids; preferably, the amphiphilic lipids comprise a phospholipid bilayer; preferably, the lipid layer comprises a planar membrane layer or a liposome; preferably, the liposome comprises a multilayer liposome or a unilamellar liposome; preferably, the lipid layer comprises a phospholipid bilayer composed of diphytanoylphosphatidylcholine (DPhPC, 1,2-diphytanoyl-sn-glycero-3-phosphocholine).

对于脂质层的选择可以有多种选择，在平面膜层或球形的脂质体上，在不同成分形成的脂质层上，孔蛋白均能够插入，形成纳米孔传感器。There are many options for the selection of lipid layers. Porin can be inserted into planar membrane layers or spherical liposomes, or into lipid layers formed of different components to form nanopore sensors.

当跨越膜层施加电场力时，待测生物分子在电压的作用下经由孔道穿过孔蛋白。待测生物分子包括DNA、RNA、多肽或蛋白质等携带生物遗传信息的生物大分子。待测生物分子可以带有用于修饰的基团分子。基团分子包括但不限于胆固醇、不同聚合度的聚乙二醇、生物素或荧光基团分子。When an electric field force is applied across the membrane layer, the biomolecule to be tested passes through the pore protein through the pore under the action of the voltage. The biomolecule to be tested includes biomacromolecules carrying biological genetic information such as DNA, RNA, polypeptides or proteins. The biomolecule to be tested can carry a group molecule for modification. The group molecule includes but is not limited to cholesterol, polyethylene glycol with different polymerization degrees, biotin or fluorescent group molecules.

在本申请第九种典型的实施方式中，提供了一种纳米孔测序装置，该纳米孔测序装置包括上述纳米孔传感器。In a ninth typical embodiment of the present application, a nanopore sequencing device is provided, wherein the nanopore sequencing device comprises the above-mentioned nanopore sensor.

在一种优选的实施例中，纳米孔测序装置包括：电解槽，电解槽含有测序缓冲液；纳米孔传感器，纳米孔传感器位于电解槽的中央，并将电解槽及测序缓冲液分割为正极电解液区和负极电解液区；第一电极和第二电极，第一电极和第二电极分别设置在正极电解液区和负极电解液区，且第一电极和第二电极与信号处理芯片相连；优选地，第一电极和第二电极包括金属或复合电极材料；优选地，第一电极和第二电极不同，分别为银和氯化银；或者第一电极和第二电极相同，各自独立地选自金、铂、石墨烯或氮化钛。In a preferred embodiment, the nanopore sequencing device includes: an electrolytic cell, the electrolytic cell contains a sequencing buffer; a nanopore sensor, the nanopore sensor is located in the center of the electrolytic cell and divides the electrolytic cell and the sequencing buffer into a positive electrolyte region and a negative electrolyte region; a first electrode and a second electrode, the first electrode and the second electrode are respectively arranged in the positive electrolyte region and the negative electrolyte region, and the first electrode and the second electrode are connected to a signal processing chip; preferably, the first electrode and the second electrode include metal or composite electrode materials; preferably, the first electrode and the second electrode are different, and are silver and silver chloride, respectively; or the first electrode and the second electrode are the same, and are each independently selected from gold, platinum, graphene or titanium nitride.

在纳米孔测序装置中，包括含有电解液的电解槽、纳米孔传感器、第一电极和第二电极。将纳米孔传感器放入含有电解液的电解槽中央，将电解槽分解形成正极电解液区和负极电解液区，2个区域分别设置有2个电极，利用2个电极形成施加在纳米孔传感器上的电场。待测生物分子通过膜上的孔蛋白，产生电流振幅。通过接收此种电流振幅，并将电流振幅传送至与电极相连的信号处理芯片。根据电流振幅的不同，该信号处理芯片，即包含信号处理芯片的纳米孔测序装置，能对待测生物分子的序列进行数据分析和测定。In the nanopore sequencing device, there are an electrolytic cell containing an electrolyte, a nanopore sensor, a first electrode, and a second electrode. The nanopore sensor is placed in the center of the electrolytic cell containing an electrolyte, and the electrolytic cell is decomposed into a positive electrolyte area and a negative electrolyte area. Two electrodes are respectively provided in the two areas, and the two electrodes are used to form an electric field applied to the nanopore sensor. The biological molecules to be tested pass through the pore protein on the membrane, generating a current amplitude. By receiving this current amplitude and transmitting the current amplitude to a signal processing chip connected to the electrode. According to the difference in current amplitude, the signal processing chip, that is, the nanopore sequencing device including the signal processing chip, can perform data analysis and determination on the sequence of the biological molecules to be tested.

在本申请第十种典型的实施方式中，提供了一种测序方法，该测序方法利用上述孔蛋白，或者纳米孔传感器，或者纳米孔测序装置检测并解析待测生物分子通过孔蛋白的孔道时产生的电信号，确定待测生物分子的序列。In the tenth typical embodiment of the present application, a sequencing method is provided, which utilizes the above-mentioned pore protein, or nanopore sensor, or nanopore sequencing device to detect and analyze the electrical signal generated when the biological molecule to be tested passes through the pore of the pore protein, so as to determine the sequence of the biological molecule to be tested.

在一种优选的实施例中，待测生物分子包括如下任意一种修饰或未修饰的生物分子：DNA、RNA或多肽。In a preferred embodiment, the biomolecule to be detected includes any one of the following modified or unmodified biomolecules: DNA, RNA or polypeptide.

在一种优选的实施例中，待测生物分子是靶核酸序列，测序方法包括：(a)使靶核酸序列与核酸结合蛋白接触，核酸结合蛋白控制靶核酸序列通过孔蛋白的孔道的移动速度；(b)在跨孔施加电压时，靶核酸序列移动通过孔道，测量通过孔道的电信号，其中，不同类型的核苷酸通过孔道所产生的电信号不同，从而基于电信号确定靶核酸的序列信息；优选地，核酸结合蛋白选自核酸酶、聚合酶、拓扑异构酶、连接酶、解旋酶和单链结合蛋白；优选地，电信号包括电流。In a preferred embodiment, the biological molecule to be detected is a target nucleic acid sequence, and the sequencing method comprises: (a) contacting the target nucleic acid sequence with a nucleic acid binding protein, which controls the movement speed of the target nucleic acid sequence through the pore of the pore protein; (b) when a voltage is applied across the pore, the target nucleic acid sequence moves through the pore, and the electrical signal passing through the pore is measured, wherein different types of nucleotides generate different electrical signals when passing through the pore, thereby determining the sequence information of the target nucleic acid based on the electrical signal; preferably, the nucleic acid binding protein is selected from nucleases, polymerases, topoisomerases, ligases, helicases and single-stranded binding proteins; preferably, the electrical signal comprises an electric current.

在跨孔施加电压时，溶液中离子穿过孔蛋白的中央门控区，产生电流。当靶核酸序列(包括DNA或RNA)移动通过孔道时，测量通过孔道的电信号，其中，由于不同类型的核苷酸的大小不同，对电流的阻滞程度不同，从而使得不同核苷酸组成的待测物通过孔道时，产生大小不同的电流信号。通过解析此电流信号，可以确定靶核酸的序列信息。When voltage is applied across the pore, ions in the solution pass through the central gating region of the pore protein, generating an electric current. When the target nucleic acid sequence (including DNA or RNA) moves through the pore, the electrical signal passing through the pore is measured. Since different types of nucleotides have different sizes, the degree of blockage to the current is different, so that when the analyte composed of different nucleotides passes through the pore, different current signals are generated. By analyzing this current signal, the sequence information of the target nucleic acid can be determined.

在本申请第十一种典型的实施方式中，提供了一种上述孔蛋白单体、孔蛋白、试剂盒、DNA分子、重组载体、宿主细胞、纳米孔传感器、或者纳米孔测序装置，在小分子检测、DNA测序、RNA测序或多肽测序中的应用。In the eleventh typical embodiment of the present application, there is provided an application of the above-mentioned porin monomer, porin, kit, DNA molecule, recombinant vector, host cell, nanopore sensor, or nanopore sequencing device in small molecule detection, DNA sequencing, RNA sequencing or polypeptide sequencing.

上述小分子包括但不限于核苷酸、氨基酸、多糖或维生素等小分子化合物。The above-mentioned small molecules include but are not limited to small molecule compounds such as nucleotides, amino acids, polysaccharides or vitamins.

纳米孔测序相较于传统测序的一大优势：不会因为错误累积而影响准确率，因此可达到极长的读长。进而可以弥补传统测序短测序片段组装时无法避免的空缺(Gap)问题，判断染色体中是否发生长片段的缺失、重复、倒位、易位，覆盖典型长度为数kb的转录组全长，从而为基因组组装、结构变异、可变剪切等科学研究提供全新的解决方案。One major advantage of nanopore sequencing over traditional sequencing is that it does not affect accuracy due to accumulated errors, so it can achieve extremely long read lengths. This can make up for the gaps that are unavoidable when assembling short sequencing fragments in traditional sequencing, determine whether long fragments are missing, repeated, inverted, or translocated in chromosomes, and cover the full length of the transcriptome, which is typically several kb in length, thus providing a new solution for scientific research such as genome assembly, structural variation, and variable splicing.

由于纳米孔测序无需PCR扩增，因此可保留待测核酸分子上的原始碱基修饰信息，进而直接一次性测序获知修饰碱基的种类、位点及丰度。因而，本申请的孔蛋白同样能够对数种带有DNA/RNA修饰碱基的核酸分子进行检测：包括5-甲基胞嘧啶(5mC)、6-甲基腺嘌呤(m6A)、7-甲基鸟嘌呤(m7G)、假尿嘧啶(pseudouridine，Ψ)等。通过对各种修饰碱基进行特定模型训练与算法开发，纳米孔测序可完成更多修饰碱基的识别定位，从而构建出更为完备的基因组/转录组修饰图谱。Since nanopore sequencing does not require PCR amplification, the original base modification information on the nucleic acid molecule to be tested can be retained, and the type, site and abundance of the modified base can be directly sequenced at one time. Therefore, the pore protein of the present application can also detect several nucleic acid molecules with DNA/RNA modified bases: including 5-methylcytosine (5mC), 6-methyladenine (m6A), 7-methylguanine (m7G), pseudouracil (pseudouridine, Ψ), etc. Through specific model training and algorithm development for various modified bases, nanopore sequencing can complete the identification and positioning of more modified bases, thereby constructing a more complete genome/transcriptome modification map.

此外，从临床应用的角度考虑，纳米孔测序长读长、高便携性、快测序速度与实时读出的特点，因而适合应用于重大疫情监测及病原快速检测中(比如，寨卡(Zika virus)病毒、埃博拉病毒(Ebola virus)、登革热病毒(Dengue virus)及新型冠状病毒(Coronavirus)等大规模流行病的行动中)，极具时效性。除病毒外，纳米孔测序还可用于细菌、真菌等其它病原体的快速检测。In addition, from the perspective of clinical application, nanopore sequencing has the characteristics of long read length, high portability, fast sequencing speed and real-time readout, so it is suitable for major epidemic monitoring and rapid detection of pathogens (for example, large-scale epidemic operations such as Zika virus, Ebola virus, Dengue virus and Coronavirus), which is very timely. In addition to viruses, nanopore sequencing can also be used for rapid detection of other pathogens such as bacteria and fungi.

基于蛋白质与核酸分子的共性组成，纳米孔测序平台在蛋白质测序领域也具有巨大的应用潜力。比如，根据目前已经进行的探索可知：通过使用蛋白解折叠酶作为控速工具，成功观察到蛋白质特征性信号，并实现蛋白质种类与修饰状态的初步识别，验证了孔蛋白质测序的可能性。在未来的发展中，通过进一步优化控速体系、开发适配的孔蛋白与信号解析算法，可最终实现在单分子水平对蛋白质进行指纹图谱识别甚至序列鉴定。Based on the common composition of proteins and nucleic acid molecules, the nanopore sequencing platform also has great application potential in the field of protein sequencing. For example, according to the exploration that has been carried out so far, by using protein unfolding enzymes as rate control tools, the characteristic signals of proteins have been successfully observed, and the initial identification of protein types and modification states has been achieved, verifying the possibility of pore protein sequencing. In future development, by further optimizing the rate control system and developing adapted pore proteins and signal analysis algorithms, it will eventually be possible to achieve fingerprint recognition and even sequence identification of proteins at the single-molecule level.

除应用于测序领域外，纳米孔平台还可作为基础检测平台，结合传感手段，完成各种小分子与大分子的代谢组学检测。结合基因组学、蛋白组学、代谢组学，纳米孔平台最终可发展为一种满足全组学分析需求的通用型测量平台，为更深刻地理解生命规律与疾病发生机制提供强有力的研究工具。In addition to being used in the field of sequencing, the nanopore platform can also be used as a basic detection platform, combined with sensing methods, to complete the metabolomics detection of various small and large molecules. In combination with genomics, proteomics, and metabolomics, the nanopore platform can eventually develop into a universal measurement platform that meets the needs of full-omics analysis, providing a powerful research tool for a deeper understanding of the laws of life and the mechanisms of disease occurrence.

下面将结合具体的实施例来进一步详细解释本申请的有益效果。The beneficial effects of the present application will be further explained in detail below in conjunction with specific embodiments.

实施例1野生型BCP20的AlphaFold2-Multimer的预测结构Example 1 Predicted structure of AlphaFold2-Multimer of wild-type BCP20

九个孔蛋白单体(SEQ ID NO：1)通过非共价聚合为九聚体，即可获得孔蛋白BCP20，利用AlphaFold2-Multimer对BCP20进行结构预测。预测结果如图1、图2和图3所示。图1为BCP20预测结构的侧视图(sideview)，图2为BCP20预测结构的俯视图(topview)，图3为BCP20预测结构的sensor区的重要氨基酸的侧链结构，显示出氨基酸侧链的三个氨基酸分别为G63、N68和I69，图3中A示出了I69之间的距离为

N68之间的距离为

G63之间的距离为

图3中B示出了侧链结构的放大图。 Nine porin monomers (SEQ ID NO: 1) are non-covalently polymerized into nonamers to obtain porin BCP20, and the structure of BCP20 is predicted using AlphaFold2-Multimer. The prediction results are shown in Figures 1, 2, and 3. Figure 1 is a side view of the predicted structure of BCP20, Figure 2 is a top view of the predicted structure of BCP20, and Figure 3 is the side chain structure of important amino acids in the sensor region of the predicted structure of BCP20, showing that the three amino acids in the amino acid side chain are G63, N68, and I69, and Figure 3 A shows that the distance between I69 is

The distance between N68 is

The distance between G63 is

FIG3B shows an enlarged view of the side chain structure.

SEQ ID NO：1：SEQ ID NO: 1:

实施例2孔蛋白单体及其突变体表达载体的构建Example 2 Construction of expression vectors for porin monomers and their mutants

通过In-fusion的方法，采用NdeI和XhoI酶切后，将孔蛋白单体编码的DNA序列(SEQ ID NO：8)插入到载体pET24a的多克隆区。在孔蛋白单体的氨基酸序列(SEQ ID NO：1)的C端添加StrepII氨基酸作为纯化标签，其中筛选标签为卡那霉素，将构建好的载体命名为pET24a-BCP20。通过定点突变的方法，采用Agilent定点突变试剂盒，以孔蛋白单体的表达载体为模板，构建相应的突变体。本申请中构建了G63A、N68Q和I69N的孔蛋白单体突变体，突变体的构建方法与野生型一致。By the In-fusion method, the DNA sequence encoding the porin monomer (SEQ ID NO: 8) was inserted into the multiple cloning region of the vector pET24a after digestion with NdeI and XhoI. StrepII amino acid was added to the C-terminus of the amino acid sequence of the porin monomer (SEQ ID NO: 1) as a purification tag, wherein the screening tag was kanamycin, and the constructed vector was named pET24a-BCP20. By the site-directed mutagenesis method, the Agilent site-directed mutagenesis kit was used to construct the corresponding mutants with the expression vector of the porin monomer as a template. In this application, porin monomer mutants of G63A, N68Q and I69N were constructed, and the construction method of the mutants was consistent with that of the wild type.

实施例3孔蛋白单体菌株的培养和诱导Example 3 Cultivation and induction of porin monomer strains

将构建好的孔蛋白单体或其突变体表达质粒分别独立转化到大肠杆菌表达菌株E.coli BL21(DE3)中，将菌液均匀涂抹在含50μg/mL卡那霉素的平板上，37℃过夜培养。次日挑取单菌落接种于含50μg/mL卡那霉素的5mL LB液体培养基中，37℃，200rpm，过夜培养。将上述所得菌液，按体积比1∶100接种于含有50μg/mL卡那霉素的50mL LB液体培养基中，37℃，200rpm，培养4h。将扩大培养的菌液，按体积比1∶100接种于含有50μg/mL卡那霉素的2L LB液体培养基中培养，37℃，200rpm。待OD ₆₀₀值达0.6-0.8左右，加入终浓度为0.5mM的IPTG，16℃，200rpm，培养约16-18h。将菌液于8000rpm离心收集，菌体冻存于-20℃待用。 The constructed porin monomer or its mutant expression plasmids were independently transformed into the E. coli expression strain E. coli BL21 (DE3), and the bacterial solution was evenly spread on a plate containing 50 μg/mL kanamycin and cultured at 37°C overnight. The next day, a single colony was picked and inoculated into 5 mL LB liquid culture medium containing 50 μg/mL kanamycin, and cultured at 37°C, 200 rpm, overnight. The above-obtained bacterial solution was inoculated into 50 mL LB liquid culture medium containing 50 μg/mL kanamycin at a volume ratio of 1:100, and cultured at 37°C, 200 rpm for 4 hours. The expanded cultured bacterial solution was inoculated into 2 L LB liquid culture medium containing 50 μg/mL kanamycin at a volume ratio of 1:100 and cultured at 37°C, 200 rpm. When the _OD600 value reaches about 0.6-0.8, add IPTG with a final concentration of 0.5mM, culture at 16℃, 200rpm for about 16-18h. Collect the bacterial solution by centrifugation at 8000rpm and freeze the bacteria at -20℃ for later use.

实施例4重组型孔蛋白单体的提取与纯化Example 4 Extraction and purification of recombinant porin monomers

(1)纯化Buffer配制(1) Preparation of purification buffer

Buffer A：20mM Tris-HCl，250mM NaCl，1％DDM，pH 8.0。Buffer A: 20 mM Tris-HCl, 250 mM NaCl, 1% DDM, pH 8.0.

Buffer B：20mM Tris-HCl，250mM NaCl，0.05％DDM，pH 8.0。Buffer B: 20 mM Tris-HCl, 250 mM NaCl, 0.05% DDM, pH 8.0.

Buffer C：20mM Tris-HCl，250mM NaCl，0.05％DDM，5mM脱硫生物素，pH 8.0。Buffer C: 20 mM Tris-HCl, 250 mM NaCl, 0.05% DDM, 5 mM desthiobiotin, pH 8.0.

(2)纯化步骤(2) Purification step

按1g菌体加10mL Buffer A的比例充分重悬菌体，超声破碎细胞至菌体溶液澄清。然后，置于旋转仪上4℃旋转过夜。次日18000rpm 4℃离心1h，取上清，0.22μm滤膜过滤后于4℃待用。Resuspend the cells thoroughly at a ratio of 10 mL Buffer A per 1 g of cells, and ultrasonically disrupt the cells until the cell solution is clear. Then, place the cell on a rotator and rotate it at 4°C overnight. Centrifuge at 18,000 rpm at 4°C for 1 hour the next day, take the supernatant, filter it with a 0.22 μm filter membrane, and store at 4°C for later use.

用AKTA pure层析仪将Strep-Tactin beads(IBA Lifesciences)层析柱利用Buffer A平衡5柱体积(CV)后，2mL/min上样。上样完成后，使用Buffer B冲洗20CV，使用Buffer C洗脱，收集目的蛋白。The Strep-Tactin beads (IBA Lifesciences) column was equilibrated with Buffer A for 5 column volumes (CV) using an AKTA pure chromatograph, and then sampled at 2 mL/min. After sample loading, Buffer B was used to wash for 20 CV, and Buffer C was used to elute and collect the target protein.

将得到的蛋白浓缩至1mL，过经buffer B平衡的Superdex 6increase 10/300GL(Cytiva)柱子，收集目的蛋白，随后储存于-80℃。将纯化后获得的目的蛋白进行SDS-PAGE电泳，结果如图4所示。其中包括未煮(孔蛋白九聚体BCP20)和95℃煮后(变性后为孔蛋白单体)的电泳条带。结果显示目标蛋白在未煮的情况下为聚体状态，煮后为单体状态。突变体的SDS-PAGE结果与野生型蛋白一致，在此不做具体展示。The obtained protein was concentrated to 1 mL, passed through a Superdex 6increase 10/300GL (Cytiva) column equilibrated with buffer B, and the target protein was collected and then stored at -80°C. The target protein obtained after purification was subjected to SDS-PAGE electrophoresis, and the results are shown in Figure 4. It includes electrophoresis bands before cooking (porin nonamer BCP20) and after cooking at 95°C (porin monomer after denaturation). The results show that the target protein is in a polymer state before cooking and in a monomer state after cooking. The SDS-PAGE results of the mutant are consistent with those of the wild-type protein, which will not be specifically displayed here.

实施例5文库构建Example 5 Library Construction

将两条部分区域互补的DNA链的正义链和反义链(SEQ ID NO：4)退火后形成接头，与待测双链目的片段pUC57(SEQ ID NO：5)利用T4 DNA连接酶在室温下连接并纯化，制备测序文库。然后该测序文库与解旋酶BCH105(SEQ ID NO：6)在25℃孵育1h(摩尔浓度比1∶8)，形成含有BCH105马达蛋白的测序文库(如图5所示)。在测序时，该测序文库能够进一步与带有胆固醇的单链DNA(SEQ ID NO：7，胆固醇连接在DNA的5′端)互补配对结合，形成如图8所示结构。The sense strand and antisense strand of two partially complementary DNA strands (SEQ ID NO: 4) were annealed to form a linker, and then connected to the double-stranded target fragment pUC57 (SEQ ID NO: 5) to be tested using T4 DNA ligase at room temperature and purified to prepare a sequencing library. The sequencing library was then incubated with helicase BCH105 (SEQ ID NO: 6) at 25°C for 1 hour (molar concentration ratio 1:8) to form a sequencing library containing BCH105 motor protein (as shown in Figure 5). During sequencing, the sequencing library can further complementarily pair with single-stranded DNA with cholesterol (SEQ ID NO: 7, cholesterol is connected to the 5′ end of DNA) to form a structure as shown in Figure 8.

接头序列正义链：S1-(iSp18) ₄-S2。 Linker sequence sense strand: S1-(iSp18) ₄ -S2.

其中S1的序列如SEQ ID NO：2所示，S2的序列如SEQ ID NO：3所示，iSp18的结构如图9所示。The sequence of S1 is shown in SEQ ID NO: 2, the sequence of S2 is shown in SEQ ID NO: 3, and the structure of iSp18 is shown in Figure 9.

SEQ ID NO：2：tttttttttttttttttttttttttttttttttttttttt。SEQ ID NO:2:tttttttttttttttttttttttttttttttttttttttttttt.

SEQ ID NO：3：ggttgtttctgttggtgctgatattgct。SEQ ID NO:3:ggttgtttctgttggtgctgatattgct.

SEQ ID NO：4(接头序列反义链)：SEQ ID NO: 4 (linker sequence antisense strand):

gcaatatcagcaccaacagaaacaacctttgaggcgagcggtcaa。gcaatatcagcaccaacagaaacaacctttgaggcgagcggtcaa.

SEQ ID NO：5：SEQ ID NO: 5:

SEQ ID NO：6：SEQ ID NO: 6:

实施例6利用孔蛋白BCP20及其突变体构建纳米孔生物传感器Example 6 Construction of nanopore biosensor using porin BCP20 and its mutants

使用膜片钳放大器或其他电信号放大器采集电流信号。按照文献(Ji Z，Guo P.Channel from bacterial virus T7 DNA packaging motor for the differentiation of peptides composed of a mixture of acidic and basic amino acids.Biomaterials.2019 May 21；214：119222)所披露的方法搭建基于膜片钳和信号放大器的单通道纳米孔检测系统。Ag/AgCl电极浸润在测序缓冲液中并且电极分别位于电解槽顺式(cis)和反式(trans)区域。使用1xPBS缓冲液将孔蛋白(即实施例4纯化获得的孔蛋白BCP20)稀释一定的倍数后，在外加电场力作用下将单个纳米孔蛋白BCP20插入由二脂酰磷脂酰胆碱(DPhPC，1，2-diphytanoyl-sn-glycero-3-phosphocholine)组成的磷脂双分子层中，形成纳米孔生物传感器。稀释倍数以孔蛋白是否嵌入膜中(即嵌孔)为标准。一般地，使用0.1mg/ml的蛋白浓度，用PBS稀释100倍或者50倍或者其他倍数，进行尝试。如果某一稀释浓度未能嵌孔，则需要降低稀释倍数继续尝试，直至纳米孔蛋白成功嵌入膜层中。施加外加电压，获得单个孔蛋白的电流振幅值。图6为施加0.02V、0.04V、0.10V、0.14V和0.18V电压时纳米孔蛋白BCP20的孔道所产生的生物传感电流。同样的，孔蛋白BCP20突变体(G63A、N68Q和I69N)也产生了生物传感电流。Use a patch clamp amplifier or other electrical signal amplifier to collect current signals. According to the method disclosed in the literature (Ji Z, Guo P. Channel from bacterial virus T7 DNA packaging motor for the differentiation of peptides composed of a mixture of acidic and basic amino acids. Biomaterials. 2019 May 21; 214: 119-222), a single-channel nanopore detection system based on patch clamp and signal amplifier was constructed. Ag/AgCl electrodes were immersed in sequencing buffer and the electrodes were located in the cis and trans regions of the electrolytic cell, respectively. After diluting the porin (i.e., the porin BCP20 purified in Example 4) by a certain multiple using 1xPBS buffer, a single nanoporin BCP20 is inserted into a phospholipid bilayer composed of diacylphosphatidylcholine (DPhPC, 1,2-diphytanoyl-sn-glycero-3-phosphocholine) under the action of an external electric field force to form a nanopore biosensor. The dilution multiple is based on whether the porin is embedded in the membrane (i.e., embedded in the pore). Generally, a protein concentration of 0.1 mg/ml is used and diluted 100 times, 50 times, or other times with PBS for an attempt. If a certain dilution concentration fails to embed the pore, it is necessary to reduce the dilution multiple and continue to try until the nanoporin is successfully embedded in the membrane layer. An external voltage is applied to obtain the current amplitude value of a single porin. Figure 6 shows the biosensor current generated by the pore of the nanoporin BCP20 when voltages of 0.02V, 0.04V, 0.10V, 0.14V, and 0.18V are applied. Similarly, porin BCP20 mutants (G63A, N68Q, and I69N) also generated biosensing currents.

实施例7将孔蛋白BCP20及其突变体用于DNA测序Example 7 Use of Porin BCP20 and Its Mutants for DNA Sequencing

将实施例5制备获得的含有pUC57序列的测序文库(SEQ ID NO：5)和带有胆固醇的单链DNA(SEQ ID NO：7，胆固醇(cholesterol)连接在DNA的5′端)与测序缓冲液混合并加入纳米孔生物传感器中；施加外加电压0.14V或0.18V后，观察到DNA被纳米孔捕获，产生特征的阻滞电流振幅值。并且随着DNA通过纳米孔移动，电流振幅值改变。不同的DNA序列产生不同的阻滞电流振幅值。带有胆固醇的单链DNA可以与磷脂双分子层进行结合，有助于纳米孔捕获测序文库，降低测序文库上样量。图7为在外加电压0.14V作用下，文库DNA穿过孔蛋白BCP20时产生的电流变化。同样的，文库DNA穿过孔蛋白BCP20突变体(G63A、N68Q和I69N)时也产生了电流变化。The sequencing library containing the pUC57 sequence prepared in Example 5 (SEQ ID NO: 5) and single-stranded DNA with cholesterol (SEQ ID NO: 7, cholesterol is connected to the 5′ end of DNA) were mixed with sequencing buffer and added to the nanopore biosensor; after applying an external voltage of 0.14V or 0.18V, it was observed that the DNA was captured by the nanopore, generating a characteristic blocking current amplitude value. And as the DNA moves through the nanopore, the current amplitude value changes. Different DNA sequences produce different blocking current amplitude values. Single-stranded DNA with cholesterol can bind to the phospholipid bilayer, which helps the nanopore capture the sequencing library and reduce the amount of sequencing library loading. Figure 7 shows the current changes generated when the library DNA passes through the porin BCP20 under the action of an applied voltage of 0.14V. Similarly, current changes were also generated when the library DNA passed through the porin BCP20 mutants (G63A, N68Q and I69N).

带有胆固醇的单链DNA序列(SEQ ID NO：7)：Single-stranded DNA sequence with cholesterol (SEQ ID NO: 7):

ttgaccgctcgcctc。ttgaccgctcgcctc.

上述带有胆固醇的DNA单链，能够结合在膜上，然后通过与文库的bottom链互补，从而把文库拉到纳米孔附近，如图8所示，增加纳米孔测序过程中的捕获效率。The above-mentioned single-stranded DNA with cholesterol can bind to the membrane, and then complement the bottom chain of the library, thereby pulling the library to the vicinity of the nanopore, as shown in Figure 8, thereby increasing the capture efficiency during nanopore sequencing.

从以上的描述中，可以看出，本发明上述的实施例实现了如下技术效果：本发明发现了一种新的孔蛋白单体，该单体经聚合为孔蛋白BCP20，该孔蛋白BCP20及其突变体的稳定性较好，能够满足单分子纳米孔测序的需求，利用上述孔蛋白及其突变体，能够形成纳米孔传感器及进一步的纳米孔测序装置，实现对于DNA、RNA及多肽、蛋白质等样品的检测。From the above description, it can be seen that the above embodiments of the present invention achieve the following technical effects: the present invention has discovered a new porin monomer, which is polymerized into porin BCP20, and the porin BCP20 and its mutants have good stability and can meet the needs of single-molecule nanopore sequencing. By using the above porin and its mutants, a nanopore sensor and a further nanopore sequencing device can be formed to realize the detection of samples such as DNA, RNA, polypeptides, and proteins.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

A porin monomer, characterized in that the porin monomer comprises:

(a) a protein consisting of the amino acid sequence shown in SEQ ID NO: 1; or

(b) a protein mutant, wherein the amino acid sequence of the protein mutant undergoes substitution, deletion and/or addition of one or more amino acids at at least one of the following positions of the amino acid sequence shown in SEQ ID NO: 1: 68, 69, 63, 64, 65, 66, 67, 103, 107, 108, 109, 113, 116, 117, 123, 126, 153, 156, 167, 169, 201, 206, 209, 213, 216, 218, and the protein mutant has the function of forming a pore structure through polymerization; or

(c) a porin monomer having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identity to the protein described in (a) or (b) and having the function of forming a pore structure through polymerization.

The porin monomer according to claim 1, characterized in that, in (b), the type of amino acid substituted at each site is independently selected from the following:

G63 mutation to G63A, G63S, or G63T;

E64 mutation is E64G, E64A, E64S, E64T, E64N, or E64Q;

K65 mutation to K65G, K65A, K65S, K65T, K65N, or K65Q;

F66 mutation to F66G, F66A, F66S, F66T, F66N, or F66Q;

A67 mutation to A67G, A67S, A67T, A67N, or A67Q;

N68 mutation is N68G, N68A, N68S, N68T, or N68Q;

I69 mutation is I69G, I69A, I69S, I69T, I69N or I69Q;

D103 mutation is D103N, D103A, D103G, D103S, D103T, D103Q, D103R or D103K;

K107 mutation is K107N, K107A, K107G, K107S, K107T, or K107Q;

R109 is mutated to R109W, R109F, R109Y, R109N, R109Q, R109S, R109T, R109A or R109G;

R113 mutation is R113W, R113F, R113Y, R113N, R113Q, R113S, R113T, R113A or R113G;

R116 mutation to R116N, R116A, R116G, R116S, R116T, or R116Q;

E117 mutation is E117N, E117A, E117G, E117S, E117T, E117Q, E117R, or E117K;

E123 mutation is E123N, E123A, E123G, E123S, E123T, E123Q, E123R or E123K;

K126 mutations are K126N, K126Q, K126S, K126T, K126A, or K126G;

D153 mutation is D153A, D153G, D153V, D153L, D153I, D153Y, D153F or D153W;

R156 is mutated to R156A, R156G, R156N, R156Q, R156S, R156T, R156D or R156E;

R167 mutation to R167N, R167Q, R167S, R167T, R167A, or R167G;

D169 mutation is D169N, D169Q, D169S, D169T, D169A or D169G;

D201 mutated to D201N, D201Q, D201S, D201T, D201A, or D201G;

R206 is mutated to R206N, R206Q, R206S, R206T, R206A, R206G, R206D or R206E;

D209 is mutated to D209N, D209Q, D209S, D209T, D209A, D209G, D209R or D209K;

E213 mutation is E213N, E213Q, E213S, E213T, E213A or E213G;

E216 mutation is E216N, E216Q, E216S, E216T, E216A or E216G;

E218 mutation is E218N, E218Q, E218S, E218T, E218A or E218G.

A protein construct, characterized in that the protein construct is formed by covalently or non-covalently linking two or more porin monomers according to claim 1 or 2.

A porin, characterized in that the porin is composed of 7 to 11 porin monomers according to claim 1 or 2, connected covalently or non-covalently.

The porin according to claim 4, characterized in that the porin is composed of 9 porin monomers connected non-covalently.

The porin according to claim 4 or 5, characterized in that the pore diameter of the porin is 0.5 to 3 nm.

A kit, characterized in that the kit comprises the porin monomer according to claim 1 or 2, or the protein construct according to claim 3, or the porin according to any one of claims 4 to 6.

The kit according to claim 7 is characterized in that the kit further comprises a membrane layer, and the membrane layer comprises a lipid layer or an artificial polymer membrane.

The kit according to claim 7 or 8, characterized in that the kit further comprises a sequencing buffer and/or single-stranded DNA linked to cholesterol.

An isolated DNA molecule, characterized in that the DNA molecule has:

A nucleotide sequence encoding the porin monomer according to claim 1 or 2, or encoding the protein construct according to claim 3, or encoding the porin according to any one of claims 4 to 6.

The DNA molecule according to claim 10 is characterized in that the DNA molecule has a nucleotide sequence shown in SEQ ID NO: 8.

The DNA molecule according to claim 11 is characterized in that it has more than 70%, preferably more than 80%, more preferably more than 90%, and further preferably more than 95% identity with the nucleotide sequence shown in SEQ ID NO: 8 and encodes a DNA molecule with the same functional protein.

A recombinant vector, characterized in that the recombinant vector comprises the DNA molecule according to any one of claims 10 to 12.

A host cell, characterized in that the host cell is transformed with the recombinant vector according to claim 13.

A nanopore sensor, characterized in that the nanopore sensor comprises:

film layer; and

a porin protein inserted in the middle of the membrane layer to form a pore, and when a voltage is applied across the membrane layer, the pore generates an electric current;

Wherein, the porin comprises the porin according to any one of claims 4 to 6.

The nanopore sensor according to claim 15, characterized in that the membrane layer comprises a lipid layer or an artificial polymer membrane;

Preferably, the lipid layer comprises amphiphilic lipids;

Preferably, the amphiphilic lipid comprises a phospholipid bilayer;

Preferably, the lipid layer comprises a planar membrane layer or a liposome;

Preferably, the liposomes comprise multilamellar liposomes or unilamellar liposomes;

Preferably, the lipid layer comprises a phospholipid bilayer composed of diphytylphosphatidylcholine.

A nanopore sequencing device, characterized in that the nanopore sequencing device comprises the nanopore sensor according to claim 15 or 16.

The nanopore sequencing device according to claim 17, characterized in that the nanopore sequencing device comprises:

an electrolytic cell containing a sequencing buffer;

A nanopore sensor, wherein the nanopore sensor is located in the center of the electrolytic cell and divides the electrolytic cell and the sequencing buffer into a positive electrode electrolyte region and a negative electrode electrolyte region;

A first electrode and a second electrode, wherein the first electrode and the second electrode are respectively arranged in the positive electrode electrolyte region and the negative electrode electrolyte region, and the first electrode and the second electrode are connected to a signal processing chip;

Preferably, the first electrode and the second electrode comprise metal or composite electrode materials;

Preferably, the first electrode and the second electrode are different, being silver and silver chloride respectively; or the first electrode and the second electrode are the same, and are each independently selected from gold, platinum, graphene or titanium nitride.

A sequencing method, characterized in that the sequencing method uses the porin according to any one of claims 4 to 6, or the nanopore sensor according to claim 15 or 16, or the nanopore sequencing device according to claim 17 or 18 to detect and analyze the electrical signal generated when the biological molecule to be tested passes through the pore of the porin, so as to determine the sequence of the biological molecule to be tested.

The sequencing method according to claim 19 is characterized in that the biological molecule to be detected includes any one of the following modified or unmodified biological molecules: DNA, RNA or polypeptide.

The sequencing method according to claim 19, wherein the biomolecule to be detected is a target nucleic acid sequence, and the sequencing method comprises:

(a) contacting the target nucleic acid sequence with a nucleic acid binding protein, wherein the nucleic acid binding protein controls the speed at which the target nucleic acid sequence moves through the pore of the porin;

(b) when a voltage is applied across the pore, the target nucleic acid sequence moves through the pore, and an electrical signal passing through the pore is measured, wherein different types of nucleotides generate different electrical signals when passing through the pore, thereby determining the sequence information of the target nucleic acid based on the electrical signal;

Preferably, the nucleic acid binding protein is selected from the group consisting of a nuclease, a polymerase, a topoisomerase, a ligase, a helicase and a single-stranded binding protein;

Preferably, the electrical signal comprises an electric current.

Use of the porin monomer of claim 1 or 2, the porin of any one of claims 4 to 6, the kit of any one of claims 7 to 9, the DNA molecule of any one of claims 10 to 12, the recombinant vector of claim 13, the host cell of claim 14, the nanopore sensor of claim 15 or 16, or the nanopore sequencing device of claim 17 or 18 in small molecule detection, DNA sequencing, RNA sequencing or polypeptide sequencing.