KR20250048551A

KR20250048551A - New pore

Info

Publication number: KR20250048551A
Application number: KR1020257005137A
Authority: KR
Inventors: 엘리자베스 제인 월러스; 라크말 니산타 자야싱헤; 리차드 조지 햄블리; 알리스테어 제임스 스콧; 랑가 프라바트 말라비아라챠기 라벨; 리스 코너 그리피스; 앰버 엘리자베스 레켄비; 프라틱 라지 싱; 알베르토 리에라; 윌리암 에프. 데그라도; 리 슈나이더; 니콜라스 폴리찌
Original assignee: 옥스포드 나노포어 테크놀로지즈 피엘씨; 더 리전트 오브 더 유니버시티 오브 캘리포니아
Priority date: 2022-08-09
Filing date: 2023-08-09
Publication date: 2025-04-09
Also published as: JP2025528144A; WO2024033447A1; EP4569331A1; CA3262945A1; AU2023322679A1; CN119678047A

Abstract

본 개시내용의 양태는 단백질 포어 복합체 및 분석물 검출 및 특성규명에서의 이들의 용도에 관한 것이다. 본 개시내용은 부분적으로 CsgG 유사 포어 및 하나 이상의 보조 단백질에 의해 형성되는 나노포어 복합체를 기반으로 하며, 이는 나노포어 복합체에서 하나 이상의 채널 협착부를 형성한다. 일부 실시형태에서, 하나 이상의 보조 단백질은 융합 단백질이다. 또한, 본 개시내용은 보조 단백질의 설계 및 나노포어 복합체의 생성 방법 및 분자 감지 및 분석물 시퀀싱 응용분야에서의 용도에 관한 것이다.Aspects of the present disclosure relate to protein pore complexes and their use in analyte detection and characterization. The present disclosure is based in part on a nanopore complex formed by a CsgG-like pore and one or more accessory proteins, which form one or more channel constrictions in the nanopore complex. In some embodiments, the one or more accessory proteins are fusion proteins. The present disclosure also relates to methods for designing the accessory proteins and producing the nanopore complexes and their use in molecular sensing and analyte sequencing applications.

Description

New pore

나노포어 감지를 사용한 중합체 특성규명의 2개의 중요한 구성요소는 (1) 포어를 통한 중합체 이동의 제어 및 (2) 중합체가 포어를 통해 이동할 때의 구성 빌딩 블록의 판별이다. 나노포어 감지 동안, 포어의 가장 좁은 부분은 협착부를 형성하며, 이는 통과하는 분석물의 함수로서 전류 시그니처와 관련하여 나노포어의 가장 판별되는 부분이다. CsgG는 대장균(Escherichia coli)으로부터 게이팅되지 않은 비선택적 단백질 분비 채널로 식별되었으며(Goyal 등, 2014) 분석물을 검출하고 특성규명하기 위한 나노포어로 사용되었다. 이러한 맥락에서 포어의 특성을 개선하는 야생형 CsgG 포어에 대한 돌연변이도 개시되었다(WO2016/034591, WO2017/149316, WO2017/149317 및 WO2017/149318, PCT/GB2018/051191, 모두 전문이 본원에 인용방식에 의해 원용됨). Two important components of polymer characterization using nanopore sensing are (1) control of polymer movement through the pore and (2) discrimination of the constituent building blocks as the polymer moves through the pore. During nanopore sensing, the narrowest portion of the pore forms the constriction, which is the most discriminative portion of the nanopore with respect to the current signature as a function of the analyte passing through it. CsgG was identified as a non-gated, non-selective protein secretion channel from Escherichia coli (Goyal et al., 2014) and has been used as a nanopore to detect and characterize analytes. In this context, mutations in the wild-type CsgG pore that improve the properties of the pore have also been disclosed (WO2016/034591, WO2017/149316, WO2017/149317 and WO2017/149318, PCT/GB2018/051191, all of which are herein incorporated by reference in their entirety).

폴리뉴클레오티드인 분석물의 경우, 뉴클레오티드 판별은 이러한 돌연변이체 포어를 통한 통과를 통해 달성되지만, 전류 시그니처는 서열 의존적인 것으로 나타났으며, 다중 뉴클레오티드는 관찰된 전류에 기여하여, 채널 협착부의 높이 및 분석물과의 상호작용 표면의 정도가 관찰된 전류 및 폴리뉴클레오티드 서열 사이의 관계에 영향을 미친다. CsgG 포어의 돌연변이를 통해 뉴클레오티드 판별을 위한 전류 범위가 개선되었지만, 뉴클레오티드 사이의 전류 차이가 추가로 개선될 수 있는 경우 시퀀싱 시스템은 더 높은 성능을 가질 것이다.For analytes that are polynucleotides, nucleotide discrimination is achieved via passage through these mutant pores, but the current signature appears to be sequence dependent, with multiple nucleotides contributing to the observed current, such that the height of the channel constriction and the extent of the interaction surface with the analyte influence the relationship between the observed current and the polynucleotide sequence. While mutations in the CsgG pore improve the current range for nucleotide discrimination, the sequencing system would perform better if the current differential between nucleotides could be further improved.

일부 양태에서 본 개시내용은 단백질 포어 복합체 및 분석물 검출 및 특성규명에서의 이들의 용도에 관한 것이다. 본 개시내용은 부분적으로 CsgG 포어 및 하나 이상의 보조 단백질에 의해 형성되는 나노포어 복합체를 기반으로 하며, 이는 나노포어 복합체에서 하나 이상의 채널 협착부를 형성한다. 일부 실시형태에서, 하나 이상의 보조 단백질은 융합 단백질이다. 실시예에서 추가로 기재된 바와 같이, CsgG 단백질 나노포어에 특정한 바람직한 특징(예를 들어, 포어 폭의 조정, 포어 내강의 연장, 하나 이상의 추가 협착부의 형성 등)을 부여하는 보조 단백질을 컴퓨터 기반 구조 분석 도구를 사용하여 신규 설계할 수 있다는 사실이 놀랍게도 밝혀졌다. 일부 실시형태에서, 신규 설계된 보조 단백질(예를 들어, 융합 단백질)은 CsgG 나노포어의 내강에서 하나 이상의 협착부를 형성하고 분석물이 나노포어를 통해 이동할 때 중합체 단위의 판별을 개선한다.In some aspects, the present disclosure relates to protein pore complexes and their use in analyte detection and characterization. The present disclosure is based in part on a nanopore complex formed by a CsgG pore and one or more accessory proteins, which form one or more channel constrictions in the nanopore complex. In some embodiments, the one or more accessory proteins are fusion proteins. As further described in the Examples, it has surprisingly been discovered that, using computational structural analysis tools, accessory proteins can be de novo designed that impart certain desirable features to a CsgG protein nanopore (e.g., tuning the pore width, elongating the pore lumen, forming one or more additional constrictions, etc.). In some embodiments, the de novo designed accessory proteins (e.g., fusion proteins) form one or more constrictions in the lumen of the CsgG nanopore and improve discrimination of polymer units as an analyte moves through the nanopore.

본 개시내용의 일부 양태는 추가로 보조 단백질의 설계 및 나노포어 복합체의 생성 방법 및 분자 감지 및 핵산 시퀀싱 응용분야에서의 사용 방법에 관한 것이다.Certain aspects of the present disclosure further relate to methods of designing accessory proteins and forming nanopore complexes and their use in molecular sensing and nucleic acid sequencing applications.

일부 양태에서, 본 개시내용은 내강을 포함하는 CsgG 나노포어; 및 CsgF 단백질을 포함하는 제1 부분 및 나선 형성 보조 단백질을 포함하는 제2 부분을 포함하는 융합 폴리펩티드를 포함하는 단백질 나노포어 복합체를 제공하며, 융합 단백질은 나노포어에 부착된다. In some embodiments, the present disclosure provides a protein nanopore complex comprising a CsgG nanopore comprising a lumen; and a fusion polypeptide comprising a first portion comprising a CsgF protein and a second portion comprising a helix-forming accessory protein, wherein the fusion protein is attached to the nanopore.

일부 실시형태에서, 융합 단백질의 제1 부분은 CsgG 나노포어에 부착된다. 일부 실시형태에서, 융합 단백질의 제1 부분은 CsgG 나노포어의 내강 내부에 위치된다. 일부 실시형태에서, 융합 단백질의 제1 부분은 CsgG 나노포어의 내강의 외부로 연장된다. 일부 실시형태에서, 제1 부분은 CsgG 나노포어의 내강에 제1 협착부 영역을 형성한다.In some embodiments, the first portion of the fusion protein is attached to a CsgG nanopore. In some embodiments, the first portion of the fusion protein is positioned within the lumen of the CsgG nanopore. In some embodiments, the first portion of the fusion protein extends outside the lumen of the CsgG nanopore. In some embodiments, the first portion forms a first constriction region in the lumen of the CsgG nanopore.

일부 실시형태에서, 제2 부분은 제2 협착부 영역을 형성한다.In some embodiments, the second portion forms a second constriction region.

일부 실시형태에서, CsgG 나노포어는 협착부 영역을 추가로 포함한다.In some embodiments, the CsgG nanopore additionally comprises a constriction region.

일부 실시형태에서, 제2 부분은 CsgG 나노포어에 부착되지 않는다. 일부 실시형태에서, 제2 부분은 하나 이상의 나선(예를 들어, 알파 나선 등)을 포함한다.In some embodiments, the second moiety is not attached to a CsgG nanopore. In some embodiments, the second moiety comprises one or more helices (e.g., an alpha helix, etc.).

일부 실시형태에서, 제2 부분의 각각의 나선(예를 들어, 알파 나선 등)은 0개 내지 15개의 알파 나선 회전을 포함한다. 일부 실시형태에서, 제2 부분은 1개 내지 4개의 알파 나선 회전을 포함하는 제1 알파 나선 및 3개 내지 6개의 알파 나선 회전을 포함하는 제2 알파 나선을 포함한다. 일부 실시형태에서, 제2 알파 나선은 제1 알파 나선에 패킹된다. 일부 실시형태에서, 제2 부분은 1개 내지 55개의 아미노산 잔기를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 20개의 아미노산 잔기를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 30개의 아미노산 잔기를 포함한다.In some embodiments, each helix (e.g., an alpha helix, etc.) of the second portion comprises 0 to 15 alpha helical turns. In some embodiments, the second portion comprises a first alpha helix comprising 1 to 4 alpha helical turns and a second alpha helix comprising 3 to 6 alpha helical turns. In some embodiments, the second alpha helix is packed into the first alpha helix. In some embodiments, the second portion comprises 1 to 55 amino acid residues. In some embodiments, each helix comprises 1 to 20 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°. In some embodiments, each helix comprises 1 to 30 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°.

일부 실시형태에서, 제1 협착부 영역과 제2 협착부 영역 사이의 거리(예를 들어, 수직 거리)는 (예를 들어, 제1 협착부를 형성하는 나노포어의 내강으로 가장 멀리 연장되는 아미노산 잔기 및 제2 협착부를 형성하는 나노포어의 내강으로 가장 멀리 연장되는 아미노산 잔기의 알파 탄소(C_a) 사이의 거리로 측정하는 경우) 범위가 약 5 내지 약 80 이다. 일부 실시형태에서, 단백질 나노포어 복합체는 90 초과의 축 길이를 가지며, 선택적으로 축 길이는 범위가 약 95 내지 약 160 이다.In some embodiments, the distance (e.g., vertical distance) between the first constriction region and the second constriction region is in the range of about 5 (e.g., as measured as the distance between the alpha carbon (C _a ) of the amino acid residue extending furthest into the lumen of the nanopore forming the first constriction and the amino acid residue extending furthest into the lumen of the nanopore forming the second constriction). About 80 In some embodiments, the protein nanopore complex is 90 It has an excess shaft length, optionally with a shaft length range of about 95 About 160 am.

일부 실시형태에서, 융합 단백질은 링커에 의해 나노포어에 부착된다. 일부 실시형태에서, 링커는 결합, 펩티드 링커 또는 화학적 링커를 포함한다. 일부 실시형태에서, 링커는 불화 황(VI) 교환(SuFEx) 반응에 의해 형성되는 결합을 포함한다. 일부 실시형태에서, 링커는 하나 이상의 말레이미드 분자를 포함한다.In some embodiments, the fusion protein is attached to the nanopore by a linker. In some embodiments, the linker comprises a bond, a peptide linker, or a chemical linker. In some embodiments, the linker comprises a bond formed by a sulfur(VI) fluoride exchange (SuFEx) reaction. In some embodiments, the linker comprises one or more maleimide molecules.

일부 실시형태에서, 융합 단백질은 고리화된다. 일부 실시형태에서, 고리화는 하나 이상의 측쇄 대 측쇄 고리화 결합을 포함한다. 일부 실시형태에서, 측쇄 대 측쇄 고리화 결합 중 적어도 하나는 이황화 결합이다.In some embodiments, the fusion protein is cyclized. In some embodiments, the cyclization comprises one or more side chain-to-side chain cyclization bonds. In some embodiments, at least one of the side chain-to-side chain cyclization bonds is a disulfide bond.

일부 양태에서, 본 개시내용은 단백질 나노포어 복합체를 제공하며 이는 내강 및 나노포어의 내강 내에 형성된 제1 협착부 영역을 포함하는 CsgG 나노포어; 및 CsgF 단백질을 포함하는 제1 부분 및 나선 형성 보조 단백질을 포함하는 제2 부분을 포함하는 융합 단백질을 포함하며, 융합 단백질은 나노포어에 부착된다. In some embodiments, the present disclosure provides a protein nanopore complex comprising a CsgG nanopore comprising a lumen and a first constriction region formed within the lumen of the nanopore; and a fusion protein comprising a first portion comprising a CsgF protein and a second portion comprising a helix-forming accessory protein, wherein the fusion protein is attached to the nanopore.

일부 실시형태에서, 융합 단백질의 제1 부분은 CsgG 나노포어에 부착된다. 일부 실시형태에서, 융합 단백질의 제1 부분은 CsgG 나노포어의 내강 내부에 위치된다.In some embodiments, the first portion of the fusion protein is attached to a CsgG nanopore. In some embodiments, the first portion of the fusion protein is positioned within the lumen of the CsgG nanopore.

일부 실시형태에서, 융합 단백질의 제2 부분은 CsgG 나노포어의 내강 외부에 위치된다.In some embodiments, the second portion of the fusion protein is located outside the lumen of the CsgG nanopore.

일부 실시형태에서, 제1 부분은 CsgG 나노포어의 내강에 제2 협착부 영역을 형성한다. 일부 실시형태에서, 제2 부분은 CsgG 나노포어의 내강에 제3 협착부 영역을 형성한다.In some embodiments, the first portion forms a second constriction region in the lumen of the CsgG nanopore. In some embodiments, the second portion forms a third constriction region in the lumen of the CsgG nanopore.

일부 실시형태에서, 제2 부분은 CsgG 나노포어에 부착되지 않는다.In some embodiments, the second portion is not attached to the CsgG nanopore.

일부 실시형태에서, 제2 부분은 하나 이상의 나선(예를 들어, 알파 나선 등)을 포함한다. 일부 실시형태에서, 각각의 나선(예를 들어, 알파 나선)은 0개 내지 15개의 알파 나선 회전을 포함한다. 일부 실시형태에서, 제2 부분은 1개 내지 54개의 아미노산 잔기를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 36개의 아미노산 잔기를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 36개의 아미노산 잔기를 포함한다.In some embodiments, the second portion comprises one or more helices (e.g., alpha helices, etc.). In some embodiments, each helix (e.g., alpha helices) comprises 0 to 15 alpha helical turns. In some embodiments, the second portion comprises 1 to 54 amino acid residues. In some embodiments, each helix comprises 1 to 36 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°. In some embodiments, each helix comprises 1 to 36 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°.

일부 실시형태에서, 융합 단백질은 고리화된다. 일부 실시형태에서, 고리화는 하나 이상의 측쇄 대 측쇄 고리화 결합을 포함한다. 일부 실시형태에서, 고리화는 하나 이상의 측쇄 대 테일(예를 들어, C 말단) 고리화 결합을 포함한다. 일부 실시형태에서, 고리화 결합 중 적어도 하나는 이황화 결합이다.In some embodiments, the fusion protein is cyclized. In some embodiments, the cyclization comprises one or more side chain-to-side chain cyclization bonds. In some embodiments, the cyclization comprises one or more side chain-to-tail (e.g., C-terminal) cyclization bonds. In some embodiments, at least one of the cyclization bonds is a disulfide bond.

일부 양태에서, 본 개시내용은 단백질 나노포어 복합체를 제공하며 이는 내강 및 나노포어의 내강 내에 형성된 제1 협착부 영역을 포함하는 CsgG 나노포어; CsgG 나노포어에 부착되어 나노포어의 내강 내에 제2 협착부 영역을 형성하는 제1 보조 단백질; 및 CsgG 나노포어 또는 제1 보조 단백질에 부착되어 제3 협착부 영역을 형성하는 제2 보조 단백질을 포함한다.In some embodiments, the present disclosure provides a protein nanopore complex comprising a CsgG nanopore comprising a lumen and a first constriction region formed within the lumen of the nanopore; a first accessory protein attached to the CsgG nanopore to form a second constriction region within the lumen of the nanopore; and a second accessory protein attached to either the CsgG nanopore or the first accessory protein to form a third constriction region.

일부 실시형태에서, 제1 보조 단백질은 CsgG 나노포어의 내강 내부에 위치된다. 일부 실시형태에서, 제1 보조 단백질은 CsgF 단백질 또는 펩티드를 포함한다.In some embodiments, the first accessory protein is located within the lumen of the CsgG nanopore. In some embodiments, the first accessory protein comprises a CsgF protein or peptide.

일부 실시형태에서, 제2 보조 단백질은 하나 이상의 나선(예를 들어, 알파 나선 등)을 포함한다. 일부 실시형태에서, 각각의 하나 이상의 나선(예를 들어, 알파 나선)은 0개 내지 15개의 알파 나선 회전을 포함한다. 일부 실시형태에서, 제2 보조 단백질은 2개의 알파 나선을 포함한다. In some embodiments, the second accessory protein comprises one or more helices (e.g., alpha helices, etc.). In some embodiments, each of the one or more helices (e.g., alpha helices) comprises from 0 to 15 alpha helical turns. In some embodiments, the second accessory protein comprises two alpha helices.

일부 실시형태에서, 알파 나선 중 하나는 1 내지 6개의 알파 나선 회전을 포함한다. 일부 실시형태에서, 알파 나선 중 하나는 1 내지 10개의 알파 나선 회전을 포함한다. 일부 실시형태에서, 알파 나선 중 하나는 3개의 알파 나선 회전을 포함하며, 다른 알파 나선은 3개 또는 4개의 알파 나선 회전을 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 36개의 아미노산 잔기를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 36개의 아미노산 잔기를 포함한다.In some embodiments, one of the alpha helices comprises 1 to 6 alpha helical turns. In some embodiments, one of the alpha helices comprises 1 to 10 alpha helical turns. In some embodiments, one of the alpha helices comprises 3 alpha helical turns and the other alpha helices comprise 3 or 4 alpha helical turns. In some embodiments, each helix comprises 1 to 36 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°. In some embodiments, each helix comprises 1 to 36 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°.

일부 실시형태에서, 제2 보조 단백질은 제1 보조 단백질의 알파 나선에 패킹되는 적어도 하나의 알파 나선을 포함한다. 일부 실시형태에서, 제2 보조 단백질은 1개 내지 55개의 아미노산 잔기를 포함한다.In some embodiments, the second accessory protein comprises at least one alpha helix that is packed into an alpha helix of the first accessory protein. In some embodiments, the second accessory protein comprises 1 to 55 amino acid residues.

일부 실시형태에서, 제1 협착부와 제2 협착부 사이의 거리(예를 들어, 수직 거리)는 (예를 들어, 제1 협착부를 형성하는 나노포어의 내강으로 가장 멀리 연장되는 아미노산 잔기 및 제2 협착부를 형성하는 나노포어의 내강으로 가장 멀리 연장되는 아미노산 잔기의 알파 탄소(C_a) 사이의 거리로 측정하는 경우) 범위가 약 20 내지 약 80 이다. 일부 실시형태에서, 제2 협착부와 제3 협착부 사이의 거리는 범위가 약 5 내지 약 80 이다. 일부 실시형태에서, 단백질 나노포어 복합체는 90 초과의 축 길이를 가지며, 선택적으로 축 길이는 범위가 약 95 내지 약 160 이다.In some embodiments, the distance (e.g., vertical distance) between the first constriction and the second constriction is in the range of about 20 (e.g., as measured as the distance between the alpha carbon (C _a ) of the amino acid residue extending furthest into the lumen of the nanopore forming the first constriction and the amino acid residue extending furthest into the lumen of the nanopore forming the second constriction). About 80 In some embodiments, the distance between the second constriction and the third constriction is in the range of about 5 About 80 In some embodiments, the protein nanopore complex is 90 It has an excess shaft length, optionally with a shaft length range of about 95 About 160 am.

일부 실시형태에서, 제1 보조 단백질과 제2 보조 단백질은 링커에 의해 부착된다. 일부 실시형태에서, 링커는 결합, 펩티드 링커 또는 화학적 링커를 포함한다. 일부 실시형태에서, 링커는 불화 황(VI) 교환(SuFEx) 반응에 의해 형성되는 결합을 포함한다. 일부 실시형태에서, 링커는 하나 이상의 말레이미드 분자를 포함한다. 일부 실시형태에서, 링커는 하나 이상의 고리화 결합을 포함한다(예를 들어, 링커의 제1 아미노산은 예를 들어, 가교제에 의해 링커의 제2 아미노산에 공유적으로 또는 비공유적으로 부착될 수 있다).In some embodiments, the first accessory protein and the second accessory protein are attached by a linker. In some embodiments, the linker comprises a bond, a peptide linker, or a chemical linker. In some embodiments, the linker comprises a bond formed by a sulfur(VI) fluoride exchange (SuFEx) reaction. In some embodiments, the linker comprises one or more maleimide molecules. In some embodiments, the linker comprises one or more cyclization bonds (e.g., a first amino acid of the linker can be covalently or noncovalently attached to a second amino acid of the linker, e.g., by a cross-linking agent).

일부 실시형태에서, 제1 보조 단백질 및 제2 보조 단백질은 하나 이상의 측쇄 대 측쇄 고리화 결합을 포함한다. 일부 실시형태에서, 제1 보조 단백질 및 제2 보조 단백질은 하나 이상의 측쇄 대 테일(예를 들어, C 말단) 고리화 결합을 포함한다. 일부 실시형태에서, 고리화 결합 중 적어도 하나는 이황화 결합이다.In some embodiments, the first auxiliary protein and the second auxiliary protein comprise one or more side chain-to-side chain cyclization bonds. In some embodiments, the first auxiliary protein and the second auxiliary protein comprise one or more side chain-to-tail (e.g., C-terminal) cyclization bonds. In some embodiments, at least one of the cyclization bonds is a disulfide bond.

일부 양태에서, 본 개시내용은 표적 분석물을 특성규명하기 위한 시스템을 제공하며, 시스템은 막에 삽입된 본원에 기재된 단백질 나노포어 복합체를 포함한다.In some aspects, the present disclosure provides a system for characterizing a target analyte, the system comprising a protein nanopore complex as described herein inserted into a membrane.

일부 실시형태에서, 시스템은 단백질 나노포어 복합체와 접촉하는 전기 전도성 용액, 막을 가로질러 전압 전위를 제공하는 전극 및 단백질 나노포어 복합체를 통과하는 상기 전류를 측정하기 위한 측정 시스템을 추가로 포함한다.In some embodiments, the system further comprises an electrically conductive solution in contact with the protein nanopore complex, an electrode providing a voltage potential across the membrane, and a measurement system for measuring the current passing through the protein nanopore complex.

일부 양태에서, 본 개시내용은 표적 분석물의 특성규명을 위한 방법으로서, 본원에 기재된 바와 같은 시스템을 표적 분석물과 접촉하게 하는 단계; 표적 분석물이 단백질 나노포어 복합체에 의해 형성되는 내강에 진입하도록 막을 가로질러 전위를 적용하는 단계; 및 표적 분석물이 내강에 대해 이동할 때 하나 이상의 측정을 수행하여 표적 분석물을 특성규명하는 단계를 포함한다.In some embodiments, the present disclosure provides a method for characterizing a target analyte, comprising: contacting a system as described herein with the target analyte; applying a potential across a membrane to cause the target analyte to enter a lumen formed by a protein nanopore complex; and performing one or more measurements as the target analyte moves about the lumen, thereby characterizing the target analyte.

일부 실시형태에서, 표적 분석물은 표적 폴리뉴클레오티드를 포함한다.In some embodiments, the target analyte comprises a target polynucleotide.

일부 실시형태에서, 하나 이상의 측정을 수행하는 단계는 연속 채널을 통과하는 전류를 측정하는 단계를 포함하며, 여기서 전류는 표적 분석물의 존재 및/또는 하나 이상의 특성을 나타내며 이에 의해 표적 분석물을 검출하고/하거나 특성규명한다.In some embodiments, the step of performing one or more measurements comprises measuring a current passing through the continuous channel, wherein the current is indicative of the presence and/or one or more characteristics of a target analyte, thereby detecting and/or characterizing the target analyte.

일부 실시형태에서, 표적 분석물은 폴리뉴클레오티드이며, 폴리뉴클레오티드에 있는 뉴클레오티드는 내강 내의 제1 협착부 영역, 제2 협착부 영역(및 선택적으로 제3 협착부 영역)과 상호작용하며, 각각의 제1 협착부 영역, 제2 협착부 영역(및 선택적으로 제3 협착부 영역)은 상이한 뉴클레오티드를 판별할 수 있어서 내강을 통과하는 전체 전류가 각각의 제1 협착부 영역, 제2 협착부 영역 및 제3 협착부 영역과 각각의 영역에 위치되는 뉴클레오티드 사이의 상호작용에 의해 영향을 받는다.In some embodiments, the target analyte is a polynucleotide, wherein nucleotides in the polynucleotide interact with the first constriction region, the second constriction region (and optionally the third constriction region) within the lumen, and wherein each of the first constriction region, the second constriction region (and optionally the third constriction region) is capable of discriminating a different nucleotide, such that the overall current passing through the lumen is affected by interactions between the nucleotides located in each of the first constriction region, the second constriction region, and the third constriction region and the respective regions.

일부 측면에서, 본 개시내용은 단백질 나노포어 복합체를 제조하는 방법을 제공하며, 단백질 나노포어 복합체는 다음을 포함한다:In some aspects, the present disclosure provides a method of preparing a protein nanopore complex, wherein the protein nanopore complex comprises:

(a) 내강을 포함하는 CsgG 나노포어; 및(a) CsgG nanopores comprising an inner lumen; and

(b) CsgF 단백질을 포함하는 제1 부분 및 나선 형성 보조 단백질을 포함하는 제2 부분을 포함하는 융합 폴리펩티드를 포함하며, 융합 단백질은 나노포어에 부착되고 융합 폴리펩티드 중 적어도 하나의 도메인은 컴퓨터 생성 알고리즘을 사용하여 설계된다.(b) a fusion polypeptide comprising a first portion comprising a CsgF protein and a second portion comprising a helix-forming accessory protein, wherein the fusion protein is attached to a nanopore and at least one domain of the fusion polypeptide is designed using a computer-generated algorithm.

도 1a 내지 도 1c는 융합 단백질의 신규 설계를 위한 작업흐름을 나타낸다. 도 1a는 CsgG 나노포어를 사용한 설계 작업흐름을 나타낸다. 야생형 CsgF(잔기 1-35; 좌측 패널)은 주황색으로 나타나 있다. 야생형 CsgF의 잔기 17 내지 30(적색)을 패킹하고 포어상에 투영하여 직경이 10 내지 30 인 새로운 협착부(청록색)를 생성하는 기하학적으로 정합되고 설계 가능한 나선을 검색한 표적으로서 선택하였다. 2개의 나선은 루프형(황색)이었으며 생성된 백본의 서열 설계는 Rosetta를 통해 수행되었다. 도 1b는 대칭 관련 파트너와의 나선-나선 상호 작용을 나타낸다. 도 1c는 신규 설계된 융합 단백질에 의해 달성된 추가적인 협착부를 입증하는 9중체 CsgG - 융합 단백질 복합체의 평면도를 나타낸다.
도 2는 Rosetta를 사용하여 설계된 신규 융합 단백질 서열의 우선순위화에 대한 대표적인 데이터를 나타낸다. 실험적 검증을 위한 서열은 가장 낮은 에너지 점수와 가장 높은 PackStat 점수를 기준으로 선택되었다.
도 3a 내지 도 3d는 신규 설계된 융합 단백질에 대한 아미노산 서열을 기반으로 하는 PSIPRED 단백질 2차 구조 분석을 나타낸다. 잔기는 각각 가닥, 나선 및 코일로 예측되는지 여부에 따라 음영 처리된다. 도 3a는 융합 단백질 및 야생형 CsgF의 성숙 서열의 2차 구조 예측을 나타낸다. 도 3b는 신규 설계된 융합 단백질 ONT1 내지 ONT10에 대한 2차 구조 분석을 나타낸다. 도 3c는 신규 설계된 융합 단백질 ONT11 내지 ONT20에 대한 2차 구조 분석을 나타낸다. 도 3d는 신규 설계된 융합 단백질 ONT21 내지 ONT25에 대한 단백질 2차 구조 분석을 나타낸다.
도 4a 내지 4c는 신규 설계된 융합 단백질에 대한 대체 서열의 예측된 3차원 구조를 나타낸다. 도 4a는 신규 설계된 융합 단백질 ONT1 내지 ONT10에 대한 예측된 구조를 나타낸다. 도 4b는 신규 설계된 융합 단백질 ONT11 내지 ONT20에 대한 예측된 구조를 나타낸다. 도 4c는 신규 설계된 융합 단백질 ONT21 내지 ONT25에 대한 예측된 구조를 나타낸다.
도 5는 CsgG-단독 포어 및 CsgG/융합 단백질 복합체의 대표적인 SDS-PAGE 겔 분석을 나타내며, 여기서 복합체는 말레이미드 가교제 유무에 관계없이 CsgF-del(S31-F119) 대조군 또는 신규 설계된 융합 단백질을 포함한다. 융합 단백질을 포함하는 복합체는 밴드 이동을 나타내며, 이는 이러한 샘플이 포어 복합체임을 나타낸다. 겔에 로딩하기 전에 샘플을 가열하지 않았다.
도 6는 CsgG-단독 포어 및 CsgG/융합 단백질 복합체의 대표적인 SDS-PAGE 겔 분석을 나타내며, 여기서 복합체는 말레이미드 가교제 유무에 관계없이 CsgF-del(S31-F119) 대조군 또는 신규 설계된 융합 단백질을 포함한다. 겔에 로딩하기 전에 DTT 존재하에 끓이는 즉시 포어가 구성 단량체 구성요소로 분해되었다.
도 7은 단일 가닥 DNA가 CsgG 단독 포어를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 원시 전류 추적은 흑색 선으로 나타나 있으며, 이벤트 검출 신호는 적색 선으로 나타나 있다. 각 포어에 대해 상단 행은 전체 DNA 전류 추적이 나타내고 하단 행은 전류 추적의 제1 섹션이 확대된 보기로 나타나 있다.
도 8은 말레이미드 가교제 유무에 관계없이 del(S31-F119) CsgF 펩티드를 포함하는 CsgG를 통해 단일 가닥 DNA가 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다.
도 9는 단일 가닥 DNA가 말레이미드 가교제의 부재하에 신규 설계된 융합 단백질을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다.
도 10은 단일 가닥 DNA가 신규 설계된 융합 단백질 +/- 말레이미드 가교결합을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다.
도 11은 단일 가닥 DNA가 말레이미드 가교제 유무에 관계없이 신규 설계된 융합 단백질을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 융합 단백질은 시스테인 잔기와 함께 K37R 돌연변이를 포함하여 펩티드 내에 내부 이황화 결합을 형성한다. 즉, 융합 단백질을 고리화한다.
도 12는 DNA 분자가 포어를 통해 전위될 때 포어 내의 위치와 이온 전류 수준의 전반적인 변화("판별")에 대한 이들의 기여를 입증하는 대표적인 프로필을 나타낸다. CsgG 단독 포어(+/- Q153C)은 위치 0에서 하나의 주요 판별 피크를 나타낸다.
도 13는 DNA 분자가 포어를 통해 전위될 때 포어 내의 위치와 이온 전류 수준의 전반적인 변화("판별")에 대한 이들의 기여를 입증하는 대표적인 프로필을 나타낸다. 점선 상자는 신규 설계된 융합 단백질의 도입에 의해 영향을 받는 영역을 나타낸다. 말레이미드 가교제 유무에 관계없이 CsgG-CsgF-del(S31-F119) 포어는 2개의 판별 피크를 나타낸다. CsgG 단독 포어에서 볼 수 있는 바와 같은 위치 0의 주요 판별 피크 및 주요 협착부(위치 -4 내지 -6) 아래의 추가 판별 피크 4-6 뉴클레오티드. 이러한 추가 판별 영역은 위치 0의 주요 판별 피크에 비해 이온 전류에 대한 영향이 적다.
도 14는 DNA 분자가 포어를 통해 전위될 때 포어 내의 위치와 이온 전류 수준의 전반적인 변화("판별")에 대한 이들의 기여를 입증하는 대표적인 프로필을 나타낸다. 포어 내의 거리는 주요 협착부를 기준으로 뉴클레오티드 단계로 측정된다. 음수 값은 주요 협착부 아래 위치에 해당하고 양수 값은 주요 협착부(CsgG) 위 위치에 해당한다. 점선 상자는 신규 설계된 융합 단백질의 도입에 의해 영향을 받는 영역을 나타낸다. CsgG 및 K37R을 함유하는 신규 설계된 융합 단백질로 구성된 복합체(말레이미드 가교제가 있거나 없음; 고리화 있음)는 세 가지 판별 피크를 나타낸다. CsgG 단독 포어에서 볼 수 있듯이 주요 판별 피크는 위치 0에 있고 추가 피크는 위치 -6 및 -9에 있다. 위치 -9의 피크는 올바른 방향으로 접힐 때 신규 설계된 융합 단백질에 의해 생성된 예상된 협착부에 해당한다.
도 15는 말레이미도프로피온산 링커에 의해 연결된 두 단백질의 예를 나타낸다.
도 16은 티올 개질제와 같은 반응성 개질제로 기능화된 포어 단백질 및 보조제(예를 들어, 융합 단백질)의 예를 나타낸다.
도 17은 단일 가닥 DNA가 말레이미드 가교제를 갖거나(하부 2개의 추적) 갖지 않는(상부 2개의 추적) 신규 설계된 융합 단백질(서열 번호: 61)을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 원시 전류 추적은 흑색 선으로 나타나 있으며, 이벤트 검출 신호는 적색 선으로 나타나 있다. 각 포어에 대해 상단 행은 전체 DNA 전류 추적이 나타내고 하단 행은 전류 추적의 제1 섹션이 확대된 보기로 나타나 있다.
도 18는 DNA 분자가 포어를 통해 전위될 때 포어 내의 위치와 이온 전류 수준의 전반적인 변화("판별")에 대한 이들의 기여를 입증하는 대표적인 프로필을 나타낸다. 포어 내의 거리는 주요 협착부를 기준으로 뉴클레오티드 단계로 측정된다. 음수 값은 주요 협착부 아래 위치에 해당하고 양수 값은 주요 협착부(CsgG) 위 위치에 해당한다. 점선 상자는 신규 설계된 융합 단백질의 도입에 의해 영향을 받는 영역을 나타낸다. 말레이미드 가교제를 갖거나(하단 프로필) 갖지 않는(상단 프로필) CsgG 및 신규 설계된 융합 단백질(서열 번호: 61)로 구성된 복합체; 둘 다 고리화가 없는 경우) 세 가지 판별 피크를 나타낸다. CsgG 단독 포어에서 볼 수 있듯이 주요 판별 피크는 위치 0에 있고 추가 피크는 위치 -5 및 -11에 있다. 위치 -11의 피크는 올바른 방향으로 접힐 때 신규 설계된 융합 단백질에 의해 생성된 예상된 협착부에 해당한다.
도 19는 대장균 균주 K12의 야생형 CsgG 포어의 구조와 크기를 나타낸다(이러한 구조에 대한 데이터뱅크 접근 코드는 4UV3이다). 나타낸 거리는 포어 구조를 형성하는 아미노산의 백본에서 백본까지 측정된다. CsgG 포어는 크라운과 유사한 밀접하게 상호 연결된 대칭 9중체 포어이다. 전체 높이는 98 , 최대 외경은 120 이다. 이는 중심 채널을 정의하며 (A) 캡 영역, (B) 협착부 영역 및 (C) 막횡단 베타 배럴 영역의 세 부분으로 구성된다. 캡 축 길이 또는 높이는 39 이다. 내경은 43 이고 입구는 66 이다. 베타 배럴은 36개의 가닥을 갖고 있으며 축 길이는 39 , 내경은 55 이다. 포어 캡과 베타 배럴 사이의 전이는 급격하며 예측된 지질-수성 경계면 수준에서 이들 사이에 협착부가 위치된다. 협착부는 직경이 약 18.5 이고 채널 축을 따라 20 의 길이를 나타낸다.Figures 1a-1c illustrate the workflow for the de novo design of fusion proteins. Figure 1a illustrates the design workflow using the CsgG nanopore. Wild-type CsgF (residues 1-35; left panel) is shown in orange. Residues 17-30 of wild-type CsgF (red) are packed and projected onto the pore to obtain a diameter of 10 30 inland Geometrically aligned and designable helices that generate novel constrictions (cyan) were selected as targets for searching. Two helices were looped (yellow) and sequence design of the generated backbone was performed via Rosetta. Figure 1b shows helix-helix interactions with symmetry-related partners. Figure 1c shows a planar view of the nine-membered CsgG-fusion protein complex demonstrating the additional constriction achieved by the newly designed fusion protein.
Figure 2 shows representative data for prioritization of novel fusion protein sequences designed using Rosetta. Sequences for experimental validation were selected based on the lowest energy score and highest PackStat score.
Figures 3a to 3d show PSIPRED protein secondary structure analysis based on amino acid sequences for the newly designed fusion proteins. Residues are shaded according to whether they are predicted as strands, helices, and coils, respectively. Figure 3a shows the secondary structure prediction of the mature sequence of the fusion proteins and wild-type CsgF. Figure 3b shows the secondary structure analysis for the newly designed fusion proteins ONT1 to ONT10. Figure 3c shows the secondary structure analysis for the newly designed fusion proteins ONT11 to ONT20. Figure 3d shows the protein secondary structure analysis for the newly designed fusion proteins ONT21 to ONT25.
Figures 4a to 4c show predicted three-dimensional structures of alternative sequences for the newly designed fusion proteins. Figure 4a shows the predicted structure for the newly designed fusion proteins ONT1 to ONT10. Figure 4b shows the predicted structure for the newly designed fusion proteins ONT11 to ONT20. Figure 4c shows the predicted structure for the newly designed fusion proteins ONT21 to ONT25.
Figure 5 shows representative SDS-PAGE gel analyses of CsgG-only pore and CsgG/fusion protein complexes, wherein the complexes include CsgF-del(S31-F119) control or the novel designed fusion proteins, with or without maleimide cross-linker. Complexes including the fusion proteins show a band shift, indicating that these samples are pore complexes. Samples were not heated prior to loading onto the gel.
Figure 6 shows representative SDS-PAGE gel analyses of CsgG-only pore and CsgG/fusion protein complexes, wherein the complexes contain CsgF-del(S31-F119) control or the newly designed fusion proteins, with or without maleimide cross-linker. The pore was immediately disassembled into its constituent monomeric components upon boiling in the presence of DTT prior to loading onto the gel.
Figure 7 shows representative ionic current (pA) versus time (s) traces as single-stranded DNA translocates through a CsgG-only pore. The raw current trace is shown as a black line, and the event detection signal is shown as a red line. For each pore, the top row shows the entire DNA current trace, and the bottom row shows an enlarged view of the first section of the current trace.
Figure 8 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG containing the del(S31-F119) CsgF peptide, with or without a maleimide crosslinker.
Figure 9 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG containing the novel designed fusion protein in the absence of a maleimide cross-linker.
Figure 10 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG containing the novel designed fusion protein +/- maleimide crosslinks.
Figure 11 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG containing the novel designed fusion proteins, with or without a maleimide crosslinker. The fusion proteins include a K37R mutation with a cysteine residue, which forms an internal disulfide bond within the peptide, i.e., cyclization of the fusion protein.
Figure 12 shows representative profiles demonstrating their contribution to the overall change in position and ionic current level within the pore (“discrimination”) as DNA molecules translocate through the pore. The CsgG-only pore (+/- Q153C) shows one major discrimination peak at position 0.
Figure 13 shows representative profiles demonstrating their contribution to the overall change in ion current level (“discrimination”) and position within the pore as a DNA molecule translocates through the pore. Dashed boxes indicate regions affected by introduction of the novel designed fusion proteins. CsgG-CsgF-del(S31-F119) pores, with or without the maleimide crosslinker, exhibit two discriminant peaks: a major discriminant peak at position 0 as seen in the CsgG-only pore, and an additional discriminant peak 4-6 nucleotides below the major constriction (positions -4 to -6). This additional discriminant region has less of an effect on ion current than the major discriminant peak at position 0.
Figure 14 shows representative profiles demonstrating their contribution to the overall change in ion current level ("discrimination") and position within the pore as a DNA molecule translocates through the pore. Distance within the pore is measured in nucleotide steps relative to the major constriction. Negative values correspond to positions below the major constriction and positive values correspond to positions above the major constriction (CsgG). The dashed box is a novel Indicates the region affected by introduction of the designed fusion protein. Complexes consisting of the newly designed fusion protein containing CsgG and K37R (with or without maleimide crosslinker; with cyclization) exhibit three discriminant peaks. As seen in the CsgG-only pore, the main discriminant peak is at position 0, with additional peaks at positions -6 and -9. The peak at position -9 corresponds to the expected constriction created by the newly designed fusion protein when folded in the correct orientation.
Figure 15 shows an example of two proteins linked by a maleimidopropionic acid linker.
Figure 16 shows an example of a pore protein and an adjuvant (e.g., a fusion protein) functionalized with a reactive modifier, such as a thiol modifier.
Figure 17 shows representative ionic current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG comprising the novel designed fusion protein (SEQ ID NO: 61) with or without (bottom two traces) a maleimide crosslinker (top two traces). The raw current traces are shown as black lines, and the event detection signals are shown as red lines. For each pore, the top row shows the entire DNA current trace, and the bottom row shows an enlarged view of the first section of the current trace.
Figure 18 shows representative profiles demonstrating their contribution to the overall change in ion current level (“discrimination”) and positions within the pore as a DNA molecule translocates through the pore. Distance within the pore is measured in nucleotide steps relative to the major constriction. Negative values correspond to positions below the major constriction and positive values correspond to positions above the major constriction (CsgG). The dashed box indicates the region affected by introduction of the novel designed fusion protein. Three discriminant peaks are shown for complexes consisting of CsgG and the novel designed fusion protein (SEQ ID NO: 61) with (bottom profile) or without (top profile) a maleimide crosslinker; both without cyclization. As seen in the CsgG-only pore, the major discriminant peak is at position 0 with additional peaks at positions -5 and -11. The peak at position -11 corresponds to the expected constriction created by the novel designed fusion protein when folded in the correct orientation.
Figure 19 shows the structure and dimensions of the wild-type CsgG pore of E. coli strain K12 (the databank accession code for this structure is 4UV3). The distances shown are measured from backbone to backbone of the amino acids forming the pore structure. The CsgG pore is a tightly interconnected symmetrical nine-membered pore resembling a crown. The overall height is 98 , the maximum outer diameter is 120 It defines the central channel and consists of three parts: (A) the cap region, (B) the constriction region, and (C) the transmembrane beta-barrel region. The cap axis length or height is 39 It is. The inner diameter is 43 And the entrance is 66 The beta barrel has 36 strands and the axis length is 39 , inner diameter is 55 The transition between the fore cap and the beta barrel is abrupt, with a constriction located between them at the predicted lipid-aqueous interface level. The constriction has a diameter of about 18.5 and along the channel axis 20 Indicates the length of .

본 개시내용의 양태는 나노포어 기반 시스템을 사용하여 분석물을 특성규명하기 위한 조성물 및 방법에 관한 것이다. 본 개시내용은 부분적으로 CsgG 포어 및 하나 이상의 보조 단백질에 의해 형성되는 단백질 나노포어 복합체를 기반으로 하며, 이는 나노포어 복합체에서 하나 이상의 채널 협착부를 형성한다. 일부 실시형태에서, 하나 이상의 보조 단백질은 융합 단백질이다. 실시예에서 추가로 기재된 바와 같이, CsgG 나노포어에 특정한 바람직한 특징(예를 들어, 포어 폭의 조정, 포어 내강의 연장, 하나 이상의 추가 협착부의 형성 등)을 부여하는 보조 단백질을 컴퓨터 기반 구조 분석 도구를 사용하여 신규 설계할 수 있다는 사실이 놀랍게도 밝혀졌다. 일부 실시형태에서, 신규 설계된 보조 단백질(예를 들어, 융합 단백질)은 CsgG 나노포어의 내강에서 하나 이상의 추가 협착부를 형성하고 분석물이 나노포어를 통해 이동할 때 중합체 단위의 판별을 개선한다.Aspects of the present disclosure relate to compositions and methods for characterizing analytes using nanopore-based systems. The present disclosure is based in part on a protein nanopore complex formed by a CsgG pore and one or more accessory proteins, which form one or more channel constrictions in the nanopore complex. In some embodiments, the one or more accessory proteins are fusion proteins. As further described in the Examples, it has surprisingly been discovered that, using computational structural analysis tools, accessory proteins can be de novo designed that impart certain desirable features to a CsgG nanopore (e.g., tuning the pore width, elongating the pore lumen, forming one or more additional constrictions, etc.). In some embodiments, the de novo designed accessory proteins (e.g., fusion proteins) form one or more additional constrictions in the lumen of the CsgG nanopore and improve discrimination of polymer units as an analyte moves through the nanopore.

보조 단백질Auxiliary protein

본 개시내용에 의해 기재된 바와 같은 단백질 나노포어 복합체(단백질 포어 복합체로도 상호교환적으로 지칭됨)는 하나 이상의 보조 단백질을 포함할 수 있다. 본원에서 사용되는 용어 "펩티드", "폴리펩티드" 또는 "단백질"은 본원에서 상호교환적으로 사용되며 펩티드 결합에 의해 함께 연결된 2개 이상의 아미노산을 지칭한다. 일부 실시형태에서, 단백질(폴리펩티드 또는 펩티드로도 지칭됨)은 2개 내지 2000개의 아미노산을 포함한다. 일부 실시형태에서에서, 단백질은 2개 내지 10개의 아미노산, 2개 내지 25개의 아미노산, 2개 내지 50개의 아미노산, 2개 내지 100개의 아미노산, 2개 내지 500개의 아미노산, 또는 2개 내지 1000개의 아미노산(또는 그 사이의 임의의 수의 아미노산, 예를 들어 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 250, 500, 750, 1000개의 아미노산 등)을 포함한다. 일부 실시형태에서, 단백질은 2000개 초과의 아미노산을 포함한다. 일부 실시형태에서, 펩티드, 폴리펩티드 또는 단백질은 기원이 합성된 것이다(예를 들어, 자연에 존재하지 않으며, 예를 들어 임의의 살아있는 유기체에서 자연적으로 발현되지 않음). 일부 실시형태에서, 펩티드, 폴리펩티드 또는 단백질은 자연적으로 발생한다(예를 들어, 펩티드, 폴리펩티드 또는 단백질을 발현하도록 유전적으로 변형되지 않은 살아있는 유기체에서 자연적으로 발현된다). 일부 실시형태에서, 펩티드, 폴리펩티드 또는 단백질은 유기체에 의해 자연적으로 발현될 수 있다. 일부 실시형태에서, 펩티드, 폴리펩티드 또는 단백질은 유기체(예를 들어, 펩티드, 폴리펩티드 또는 단백질을 발현하도록 유전적으로 변형된 유기체)에 의해 이종적으로 발현된다. 일부 실시형태에서, 펩티드, 폴리펩티드 또는 단백질은 (예를 들어, 시험관내 전사, 펩티드 합성 등에 의해) 화학적으로 합성된다. 펩티드, 폴리펩티드 또는 단백질은 하나 이상의 자연 발생 아미노산(L-아미노산, D-아미노산 등), 하나 이상의 비자연 발생 아미노산(예를 들어, 방사성표지된 아미노산, 비표준 아미노산, 비천연 아미노산 등) 또는 하나 이상의 자연 발생 아미노산과 하나 이상의 비자연 발생 아미노산을 포함할 수 있다. The protein nanopore complexes (also referred to interchangeably as protein pore complexes) described herein may comprise one or more accessory proteins. The terms "peptide,""polypeptide," or "protein," as used herein, are used interchangeably herein and refer to two or more amino acids joined together by peptide bonds. In some embodiments, the protein (also referred to as a polypeptide or peptide) comprises from 2 to 2000 amino acids. In some embodiments, the protein comprises 2 to 10 amino acids, 2 to 25 amino acids, 2 to 50 amino acids, 2 to 100 amino acids, 2 to 500 amino acids, or 2 to 1000 amino acids (or any number of amino acids therebetween, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 250, 500, 750, 1000 amino acids, etc.). In some embodiments, the protein comprises more than 2000 amino acids. In some embodiments, the peptide, polypeptide or protein is synthetic in origin (e.g., not existing in nature, e.g., not naturally expressed in any living organism). In some embodiments, the peptide, polypeptide, or protein occurs naturally (e.g., is expressed naturally in a living organism that has not been genetically modified to express the peptide, polypeptide, or protein). In some embodiments, the peptide, polypeptide, or protein can be naturally expressed by the organism. In some embodiments, the peptide, polypeptide, or protein is heterologously expressed by the organism (e.g., an organism that has been genetically modified to express the peptide, polypeptide, or protein). In some embodiments, the peptide, polypeptide, or protein is chemically synthesized (e.g., by in vitro transcription, peptide synthesis, or the like). The peptide, polypeptide, or protein can comprise one or more naturally occurring amino acids (e.g., L-amino acids, D-amino acids, etc.), one or more non-naturally occurring amino acids (e.g., radiolabeled amino acids, non-standard amino acids, unnatural amino acids, etc.), or one or more naturally occurring amino acids and one or more non-naturally occurring amino acids.

일부 실시형태에서, 보조 단백질은 융합 단백질이다. 용어 "융합 단백질"은 펩티드 결합으로 연결된 2개 이상의 이종 폴리펩티드(예를 들어, 서로에 대해 이종인 폴리펩티드)의 전체 또는 부분을 포함하는 자연적으로 발생하는 합성, 반합성 또는 재조합 단일 단백질 분자를 지칭한다. 일부 실시형태에서, 융합 단백질은 펩티드 결합에 의해 연결된 적어도 2, 3, 4, 5, 6, 7, 8, 9 또는 10개의 이종 폴리펩티드의 전부 또는 부분을 포함한다. 본원에서 사용되는 "펩티드의 부분"은 펩티드의 2개 이상의 아미노산을 지칭한다. 일부 실시형태에서, 펩티드의 부분은 펩티드의 완전한 아미노산 서열 또는 펩티드의 전체 아미노산 서열의, 연속적이거나 갭을 포함하여, 적어도 5, 10, 20, 30, 50 또는 100개의 아미노산(예를 들어, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 또는 100개의 아미노산)을 포함한다. 융합 단백질의 부분은 임의의 적합한 방식(예를 들어, C 말단에서 N 말단으로, N 말단에서 C 말단으로, C 말단에서 C 말단으로, N 말단에서 N 말단으로 등)으로 배열될 수 있다. 일부 실시형태에서, 제1 부분의 C 단부는 제2 부분의 N 단부에 연결(예를 들어, 연결)될 수 있다. 융합 단백질의 부분은 직접 연결될 수도 있고(예를 들어, 한 부분의 아미노산은 부분의 말단 아미노산 사이의 펩티드 결합을 통해 제2 부분의 아미노산에 직접 연결될 수 있음) 또는 간접적으로 연결될 수도 있다(예를 들어, 융합 단백질의 한 부분의 아미노산은, 예를 들어, 제1 펩티드 결합에 의해, 융합 단백질의 제2 부분에 제2 펩티드 결합에 의해 결합되는 링커에 결합될 수 있음). 일부 실시형태에서, 제1 보조 단백질은 융합 단백질의 제1 부분이고, 제2 보조 단백질은 융합 단백질의 제2 부분이다. 링커를 사용한 융합 단백질 부분의 연결은 본원에서, 예를 들어, "링커"라는 제목의 부문에서 추가로 기재되어 있다.In some embodiments, the accessory protein is a fusion protein. The term "fusion protein" refers to a naturally occurring synthetic, semi-synthetic, or recombinant single protein molecule comprising all or portions of two or more heterologous polypeptides (e.g., polypeptides that are heterologous to one another) joined by peptide bonds. In some embodiments, the fusion protein comprises all or portions of at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 heterologous polypeptides joined by peptide bonds. As used herein, "portion of a peptide" refers to two or more amino acids of a peptide. In some embodiments, a portion of a peptide comprises the complete amino acid sequence of the peptide or at least 5, 10, 20, 30, 50 or 100 amino acids, contiguous or including gaps, of the entire amino acid sequence of the peptide (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 amino acids). The portions of the fusion protein can be arranged in any suitable manner (e.g., C-terminus to N-terminus, N-terminus to C-terminus, C-terminus to C-terminus, N-terminus to N-terminus, etc.). In some embodiments, the C-terminus of the first portion can be linked (e.g., connected) to the N-terminus of the second portion. The portions of the fusion protein can be directly linked (e.g., an amino acid of one portion can be directly linked to an amino acid of the second portion, e.g., via a peptide bond between the terminal amino acids of the portions) or indirectly linked (e.g., an amino acid of one portion can be linked, e.g., by a first peptide bond, to a linker that is linked to the second portion of the fusion protein by a second peptide bond). In some embodiments, the first accessory protein is the first portion of the fusion protein and the second accessory protein is the second portion of the fusion protein. Linking of fusion protein portions using linkers is further described herein, e.g., in the section entitled “Linkers.”

일부 실시형태에서, 단백질 나노포어 복합체는 중심 공동 또는 천공(나노포어의 "내강"으로도 지칭됨) 주위에 배열된 다중 서브유닛 또는 단량체(예를 들어, 다중 CsgG 단량체)를 포함한다. 단백질 나노포어의 형성은 본원에서, 예를 들어, "CsgG 포어"라는 제목의 부문에서 추가로 기재된다. 일부 실시형태에서, 하나 이상(예를 들어, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15개 또는 그 이상)의 보조 단백질은 나노포어의 내강 내에 또는 이와 함께 배열되어 연속 채널(예를 들어, 연속 내강)을 형성한다. 일부 실시형태에서, 단백질 나노포어 복합체는 9:1, 9:2, 9:3, 9:4, 9:5, 9:6, 9:7, 9:8, 9:9 (예를 들어, 1:1), 9:10, 9:11, 9:12, 9:13, 9:14, 9:15, 9:16, 9:17 또는 9:18(예를 들어, 1:2)의 포어 단량체(예를 들어, CsgG 포어 단량체) 대 보조 단백질의 비를 포함한다. 일부 실시형태에서, 하나 이상의 보조 단백질 또는 하나 이상의 융합 단백질은 나노포어와 동일한 대칭성을 가질 수 있다. 예를 들어, 나노포어가 중심 축 주위에 8개의 단량체를 포함하는 경우, 8개의 보조 단백질(또는 8개의 융합 단백질)이 존재하고, 나노포어가 중심 축 주위에 9개의 단량체를 포함하는 경우, 9개의 보조 단백질(또는 9개의 융합 단백질)이 존재하는 식이다. 일부 실시형태에서, 하나 이상의 보조 단백질(또는 하나 이상의 융합 단백질)은 나노포어보다 더 많거나 더 적은, 예를 들어 하나 더 많거나 하나 더 적은 단량체를 포함할 수 있다.In some embodiments, the protein nanopore complex comprises multiple subunits or monomers (e.g., multiple CsgG monomers) arranged around a central cavity or perforation (also referred to as the "lumen" of the nanopore). Formation of protein nanopores is further described herein, e.g., in the section entitled "CsgG Pore." In some embodiments, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) accessory proteins are arranged within or with the lumen of the nanopore to form a continuous channel (e.g., a continuous lumen). In some embodiments, the protein nanopore complex comprises a ratio of pore monomers (e.g., CsgG pore monomers) to accessory protein of 9:1, 9:2, 9:3, 9:4, 9:5, 9:6, 9:7, 9:8, 9:9 (e.g., 1:1), 9:10, 9:11, 9:12, 9:13, 9:14, 9:15, 9:16, 9:17 or 9:18 (e.g., 1:2). In some embodiments, the one or more accessory proteins or the one or more fusion proteins can have the same symmetry as the nanopore. For example, if the nanopore comprises eight monomers around a central axis, then eight accessory proteins (or eight fusion proteins) are present, if the nanopore comprises nine monomers around a central axis, then nine accessory proteins (or nine fusion proteins) are present, and so on. In some embodiments, the one or more accessory proteins (or the one or more fusion proteins) may comprise more or fewer, for example, one more or one less, monomers than the nanopore.

나노포어 또는 단백질 나노포어 복합체의 내강은 하나 이상의 협착부를 가질 수 있다. 본원에서 상호교환적으로 사용되는 "협착부", "구멍", "협착부 영역", "채널 협착부" 또는 "협착부 부위"는 포어 또는 단백질 포어 복합체의 내강 표면에 의해 정의되는 천공을 지칭하며, 이는 이온 및 표적 분자(예를 들어, 폴리뉴클레오티드 또는 개별 뉴클레오티드에 국한되지 않음)의 통과를 허용하지만 다른 비표적 분자는 포어 또는 단백질 포어 복합체 채널을 통과시키지 않도록 작용한다. 협착부는 전형적으로 포어 또는 단백질 포어 복합체 내의 가장 좁은 천공이거나 포어 또는 포어 복합체에 의해 정의된 채널 내의 가장 좁은 구멍이다. 협착부는 포어를 통한 분자의 통과를 제한하는 역할을 할 수 있다. 협착부의 크기는 전형적으로 분석물 특성규명을 위한 포어 또는 포어 복합체의 적합성을 결정하는 핵심 요소이다. 협착부가 너무 작은 경우, 특성규명될 분자가 통과할 수 없을 것이다. 그러나, 채널을 통한 이온 유동에 대한 최대 효과를 달성하기 위해, 각각의 협착부는 너무 커서는 안 된다. 예를 들어, 각각의 협착부는 표적 분석물의 용매-접근가능한 가로 직경보다 넓지 않아야 한다. 이상적으로, 각각의 협착부는 통과하는 분석물의 가로 직경에 가능한 한 직경이 가까워야 한다. The lumen of a nanopore or protein nanopore complex may have one or more constrictions. The terms "constriction," "pore," "constriction region," "channel constriction," or "constriction region," as used interchangeably herein, refer to an opening defined by the lumenal surface of a pore or protein pore complex that acts to allow passage of ions and target molecules (e.g., but not limited to, polynucleotides or individual nucleotides) but prevent other nontarget molecules from passing through the pore or protein pore complex channel. A constriction is typically the narrowest opening within a pore or protein pore complex or the narrowest hole within a channel defined by a pore or pore complex. A constriction may serve to restrict passage of molecules through the pore. The size of the constriction is typically a key factor in determining the suitability of a pore or pore complex for analyte characterization. If the constriction is too small, the molecules to be characterized will not be able to pass through. However, to achieve maximum effect on ion flow through the channel, each constriction should not be too large. For example, each constriction should not be wider than the solvent-accessible transverse diameter of the target analyte. Ideally, each constriction should be as close as possible to the transverse diameter of the analyte passing through it.

본 개시내용에서 기재된 단백질 포어 복합체의 협착부 수는 다를 수 있다. 일부 실시형태에서, 단백질 포어 복합체는 적어도 1, 2, 3, 4, 5개 또는 그 이상의 협착부를 포함한다. 일부 실시형태에서, 단백질 포어 복합체는 2개 또는 3개의 협착부를 포함한다. 일부 실시형태에서, 단백질 포어 복합체는 2개의 협착부를 포함한다. 일부 실시형태에서, 제1 협착부는 제1 보조 단백질에 의해 형성되며, 제2 협착부는 제2 보조 단백질에 의해 형성된다. 일부 실시형태에서, 제1 협착부는 CsgG 나노포어의 부분에 의해 형성되며, 제2 협착부는 보조 단백질 또는 융합 단백질에 의해 형성된다. 일부 실시형태에서, 단백질 포어 복합체는 3개의 협착부를 포함한다. 일부 실시형태에서, 제1 협착부는 CsgG 나노포어의 부분에 의해 형성되며, 제2 협착부은 제1 보조 단백질에 의해 형성되며, 제3 협착부는 제2 보조 단백질에 의해 형성된다. 일부 실시형태에서, 제1 협착부는 CsgG 나노포어의 부분에 의해 형성되며, 제2 협착부 및 제3 협착부는 융합 단백질에 의해 형성된다.The number of constrictions of the protein pore complexes described in the present disclosure can vary. In some embodiments, the protein pore complex comprises at least 1, 2, 3, 4, 5 or more constrictions. In some embodiments, the protein pore complex comprises 2 or 3 constrictions. In some embodiments, the protein pore complex comprises 2 constrictions. In some embodiments, the first constriction is formed by the first accessory protein and the second constriction is formed by the second accessory protein. In some embodiments, the first constriction is formed by a portion of a CsgG nanopore and the second constriction is formed by an accessory protein or a fusion protein. In some embodiments, the protein pore complex comprises 3 constrictions. In some embodiments, the first constriction is formed by a portion of a CsgG nanopore, the second constriction is formed by the first accessory protein and the third constriction is formed by the second accessory protein. In some embodiments, the first constriction is formed by a portion of a CsgG nanopore, and the second and third constrictions are formed by the fusion protein.

중심 공동 또는 천공의 가장 좁은 지점은 전형적으로 연속 채널에서 협착부를 형성한다. 일부 실시형태에서, 협착부의 직경은 협착부를 형성하기 위해 나노포어의 내강 내로 가장 멀리 연장되는 아미노산 잔기의 알파-탄소(C_a) 사이의 거리를 측정하여 계산된다. 일부 실시형태에서, 협착부의 직경은 협착부를 형성하기 위해 나노포어의 내강 내로 가장 멀리 연장되는 원자의 반 데르 발스 반경 사이의 거리를 측정하여 계산된다. 일부 실시형태에서, 협착부(예를 들어, CsgG 단백질의 부분에 의해 형성된 협착부, 보조 단백질에 의해 형성된 협착부, 융합 단백질에 의해 형성된 협착부 등)의 최소 직경은 (예를 들어, 반 데르 발스 반경 사이의 거리로 측정되는 바와 같이) 범위가 약 0.5 nm 내지 약 4.0 나노미터이다. 일부 실시형태에서, 협착부의 최소 직경은 범위가 약 0.5 내지 약 3.0 나노미터 또는 약 0.5 내지 약 2.0 나노미터, 바람직하게는 약 0.7 내지 약 1.8 나노미터, 약 0.8 내지 약 1.7 나노미터, 약 0.9 내지 약 1.6 나노미터 또는 약 1.0 내지 약 1.5 나노미터, 예를 들어 약 1.1, 1.2, 1.3 또는 1.4 나노미터이다. 일부 실시형태에서, 협착부의 최소 직경은 범위가 (예를 들어, C_a 내지 C_a에 의해 측정되는 바와 같이) 약 10 내지 약 30 , 예를 들어 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 또는 30 이다. 일부 실시형태에서, 협착부의 최소 직경은 범위가 (예를 들어, C_a 내지 C_a에 의해 측정되는 바와 같이) 약 10 내지 약 30 이다. 일부 실시형태에서, 협착부의 최소 직경은 범위가 (예를 들어, C_a 내지 C_a에 의해 측정되는 바와 같이) 약 15 내지 약 25 이다.The narrowest point of the central cavity or perforation typically forms a constriction in the continuous channel. In some embodiments, the diameter of the constriction is calculated by measuring the distance between the alpha-carbons (C _a ) of the amino acid residues that extend furthest into the lumen of the nanopore to form the constriction. In some embodiments, the diameter of the constriction is calculated by measuring the distance between the van der Waals radii of the atoms that extend furthest into the lumen of the nanopore to form the constriction. In some embodiments, the minimum diameter of a constriction (e.g., a constriction formed by a portion of the CsgG protein, a constriction formed by an accessory protein, a constriction formed by a fusion protein, etc.) is in the range of about 0.5 nm to about 4.0 nanometers (e.g., as measured by the distance between van der Waals radii). In some embodiments, the minimum diameter of the constriction is in the range of about 0.5 to about 3.0 nanometers, or about 0.5 to about 2.0 nanometers, preferably about 0.7 to about 1.8 nanometers, about 0.8 to about 1.7 nanometers, about 0.9 to about 1.6 nanometers, or about 1.0 to about 1.5 nanometers, for example about 1.1, 1.2, 1.3 or 1.4 nanometers. In some embodiments, the minimum diameter of the constriction is in the range of about 10 (e.g., as measured by C _a to C _a ). About 30 , for example 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 or 30 In some embodiments, the minimum diameter of the constriction is in the range of about 10 (e.g., as measured by C _a to C _a ). About 30 In some embodiments, the minimum diameter of the stenosis is in the range of about 15 (e.g., as measured by C _a to C _a ). About 25 am.

단백질 포어 복합체의 내강에 있는 하나 이상의 협착부 사이의 거리는 다양할 수 있다. 일부 실시형태에서, 제1 협착부 영역와 제2 협착부 영역 사이의 거리는 범위가 약 5 내지 약 80 이다. 일부 실시형태에서, 제1 협착부 영역과 제2 협착부 영역 사이의 거리는 길이가 약 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 또는 80 이다. 일부 실시형태에서, 제1 협착부 영역과 제2 협착부 영역 사이의 거리는 길이가 80 초과(예를 들어, 90 , 100 등)이다. The distance between one or more constrictions in the lumen of the protein pore complex can vary. In some embodiments, the distance between the first constriction region and the second constriction region is in the range of about 5 About 80 In some embodiments, the distance between the first constriction region and the second constriction region is about 5 in length. , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 or 80 In some embodiments, the distance between the first constriction region and the second constriction region is 80 in length. Excess (e.g. 90 , 100 etc.)

일부 실시형태에서, 제2 협착부 영역과 제3 협착부 영역 사이의 거리는 범위가 약 5 내지 약 80 이다. 일부 실시형태에서, 제1 협착부 영역과 제2 협착부 영역 사이의 거리는 길이가 약 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 또는 80 이다. 일부 실시형태에서, 제2 협착부 영역과 제3 협착부 영역 사이의 거리는 길이가 80 초과(예를 들어, 90 , 100 등)이다.In some embodiments, the distance between the second stenosis region and the third stenosis region is in the range of about 5 About 80 In some embodiments, the distance between the first constriction region and the second constriction region is about 5 in length. , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 or 80 In some embodiments, the distance between the second constriction region and the third constriction region is 80 in length. Excess (e.g. 90 , 100 etc.)

일부 실시형태에서, 제1 협착부 영역과 제3 협착부 영역 사이의 거리는 범위가 약 10 내지 약 160 이다. 일부 실시형태에서, 제1 협착부 영역과 제2 협착부 영역 사이의 거리는 길이가 약 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 또는 160 이다. 일부 실시형태에서, 제1 협착부 영역과 제3 협착부 영역 사이의 거리는 길이가 160 초과(예를 들어, 190 , 200 등)이다.In some embodiments, the distance between the first stenosis region and the third stenosis region is in the range of about 10 About 160 In some embodiments, the distance between the first constriction region and the second constriction region is about 10 in length. , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 or 160 In some embodiments, the distance between the first stenosis region and the third stenosis region is 160 in length. Excess (e.g. 190 , 200 etc.)

보조 단백질(또는 융합 단백질)은, 일부 실시형태에서, 원하는 최소 직경을 갖는 협착부를 제공하기 위해 자연 상태에서 변형될 수 있다. 예를 들어, 보조 단백질은 표적화된 돌연변이에 의해 하나 이상의 부피가 큰 잔기를 도입하여 위에 명시된 범위 내의 최소 직경을 갖는 수축을 협착부를 생성함으로써 변형될 수 있다. 보조 단백질의 최대 높이는, 일 실시형태에서, 약 3 nm 내지 약 20 nm, 예컨대 약 4 nm 내지 약 10 nm이다. 일 실시형태에서, 보조 단백질에 있는 채널의 길이는 약 3 nm 내지 약 20 nm, 예컨대 약 4 nm 내지 약 10 nm이다. 높이는 막에 수직인 방향의 보조 단백질의 크기이다. The accessory protein (or fusion protein) can, in some embodiments, be modified from its native state to provide a constriction having a desired minimum diameter. For example, the accessory protein can be modified by introducing one or more bulky residues by targeted mutation to create a constriction having a minimum diameter within the ranges specified above. The maximum height of the accessory protein is, in one embodiment, from about 3 nm to about 20 nm, such as from about 4 nm to about 10 nm. In one embodiment, the length of the channel in the accessory protein is from about 3 nm to about 20 nm, such as from about 4 nm to about 10 nm. The height is the size of the accessory protein in the direction perpendicular to the membrane.

일부 실시형태에서, 보조 단백질(예를 들어, 제1 보조 단백질 또는 제2 보조 단백질) 또는 융합 단백질(예를 들어, 융합 단백질의 제1 부분 또는 융합 단백질의 제2 부분)은 단백질 포어 복합체의 내강의 외부로 연장된다. 보조 단백질 또는 융합 단백질은 (예를 들어, 단백질 포어 복합체가 막 내로 삽입되는 경우) 단백질 포어 복합체 내강의 시스측 또는 트랜스측 외부로 연장될 수 있다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질이 단백질 포어 복합체의 내강 외부로 연장되는 거리는 내강 외부로 가장 멀리 연장되는 보조 단백질 또는 융합 단백질의 아미노산 잔기 및 단백질 포어(예를 들어, CsgG 포어)의 기준 아미노산, 예를 들어 야생형 CsgG 단량체의 아미노산 잔기 Phe144 또는 Tyr196의 C_a의 거리를 측정하여 계산된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 약 0 내지 약 50 만큼 내강 외부로 연장된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 약 5 내지 약 30 만큼 내강 외부로 연장된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 약 10 내지 약 25 만큼 내강 외부로 연장된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 내강의 외부로 약 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 또는 약 50 만큼 연장된다.In some embodiments, the accessory protein (e.g., the first accessory protein or the second accessory protein) or the fusion protein (e.g., the first portion of the fusion protein or the second portion of the fusion protein) extends outside the lumen of the protein pore complex. The accessory protein or fusion protein can extend outside the lumen of the protein pore complex, either cis-side or trans-side (e.g., when the protein pore complex is inserted into a membrane). In some embodiments, the distance that the accessory protein or fusion protein extends outside the lumen of the protein pore complex is calculated by measuring the distance between the C a of the amino acid residue of the accessory protein or fusion protein that extends furthest outside the lumen and a reference amino acid of the protein pore (e.g., a CsgG pore), e.g., amino acid residue Phe144 or Tyr196 of a wild-type CsgG monomer. In some embodiments, the accessory protein or fusion protein has a C _a of about 0 About 50 extends outside the lumen by about 5. In some embodiments, the accessory protein or fusion protein is about About 30 extends outside the lumen by about 10. In some embodiments, the accessory protein or fusion protein is About 25 extends about 1 degree outside the lumen. In some embodiments, the accessory protein or fusion protein extends about 1 degree outside the lumen. , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 or about 50 It is extended by that much.

단백질 포어 복합체의 제1 협착부와 제2 협착부 사이의 길이는 전형적으로 단백질 포어 복합체의 축 길이에 영향을 미친다. 일부 실시형태에서, 단백질 포어 복합체의 축 길이는 단백질 포어 복합체의 내강의 상부와 단백질 포어 복합체의 내강의 하부 사이의 거리를 지칭한다. 일부 실시형태에서, 단백질 포어 복합체는 90 초과의 축 길이를 갖는다. 일부 실시형태에서, 단백질 포어 복합체(예를 들어, 하나 이상의 보조 단백질 또는 하나 이상의 융합 단백질을 포함하는 단백질 포어 복합체)의 축 길이는 범위가 약 95 내지 약 160 , 예를 들어, 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 또는 160 이다.The length between the first constriction and the second constriction of the protein pore complex typically affects the axial length of the protein pore complex. In some embodiments, the axial length of the protein pore complex refers to the distance between the upper part of the lumen of the protein pore complex and the lower part of the lumen of the protein pore complex. In some embodiments, the protein pore complex has a length of 90 has an axis length of greater than about 95. In some embodiments, the axis length of the protein pore complex (e.g., a protein pore complex comprising one or more accessory proteins or one or more fusion proteins) is in the range of about 95 About 160 , for example, 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 or 160 am.

일부 실시형태에서, 보조 단백질 또는 융합 단백질은 보조 단백질 또는 융합 단백질에 의해 형성된 협착부에 또는 그 근처에(예를 들어, 협착부의 약 1, 2, 3, 4 또는 5 nm 내에) 위치된 하나 이상의 양전하 아미노산, 예컨대 아르기닌, 리신 또는 히스티딘 또는 방향족 아미노산, 예컨대 티로신 또는 트립토판을 포함한다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 보조 단백질 또는 융합 단백질에 의해 형성된 협착부에 또는 그 근처에(예를 들어, 협착부의 약 1, 2, 3, 4 또는 5 nm 내에) 위치된 하나 이상의 극성 아미노산, 음성 아미노산 또는 소수성 아미노산을 포함한다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질에 의해 형성된 협착부에 또는 그 근처에(예를 들어, 협착부의 약 1, 2, 3, 4 또는 5 nm 내에) 위치된 하나 이상의 아미노산은 아스파라긴, 트레오닌, 세린 또는 글루타메이트이다. 이러한 아미노산은 전형적으로 포어과 폴리뉴클레오티드 사이의 상호작용을 촉진한다.In some embodiments, the accessory protein or fusion protein comprises one or more positively charged amino acids, such as arginine, lysine, or histidine, or an aromatic amino acid, such as tyrosine or tryptophan, positioned at or near a constriction formed by the accessory protein or fusion protein (e.g., within about 1, 2, 3, 4, or 5 nm of the constriction). In some embodiments, the accessory protein or fusion protein comprises one or more polar amino acids, negative amino acids, or hydrophobic amino acids positioned at or near a constriction formed by the accessory protein or fusion protein (e.g., within about 1, 2, 3, 4, or 5 nm of the constriction). In some embodiments, the one or more amino acids positioned at or near a constriction formed by the accessory protein or fusion protein (e.g., within about 1, 2, 3, 4, or 5 nm of the constriction) are asparagine, threonine, serine, or glutamate. Such amino acids typically facilitate interactions between the pore and the polynucleotide.

단백질 포어 복합체의 하나 이상의 보조 단백질(또는 하나 이상의 융합 단백질)의 위치는 다양할 수 있다. 일부 실시형태에서, 보조 단백질(또는 융합 단백질)은 단백질 포어 복합체의 내강 내에 완전히 위치된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 단백질 포어 복합체의 내강을 넘어 연장되는 부분, 예를 들어, 단백질 포어 복합체의 내강 위로 연장되는 부분(예를 들어, 단백질 포어 복합체의 시스 측상의 캡 영역 위로 연장되는 부분) 및/또는 단백질 포어 복합체 아래로 연장되는 부분(예를 들어, 단백질 포어 복합체의 트랜스 측상의 막횡단 도메인(예를 들어, 배럴) 아래로 연장되는 부분)을 포함한다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질(또는 보조 단백질 또는 융합 단백질의 부분, 예컨대 제1 부분 또는 제2 부분)은 나노포어(예를 들어, CsgG 나노포어)에 부착된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질(또는 이의 부분)은 나노포어에 공유적으로 부착된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질(또는 이의 부분)은 나노포어에 비공유적으로 부착된다. 일부 실시형태에서, 제1 보조 단백질과 제2 보조 단백질은 서로 부착된다(예를 들어, 공유 부착, 비공유 부착 등). 일부 실시형태에서, 융합 단백질의 제1 부분과 융합 단백질의 제2 부분은 서로 부착된다(예를 들어, 공유 부착, 비공유 부착 등). 일부 실시형태에서, 보조 단백질 또는 융합 단백질(또는 보조 단백질 또는 융합 단백질의 부분, 예컨대 제1 부분 또는 제2 부분)은 나노포어(예를 들어, CsgG 나노포어)에 부착되지 않는다. 일부 실시형태에서, 제1 보조 단백질과 제2 보조 단백질은 서로 부착되지 않는다.The location of one or more of the accessory proteins (or one or more fusion proteins) of the protein pore complex can vary. In some embodiments, the accessory protein (or fusion protein) is positioned entirely within the lumen of the protein pore complex. In some embodiments, the accessory protein or fusion protein comprises a portion that extends beyond the lumen of the protein pore complex, e.g., a portion that extends above the lumen of the protein pore complex (e.g., a portion that extends above the cap region on the cis side of the protein pore complex) and/or a portion that extends below the protein pore complex (e.g., a portion that extends below a transmembrane domain (e.g., barrel) on the trans side of the protein pore complex). In some embodiments, the accessory protein or fusion protein (or a portion of the accessory protein or fusion protein, such as the first portion or the second portion) is attached to a nanopore (e.g., a CsgG nanopore). In some embodiments, the accessory protein or fusion protein (or a portion thereof) is covalently attached to the nanopore. In some embodiments, the assistant protein or fusion protein (or portion thereof) is noncovalently attached to the nanopore. In some embodiments, the first assistant protein and the second assistant protein are attached to each other (e.g., covalently attached, noncovalently attached, etc.). In some embodiments, the first portion of the fusion protein and the second portion of the fusion protein are attached to each other (e.g., covalently attached, noncovalently attached, etc.). In some embodiments, the assistant protein or fusion protein (or portion of the assistant protein or fusion protein, e.g., the first portion or the second portion) is not attached to the nanopore (e.g., a CsgG nanopore). In some embodiments, the first assistant protein and the second assistant protein are not attached to each other.

일부 실시형태에서, 보조 단백질(예를 들어, 제1 보조 단백질)은 CsgF 또는 CsgF 펩티드 또는 이의 기능적 상동체, 단편 또는 변형된 버전이 아니다. 일부 실시형태에서, 융합 단백질의 부분(예를 들어, 제1 부분 및/또는 제2 부분)은 CsgF 또는 CsgF 펩티드 또는 이의 기능적 상동체, 단편 또는 변형된 버전이 아니다. 일부 실시형태에서, 보조 단백질은 CsgG 나노포어, 또는 이의 상동체, 단편 또는 변형된 버전이 아니다. 일부 실시형태에서, 융합 단백질의 부부(예를 들어, 제1 부분 및/또는 제2 부분)은 CsgG 나노포어 또는 이의 상동체, 단편 또는 변형된 버전이 아니다.In some embodiments, the accessory protein (e.g., the first accessory protein) is not CsgF or a CsgF peptide or a functional homolog, fragment or modified version thereof. In some embodiments, neither portion (e.g., the first portion and/or the second portion) of the fusion protein is CsgF or a CsgF peptide or a functional homolog, fragment or modified version thereof. In some embodiments, the accessory protein is not a CsgG nanopore, or a homolog, fragment or modified version thereof. In some embodiments, neither portion (e.g., the first portion and/or the second portion) of the fusion protein is a CsgG nanopore or a homolog, fragment or modified version thereof.

일부 실시형태에서, 보조 단백질은 폴리뉴클레오티드 결합 단백질이 아니다. 일부 실시형태에서, 보조 단백질은 기능성 폴리뉴클레오티드 결합 단백질이 아니다. 예를 들어, 보조 단백질은 효소 활성을 갖는 폴리뉴클레오티드 결합 단백질이 아니다. 일부 실시형태에서, 보조 단백질은 핵산 처리 효소 이외의 단백질, 예를 들어 헬리카제 또는 폴리머라제가 아닌 보조 단백질 또는 이러한 효소로부터 유래된 단백질일 수 있다. 일부 실시형태에서, 보조 단백질은 효소 활성을 갖지 않는다. 일부 실시형태에서, 보조 단백질은 단백질 포어 복합체에서 형성된 연속 채널을 통해 표적 분석물이 통과할 때 구조적 변화를 겪지 않는다. In some embodiments, the accessory protein is not a polynucleotide binding protein. In some embodiments, the accessory protein is not a functional polynucleotide binding protein. For example, the accessory protein is not a polynucleotide binding protein having enzymatic activity. In some embodiments, the accessory protein can be a protein other than a nucleic acid processing enzyme, for example, an accessory protein other than a helicase or a polymerase, or a protein derived from such an enzyme. In some embodiments, the accessory protein does not have enzymatic activity. In some embodiments, the accessory protein does not undergo a structural change when a target analyte passes through the continuous channel formed in the protein pore complex.

일부 실시형태에서, 보조 단백질 또는 융합 단백질(예를 들어, 융합 단백질의 부분)은 막횡단 포어를 형성하는 구성요소 이외의 나노포어 시스템의 구성요소이거나 이러한 시스템의 변형된 구성요소이다. 이러한 구성요소의 예는 CsgF 또는 CsgF의 절단된 버전이 있다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 CsgF 단백질 또는 이의 상동체 또는 변형된 버전, 예컨대 단편을 포함한다. 일부 실시형태에서, 포어 복합체는 CsgF 단백질 또는 펩티드 및 비CsgG 포어, 이의 상동체 또는 변형된 버전, 예컨대 단편을 포함한다. In some embodiments, the accessory protein or fusion protein (e.g., portion of a fusion protein) is a component of the nanopore system other than a component forming a transmembrane pore, or a modified component of such a system. An example of such a component is CsgF or a truncated version of CsgF. In some embodiments, the accessory protein or fusion protein comprises a CsgF protein or a homolog or modified version thereof, such as a fragment. In some embodiments, the pore complex comprises a CsgF protein or peptide and a non-CsgG pore, a homolog or modified version thereof, such as a fragment.

용어 "CsgF 단백질" 또는 "CsgF 펩티드"는 바람직하게는 C 말단 단부가 절단된(즉, N 말단 단편인) CsgF 펩티드를 정의한다. CsgF 펩티드는 (예를 들어, 도 3a에 나타낸 바와 같은) 야생형 대장균 CsgF의 단편, 또는 예를 들어, WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)에 나타낸 아미노산 서열 중 임의의 하나를 포함하는 펩티드와 같은 대장균 CsgF의 야생형 상동체의 단편일 수 있다. CsgF 상동체는 야생형 대장균 CsgF에 대해 적어도 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% 또는 99%의 완전한 서열 동일성을 갖는 폴리펩티드로 지칭된다. CsgF 상동체는 CsgF 유사 단백질의 특징인 PFAM 도메인 PF10614를 함유하는 폴리펩티드로도 지칭될 수 있다. 현재 공지된 CsgF 상동체 및 CsgF 아키텍처의 목록은 http://pfam.xfam.org//family/PF10614에서 확인할 수 있다. 성숙한 CsgF(예를 들어, 도 3a에서 나타낸 바와 같음)는 3개의 주요 영역, 즉, "CsgF 협착부 펩티드"(FCP), "넥(neck)" 영역 및 "헤드(head)" 영역으로 나눌 수 있다. CsgF 펩티드의 "헤드" 영역은 본원에 기재된 바와 같이 포어의 협착부와 구별된다. CsgF 펩티드의 "헤드" 영역은 "C 말단 헤드 도메인"으로도 지칭될 수 있다. CsgF의 구조는 WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)에 상세히 논의되어 있다.The term "CsgF protein" or "CsgF peptide" preferably defines a CsgF peptide, which is truncated at the C-terminal end (i.e., is an N-terminal fragment). The CsgF peptide may be a fragment of wild-type E. coli CsgF (e.g., as shown in FIG. 3a ), or a fragment of a wild-type homologue of E. coli CsgF, such as a peptide comprising any one of the amino acid sequences set forth in, for example, WO 2019/002893 (which is incorporated herein by reference in its entirety). A CsgF homologue is referred to as a polypeptide having at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to wild-type E. coli CsgF. A CsgF homologue may also be referred to as a polypeptide containing the PFAM domain PF10614, which is characteristic of CsgF-like proteins. A list of currently known CsgF homologues and CsgF architectures can be found at http://pfam.xfam.org//family/PF10614. Mature CsgF (e.g., as shown in FIG. 3a ) can be divided into three major regions, namely, the "CsgF constriction peptide" (FCP), the "neck" region, and the "head" region. The "head" region of the CsgF peptide is distinct from the constriction of the pore as described herein. The "head" region of the CsgF peptide may also be referred to as the "C-terminal head domain". The structure of CsgF is discussed in detail in WO 2019/002893 (which is incorporated herein by reference in its entirety).

일부 실시형태에서, CsgF 펩티드는 C 말단 헤드가 결여된 절단된 CsgF 펩티드; CsgF의 C 말단 헤드 및 넥 도메인의 일부가 결여된 절단된 CsgF 펩티드(예를 들어, 절단된 CsgF 펩티드는 CsgF의 넥 도메인의 부분만을 포함할 수 있음); 또는 CsgF의 C 말단 헤드 및 넥 도메인이 결여된 절단된 CsgF 펩티드이다. CsgF 펩티드는 CsgF 넥 도메인의 부분이 결여될 수 있으며, 예를 들어, CsgF 펩티드는 예를 들어, 넥 도메인의 N 말단 단부에 있는 아미노산 잔기 36(예를 들어, 야생형 대장균 CsgF의 잔기 36-40, 36-41, 36-42, 36-43, 36-45, 36-46 내지 36-50 또는 36-60)으로부터 넥 도메인의 부분을 포함할 수 있다. 일부 실시형태에서, CsgF 펩티드는 CsgG 결합 영역과 포어의 내강에 협착부를 형성하는 영역을 포함한다. CsgG 결합 영역은 전형적으로 CsgF 단백질의 잔기 1 내지 11 및/또는 29 내지 32(예를 들어, 야생형 대장균 CsgF 또는 다른 종의 상동체)를 포함하고 하나 이상의 변형을 포함할 수 있다. 포어에서 협착부를 형성하는 영역은 전형적으로 CsgF 단백질의 잔기 9 내지 28(예를 들어, 야생형 대장균 CsgF 또는 다른 종의 상동체)를 포함하고 하나 이상의 변형을 포함할 수 있다. 일부 실시형태에서, 잔기 9 내지 17은 보존된 모티프, N₉PXFGGXXX₁₇를 포함하며회전 영역을 형성한다. 일부 실시형태에서, 잔기 9 내지 28은 알파 나선을 형성한다. 일부 실시형태에서, CsgF 펩티드의 위치 17에 있는 아미노산 잔기는 협착부 영역의 정점을 형성하며, 이는 포어에서 CsgF 협착부의 가장 좁은 부분에 해당한다. 일부 실시형태에서, CsgF 협착부 영역은 또한, 주로 CsgF 펩티드의 잔기 8, 9, 11, 12, 18, 21 및 22에서, CsgG 베타 배럴과 안정화 접촉을 한다. 일부 실시형태에서, CsgF 펩티드는 야생형 대장균 CsgF의 아미노산 잔기 1-30에 해당하는 아미노산 서열 GTMTFQFRNPNFGGNPNNGAFLLNSAQAQN (서열 번호: 60)을 포함하거나 이로 구성된다. 일부 실시형태에서, CsgF 펩티드는 제1 보조 단백질이다. 일부 실시형태에서, CsgF 펩티드는 융합 단백질의 부분(예를 들어, 제1 부분 또는 제2 부분)이다. 일부 실시형태에서, CsgF 펩티드는 야생형 대장균 CsgF의 아미노산 잔기 1-23을 포함하거나 이로 구성된다. 일부 실시형태에서, CsgF 펩티드는 야생형 대장균 CsgF의 아미노산 잔기 1-23을 포함하거나 이로 구성된다. 일부 실시형태에서, CsgF 펩티드는 야생형 대장균 CsgF의 아미노산 잔기 1-24을 포함하거나 이로 구성된다. 일부 실시형태에서, CsgF 펩티드는 야생형 대장균 CsgF의 아미노산 잔기 1-24을 포함하거나 이로 구성된다. In some embodiments, the CsgF peptide is a truncated CsgF peptide lacking the C-terminal head; a truncated CsgF peptide lacking the C-terminal head and a portion of the neck domain of CsgF (e.g., the truncated CsgF peptide can comprise only a portion of the neck domain of CsgF); or a truncated CsgF peptide lacking the C-terminal head and the neck domain of CsgF. The CsgF peptide can lack a portion of the CsgF neck domain, for example, the CsgF peptide can comprise a portion of the neck domain, for example, from amino acid residues 36 at the N-terminal end of the neck domain (e.g., residues 36-40, 36-41, 36-42, 36-43, 36-45, 36-46 to 36-50 or 36-60 of wild-type E. coli CsgF). In some embodiments, the CsgF peptide comprises a CsgG binding region and a region that forms a constriction in the lumen of the pore. The CsgG binding region typically comprises residues 1 to 11 and/or 29 to 32 of the CsgF protein (e.g., wild-type E. coli CsgF or a homolog of another species) and may comprise one or more modifications. The region that forms a constriction in the pore typically comprises residues 9 to 28 of the CsgF protein (e.g., wild-type E. coli CsgF or a homolog of another species) and may comprise one or more modifications. In some embodiments, residues 9 to 17 comprise a conserved motif, N ₉ PXFGGXXX _{17 .} forming a turn region. In some embodiments, residues 9 to 28 form an alpha helix. In some embodiments, the amino acid residue at position 17 of the CsgF peptide forms the apex of the constriction region, which corresponds to the narrowest part of the CsgF constriction in the pore. In some embodiments, the CsgF constriction region also makes stabilizing contacts with the CsgG beta barrel, primarily at residues 8, 9, 11, 12, 18, 21 and 22 of the CsgF peptide. In some embodiments, the CsgF peptide comprises or consists of the amino acid sequence GTMTFQFRNPNFGGNPNNGAFLLNSAQAQN (SEQ ID NO: 60), which corresponds to amino acid residues 1-30 of wild-type E. coli CsgF. In some embodiments, the CsgF peptide is a first accessory protein. In some embodiments, the CsgF peptide is a portion (e.g., the first portion or the second portion) of the fusion protein. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-23 of wild-type E. coli CsgF. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-23 of wild-type E. coli CsgF. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-24 of wild-type E. coli CsgF. In some embodiments, the CsgF peptide comprises or consists of amino acid residues 1-24 of wild-type E. coli CsgF.

일부 실시형태에서, CsgF 펩티드는 28개 내지 60개 아미노산, 예컨대 29개 내지 49개, 30개 내지 45개 또는 32개 내지 40개 아미노산의 길이를 갖는다. 일부 실시형태에서, CsgF 펩티드는 29개 내지 35개의 아미노산 또는 29개 내지 45개의 아미노산을 포함한다. 일부 실시형태에서, CsgF 펩티드는 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 또는 60개의 아미노산의 길이를 포함한다. 일부 실시형태에서, CsgF 펩티드는 야생형 대장균 CsgF의 잔기 1 내지 35(또는 CsgF 상동체 중의 상응하는 잔기)에 상응하는 FCP의 전부 또는 일부를 포함한다. 일부 실시형태에서, CsgF 펩티드가 FCP보다 짧은 경우, 절단은 바람직하게는 C 말단 단부에서 이루어진다. In some embodiments, the CsgF peptide has a length of 28 to 60 amino acids, such as 29 to 49, 30 to 45, or 32 to 40 amino acids. In some embodiments, the CsgF peptide comprises 29 to 35 amino acids or 29 to 45 amino acids. In some embodiments, the CsgF peptide comprises a length of 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 or 60 amino acids. In some embodiments, the CsgF peptide comprises all or a portion of an FCP corresponding to residues 1 to 35 of wild-type E. coli CsgF (or the corresponding residues in a CsgF homolog). In some embodiments, when the CsgF peptide is shorter than FCP, the cleavage preferably occurs at the C-terminal end.

CsgF 펩티드에서 하나 이상의 잔기가 변형될 수 있다. 예를 들어, CsgF 펩티드는 서열 번호: 6에 있는 다음 위치: 서열 번호: 60의 G1, M3, T4, F5, R8, N9, N11, F12, N17, A20, N24, A26 및 Q29 중 하나 이상에 상응하는 위치에서의 변형을 포함할 수 있다. 일부 실시형태에서, CsgF 펩티드는 예를 들어 서열 번호: 60에 있는 다음 위치: G1, T4, F5, R8, N9, N11, F12, N17, A20, N24, A26, Q27 및 Q29 중 하나 이상에 상응하는 위치에서 하나 이상의 시스테인, 하나 이상의 소수성 아미노산, 하나 이상의 하전된 아미노산, 하나 이상의 비천연 아미노산, 하나 이상의 극성 아미노산 또는 하나 이상의 광반응성 아미노산을 도입하도록 변형된다. 이러한 도입은 임의의 수와 조합으로 이루어질 수 있다. 도입은 바람직하게는 치환에 의한 것이다.One or more residues in the CsgF peptide can be modified. For example, the CsgF peptide can comprise a modification at a position corresponding to one or more of the following positions in SEQ ID NO: 6: G1, M3, T4, F5, R8, N9, N11, F12, N17, A20, N24, A26, and Q29 of SEQ ID NO: 60. In some embodiments, the CsgF peptide is modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more unnatural amino acids, one or more polar amino acids, or one or more photoreactive amino acids, for example at a position corresponding to one or more of the following positions: G1, T4, F5, R8, N9, N11, F12, N17, A20, N24, A26, Q27, and Q29 of SEQ ID NO: 60. Such introductions can be made in any number and combination. The introductions are preferably by substitution.

일부 실시형태에서, CsgF 펩티드는 서열 번호: 60에 있는 다음 위치: N15, N17, A20, N24 및 A28 중 하나 이상에 상응하는 위치에서 변형을 포함한다. 일부 실시형태에서, CsgF 펩티드는 다음 중 하나 이상의 치환을 포함한다: N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E; N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E; A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E; N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E; 또는 A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E.In some embodiments, the CsgF peptide comprises a modification at one or more of the following positions: N15, N17, A20, N24, and A28 in SEQ ID NO: 60. In some embodiments, the CsgF peptide comprises one or more of the following substitutions: N15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E; N17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E; A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E; N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E; Or A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E.

일부 실시형태에서, CsgF 펩티드는 바람직하게는 비교 서열과 비교하여 하나 이상의 변형을 포함하는 서열 번호: 60을 포함하여 위에서 논의된 임의의 CsgF 서열의 변이체이다. 서열 번호: 60의 아미노산 서열의 전체 길이에 걸쳐, 변이체는 바람직하게는 아미노산 동일성에 기반하여 해당 서열과 적어도 40% 상동성일 것이다. 보다 바람직하게는, 변이체는 전체 서열에 걸쳐 서열 번호: 60의 아미노산 서열에 대한 아미노산 동일성에 기반하여 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 상동성일 수 있다. 서열 번호: 60의 아미노산 서열 전체 길이에 걸쳐, 변이체는 바람직하게는 해당 서열과 적어도 40% 동일할 것이다. 보다 바람직하게는, 변이체는 전체 서열에 걸쳐 서열 번호: 60과 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 동일할 수 있다. 15개 이상, 예를 들어, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30개 또는 그 이상의 연속 아미노산의 스트레치에 걸쳐 적어도 80%, 예를 들어, 적어도 85%, 90% 또는 95%의 아미노산 동일성이 있을 수 있다("강한 상동성"). 이러한 상동성/동일성 수준은 위에서 기재된 임의의 다른 CsgF 펩티드에도 동일하게 적용된다.In some embodiments, the CsgF peptide is a variant of any of the CsgF sequences discussed above, including SEQ ID NO: 60, preferably comprising one or more modifications compared to the comparable sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 60, the variant will preferably be at least 40% homologous to that sequence, based on amino acid identity. More preferably, the variant can be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous to the amino acid sequence of SEQ ID NO: 60, over the entire sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 60, the variant will preferably be at least 40% identical to that sequence. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to SEQ ID NO: 60 over the entire sequence. There may be at least 80%, for example at least 85%, 90% or 95% amino acid identity over a stretch of 15 or more, for example 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more contiguous amino acids (“strong homology”). These levels of homology/identity apply equally to any other CsgF peptide described above.

1, 2, 3, 4, 5, 6, 7, 8, 9 또는 10개와 같은 포어 또는 포어 복합체에 있는 임의의 수의 CsgF 펩티드는 서열 번호: 60과 비교하여 하나 이상의 치환을 함유할 수 있다. 일부 실시형태에서, 포어 또는 포어 복합체에 있는 6 내지 10개 단량체 모두는 바람직하게는 서열 번호: 60과 비교하여 하나 이상의 치환을 함유한다. 포어 복합체에 있는 CsgF 펩티드는 동일하거나 상이할 수 있다. CsgF 펩티드는 바람직하게는 본 개시내용의 포어 복합체의 각 포어 단량체 접합체에서 동일하다.Any number of the CsgF peptides in a pore or pore complex, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10, may contain one or more substitutions compared to SEQ ID NO: 60. In some embodiments, all 6 to 10 monomers in a pore or pore complex preferably contain one or more substitutions compared to SEQ ID NO: 60. The CsgF peptides in a pore complex can be identical or different. The CsgF peptides are preferably identical in each pore monomer conjugate of the pore complex of the present disclosure.

본 개시내용의 양태는 하나 이상의 알파 나선을 포함하는 보조 단백질 또는 융합 단백질에 관한 것이다. 일부 실시형태에서, 이러한 단백질은 "나선 형성 단백질"로 지칭될 수 있다. 본 개시내용은 나선 형성 단백질이 특정 나노포어(예를 들어, CsgG 나노포어)의 내강에 위치하여 나노포어의 내강에 하나 이상의 협착부를 형성할 수 있다는 인식과 이러한 하나 이상의 협착부의 존재가 생성되는 단백질 포어 복합체의 신호 대 잡음 비율(예를 들어, 폴리뉴클레오티드 염기의 판별)을 개선한다는 인식에 부분적으로 기초한다. 용어 "나선" 또는 "나선형"은 일반적으로 나선을 형성하고 반복 패턴으로 연속되지 않은 아미노산 잔기의 백본 사이에 수소 결합이 형성되어 발생하는 단백질의 코일 구조 배열을 지칭한다. 일부 실시형태에서, 나선은 알파 나선(3.6₁₃ 나선이라고도 지칭됨)으로, 나선 회전 당 약 3.6개의 아미노산 잔기를 포함하며, 수소 결합에 의해 형성된 고리에 13개의 원자가 포함된다. 일부 실시형태에서, 나선은 3₁₀ 나선으로, 회전 당 약 3개의 잔기를 포함하며, 수소 결합을 만들어 형성된 고리에 10개의 원자를 갖는다.Aspects of the present disclosure relate to accessory proteins or fusion proteins comprising one or more alpha helices. In some embodiments, such proteins may be referred to as "helix-forming proteins." The present disclosure is based in part on the recognition that helix-forming proteins can be positioned in the lumen of a particular nanopore (e.g., a CsgG nanopore) to form one or more constrictions in the lumen of the nanopore, and that the presence of such one or more constrictions improves the signal-to-noise ratio (e.g., discrimination of polynucleotide bases) of the resulting protein-pore complex. The terms "helix" or "helical" generally refer to a coiled-coil structural arrangement of a protein that forms a helix and is formed by hydrogen bonding between backbones of non-contiguous amino acid residues in a repeating pattern. In some embodiments, the helix is an alpha helix (also referred to as a 3.6 ₁₃ helix), comprising about 3.6 amino acid residues per helical turn, with 13 atoms in the loop formed by hydrogen bonding. In some embodiments, the helix is a 3 ₁₀ helix, containing about 3 residues per turn and having 10 atoms in a loop formed by hydrogen bonding.

보조 단백질 또는 융합 단백질의 나선 수(예를 들어, 알파 나선, 3₁₀ 나선, π 나선 등)는 다양할 수 있다. 일부 실시형태에서, 보조 단백질(예를 들어, 제1 보조 단백질, 제2 보조 단백질 등)의 나선 수는 범위가 약 0개 내지 약 15개, 예를 들어, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 또는 15개이다. 일부 실시형태에서, 보조 단백질(예를 들어, 제1 보조 단백질, 제2 보조 단백질 등)의 나선 수는 15개 초과(예를 들어, 20개, 25개 등)이다. 일부 실시형태에서, 융합 단백질(예를 들어, 융합 단백질의 제1 부분, 융합 단백질의 제2 부분 등)은 0개 내지 약 15개의 나선(예를 들어, 알파 나선, 3₁₀ 나선, π 나선 등), 예를 들어 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 또는 15개의 나선을 포함한다. The helix number of the accessory protein or fusion protein can vary (e.g., alpha helix, 3 ₁₀ helix, π helix, etc.). In some embodiments, the helix number of the accessory protein (e.g., the first accessory protein, the second accessory protein, etc.) is in the range of about 0 to about 15, for example, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. In some embodiments, the helix number of the accessory protein (e.g., the first accessory protein, the second accessory protein, etc.) is greater than 15 (e.g., 20, 25, etc.). In some embodiments, the fusion protein (e.g., the first portion of the fusion protein, the second portion of the fusion protein, etc.) comprises from 0 to about 15 helices (e.g., alpha helices, _3-10 helices, π helices, etc.), for example, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 helices.

나선(예를 들어, 알파 나선, 3₁₀ 나선, π 나선 등)의 회전 수는 다양할 수 있다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질의 각각의 나선(예를 들어, 알파 나선, 3₁₀ 나선, π 나선 등)은 약 0 내지 약 15개의 나선 회전, 예를 들어 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 또는 15개의 나선 회전을 포함한다. 나선(예를 들어, 알파 나선, 310 나선, π 나선 등)은 1개 이상의 반나선(예를 들어, 반 회전)을 포함할 수 있으며, 예를 들어, 0.5, 1.5, 2.5, 3.5 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5개 등의 나선 회전을 포함할 수 있다.The number of turns of the helix (e.g., alpha helix, 3 ₁₀ helix, π helix, etc.) can vary. In some embodiments, each helix (e.g., alpha helix, 3 ₁₀ helix, π helix, etc.) of the accessory protein or fusion protein comprises from about 0 to about 15 helical turns, for example, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 helical turns. A helix (e.g., an alpha helix, a 310 helix, a π helix, etc.) can contain one or more half-helices (e.g., half-turns), for example, 0.5, 1.5, 2.5, 3.5 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, etc. helical turns.

나선(예를 들어, 알파 나선, 3₁₀ 나선, π 나선 등)을 형성하는 아미노산의 수는 다양할 수 있다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질의 각각의 나선(예를 들어, 알파 나선, 3₁₀ 나선, π 나선 등)은 2개 내지 55개의 아미노산 잔기, 예를 들어, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54 또는 55개의 아미노산 잔기를 포함한다.The number of amino acids that form a helix (e.g., alpha helix, _3–10 helix, π helix, etc.) can vary. In some embodiments, each helix (e.g., an alpha helix, a _3-10 helix, a π helix, etc.) of the accessory protein or fusion protein comprises from 2 to 55 amino acid residues, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, Contains 53, 54 or 55 amino acid residues.

보조 단백질 또는 융합 단백질의 나선의 각도는 다양할 수 있다. 일부 실시형태에서, 나선은 약 -45° 내지 -90°(예를 들어, -45°, -46°, -47°, -48°, -49°, -50°, -51°, -52°, -53°, -54°, -55°, -56°, -57°, -58°, -59°, -60°, -61°, -62°, -63°, -64°, -65°, -66°, -67°, -68°, -69°, -70°, -71°, -72°, -73°, -74°, -75°, -76°, -77°, -78°, -79°, -80°, -81°, -82°, -83°, -84°, -85°, -86°, -87°, -88°, -89° 또는 -90°) 범위의 파이(Phi) 각도를 포함한다. 일부 실시형태에서, 나선은 약 0° 내지 -70°(예를 들어, 0°, -1°, -2°, -3°, -4°, -5°, -6°, -7°, -8°, -9°, -10°, -11°, -12°, -13°, -14°, -15°, -16°, -17°, -18°, -19°, -20°, -21°, -22°, -23°, -24°, -25°, -26°, -27°, -28°, -29°, -30°, -31°, -32°, -33°, -34°, -35°, -36°, -37°, -38°, -39°, -40°, -41°, -42°, -43°, -44°, -45°, -46°, -47°, -48°, -49°, -50°, -51°, -52°, -53°, -54°, -55°, -56°, -57°, -58°, -59°, -60°, -61°, -62°, -63°, -64°, -65°, -66°, -67°, -68°, -69° 또는 -70°) 범위의 프사이(Psi) 각도를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 20개의 아미노산 잔기를 포함한다. 일부 실시형태에서, 각각의 나선은 범위가 약 -45° 내지 -90°의 Phi 각도 및 범위가 약 0° 내지 -70°의 Psi 각도를 갖는 1개 내지 30개의 아미노산 잔기를 포함한다.The angle of the helix of an accessory protein or fusion protein can vary. In some embodiments, the spiral is about -45° to -90° (e.g., -45°, -46°, -47°, -48°, -49°, -50°, -51°, -52°, -53°, -54°, -55°, -56°, -57°, -58°, -59°, -60°, -61°, -62°, -63°, -64°, -65°, -66°, -67°, -68°, -69°, -70°, -71°, -72°, -73°, -74°, -75°, -76°, -77°, -78°, -79°, -80°, -81°, -82°, -83°, -84°, -85°, -86°, -87°, -88°, Includes Phi angles in the range of -89° or -90°. In some embodiments, the spiral has an angle of about 0° to -70° (e.g., 0°, -1°, -2°, -3°, -4°, -5°, -6°, -7°, -8°, -9°, -10°, -11°, -12°, -13°, -14°, -15°, -16°, -17°, -18°, -19°, -20°, -21°, -22°, -23°, -24°, -25°, -26°, -27°, -28°, -29°, -30°, -31°, -32°, -33°, -34°, -35°, -36°, -37°, -38°, -39°, -40°, -41°, -42°, -43°, -44°, -45°, -46°, -47°, -48°, -49°, -50°, -51°, -52°, -53°, -54°, -55°, -56°, -57°, -58°, -59°, -60°, -61°, -62°, -63°, -64°, -65°, -66°, -67°, -68°, -69° or -70°. In some embodiments, each helix comprises 1 to 20 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°. In some embodiments, each helix comprises 1 to 30 amino acid residues having a Phi angle in the range of about -45° to -90° and a Psi angle in the range of about 0° to -70°.

일부 실시형태에서, 보조 단백질 또는 융합 단백질의 하나 이상의 나선의 서로의 패킹을 촉진하는 구조적 특징을 포함한다. 나선의 "패킹"은 전형적으로 나선 사이의 공유 또는 비공유 상호작용으로 인한 둘 이상의 나선의 서로에 대한 긴밀한 결합, 예를 들어, 염 브릿지, 수소 결합, 이황화 결합 및 긴밀한 소수성 측쇄 대 측쇄 접촉, 측쇄 대 주쇄 접촉, 주쇄 대 주쇄 접촉 등을 지칭하며, 이는 문헌[Walther 및 Argos, J Mol Biol. 1996년 1월 26일;255(3):536-53. doi: 10.1006/jmbi.1996.0044]에서 설명되는 바와 같다. 나선 패킹을 예측하는 방법은 예를 들어 문헌[Eilers 등 Proc Natl Acad Sci U S A. 2000년 5월 23일; 97(11): 5796-5801]에서 설명되는 바와 같다.In some embodiments, the accessory protein or fusion protein comprises structural features that facilitate packing of one or more helices with respect to one another. "Packing" of a helix typically refers to close association of two or more helices with respect to one another due to covalent or non-covalent interactions between the helices, such as salt bridges, hydrogen bonds, disulfide bonds, and close hydrophobic side-chain-to-side-chain contacts, side-chain-to-backbone contacts, backbone-to-backbone contacts, etc., as described in the literature [Walther and Argos, J Mol Biol . 26 Jan 1996;255(3):536-53. doi: 10.1006/jmbi.1996.0044]. Methods for predicting helix packing are described, for example, in the literature [Eilers et al. Proc Natl Acad Sci USA . 23 May 2000;97(11):5796-5801].

본 개시내용의 양태는 고리화된 융합 단백질이 단백질 포어 복합체에서 표적 분석물의 판별을 개선한다는 인식에 관한 것이다. "고리화된" 단백질은 전형적으로 결합의 하나 이상의 원형 배열의 형성을 초래하는 하나 이상의 분자내 상호작용을 포함하는 단백질(예를 들어, 융합 단백질)을 지칭한다. 고리화의 예는 측쇄 대 측쇄 고리화(예를 들어, 분자내 이황화 결합 형성), 헤드-투-테일 고리화(예를 들어, 단백질의 N 말단 및 C 말단 아미노산 사이의 아미드 결합의 형성), 테일-투-측쇄 고리화 및 헤드-투-측쇄 고리화를 포함하며, 이는 예를 들어, 문헌[Hayes 등 Org Biomol Chem. 2021년 5월 12일; 19(18): 3983-4001]에서 설명되는 바와 같다. 일부 실시형태에서, 융합 단백질은 하나 이상의 측쇄 대 측쇄 고리화 결합을 포함한다. 일부 실시형태에서, 측쇄 대 측쇄 고리화 결합 중 적어도 하나는 이황화 결합이다. 일부 실시형태에서, 하나 이상의 고리화 결합은 융합 단백질의 제1 부분과 융합 단백질의 제2 부분 사이의 고리화(예를 들어, CsgF 펩티드와 나선 형성 단백질 사이의 고리화)를 초래한다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 하나 이상의 고리화 결합을 포함하는 루프 영역(예를 들어, 루프 영역을 형성하는 링커)을 포함한다. 일부 실시형태에서, 고리화 결합은 화학적 가교제에 의해 형성되고/되거나 이황화 결합을 포함한다.Aspects of the present disclosure relate to the recognition that cyclized fusion proteins improve discrimination of a target analyte in a protein pore complex. A "cyclized" protein typically refers to a protein (e.g., a fusion protein) that comprises one or more intramolecular interactions that result in the formation of one or more circular arrangements of bonds. Examples of cyclization include side chain-to-side chain cyclization (e.g., intramolecular disulfide bond formation), head-to-tail cyclization (e.g., formation of an amide bond between the N-terminal and C-terminal amino acids of the protein), tail-to-side chain cyclization, and head-to-side chain cyclization, as described, for example, in Hayes et al. Org Biomol Chem . May 12, 2021; 19(18): 3983-4001. In some embodiments, the fusion protein comprises one or more side chain-to-side chain cyclization bonds. In some embodiments, at least one of the side chain-to-side chain cyclization bonds is a disulfide bond. In some embodiments, the one or more cyclization bonds result in cyclization between the first portion of the fusion protein and the second portion of the fusion protein (e.g., cyclization between the CsgF peptide and the helix-forming protein). In some embodiments, the accessory protein or fusion protein comprises a loop region (e.g., a linker forming the loop region) comprising one or more cyclization bonds. In some embodiments, the cyclization bond is formed by a chemical cross-linker and/or comprises a disulfide bond.

CsgG 나노포어CsgG nanopore

본 개시내용의 양태는 단백질 포어 복합체에 관한 것이다. 일부 실시형태에서, 본 개시내용에 의해 기재되는 단백질 포어 복합체는 나노포어(예를 들어, CsgG 나노포어)를 포함한다. 나노포어는 인가된 전위에 의해 구동되는 수화 이온이 막을 가로질러 또는 막 내에서 흐를 수 있도록 하는 막을 통과하는 홀 또는 채널이다. Aspects of the present disclosure relate to protein pore complexes. In some embodiments, the protein pore complexes described by the present disclosure comprise a nanopore (e.g., a CsgG nanopore). A nanopore is a hole or channel through a membrane that allows hydrated ions to flow across or within the membrane, driven by an applied potential.

나노포어는, 일부 실시형태에서, 막횡단 단백질 포어이다. 막횡단 단백질 포어는 전형적으로 전체 막에 걸쳐 있으며 한 측면 또는 양 측면에서 막 너머로 연장되는 구조를 가질 수 있다. 막횡단 단백질 포어는 수화 이온이 막의 한 측면에서 막의 다른 측면으로 흐르도록 하는 단일 또는 다량체 단백질이다. 막횡단 단백질 포어는 분석물, 예를 들어, DNA 또는 RNA와 같은 폴리뉴클레오티드가 포어 내로 및/또는 포어를 통해 이동할 수 있거나 이동될 수 있게 하는 채널을 포함한다.The nanopore, in some embodiments, is a transmembrane protein pore. A transmembrane protein pore typically spans the entire membrane and can have a structure that extends beyond the membrane on one or both sides. A transmembrane protein pore is a single or multimeric protein that allows hydrated ions to flow from one side of the membrane to the other side of the membrane. A transmembrane protein pore includes a channel that allows an analyte, such as a polynucleotide such as DNA or RNA, to move into and/or through the pore.

막횡단 단백질 포어는 전형적으로 이온이 흐를 수 있는 배럴 또는 채널을 포함한다. 포어의 서브유닛은 전형적으로 중심 축을 둘러싸고 막횡단 β 배럴 또는 채널 또는 막횡단 α 나선 번들 또는 채널에 가닥을 제공한다. A transmembrane protein pore typically contains a barrel or channel through which ions can flow. The subunits of the pore typically surround a central axis and provide strands to a transmembrane β barrel or channel or a transmembrane α helix bundle or channel.

막횡단 단백질 포어의 배럴 또는 채널은 전형적 폴리뉴클레오티드와의 상호 작용을 촉진하는 아미노산을 포함한다. 이러한 아미노산은 바람직하게는 배럴 또는 채널의 협착부 근처에(예컨대, 1, 2, 3, 4 또는 5 nm 이내) 위치된다. 막횡단 단백질 포어는 전형적으로 하나 이상의 극성 또는 소수성 잔기를 포함한다. 이러한 아미노산은 전형적으로 포어와 뉴클레오티드, 폴리뉴클레오티드 또는 핵산 사이의 상호작용을 촉진한다.The barrel or channel of the transmembrane protein pore comprises amino acids that facilitate interaction with a typical polynucleotide. Such amino acids are preferably located near the narrow portion of the barrel or channel (e.g., within 1, 2, 3, 4 or 5 nm). The transmembrane protein pore typically comprises one or more polar or hydrophobic residues. Such amino acids typically facilitate interaction between the pore and a nucleotide, polynucleotide or nucleic acid.

일부 실시형태에서, 나노포어는 CsgG 포어, 예를 들어 대장균 균주 K-12 하위균주 MC4100의 CsgG 또는 이의 상동체 또는 돌연변이체이다. 돌연변이체 CsgG 포어는 하나 이상의 돌연변이 단량체를 포함할 수 있다. CsgG 포어는 동일한 단량체를 포함하는 동종중합체이거나 2개 이상의 상이한 단량체를 포함하는 이종중합체일 수 있다. CsgG로부터 유래된 적합한 포어는 WO 2016/034591, WO2017/149316, WO2017/149317, WO2017/149318, 국제 특허 출원 번호 제PCT/GB2018/051191호 및 제PCT/GB2018/051858호 및 중국 특허 출원 번호 제CN113773373호, 제CN113896776호, 제CN113912683호 및 제CN113754743호에서 개시되어 있으며, 이들은 각각 전문이 본원에 인용방식에 의해 원용된다. CsgG 포어의 추가 예는 Uniprot 참조 번호 K4KIX7, A0A086D1N6, A0A1I1MNE8, A0A143HJG2, AoA090RS48 및 A0A090SZM0을 포함하나 이에 제한되지 않는다. In some embodiments, the nanopore is a CsgG pore, e.g., CsgG of E. coli strain K-12 substrain MC4100 or a homolog or mutant thereof. The mutant CsgG pore can comprise one or more mutant monomers. The CsgG pore can be a homopolymer comprising identical monomers or a heteropolymer comprising two or more different monomers. Suitable pores derived from CsgG are disclosed in WO 2016/034591, WO2017/149316, WO2017/149317, WO2017/149318, International Patent Application Nos. PCT/GB2018/051191 and PCT/GB2018/051858 and Chinese Patent Application Nos. CN113773373, CN113896776, CN113912683 and CN113754743, which are each incorporated herein by reference in their entireties. Additional examples of CsgG pores include, but are not limited to, Uniprot reference numbers K4KIX7, A0A086D1N6, A0A1I1MNE8, A0A143HJG2, AoA090RS48, and A0A090SZM0.

CsgG 포어는 전형적으로 하나 이상의 CsgG 단량체를 포함한다. CsgG 포어 단량체는 CsgG 포어를 형성할 수 있는 단량체이다. 이러한 단량체는 당해 기술분야에서 특히 WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)로부터 공지되어 있다. CsgG 포어는 바람직하게는 (a) 캡 영역, (b) 협착부 영역 및 (c) 막횡단 베타 배럴 영역 중 하나 이상, 예컨대, (a), (b), (c), (a) 및 (b), (a) 및 (c), (b) 및 (c) 또는 (a), (b) 및 (c)를 포함한다. CsgG 포어 단량체는 바람직하게는 (a) 캡 형성 영역, (b) 협착부 형성 영역 및 (c) 막횡단 베타 배럴 형성 영역 중 하나 이상, 예컨대, (a), (b), (c), (a) 및 (b), (a) 및 (c), (b) 및 (c) 또는 (a), (b) 및 (c)를 포함한다. 단량체에 의해 형성되는 CsgG 포어는 임의의 구조를 가질 수 있지만 바람직하게는 야생형 대장균 CsgG 포어의 구조를 갖거나 포함한다(예를 들어, PDB 수탁 번호 제4UV3호에 의해 설명됨). CsgG의 단백질 구조는 막의 한 측면으로부터 다른 측면으로 분자 및 이온의 전위를 허용하는 채널 또는 홀을 정의한다. A CsgG pore typically comprises one or more CsgG monomers. A CsgG pore monomer is a monomer capable of forming a CsgG pore. Such monomers are known in the art, particularly from WO 2019/002893 (which is incorporated herein by reference in its entirety). The CsgG pore preferably comprises one or more of (a) a cap region, (b) a constriction region and (c) a transmembrane beta-barrel region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c) or (a), (b) and (c). The CsgG pore monomer preferably comprises one or more of (a) a cap-forming region, (b) a constriction-forming region and (c) a transmembrane beta-barrel-forming region, e.g., (a), (b), (c), (a) and (b), (a) and (c), (b) and (c) or (a), (b) and (c). The CsgG pore formed by the monomer can have any structure, but preferably has or comprises the structure of a wild-type E. coli CsgG pore (e.g., as described by PDB Accession No. 4UV3). The protein structure of CsgG defines a channel or hole that allows translocation of molecules and ions from one side of the membrane to the other.

CsgG 포어는 임의의 크기일 수 있으나 바람직하게는 야생형 대장균 CsgG 포어의 크기를 갖는다(예를 들어, PDB 수탁 번호 제4UV3호에 의해 기재됨). 이러한 크기는 도 19에서 나타나 있다. 일부 실시형태에서, CsgG 포어는 가장 넓은 지점에서 약 100 내지 약 150 , 예컨대 가장 넓은 지점에서 약 110 내지 약 140 또는 약 115 내지 약 125 의 외부 직경을 갖는다. 일부 실시형태에서, CsgG 포어는 가장 넓은 지점에서 약 120 의 외부 직경을 갖는다. 일부 실시형태에서, CsgG 포어는 약 80 내지 약 120 , 예컨대 약 90 내지 약 110 또는 약 95 내지 약 105 의 총 길이를 갖는다. 일부 실시형태에서, CsgG 포어는 약 98 의 총 길이를 갖는다. "전체 길이" 및 "길이"에 대한 언급은 측면에서 볼 때 포어 또는 포어 영역의 길이에 관한 것이다(예를 들어, 막 내에 삽입된 포어의 시스-투-트랜스 단면 참조). 이는 도 19의 측면도일 수 있다. 일부 실시형태에서, 외부 직경은 가장 멀리 떨어져 있는 CsgG 포어 외부의 아미노산 잔기의 C_a 내지 C_a 거리를 계산하여 측정된다. 일부 실시형태에서, 외부 직경은 가장 멀리 떨어져 있는 CsgG 포어 외부의 아미노산 잔기의 반 데르 발스 반경의 거리를 계산하여 측정된다.The CsgG pore can be of any size, but preferably has the size of a wild-type E. coli CsgG pore (e.g., as described by PDB Accession No. 4UV3). This size is shown in FIG. 19. In some embodiments, the CsgG pore is about 100 to about 150 at its widest point. , for example, about 110 to about 140 at its widest point. or about 115 to about 125 has an outer diameter of about 120 at its widest point. In some embodiments, the CsgG pore is has an outer diameter of about 80 to about 120. In some embodiments, the CsgG pore has an outer diameter of about 80 to about 120 , for example, about 90 to about 110 or about 95 to about 105 has a total length of about 98. In some embodiments, the CsgG pore has a total length of about 98 has a total length of . References to "total length" and "length" relate to the length of the pore or pore region when viewed from the side (e.g., see a cis-to-trans cross-section of a pore inserted into a membrane). This may be the side view of FIG. 19 . In some embodiments, the outer diameter is measured by calculating the C _a to C _a distance of the most distant amino acid residues outside the CsgG pore. In some embodiments, the outer diameter is measured by calculating the distance of the van der Waals radii of the most distant amino acid residues outside the CsgG pore.

일부 실시형태에서, 캡 영역은 약 20 내지 약 60 , 예컨대 약 30 내지 약 50 또는 약 35 내지 약 45 의 길이를 갖는다. 일부 실시형태에서, 캡 영역은 약 39 의 길이를 갖는다. 일부 실시형태에서, 캡 영역에 의해 정의되는 채널은 직경이 약 30 내지 약 70 , 예컨대 직경이 약 40 내지 약 60 또는 약 45 내지 약 55 인 개구를 갖는다. 일부 실시형태에서, 캡 영역에 의해 정의되는 채널은 직경이 약 66 인 개구를 갖는다. 일부 실시형태에서, 캡 영역에 의해 정의되는 채널은 가장 좁은 지점에서 직경이 약 20 내지 약 66 이며, 예컨대 가장 좁은 지점에서 직경이 약 30 내지 약 50 또는 약 32 내지 약 43 이다. 일부 실시형태에서, 캡 영역에 의해 정의되는 채널은 바람직하게는 가장 좁은 지점에서 직경이 약 43 이다. 일부 실시형태에서, 외부 직경은 서로 가장 가까운 CsgG 포어의 캡 영역의 채널상에 있는 아미노산 잔기의 C_a 내지 C_a 거리를 계산하여 측정된다. 일부 실시형태에서, 외부 직경은 서로 가장 가까운 캡 영역의 채널상에 있는 아미노산 잔기의 반 데르 발스 반경의 거리를 계산하여 측정된다.In some embodiments, the cap area is about 20 to about 60 , for example, about 30 to about 50 or about 35 to about 45 has a length of about 39. In some embodiments, the cap region is has a length of about 30 to about 70. In some embodiments, the channel defined by the cap region has a diameter of about 30 to about 70. , for example, a diameter of about 40 to about 60 or about 45 to about 55 has an opening. In some embodiments, the channel defined by the cap region has a diameter of about 66 has a diameter of from about 20 to about 66 at its narrowest point. In some embodiments, the channel defined by the cap region has a diameter of from about 20 to about 66 at its narrowest point. , and has a diameter of about 30 to about 50 at its narrowest point. or about 32 to about 43 In some embodiments, the channel defined by the cap region preferably has a diameter of about 43 at its narrowest point. In some embodiments, the outer diameter is measured by calculating the C _a to _C a distance between amino acid residues on the channel of the cap region of the CsgG pore that are closest to each other. In some embodiments, the outer diameter is measured by calculating the distance between the van der Waals radii of amino acid residues on the channel of the cap region that are closest to each other.

일부 실시형태에서, CsgG 포어(존재하는 경우)에 의해 형성되는 협착부 영역은 약 5 내지 약 40 , 예컨대 약 10 내지 약 30 또는 약 15 내지 약 25 의 길이를 갖는다. 일부 실시형태에서, 협착부 영역은 약 20 의 길이를 갖는다. 일부 실시형태에서, 협착부 영역에 의해 정의되는 채널은 가장 좁은 지점에서 직경이 약 2 내지 약 30 , 예를 들어, 가장 좁은 지점에서 직경이 약 5 내지 약 25 , 약 8 내지 약 20 또는 약 10 내지 약 15 이다. 일부 실시형태에서, 협착부 영역에 의해 정의되는 채널은 직경이 약 9 이다. 일부 실시형태에서, 협착부 영역에 의해 정의되는 채널은 직경이 약 18.5 이다. 일부 실시형태에서, 협착부의 직경은 약 2 내지 약 30 , 예컨대 직경은 약 5 내지 약 25 , 약 8 내지 약 20 또는 약 10 내지 약 15 이다. 일부 실시형태에서, 협착부는 직경이 약 12 이다. 일부 실시형태에서, CsgG 포어의 협착부 영역은 포어의 내강으로 가장 멀리 연장되어 협착부를 형성하는 아미노산 잔기의 C_a 내지 C_a 거리를 계산하여 측정된다. 일부 실시형태에서, 외부 직경은 포어의 내강 내로 가장 멀리 연장되어 협착부를 형성하는 아미노산 잔기의 반 데르 발스 반경의 거리를 계산하여 측정된다.In some embodiments, the constriction region formed by the CsgG pore (if present) is about 5 to about 40 , for example, about 10 to about 30 or about 15 to about 25 has a length of about 20. In some embodiments, the stenotic region is about 20 has a length of about 2 to about 30. In some embodiments, the channel defined by the constriction region has a diameter at its narrowest point of about 2 to about 30. , for example, from about 5 to about 25 in diameter at its narrowest point. , about 8 to about 20 or about 10 to about 15 In some embodiments, the channel defined by the constricted region has a diameter of about 9 In some embodiments, the channel defined by the constricted region has a diameter of about 18.5 In some embodiments, the diameter of the constriction is from about 2 to about 30 , for example, the diameter is about 5 to about 25 , about 8 to about 20 or about 10 to about 15 In some embodiments, the constriction has a diameter of about 12 In some embodiments, the constriction area of a CsgG pore is measured by calculating the distance between C _a and C _a of the amino acid residues that extend furthest into the lumen of the pore to form the constriction. In some embodiments, the outer diameter is measured by calculating the distance between the van der Waals radii of the amino acid residues that extend furthest into the lumen of the pore to form the constriction.

일부 실시형태에서, 막횡단 베타 배럴 영역은 약 20 내지 약 60 , 예컨대 약 30 내지 약 50 또는 약 35 내지 약 45 의 길이를 갖는다. 일부 실시형태에서, 막횡단 베타 배럴은 약 39 의 길이를 갖는다. 일부 실시형태에서, 막횡단 베타 배럴 영역에 의해 정의되는 채널은 가장 좁은 지점에서 직경이 약 20 내지 약 60 이고, 예컨대 가장 좁은 지점에서 직경이 약 30 내지 약 50 또는 약 35 내지 약 45 이다. 일부 실시형태에서, 막횡단 베타 배럴 영역에 의해 정의되는 채널은 가장 좁은 지점에서 직경이 약 55 이다. In some embodiments, the transmembrane beta barrel region is about 20 to about 60 , for example, about 30 to about 50 or about 35 to about 45 has a length of about 39. In some embodiments, the transverse beta barrel has a length of about 39. has a length of about 20 to about 60 cm. In some embodiments, the channel defined by the transverse beta-barrel region has a diameter at its narrowest point of about 20 to about 60 cm. , and for example, the diameter at the narrowest point is about 30 to about 50 or about 35 to about 45 In some embodiments, the channel defined by the transverse beta barrel region has a diameter of about 55 at its narrowest point. am.

위의 모든 측정은 상이한 영역을 형성하는 아미노산의 백본에서 백본까지의 측정을 기반으로 한다(도 19에서 나타나 있음).All of the above measurements are based on measurements from backbone to backbone of amino acids forming different domains (as shown in Figure 19).

서열 번호: 59는 성숙 단백질로서의 야생형 대장균 CsgG의 서열을 나타낸다. 서열 번호: 59의 잔기 1 내지 41이 캡 영역을 형성한다. 서열 번호: 59의 잔기 64 내지 131이 협착부 영역을 형성한다. 서열 번호: 59의 잔기 156 내지 180 및 212 내지 262가 막횡단 베타 배럴 영역을 형성한다.SEQ ID NO: 59 represents the sequence of wild-type E. coli CsgG as a mature protein. Residues 1 to 41 of SEQ ID NO: 59 form the cap region. Residues 64 to 131 of SEQ ID NO: 59 form the constriction region. Residues 156 to 180 and 212 to 262 of SEQ ID NO: 59 form the transmembrane beta-barrel region.

일부 실시형태에서, CsgG 포어 단량체는 서열 번호: 59의 위치 153 또는 133에 상응하는 위치에 시스테인을 갖기 때문에 서열 번호: 59의 변이체이다. 일부 실시형태에서, 변이체 CsgG 단량체는 또한 변형된 CsgG 포어 단량체 또는 돌연변이체 CsgG 포어 단량체로 지칭될 수 있다. 변이체에서의 변형 또는 돌연변이는 본원에 개시된 변형 중 임의의 하나 이상, 또는 상기 변형의 조합을 포함하나, 이에 제한되지 않는다. CsgG 포어 단량체는 CsgG 상동체 단량체일 수 있다. CsgG 상동체 단량체는 서열 번호: 59에서 나타낸 야생형 대장균 CsgG와 적어도 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% 또는 99%의 완전한 서열 동일성을 갖는 폴리펩티드이다. CsgG 상동체는 CsgG 유사 단백질에 특징적인 PFAM 도메인 PF03783을 함유하는 폴리펩티드로도 지칭된다. 현재 공지된 CsgG 상동체 및 CsgG 아키텍처 목록은 http://pfam.xfam.org//family/PF03783에서 확인할 수 있다. In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO: 59 because it has a cysteine at a position corresponding to position 153 or 133 of SEQ ID NO: 59. In some embodiments, the variant CsgG monomer may also be referred to as a modified CsgG pore monomer or a mutant CsgG pore monomer. The modifications or mutations in the variant include, but are not limited to, any one or more of the modifications disclosed herein, or a combination of such modifications. The CsgG pore monomer can be a CsgG homolog monomer. A CsgG homolog monomer is a polypeptide having at least 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% or 99% complete sequence identity to the wild-type E. coli CsgG as set forth in SEQ ID NO: 59. CsgG homologues are also referred to as polypeptides containing the PFAM domain PF03783, which is characteristic of CsgG-like proteins. A list of currently known CsgG homologues and CsgG architectures can be found at http://pfam.xfam.org//family/PF03783.

일부 실시형태에서, CsgG 포어 단량체는 서열 번호: 59의 위치 153 또는 133에 상응하는 위치의 시스테인에 더하여 하나 이상의 변형을 포함하는 서열 번호 59의 변이체이다. 서열 번호: 59의 아미노산 서열의 전체 길이에 걸쳐, 변이체는 바람직하게는 아미노산 동일성에 기반하여 해당 서열과 적어도 40% 상동성일 것이다. 보다 바람직하게는, 변이체는 전체 서열에 걸쳐 서열 번호: 59의 아미노산 서열에 대한 아미노산 동일성에 기반하여 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 상동성일 수 있다. 서열 번호: 59의 아미노산 서열 전체 길이에 걸쳐, 변이체는 바람직하게는 해당 서열과 적어도 40% 동일할 것이다. 보다 바람직하게는, 변이체는 전체 서열에 걸쳐 서열 번호: 59과 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 동일할 수 있다. In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO: 59 comprising one or more modifications in addition to a cysteine at a position corresponding to position 153 or 133 of SEQ ID NO: 59. Over the entire length of the amino acid sequence of SEQ ID NO: 59, the variant will preferably be at least 40% homologous, based on amino acid identity, to that sequence. More preferably, the variant can be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous, based on amino acid identity, to the amino acid sequence of SEQ ID NO: 59 over the entire sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 59, the variant will preferably be at least 40% identical to that sequence. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to SEQ ID NO: 59 over the entire sequence.

서열 동일성은 또한 CsgG 포어 단량체의 단편 또는 부분과 관련될 수 있다. 따라서, 서열은 서열 번호: 59와 전반적인 서열 상동성/동일성이 40% 미만일 수 있지만, 특정 영역, 도메인 또는 서브유닛의 서열은 서열 번호: 59의 상응하는 영역과 적어도 80%, 90%, 또는 최대 99%의 서열 상동성/동일성을 공유할 수 있다. 100개 이상, 예를 들어, 125, 150, 175 또는 200개 이상의 연속 아미노산의 스트레치에 걸쳐 적어도 80%, 예를 들어, 적어도 85%, 90% 또는 95%의 아미노산 동일성이 있을 수 있다("강한 상동성"). 일부 실시형태에서, CsgG 포어 단량체는 바람직하게는 서열 번호: 3의 캡 영역(잔기 1 내지 41)과 적어도 40% 상동성인 서열을 포함하는 서열 번호: 3의 변이체이다. 보다 바람직하게는, 변이체는 서열 번호: 59의 잔기 1 내지 41에 대한 아미노산 동일성에 기반하여 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 상동성인 서열을 포함할 수 있다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 1 내지 41과 적어도 40% 동일한 서열을 포함한다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 1 내지 41과 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 동일한 서열을 포함한다.The sequence identity can also relate to a fragment or portion of a CsgG pore monomer. Thus, while a sequence may have less than 40% overall sequence homology/identity with SEQ ID NO: 59, the sequence of a particular region, domain or subunit may share at least 80%, 90%, or at most 99% sequence homology/identity with a corresponding region of SEQ ID NO: 59. There may be at least 80%, for example, at least 85%, 90% or 95% amino acid identity over a stretch of 100 or more, for example, 125, 150, 175 or 200 or more contiguous amino acids (“strong homology”). In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO: 3, preferably comprising a sequence that is at least 40% homologous to the cap region (residues 1 to 41) of SEQ ID NO: 3. More preferably, the variant comprises a sequence which is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous to residues 1 to 41 of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence which is at least 40% identical to residues 1 to 41 of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues 1 to 41 of SEQ ID NO: 59.

일부 실시형태에서, CsgG 포어 단량체는 서열 번호: 59의 협착부 영역(잔기 64 내지 131)과 적어도 40% 상동성인 서열을 포함하는 서열 번호: 59의 변이체이다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 64 내지 131에 대한 아미노산 동일성에 기반하여 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 상동성인 서열을 포함한다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 64 내지 131과 적어도 40% 동일한 서열을 포함한다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 64 내지 131과 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 동일한 서열을 포함한다.In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO: 59 comprising a sequence that is at least 40% homologous to the stricture region (residues 64 to 131) of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 64 to 131 of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence that is at least 40% identical to residues 64 to 131 of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues 64 to 131 of SEQ ID NO: 59.

일부 실시형태에서, CsgG 포어 단량체는 서열 번호: 3의 막횡단 베타 배럴 영역(잔기 156-180 및 212-262)과 적어도 40% 상동성인 서열을 포함하는 서열 번호: 59의 변이체이다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 156-180 및 212-262에 대한 아미노산 동일성에 기반하여 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 상동성인 서열을 포함한다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 156-180 및 212-262와 적어도 40% 동일한 서열을 포함한다. 일부 실시형태에서, 변이체는 서열 번호: 59의 잔기 156-180 및 212-262와 적어도 45%, 적어도 50%, 적어도 55%, 적어도 60%, 적어도 65%, 적어도 70%, 적어도 75%, 적어도 80%, 적어도 85%, 적어도 90% 및 보다 바람직하게는 적어도 95%, 97% 또는 99% 동일한 서열을 포함한다.In some embodiments, the CsgG pore monomer is a variant of SEQ ID NO: 59 comprising a sequence that is at least 40% homologous to the transmembrane beta barrel region (residues 156-180 and 212-262) of SEQ ID NO: 3. In some embodiments, the variant comprises a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 156-180 and 212-262 of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence that is at least 40% identical to residues 156-180 and 212-262 of SEQ ID NO: 59. In some embodiments, the variant comprises a sequence which is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues 156-180 and 212-262 of SEQ ID NO: 59.

CsgG 포어 단량체는 (WO 2017/149317의 도 45 내지 47에서 쉽게 알 수 있는 바와 같이) 고도로 보존되어 있다. 또한, 서열 번호: 59와 관련된 돌연변이에 대한 지식으로부터 서열 번호: 59의 돌연변이 이외의 CsgG 포어 단량체의 돌연변이에 대한 등가 위치를 결정하는 것이 가능하다. The CsgG pore monomer is highly conserved (as readily apparent in FIGS. 45 to 47 of WO 2017/149317). Furthermore, from knowledge of the mutations associated with SEQ ID NO: 59, it is possible to determine equivalent positions for mutations in the CsgG pore monomer other than those of SEQ ID NO: 59.

따라서, 서열 번호: 59에 나타낸 서열의 변이체와 청구범위 및 명세서의 다른 부분에서 명시되는 특정 아미노산 돌연변이를 포함하는 돌연변이체 CsgG 포어 단량체에 대한 언급은 WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)의 서열 번호: 68 내지 88에 표시된 서열 중 임의의 하나의 변이체와 이의 상응하는 아미노산 돌연변이를 포함하는 돌연변이체 CsgG 포어 단량체도 포괄한다. CsgG 포어 단량체는 또한 CN 113773373 A, CN 113896776 A, CN 113912683 A 및 CN 113754743 A에 나타낸 서열 중 임의의 하나 또는 이의 변이체일 수 있다.Thus, a reference to a mutant CsgG pore monomer comprising a variant of the sequence set forth in SEQ ID NO: 59 and certain amino acid mutations specified in the claims and elsewhere in the specification also encompasses mutant CsgG pore monomers comprising a variant of any one of the sequences set forth in SEQ ID NO: 68 to 88 of WO 2019/002893 (which is herein incorporated by reference in its entirety) and the corresponding amino acid mutations thereof. The CsgG pore monomer may also be any one of the sequences set forth in CN 113773373 A, CN 113896776 A, CN 113912683 A and CN 113754743 A or variants thereof.

상동성을 결정하기 위해 당해 기술분야의 표준 방법을 사용할 수 있다. 예를 들어, UWGCG 패키지는 예를 들어, 기본 설정에서 사용되는 상동성을 계산하기 위해 사용될 수 있는 BESTFIT 프로그램을 제공한다(Devereux 등 (1984) Nucleic Acids Research 12, p387-395). PILEUP 및 BLAST 알고리즘은 예를 들어, 문헌[Altschul S. F. (1993) J Mol Evol 36:290-300]; 문헌[Altschul, S.F 등 (1990) J Mol Biol 215:403-10]에서 기재된 바와 같이, 상동성을 계산하거나 서열을 정렬하기 위해 (예컨대, 동등한 잔기 또는 상응하는 서열을 식별하기 위해 (전형적으로 이의 기본 설정으로)) 사용될 수 있다. BLAST 분석을 수행하기 위한 소프트웨어는 미국 국립 생명공학 정보 센터(National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/)를 통해 공개적으로 이용 가능하다. Standard methods in the art can be used to determine homology. For example, the UWGCG package provides the BESTFIT program, which can be used to calculate homology, for example, with default settings (Devereux et al. (1984) Nucleic Acids Research 12, p387-395). The PILEUP and BLAST algorithms can be used to calculate homology or to align sequences (e.g., to identify equivalent residues or corresponding sequences (typically with their default settings)), as described, for example, in Altschul S. F. (1993) J Mol Evol 36:290-300; Altschul, S. F et al. (1990) J Mol Biol 215:403-10). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/).

서열 번호: 59는 대장균 균주 K-12 하위균주 MC4100의 야생형 CsgG 포어 단량체이다. 서열 번호 59의 변이체는 다른 CsgG 상동체에 존재하는 임의의 치환을 포함할 수 있다. 바람직한 CsgG 상동체는 WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)의 서열 번호: 68 내지 88에서 나타나 있다. 변이체는 서열 번호: 59와 비교하여 WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)의 서열 번호: 68 내지 88에서 존재하는 하나 이상의 치환의 조합을 포함할 수 있다.SEQ ID NO: 59 is a wild-type CsgG pore monomer of E. coli strain K-12 substrain MC4100. Variants of SEQ ID NO: 59 may comprise any substitution present in other CsgG homologues. Preferred CsgG homologues are set forth in SEQ ID NOs: 68 to 88 of WO 2019/002893 (which is incorporated herein by reference in its entirety). Variants may comprise a combination of one or more substitutions present in SEQ ID NOs: 68 to 88 of WO 2019/002893 (which is incorporated herein by reference in its entirety) compared to SEQ ID NO: 59.

본 개시내용의 포어 단량체 접합체에 있는 CsgG 포어 단량체는 전형적으로 야생형 CsgG 포어 단량체와 동일한 3D 구조, 예컨대 서열 번호: 59의 서열을 갖는 CsgG 포어 단량체와 동일한 3D 구조를 형성하는 능력을 유지한다. CsgG의 3D 구조는 당해 기술분야에서 공지되어 있으며, 예를 들어 문헌[Goyal 등 (2014) Nature 516(7530):250-3]에서 개시되어 있다. CsgG 포어 단량체가 돌연변이에 의해 부여된 개선된 특성을 유지한다면 본원에 기재된 돌연변이에 더하여 야생형 CsgG 서열에서 임의의 수의 돌연변이가 만들어질 수 있다.The CsgG pore monomer in the pore monomer conjugate of the present disclosure typically retains the ability to form the same 3D structure as a wild-type CsgG pore monomer, e.g., a CsgG pore monomer having the sequence of SEQ ID NO: 59. The 3D structure of CsgG is known in the art and is disclosed, for example, in Goyal et al. (2014) Nature 516(7530):250-3. Any number of mutations may be made in the wild-type CsgG sequence in addition to the mutations described herein, provided that the CsgG pore monomer retains the improved properties conferred by the mutation.

아미노산 치환은 위에서 논의된 것에 더하여, 서열 번호: 59의 아미노산 서열에 대해, 예를 들어 최대 1, 2, 3, 4, 5, 10, 20 또는 30개 치환이 이루어질 수 있다. 보존적 치환은 아미노산을 유사한 화학적 구조, 유사한 화학적 특성 또는 유사한 측쇄 부피의 다른 아미노산으로 대체한다. 도입된 아미노산은 이들이 대체하는 아미노산과 유사한 극성, 친수성, 소수성, 염기성, 산성, 중성 또는 전하를 가질 수 있다. 대안적으로, 보존적 치환은 기존의 방향족 또는 지방족 아미노산 대신에 방향족 또는 지방족인 다른 아미노산을 도입할 수 있다.Amino acid substitutions may be made, for example, up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions, for the amino acid sequence of SEQ ID NO: 59, in addition to those discussed above. A conservative substitution replaces an amino acid with another amino acid having a similar chemical structure, similar chemical properties or similar side chain volume. The introduced amino acid may have a polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge similar to the amino acid it replaces. Alternatively, a conservative substitution may introduce another amino acid which is aromatic or aliphatic instead of the existing aromatic or aliphatic amino acid.

일부 실시형태에서, CsgG 포어 단량체는 하나 이상의 시스테인, 하나 이상의 소수성 아미노산, 하나 이상의 하전된 아미노산, 하나 이상의 비천연 아미노산, 하나 이상의 극성 아미노산 또는 하나 이상의 광반응성 아미노산을 도입하도록 변형된다. 이러한 도입은 임의의 수와 조합으로 이루어질 수 있다. 도입은 바람직하게는 치환에 의한 것이다.In some embodiments, the CsgG pore monomer is modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more unnatural amino acids, one or more polar amino acids, or one or more photoreactive amino acids. Such introductions can be made in any number and combination. The introductions are preferably by substitution.

서열 번호: 59의 아미노산 서열의 하나 이상의 아미노산 잔기가 위에서 기재된 폴리펩티드로부터 추가로 결실될 수 있다. 최대 1, 2, 3, 4, 5, 10, 20 또는 30개 이상의 잔기가 결실될 수 있다. One or more amino acid residues of the amino acid sequence of SEQ ID NO: 59 may additionally be deleted from the polypeptide described above. At most 1, 2, 3, 4, 5, 10, 20 or 30 or more residues may be deleted.

변이체는 서열 번호: 59의 단편을 포함할 수 있다. 이러한 단편은 포어 형성 활성을 유지한다. 단편은 적어도 50 개, 적어도 100 개, 적어도 150 개, 적어도 200 개 또는 적어도 250 개의 아미노산 길이일 수 있다. 이러한 단편은 포어를 생성하기 위해 사용될 수 있다. 단편은 바람직하게는 서열 번호: 59의 막 스패닝 도메인, 즉 K135-Q153 및 S183-S208을 포함한다. The variant may comprise a fragment of SEQ ID NO: 59. This fragment retains pore-forming activity. The fragment may be at least 50, at least 100, at least 150, at least 200 or at least 250 amino acids in length. This fragment may be used to generate a pore. The fragment preferably comprises the membrane spanning domain of SEQ ID NO: 59, i.e. K135-Q153 and S183-S208.

하나 이상의 아미노산이 위에 기재된 폴리펩티드에 대안적으로 또는 추가적으로 첨가될 수 있다. 서열번호 59의 아미노산 서열 또는 이의 폴리펩티드 변이체 또는 단편의 아미노 말단 또는 카복시 말단에 연장부가 제공될 수 있다. 연장은 예를 들어, 1 내지 10 개의 아미노산 길이로 매우 짧을 수 있다. 대안적으로, 연장은 예를 들어, 최대 50 개 또는 100 개의 아미노산까지 더 길 수 있다. 담체 단백질은 아미노산 서열에 융합될 수 있다. 다른 융합 단백질은 본 개시내용의 다른 부분, 예를 들어 "보조 단백질"이라는 제목의 부문에서 더 상세히 논의된다.One or more amino acids may alternatively or additionally be added to the polypeptide described above. An extension may be provided at the amino terminus or carboxy terminus of the amino acid sequence of SEQ ID NO: 59 or a polypeptide variant or fragment thereof. The extension may be very short, for example, from 1 to 10 amino acids in length. Alternatively, the extension may be longer, for example, up to 50 or 100 amino acids. A carrier protein may be fused to the amino acid sequence. Other fusion proteins are discussed in more detail elsewhere in this disclosure, for example in the section entitled "Accessory Proteins."

서열 번호: 59의 변이체는 서열 번호: 59의 아미노산 서열과 다르고 포어를 형성하는 능력을 유지하는 아미노산 서열을 갖는 폴리펩티드이다. 변이체는 전형적으로 포어 형성을 담당하는 서열 번호: 59의 영역을 함유한다. β 배럴을 함유하는 CsgG의 포어 형성 능력은 각 서브유닛 단량체의 막횡단 베타 배럴 영역에 있는 β 시트에 의해 제공된다. 서열 번호: 59의 변이체는 전형적으로 β 시트를 형성하는 서열 번호: 59의 영역, 즉 K134-Q154 및 S183-S208을 포함한다. 생성된 변이체가 포어를 형성하는 능력을 유지하는 한 β 시트를 형성하는 서열 번호: 3의 영역에 하나 이상의 변형이 이루어질 수 있다.A variant of SEQ ID NO: 59 is a polypeptide having an amino acid sequence that differs from the amino acid sequence of SEQ ID NO: 59 and retains the ability to form a pore. The variant typically comprises the region of SEQ ID NO: 59 that is responsible for pore formation. The pore-forming ability of CsgG containing a β-barrel is provided by a β-sheet in the transmembrane β-barrel region of each subunit monomer. A variant of SEQ ID NO: 59 typically comprises the region of SEQ ID NO: 59 that forms a β-sheet, namely K134-Q154 and S183-S208. One or more modifications may be made in the region of SEQ ID NO: 3 that forms a β-sheet, as long as the resulting variant retains the ability to form a pore.

CsgG 포어 단량체의 하나 이상의 변형은 바람직하게는 분석물을 특성규명하는 포어 단량체를 포함하는 포어 복합체의 능력을 개선한다. 예를 들어, 변형/돌연변이/치환은 본 개시내용의 포어 단량체 접합체로부터 채널 내 협착부의 수, 크기, 모양, 배치 또는 배향을 변경하는 것으로 고려된다. CsgG 포어 단량체 또는 서열 번호: 59의 변이체는 WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241 및 WO 2019/002893(모두 전문이 본원에 인용방식에 의해 원용됨)에 개시되는 특정 변형 또는 치환을 가질 수 있다. One or more modifications of the CsgG pore monomer preferably improve the ability of the pore complex comprising the pore monomer to characterize an analyte. For example, the modifications/mutations/substitutions are contemplated to alter the number, size, shape, arrangement or orientation of constrictions within the channel from the pore monomer conjugates of the present disclosure. The CsgG pore monomer or variants of SEQ ID NO: 59 may have specific modifications or substitutions disclosed in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241 and WO 2019/002893 (all of which are incorporated herein by reference in their entireties).

서열 번호: 59의 바람직한 변형 또는 치환은 다음 중 1개 이상, 예컨대, 2개 이상, 3개 이상, 4개 이상, 5개 이상, 6개 이상, 7개 이상 또는 모두를 포함하나 이에 제한되지 않는다:Preferred modifications or substitutions of SEQ ID NO: 59 include, but are not limited to, one or more of the following, for example, two or more, three or more, four or more, five or more, six or more, seven or more, or all:

(a) 위치 Y51에서의 치환, 예컨대 Y51I, Y51L, Y51A, Y51V, Y51T, Y51S, Y51Q 또는 Y51N;(a) a substitution at position Y51, such as Y51I, Y51L, Y51A, Y51V, Y51T, Y51S, Y51Q or Y51N;

(b) 위치 N55에서의 치환, 예컨대 N55I, N55L, N55A, N55V, N55T, N55S 또는 N55Q;(b) a substitution at position N55, such as N55I, N55L, N55A, N55V, N55T, N55S or N55Q;

(c) 위치 F56에서의 치환, 예컨대 F56I, F56L, F56A, F56V, F56T, F56S, F56Q 또는 F56N;(c) a substitution at position F56, such as F56I, F56L, F56A, F56V, F56T, F56S, F56Q or F56N;

(d) 위치 L90에서의 치환, 예컨대 L90N, L90D, L90E, L90R 또는 L90K;(d) a substitution at position L90, such as L90N, L90D, L90E, L90R or L90K;

(e) 위치 N91에서의 치환, 예컨대 N91D, N91E, N91R 또는 N91K;(e) a substitution at position N91, such as N91D, N91E, N91R or N91K;

(f) 위치 K94에서의 치환, 예컨대 K94R, K94F, K94Y, K94Q, K94W, K94L, K94S 또는 K94N;(f) a substitution at position K94, such as K94R, K94F, K94Y, K94Q, K94W, K94L, K94S or K94N;

(g) 위치 R192에서의 치환, 예컨대 R192Q, R192F, R192S R192D, 또는 R192T; 및(g) a substitution at position R192, such as R192Q, R192F, R192S R192D, or R192T; and

(i) 위치 C215에서의 치환, 예컨대 C215T, C215S, C215I, C215L, C215A, C215V 또는 C215G.(i) a substitution at position C215, such as C215T, C215S, C215I, C215L, C215A, C215V or C215G.

서열 번호: 3의 변이체는 하나 이상의 위치의 결실, 예컨대 T104-N109의 결실, F193-L199의 결실 또는 F195-L199의 결실을 추가로 포함할 수 있다.The variant of SEQ ID NO: 3 may additionally comprise a deletion of one or more positions, for example a deletion of T104-N109, a deletion of F193-L199 or a deletion of F195-L199.

포어 또는 포어 복합체에 있는 CsgG 포어 단량체의 임의의 수, 예컨대 6, 7, 8, 9 또는 10은 서열 번호: 59의 변이체일 수 있다. 포어 또는 포어 복합체에 있는 6 내지 10개 단량체 모두는 바람직하게는 서열 번호: 59의 변이체이다. 포어 복합체의 변이체는 동일할 수도 있고 상이할 수도 있다. 변이체는 바람직하게는 포어 복합체에 있는 각각의 포어 단량체 접합체에서 동일하다.Any number of CsgG pore monomers in the pore or pore complex, such as 6, 7, 8, 9 or 10, can be variants of SEQ ID NO: 59. Preferably, all 6 to 10 monomers in the pore or pore complex are variants of SEQ ID NO: 59. The variants in the pore complex may be identical or different. The variants are preferably identical in each pore monomer conjugate in the pore complex.

링커Linker

일부 실시형태에서, 단백질 포어 복합체는 보조 단백질 또는 융합 단백질을 나노포어에 부착(예를 들어, 공유 부착)하여 안정화된다. 공유 연결은 예를 들어 이황화 결합 또는 클릭 화학일 수 있다. 추가 예로서, 시스테인 잔기는 BMOE와 같은 링커에 의해 연결될 수 있다. 보조 단백질 또는 융합 단백질 및/또는 막횡단 단백질 나노포어는 이러한 공유 상호작용을 촉진하도록 변형될 수 있다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 나노포어에 비공유적으로 부착된다. 일부 실시형태에서, 보조 단백질 또는 융합 단백질은 하나 이상(예를 들어, 1, 2, 3, 4, 5개 또는 그 이상)의 링커에 의해 나노포어에 부착된다.In some embodiments, the protein pore complex is stabilized by attaching (e.g., covalently attaching) an auxiliary protein or fusion protein to the nanopore. The covalent linkage can be, for example, a disulfide bond or click chemistry. As a further example, the cysteine residues can be linked by a linker, such as BMOE. The auxiliary protein or fusion protein and/or the transmembrane protein nanopore can be modified to facilitate such covalent interactions. In some embodiments, the auxiliary protein or fusion protein is noncovalently attached to the nanopore. In some embodiments, the auxiliary protein or fusion protein is attached to the nanopore by one or more (e.g., 1, 2, 3, 4, 5, or more) linkers.

일부 실시형태에서, 보조 단백질 또는 융합 단백질은 소수성 상호작용 및/또는 하나 이상의 이황화 결합에 의해 나노포어에 부착된다. 포어에 있는 단량체 중 하나 이상, 예컨대 2, 3, 4, 5, 6, 8, 9개, 예를 들어, 전부가 변형되어 이러한 상호작용을 향상시킬 수 있다. 이는 임의의 적합한 방식으로 달성될 수 있다. 추가로 적합한 상호작용은 염 브릿지, 정전기 상호작용, 수소 결합 형성, 펩티드 결합 형성 및 Pi-Pi 상호작용을 포함한다. In some embodiments, the accessory protein or fusion protein is attached to the nanopore by hydrophobic interactions and/or one or more disulfide bonds. One or more of the monomers in the pore, such as 2, 3, 4, 5, 6, 8, 9, for example all, may be modified to enhance these interactions. This may be accomplished in any suitable manner. Additional suitable interactions include salt bridges, electrostatic interactions, hydrogen bond formation, peptide bond formation, and Pi-Pi interactions.

나노포어와 보조 단백질(또는 융합 단백질) 사이의 계면에서 막횡단 단백질 나노포어의 아미노산 서열에 있는 적어도 하나의 시스테인 잔기는 나노포어와 보조 단백질 사이의 계면에서 보조 단백질의 아미노산 서열에 있는 적어도 하나의 시스테인 잔기와 이황화물 결합될 수 있다. 일부 실시형태에서, 제1 보조 단백질의 아미노산 서열에 있는 적어도 하나의 시스테인 잔기는 제2 보조 단백질의 아미노산 서열에 있는 적어도 하나의 시스테인 잔기에 이황화 결합된다. 일부 실시형태에서, 융합 단백질의 제1 부분의 아미노산 서열에 있는 적어도 하나의 시스테인 잔기는 융합 단백질의 제2 부분의 아미노산 서열에 있는 적어도 하나의 시스테인 잔기에 이황화 결합된다. 나노포어에 있는 시스테인 잔기 및/또는 보조 단백질 또는 융합 단백질에 있는 시스테인 잔기는 야생형 막횡단 단백질 포어 단량체 또는 야생형 보조 단백질에 존재하지 않는 시스테인 잔기일 수 있다. 2, 3, 4, 5, 6, 7, 8 또는 9 내지 16, 18, 24, 27, 32, 36, 40, 45, 48, 54, 56 또는 63과 같은 다중 이황화 결합이 포어 복합체에 있는 나노포어와 보조단백질(또는 융합 단백질) 사이에서 형성될 수 있다. 나노포어와 보조 단백질(또는 융합 단백질) 중 하나 또는 둘 모두는 나노포어와 보조 단백질(또는 융합 단백질) 사이의 계면에 시스테인 잔기를 포함하는 적어도 하나의 단량체 또는 서브유닛, 예를 들어 최대 8, 9 또는 10개의 단량체 또는 서브유닛을 포함할 수 있다.At least one cysteine residue in the amino acid sequence of the transmembrane protein nanopore at the interface between the nanopore and the accessory protein (or fusion protein) can be disulfide bonded with at least one cysteine residue in the amino acid sequence of the accessory protein at the interface between the nanopore and the accessory protein. In some embodiments, at least one cysteine residue in the amino acid sequence of the first accessory protein is disulfide bonded to at least one cysteine residue in the amino acid sequence of the second accessory protein. In some embodiments, at least one cysteine residue in the amino acid sequence of the first portion of the fusion protein is disulfide bonded to at least one cysteine residue in the amino acid sequence of the second portion of the fusion protein. The cysteine residue in the nanopore and/or the cysteine residue in the accessory protein or fusion protein can be a cysteine residue that is not present in the wild-type transmembrane protein pore monomer or in the wild-type accessory protein. Multiple disulfide bonds, such as 2, 3, 4, 5, 6, 7, 8 or 9 to 16, 18, 24, 27, 32, 36, 40, 45, 48, 54, 56 or 63, can be formed between the nanopore and the auxiliary protein (or fusion protein) in the pore complex. One or both of the nanopore and the auxiliary protein (or fusion protein) can comprise at least one monomer or subunit comprising a cysteine residue at the interface between the nanopore and the auxiliary protein (or fusion protein), for example up to 8, 9 or 10 monomers or subunits.

나노포어 및/또는 보조 단백질(또는 융합 단백질)은 나노포어와 보조 단백질(또는 융합 단백질) 사이의 계면에 하나 이상의 소수성 아미노산 잔기를 포함할 수 있으며, 이는 야생형 나노포어 또는 보조 단백질(또는 융합 단백질)에 있는 상응하는 위치에 존재하는 잔기보다 더 소수성이다. 나노포어에 있는 적어도 하나의 단량체 또는 서브유닛 및/또는 보조 단백질(또는 융합 단백질)에 있는 적어도 하나의 단량체 또는 서브유닛은 나노포어와 보조 단백질(또는 융합 단백질) 사이의 계면에 적어도 하나의 잔기를 포함할 수 있으며, 이 잔기는 야생형 포어 또는 보조 단백질(또는 융합 단백질)에 있는 상응하는 위치에 존재하는 잔기보다 더 소수성이다. 예를 들어, 나노포어 및/또는 보조 단백질(또는 융합 단백질)에 있는 2개 내지 10개, 즉 3, 4, 5, 6, 7, 8 또는 9개의 잔기는 상응하는 야생형 나노포어 및/또는 보조 단백질(또는 융합 단백질)에 있는 동일한 위치에 있는 잔기보다 더 소수성일 수 있다. 이러한 소수성 잔기는 포어 복합체에 있는 나노포어와 보조 단백질(또는 융합 단백질) 사이의 상호작용을 강화한다. 야생형 나노포어 또는 보조 단백질(또는 융합 단백질)의 계면에서 잔기가 R, Q, N 또는 E인 경우, 소수성 잔기는 전형적으로 I, L, V, M, F, W, A 또는 Y이다. 야생형 나노포어 또는 보조 단백질(또는 융합 단백질)의 계면에서 잔기가 I인 경우, 소수성 잔기는 전형적으로 L, V, M, F, W, A 또는 Y이다. 야생형 나노포어 또는 보조 단백질(또는 융합 단백질)의 계면에서 잔기가 L인 경우, 소수성 잔기는 전형적으로 I, V, M, F, W, A 또는 Y이다.The nanopore and/or the accessory protein (or fusion protein) can comprise one or more hydrophobic amino acid residues at the interface between the nanopore and the accessory protein (or fusion protein) that are more hydrophobic than the residues present at the corresponding positions in the wild-type nanopore or the accessory protein (or fusion protein). At least one of the monomers or subunits in the nanopore and/or at least one of the monomers or subunits in the accessory protein (or fusion protein) can comprise at least one residue at the interface between the nanopore and the accessory protein (or fusion protein), which residue is more hydrophobic than the residues present at the corresponding positions in the wild-type pore or the accessory protein (or fusion protein). For example, from two to ten, i.e., three, four, five, six, seven, eight or nine, residues in the nanopore and/or the accessory protein (or fusion protein) can be more hydrophobic than the residues present at the same positions in the corresponding wild-type nanopore and/or the accessory protein (or fusion protein). These hydrophobic residues enhance the interaction between the nanopore and the accessory protein (or fusion protein) in the pore complex. When the residue at the interface of the wild-type nanopore or the accessory protein (or fusion protein) is R, Q, N or E, the hydrophobic residues are typically I, L, V, M, F, W, A or Y. When the residue at the interface of the wild-type nanopore or the accessory protein (or fusion protein) is I, the hydrophobic residues are typically L, V, M, F, W, A or Y. When the residue at the interface of the wild-type nanopore or the accessory protein (or fusion protein) is L, the hydrophobic residues are typically I, V, M, F, W, A or Y.

보조 단백질과 나노포어의 어떤 잔기가 근접해 있는지 확립하기 위해 분자 역학 시뮬레이션을 수행할 수 있다. 이러한 정보는 복합체의 안정성을 증가시킬 수 있는 보조 단백질 및/또는 막횡단 단백질 나노포어 돌연변이체를 설계하기 위해 사용될 수 있다. 예를 들어, GROMACS 패키지 버전 4.6.5를 사용하여 GROMOS 53a6 역장 및 단백질의 극저온-EM 구조를 사용하는 SPC 물 모델을 사용하여 시뮬레이션을 수행할 수 있다. 복합체는 용매화되고 가장 가파른 하강 알고리즘을 사용하여 에너지를 최소화할 수 있다. 시뮬레이션 전반에 걸쳐 단백질의 백본에 제한을 가할 수 있지만 잔기 측쇄는 자유롭게 이동할 수 있다. 시스템은 Berendsen 온도 조절 장치와 Berendsen 압력 조절 장치를 사용하여 300 K까지 NPT 앙상블에서 20 ns 동안 시뮬레이션할 수 있다. 보조 단백질과 나노포어 사이의 접촉은 GROMACS 분석 소프트웨어 및/또는 로컬로 작성된 코드를 사용하여 분석할 수 있다. 두 잔기가 서로 3 옹스트롬 이내에 있으면 접촉한 것으로 정의할 수 있다.Molecular dynamics simulations can be performed to establish which residues of the auxiliary protein and the nanopore are in close proximity. This information can be used to design auxiliary protein and/or transmembrane protein nanopore mutants that may increase the stability of the complex. For example, simulations can be performed using the GROMOS 53a6 force field and the SPC water model using the cryo-EM structure of the protein using the GROMACS package version 4.6.5. The complex can be solvated and energy minimized using the steepest descent algorithm. The protein backbone can be restrained throughout the simulation, but the residue side chains can move freely. The system can be simulated for 20 ns in the NPT ensemble up to 300 K using the Berendsen thermostat and Berendsen pressure regulator. Contacts between the auxiliary protein and the nanopore can be analyzed using the GROMACS analysis software and/or locally written code. Contacts are defined as two residues within 3 angstroms of each other.

예를 들어, 포어 복합체에서 CsgF 펩티드와 CsgG 포어 사이의 상호작용은 예를 들어 서열 번호: 60 및 서열 번호: 59의 다음 위치 쌍 중 하나 이상에 상응하는 위치에서 소수성 상호작용, 정전기 상호작용 또는 공유 결합에 의해 안정화될 수 있다: 1 및 153, 4 및 133, 5 및 136, 8 및 187, 8 및 203, 9 및 203, 11 및 142, 11 및 201, 12 및 149, 12 및 203, 26 및 191, 29 및 144 또는 30 및 196. 이러한 위치 중 하나 이상에 있는 CsgF 및/또는 CsgG의 잔기는 포어에서 CsgG와 CsgF 사이의 상호작용을 향상시키기 위해 변형될 수 있다.For example, the interaction between a CsgF peptide and a CsgG pore in the pore complex can be stabilized by hydrophobic interactions, electrostatic interactions or covalent bonds at positions corresponding to, for example, one or more of the following position pairs of SEQ ID NO: 60 and SEQ ID NO: 59: 1 and 153, 4 and 133, 5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201, 12 and 149, 12 and 203, 26 and 191, 29 and 144 or 30 and 196. Residues of CsgF and/or CsgG at one or more of these positions can be modified to enhance the interaction between CsgG and CsgF in the pore.

공유 연결 또는 결합은 예를 들어, 시스테인 연결을 통해 이루어지며, 여기서 시스테인의 술프히드릴 측기는 또 다른 아미노산 잔기 또는 모이어티와 공유적으로 연결되고/되거나 비천연 (광)반응성 아미노산 사이의 상호작용을 통해 연결된다. (광)반응성 아미노산은 단백질 복합체의 가교에 사용될 수 있는 천연 아미노산의 인공 유사체를 지칭하며, 이는 생체내에서 또는 시험관내에서 단백질 및 펩티드 내에 통합될 수 있다. 일반적으로 사용되는 광반응성 아미노산 유사체는 류신 및 메티오닌 및 파라-벤조일-페닐-알라닌 뿐만 아니라 아지도호모알라닌, 호모프로파길글리신, 호모알렐글리신, p-아세틸-Phe, p-아지도-Phe, p-프로파길옥시-Phe 및 p-벤조일-Phe에 대한 광반응성 디아지린 유사체이다(Wang 등 2012; Chin 등 2002). 자외선에 노출되면 활성화되어 광반응성 아미노산 유사체의 몇 옹스트롬 내에 있는 상호작용하는 단백질에 공유 결합한다. Covalent linkages or bonds are formed, for example, via cysteine linkages, wherein the sulfhydryl side group of the cysteine is covalently linked to another amino acid residue or moiety and/or via interactions between non-natural (photo)reactive amino acids. (Photo)reactive amino acids refer to artificial analogues of natural amino acids that can be used for cross-linking protein complexes, which can be incorporated into proteins and peptides in vivo or in vitro . Commonly used photoreactive amino acid analogues are leucine and methionine and para-benzoyl-phenyl-alanine, as well as photoreactive diazirine analogues for azidohomoalanine, homopropargylglycine, homoallelglycine, p-acetyl-Phe, p-azido-Phe, p-propargyloxy-Phe and p-benzoyl-Phe (Wang et al. 2012; Chin et al. 2002). When exposed to ultraviolet light, it becomes activated and covalently binds to interacting proteins within a few angstroms of the photoreactive amino acid analogue.

산화제(예를 들어, 구리-오르토페난트롤린)를 사용하여 포어 복합체를 만들고 이황화 결합 형성을 유도할 수 있다. 다른 상호작용(예를 들어, 소수성 상호작용, 전하-전하 상호작용/정전기 상호작용)도 시스테인 상호작용 대신 해당 위치에서 사용될 수 있다. 다른 실시형태에서, 비천연 아미노산도 이러한 위치에 포함될 수 있다. 이러한 실시형태에서, 공유 결합은 클릭 화학을 통해 만들어진다. 예를 들어, 아지드 또는 알킨 또는 디벤조사이클로옥틴(DBCO) 그룹 및/또는 비시클로[6.1.0]노닌(BCN) 그룹을 갖는 비천연 아미노산이 이러한 위치 중 하나 이상에 도입될 수 있다. An oxidizing agent (e.g., copper-orthophenanthroline) can be used to form a pore complex and induce disulfide bond formation. Other interactions (e.g., hydrophobic interactions, charge-charge interactions/electrostatic interactions) can also be used at this position in place of the cysteine interaction. In other embodiments, a non-natural amino acid can also be incorporated at this position. In such embodiments, the covalent bond is formed via click chemistry. For example, a non-natural amino acid having an azide or alkyne or dibenzocyclooctyne (DBCO) group and/or a bicyclo[6.1.0]nonyne (BCN) group can be introduced at one or more of these positions.

예를 들어, CsgG 포어는 보조 단백질 또는 융합 단백질에 대한 부착을 촉진하도록 변형된 적어도 하나, 예컨대, 2, 3, 4, 5, 6, 7, 8, 9 또는 10개의 CsgG 단량체를 포함할 수 있다. 예를 들어, 시스테인 잔기는 서열 번호: 59의 위치 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 및 209에 상응하는 위치 중 하나 이상 및/또는 보조 단백질 또는 융합 단백질과 접촉할 것으로 예측되는 임의의 위치에 도입되어 보조 단백질 또는 융합 단백질에 대한 공유 부착을 촉진할 수 있다. 시스테인 잔기를 통한 공유 부착에 대한 대안 또는 추가로서, 소수성 상호작용 또는 정전기 상호작용에 의해 포어가 안정화될 수 있다. 이러한 상호작용을 촉진하기 위해, 서열 번호: 59의 위치 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 및 209 중 하나 이상에 상응하는 위치에 비천연 반응성 또는 광반응성 아미노산이 위치한다.For example, the CsgG pore can comprise at least one, e.g., 2, 3, 4, 5, 6, 7, 8, 9 or 10 CsgG monomers modified to facilitate attachment to an accessory protein or fusion protein. For example, a cysteine residue can be introduced at one or more of positions corresponding to positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 59 and/or at any position predicted to contact an accessory protein or fusion protein to facilitate covalent attachment to an accessory protein or fusion protein. As an alternative or addition to covalent attachment via a cysteine residue, the pore may be stabilized by hydrophobic interactions or electrostatic interactions. To facilitate such interactions, a non-naturally reactive or photoreactive amino acid is positioned at a position corresponding to one or more of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 59.

예를 들어, CsgF 펩티드는 CsgG 포어에 대한 부착을 촉진하도록 변형될 수 있다. 예를 들어, 시스테인 잔기는 서열 번호: 60의 위치 1, 4, 5, 8, 9, 11, 12, 26 또는 29에 상응하는 하나 이상의 위치 및/또는 CsgG와 접촉할 것으로 예측되는 임의의 위치에 도입되어 CsgG에 대한 공유 결합 부착을 촉진할 수 있다. 시스테인 잔기를 통한 공유 부착에 대한 대안 또는 추가로서, 소수성 상호작용 또는 정전기 상호작용에 의해 포어가 안정화될 수 있다. 이러한 상호작용을 촉진하기 위해, 서열 번호: 60의 위치 1, 2, 3, 4, 5, 8, 9, 11, 12, 26 또는 29 중 하나 이상에 상응하는 위치에 비천연 반응성 또는 광반응성 아미노산이 위치한다.For example, the CsgF peptide can be modified to facilitate attachment to a CsgG pore. For example, a cysteine residue can be introduced at one or more positions corresponding to positions 1, 4, 5, 8, 9, 11, 12, 26 or 29 of SEQ ID NO: 60 and/or at any position predicted to contact CsgG to facilitate covalent attachment to CsgG. As an alternative to or in addition to covalent attachment via a cysteine residue, the pore can be stabilized by hydrophobic interactions or electrostatic interactions. To facilitate such interactions, a non-naturally reactive or photoreactive amino acid is positioned at a position corresponding to one or more of positions 1, 2, 3, 4, 5, 8, 9, 11, 12, 26 or 29 of SEQ ID NO: 60.

이러한 안정화 돌연변이는 보조 단백질 또는 융합 단백질에 대한 다른 모든 변형, 예를 들어, 폴리뉴클레오티드와 포어 복합체의 상호작용을 개선하거나 복합체의 특정 특성(예를 들어, 폴리뉴클레오티드의 뉴클레오티드와 같은 중합체 단위의 판별)을 개선하기 위한 변형과 결합될 수 있다.These stabilizing mutations may be combined with any other modifications to the accessory protein or fusion protein, for example, modifications that improve the interaction of the polynucleotide with the pore complex or that improve certain properties of the complex (for example, the discrimination of polymeric units such as nucleotides in the polynucleotide).

일부 실시형태에서, 나노포어는 단리되거나 실질적으로 단리되거나 정제되거나 실질적으로 정제될 수 있다. 포어는 임의의 다른 구성요소, 예컨대, 지질 또는 다른 포어가 완전히 없는 경우 단리되거나 정제된다. 포어가 의도된 용도를 방해하지 않을 담체 또는 희석제와 혼합된 경우, 이는 실질적으로 단리된다. 예를 들어, 포어가 10% 미만, 5% 미만, 2% 미만 또는 1% 미만의 다른 구성요소, 예컨대, 블록 공중합체, 지질 또는 다른 포어를 포함하는 형태로 존재하는 경우, 이는 실질적으로 단리되거나 실질적으로 정제된다. 대안적으로, 포어는 막에 존재할 수 있다. 적합한 막은 아래에서 논의된다.In some embodiments, the nanopores can be isolated, substantially isolated, purified, or substantially purified. The pores are isolated or purified when they are completely free of any other components, such as lipids or other pores. The pores are substantially isolated when they are mixed with a carrier or diluent that does not interfere with the intended use. For example, the pores are substantially isolated or substantially purified when they are present in a form that includes less than 10%, less than 5%, less than 2%, or less than 1% of other components, such as block copolymers, lipids, or other pores. Alternatively, the pores can be present in a membrane. Suitable membranes are discussed below.

포어 복합체는 개별 또는 단일 포어로 막에 존재할 수 있다. 대안적으로, 포어 복합체는 2개 이상의 포어의 동종 또는 이종 집단으로 존재할 수 있다.Pore complexes may exist in the membrane as individual or single pores. Alternatively, pore complexes may exist as homogeneous or heterogeneous populations of two or more pores.

보조 단백질 또는 융합체는 막횡단 단백질 나노포어에 직접 부착될 수 있거나 두 단백질(예를 들어, 제1 보조 단백질과 제2 보조 단백질; 융합 단백질의 제1 부분과 융합 단백질의 제2 부분 등)이 화학적 가교제 또는 펩티드 링커 같은 링커를 사용하여 부착될 수 있다.The accessory protein or fusion may be attached directly to the transmembrane protein nanopore, or two proteins (e.g., a first accessory protein and a second accessory protein; a first portion of a fusion protein and a second portion of a fusion protein, etc.) may be attached using a linker, such as a chemical cross-linker or a peptide linker.

적합한 화학적 가교제는 당업계에 잘 알려져 있다. 가교제의 예는 2,5-디옥소피롤리딘-1-일 3-(피리딘-2-일디술파닐)프로파노에이트, 2,5-디옥소피롤리딘-1-일 4-(피리딘-2-일디술파닐)부타노에이트 및 2,5-디옥소피롤리딘-1-일 8-(피리딘-2-일디술파닐)옥타나노에이트를 포함하나 이에 제한되지 않는다. 일부 실시형태에서, 가교제는 숙신이미딜 3-(2-피리딜디티오)프로피오네이트(SPDP)이다. 전형적으로, 분자/가교제 복합체가 돌연변이체 단량체에 공유적으로 부착되기 전에 분자는 이기능적 가교제에 공유적으로 부착되지만, 이기능적 가교제/단량체 복합체가 분자에 부착되기 전에 이기능적 가교제를 단량체에 공유적으로 부착시키는 것이 또한 가능하다. 일부 실시형태에서, 링커는 디티오트레이톨(DTT)에 저항성이다. 추가적인 적합한 링커는 요오도아세트아미드-기반 및 말레이미드-기반 링커를 포함하나 이에 제한되지 않는다.Suitable chemical cross-linkers are well known in the art. Examples of cross-linkers include, but are not limited to, 2,5-dioxopyrrolidin-1-yl 3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-1-yl 4-(pyridin-2-yldisulfanyl)butanoate, and 2,5-dioxopyrrolidin-1-yl 8-(pyridin-2-yldisulfanyl)octanoate. In some embodiments, the cross-linker is succinimidyl 3-(2-pyridyldithio)propionate (SPDP). Typically, the molecule is covalently attached to the bifunctional cross-linker before the molecule/cross-linker complex is covalently attached to the mutant monomer, but it is also possible to covalently attach the bifunctional cross-linker to the monomer before the bifunctional cross-linker/monomer complex is attached to the molecule. In some embodiments, the linker is resistant to dithiothreitol (DTT). Additional suitable linkers include, but are not limited to, iodoacetamide-based and maleimide-based linkers.

펩티드 링커와 같은 적합한 아미노산 링커는 당해 기술분야에서 공지되어 있다. 아미노산 또는 펩티드 링커의 길이, 유연성 및 친수성은 전형적으로 보조 단백질 또는 융합 단백질이 포어 복합체에서 협착부를 형성하도록 설계된다. 바람직한 가요성 펩티드 링커는 2 내지 20 개, 예컨대, 4, 6, 8, 10 또는 16 개의 세린 및/또는 글리신 아미노산의 스트레치이다. 보다 바람직한 가요성 링커는 (SG)₁, (SG)₂,(SG)₃, (SG)₄, (SG)₅, (SG)₈, (SG)₁₀, (SG)₁₅ 또는 (SG)₂₀을 포함하며, 여기서 S는 세린이며 G는 글리신이다. 바람직한 강성 링커는 2 내지 30 개, 예컨대, 4, 6, 8, 16 또는 24 개의 프롤린 아미노산의 스트레치이다. 보다 바람직한 강성 링커는 (P)₁₂를 포함하며, 여기서 P는 프롤린이다. Suitable amino acid linkers, such as peptide linkers, are known in the art. The length, flexibility and hydrophilicity of the amino acid or peptide linker are typically designed to allow the accessory protein or fusion protein to form a constriction in the pore complex. Preferred flexible peptide linkers are stretches of 2 to 20, e.g., 4, 6, 8, 10 or 16 serine and/or glycine amino acids. More preferred flexible linkers are (SG) ₁ , (SG) ₂ ,(SG) ₃ , (SG) ₄ , (SG) ₅ , (SG) ₈ , (SG) ₁₀ , (SG) ₁₅ or (SG) ₂₀ , wherein S is serine and G is glycine. Preferred rigid linkers are stretches of 2 to 30, such as 4, 6, 8, 16 or 24, proline amino acids. A more preferred rigid linker comprises (P) ₁₂ , wherein P is proline.

적합한 화학적 가교제는 다음과 같은 작용기 그룹을 포함하는 것을 포함하나 이에 제한되지 않는다: 말레이미드, 활성 에스테르, 숙신이미드, 아지드, 알킨(예컨대, 디벤조시클로옥티놀(DIBO 또는 DBCO), 디플루오로 시클로알킨 및 선형 알킨), 포스핀(예컨대, 흔적이 없거나 흔적이 있는 슈타우딩거 결찰에 사용되는 것), 할로아세틸(예컨대, 요오드아세트아미드), 포스겐 유형 시약, 술포닐 클로라이드 시약, 이소티오시아네이트, 아실 할라이드, 히드라진, 디술피드, 비닐 술폰, 아지리딘 및 광반응성 시약(예컨대, 아릴 아지드, 디아지리딘). Suitable chemical cross-linkers include, but are not limited to, those containing the following functional groups: maleimides, activated esters, succinimides, azides, alkynes (e.g., dibenzocyclooctynols (DIBO or DBCO), difluoro cycloalkynes, and linear alkynes), phosphines (e.g., those used in trace-free and trace Staudinger ligations), haloacetyls (e.g., iodoacetamide), phosgene type reagents, sulfonyl chloride reagents, isothiocyanates, acyl halides, hydrazines, disulfides, vinyl sulfones, aziridines, and photoreactive reagents (e.g., aryl azides, diaziridines).

아미노산과 작용기 사이의 반응은 시스테인/말레이미드와 같이 자발적일 수도 있고 아지드와 선형 알킨을 연결하기 위해 Cu(I)와 같은 외부 시약이 필요할 수도 있다.Reactions between amino acids and functional groups can be spontaneous, such as cysteine/maleimide, or may require external reagents, such as Cu(I) to link azides and linear alkynes.

링커는 필요한 거리에 걸쳐 늘어나는 임의의 분자를 포함할 수 있다. 링커의 길이는 탄소 1개(포스겐 유형 링커)부터 수 옹스트롬까지 다양하다. 링커 분자의 예는 폴리에틸렌글리콜(PEG), 폴리펩티드, 다당류, 데옥시리보핵산(DNA), 펩티드 핵산(PNA), 트레오스 핵산(TNA), 글리세롤 핵산(GNA), 포화 및 불포화 탄화수소, 폴리아미드 등을 포함하나 이에 제한되지 않는다. 이러한 링커는 불활성이거나 반응성일 수 있으며, 특히 정의된 위치에서 화학적으로 절단 가능할 수 있거나 그 자체가 형광단 또는 리간드로 변형될 수 있다. 링커는 바람직하게는 CsgG 포어 단량체에 대한 보조 단백질 또는 융합 단백질의 공유 부착 후 디티오트레이톨(DTT)에 저항성을 갖는다. The linker can comprise any molecule that extends over the required distance. The length of the linker can range from one carbon (phosgene type linker) to several angstroms. Examples of linker molecules include, but are not limited to, polyethylene glycol (PEG), polypeptides, polysaccharides, deoxyribonucleic acid (DNA), peptide nucleic acid (PNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), saturated and unsaturated hydrocarbons, polyamides, and the like. Such linkers can be inert or reactive, and in particular can be chemically cleavable at defined positions or can themselves be modified with fluorophores or ligands. The linker is preferably resistant to dithiothreitol (DTT) following covalent attachment of the accessory protein or fusion protein to the CsgG pore monomer.

일부 실시형태에서, 가교제는 다음으로부터 선택된다: 2,5-디옥소피롤리딘-1-일 3-(피리딘-2-일디술파닐)프로파노에이트, 2,5-디옥소피롤리딘-1-일 4-(피리딘-2-일디술파닐)부타노에이트 및 2,5-디옥소피롤리딘-1-일 8-(피리딘-2-일디술파닐)옥타노에이트, 디-말레이미드 PEG 1k, 디-말레이미드 PEG 3.4k, 디-말레이미드 PEG 5k, 디-말레이미드 PEG 10k, 비스(말레이미도)에탄(BMOE), 비스-말레이미도헥산(BMH), 1,4-비스-말레이미도부탄(BMB), 1,4 비스-말레이미딜-2,3-디하이드록시부탄(BMDB), BM[PEO]2 (1,8-비스-말레이미도디에틸렌글리콜), BM[PEO]3(1,11-비스-말레이미도트리에틸렌글리콜), 트리스[2-말레이미도에틸]아민(TMEA), DTME 디티오비스말레이미도에탄, 비스-말레이미드 PEG3, 비스-말레이미드 PEG11, DBCO-말레이미드, DBCO-PEG4-말레이미드, DBCO-PEG4-NH2, DBCO-PEG4-NHS, DBCO-NHS, DBCO-PEG-DBCO 2.8kDa, DBCO-PEG-DBCO 4.0kDa, DBCO-15개 원자-DBCO, DBCO-26개 원자-DBCO, DBCO-35개 원자-DBCO, DBCO-PEG4-S-S-PEG3-비오틴, DBCO-S-S-PEG3-비오틴, DBCO-S-S-PEG11-비오틴, (숙신이미딜 3-(2-피리딜디티오)프로피오네이트(SPDP) 및 말레이미드-PEG(2kDa)-말레이미드(알파,오메가-비스-말레이미도 폴리(에틸렌 글리콜)). 일부 실시형태에서, 가교제는 말레이미드-프로필-SRDFWRS-(1,2-디아미노에탄)-프로필-말레이미드이다.In some embodiments, the crosslinker is selected from: 2,5-dioxopyrrolidin-1-yl 3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-1-yl 4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-1-yl 8-(pyridin-2-yldisulfanyl)octanoate, di-maleimide PEG 1k, di-maleimide PEG 3.4k, di-maleimide PEG 5k, di-maleimide PEG 10k, bis(maleimido)ethane (BMOE), bis-maleimidohexane (BMH), 1,4-bis-maleimidobutane (BMB), 1,4 Bis-maleimidyl-2,3-dihydroxybutane (BMDB), BM[PEO]2 (1,8-bis-maleimidodiethylene glycol), BM[PEO]3 (1,11-bis-maleimidotriethylene glycol), tris[2-maleimidoethyl]amine (TMEA), DTME dithiobismaleimidoethane, bis-maleimide PEG3, bis-maleimide PEG11, DBCO-maleimide, DBCO-PEG4-maleimide, DBCO-PEG4-NH2, DBCO-PEG4-NHS, DBCO-NHS, DBCO-PEG-DBCO 2.8 kDa, DBCO-PEG-DBCO 4.0 kDa, DBCO-15 atoms-DBCO, DBCO-26 atoms-DBCO, DBCO-35 atoms-DBCO, DBCO-PEG4-S-S-PEG3-biotin, DBCO-S-S-PEG3-Biotin, DBCO-S-S-PEG11-Biotin, (succinimidyl 3-(2-pyridyldithio)propionate (SPDP) and maleimide-PEG(2kDa)-maleimide(alpha,omega-bis-maleimido poly(ethylene glycol)). In some embodiments, the cross-linker is maleimide-propyl-SRDFWRS-(1,2-diaminoethane)-propyl-maleimide.

연결된 CsgG 포어 단량체 및 보조 단백질 또는 융합 단백질은 그룹 사이의 공유 결합 형성을 통해 커플링될 수 있다. WO 2010/086602(전문이 본원에 인용방식에 의해 원용됨)에서 개시되는 특정 링커 중 임의의 하나가 사용될 수 있다.The linked CsgG pore monomer and the accessory protein or fusion protein can be coupled via covalent bond formation between the groups. Any one of the specific linkers disclosed in WO 2010/086602 (which is incorporated herein by reference in its entirety) can be used.

링커는 라벨링될 수 있다. 적합한 라벨은 형광 분자(예컨대, Cy3 또는 AlexaFluor®555), 방사성동위원소, 예컨대, ¹²⁵I, ³⁵S, ³²P, 효소, 항체, 항원, 폴리뉴클레오티드 및 리간드, 예컨대, 비오틴을 포함하나 이에 제한되지 않는다. 이러한 라벨을 사용하면 링커의 양을 정량화할 수 있다. 라벨은 비오틴과 같은 절단 가능한 정제 태그일 수도 있으며, 단백질 자체에는 존재하지 않지만 트립신 소화에 의해 방출되는 펩티드와 같이 식별 방법에 나타나는 특정 서열일 수도 있다. The linker may be labeled. Suitable labels include, but are not limited to, fluorescent molecules (e.g., Cy3 or AlexaFluor®555), radioisotopes, such as ¹²⁵ I, ³⁵ S, ³² P, enzymes, antibodies, antigens, polynucleotides, and ligands, such as biotin. Such labels allow for quantification of the amount of the linker. The label may be a cleavable purification tag, such as biotin, or may be a specific sequence that is not present in the protein itself but appears in the identification method, such as a peptide that is released by trypsin digestion.

포어 단량체 접합체를 연결하는 바람직한 방법은 시스테인 연결을 통한 것이다. 이는 이중 기능성 화학적 가교제 또는 말단에 제시된 시스테인 잔기를 갖는 아미노산 링커에 의해 매개될 수 있다. A preferred method of linking the pore monomer conjugates is via a cysteine linkage. This can be mediated by a bifunctional chemical cross-linker or an amino acid linker having cysteine residues presented at the terminal end.

다른 바람직한 부착 방법은 4-아지도페닐알라닌(Faz) 연결을 통한 것이다. 이는 이중기능성 화학적 링커 또는 말단에 제시된 Faz 잔기를 갖는 폴리펩티드 링커에 의해 매개될 수 있다.Another preferred method of attachment is via a 4-azidophenylalanine (Faz) linkage. This can be mediated by a bifunctional chemical linker or a polypeptide linker having Faz residues presented at the termini.

일부 실시형태에서, 링커는 불화 황(VI) 교환(SuFEx) 반응에 의해 형성되는 결합이다. 일부 실시형태에서, 보조 단백질(예를 들어, CsgF 또는 CsgF의 부분)은 적절한 근접성에 있을 경우 친핵성 아미노산(예를 들어, CsgG 포어 단량체의 친핵성 아미노산, 다른 보조 단백질의 친핵성 산 등)과 반응하여 불화 술포닐 결합(SuFEX)을 형성할 수 있는 술포닐 플루오라이드 그룹으로 기능화될 수 있다.In some embodiments, the linker is a bond formed by a sulfur(VI) fluoride exchange (SuFEx) reaction. In some embodiments, the accessory protein (e.g., CsgF or a portion of CsgF) can be functionalized with a sulfonyl fluoride group that can react with a nucleophilic amino acid (e.g., a nucleophilic amino acid of a CsgG pore monomer, a nucleophilic acid of another accessory protein, etc.) when in suitable proximity to form a sulfonyl fluoride bond (SuFEX).

보조 단백질 또는 융합 단백질은 막횡단 단백질 나노포어에 유전적으로 융합될 수 있다. 포어 단량체와 보조 단백질(또는 융합 단백질)은 전체 작제물이 단일 폴리뉴클레오티드 코딩 서열로부터 발현되는 경우 유전적으로 융합된다. 단량체 또는 서브유닛, 보조 단백질(또는 융합 단백질)은 막횡단 단백질 나노포어의 단량체 또는 서브유닛에 직접 융합될 수 있다. 대안적으로, 단량체 또는 서브유닛, 보조 단백질(또는 융합 단백질)은 하나 이상의 링커를 통해 막횡단 단백질 나노포어의 단량체 또는 서브유닛에 융합될 수 있다. An accessory protein or fusion protein can be genetically fused to the transmembrane protein nanopore. The pore monomer and the accessory protein (or fusion protein) are genetically fused when the entire construct is expressed from a single polynucleotide coding sequence. The monomer or subunit, the accessory protein (or fusion protein) can be fused directly to a monomer or subunit of the transmembrane protein nanopore. Alternatively, the monomer or subunit, the accessory protein (or fusion protein) can be fused to a monomer or subunit of the transmembrane protein nanopore via one or more linkers.

CsgG 포어 단량체 접합체에 있는 CsgG 포어 단량체와 보조 단백질 또는 융합 단백질 사이의 거리 및/또는 링커의 길이는 바람직하게는 약 2.00 nm 미만, 예컨대 약 1.90 nm 미만, 약 1.80 nm 미만, 약 1.70 nm 미만, 약 1.60 nm 미만, 약 1.50 nm 미만, 약 1.40 nm 미만, 약 1.30 nm 미만, 약 1.20 nm 미만, 약 1.10 nm 미만, 약 1.00 nm 미만, 약 0.90 nm 미만, 약 0.80 nm 미만, 약 0.70 nm 미만, 약 0.60 nm 미만, 약 0.50 nm 미만 또는 약 0.40 nm 미만이다. 포어 단량체 접합체에 있는 CsgG 포어 단량체와 보조 단백질 또는 융합 단백질 사이의 거리 및/또는 링커의 길이는 바람직하게는 약 1.20 nm 미만이다. 이러한 거리/길이는 아래에서 더 상세히 논의되는 바와 같이 말레이미도헥사논산을 사용하여 달성될 수 있다. 포어 단량체 접합체에 있는 CsgG 포어 단량체와 보조 단백질 또는 융합 단백질 사이의 거리 및/또는 링커의 길이는 바람직하게는 약 0.8 nm 미만이다. 이 거리/길이는 아래에 논의된 대로 말레이미도프로피온산을 사용하여 달성될 수 있다.The distance between the CsgG pore monomer and the accessory protein or fusion protein in the CsgG pore monomer conjugate and/or the length of the linker is preferably less than about 2.00 nm, such as less than about 1.90 nm, less than about 1.80 nm, less than about 1.70 nm, less than about 1.60 nm, less than about 1.50 nm, less than about 1.40 nm, less than about 1.30 nm, less than about 1.20 nm, less than about 1.10 nm, less than about 1.00 nm, less than about 0.90 nm, less than about 0.80 nm, less than about 0.70 nm, less than about 0.60 nm, less than about 0.50 nm or less than about 0.40 nm. The distance between the CsgG pore monomer and the accessory protein or fusion protein in the pore monomer conjugate and/or the length of the linker is preferably less than about 1.20 nm. This distance/length can be achieved using maleimidohexanoic acid as discussed in more detail below. The distance and/or length of the linker between the CsgG pore monomer and the accessory protein or fusion protein in the pore monomer conjugate is preferably less than about 0.8 nm. This distance/length can be achieved using maleimidopropionic acid as discussed below.

단량체 접합체에 있는 CsgG 포어 단량체와 포어 보조 단백질 또는 융합 단백질 사이의 거리 및/또는 링커의 길이는 바람직하게는 약 0.40 nm 내지 약 2.0 nm, 예를 들어 약 0.45 nm 내지 약 1.90 nm, 약 0.50 nm 내지 약 1.80 nm, 약 0.55 nm 내지 약 1.7 nm, 약 0.60 nm 내지 약 1.6 nm, 약 0.65 nm 내지 약 1.5 nm, 약 0.7 nm 내지 약 1.4 nm, 약 0.75 nm 내지 약 1.3 nm, 약 0.80 nm 내지 약 1.2 nm, 약 0.85 nm 내지 약 1.1 nm 및 약 0.90 nm 내지 약 1.00 nm이다. 포어 단량체 접합체에 있는 CsgG 포어 단량체와 보조 단백질 또는 융합 단백질 사이의 거리 및/또는 링커의 길이는 바람직하게는 약 0.50 nm 내지 약 1.50 nm이다. 포어 단량체 접합체에 있는 CsgG 포어 단량체와 보조 단백질 또는 융합 단백질 사이의 거리 및/또는 링커의 길이는 바람직하게는 약 0.60 nm 내지 약 1.2 nm이다. 이러한 거리/길이는 아래에 논의된 특정 말레이미드 함유 링커를 사용하여 달성될 수 있다.The distance between the CsgG pore monomer and the pore accessory protein or fusion protein in the monomer conjugate and/or the length of the linker is preferably about 0.40 nm to about 2.0 nm, for example about 0.45 nm to about 1.90 nm, about 0.50 nm to about 1.80 nm, about 0.55 nm to about 1.7 nm, about 0.60 nm to about 1.6 nm, about 0.65 nm to about 1.5 nm, about 0.7 nm to about 1.4 nm, about 0.75 nm to about 1.3 nm, about 0.80 nm to about 1.2 nm, about 0.85 nm to about 1.1 nm and about 0.90 nm to about 1.00 nm. The distance between the CsgG pore monomer and the accessory protein or fusion protein in the pore monomer conjugate and/or the length of the linker is preferably from about 0.50 nm to about 1.50 nm. The distance between the CsgG pore monomer and the accessory protein or fusion protein in the pore monomer conjugate and/or the length of the linker is preferably from about 0.60 nm to about 1.2 nm. Such distances/lengths can be achieved using certain maleimide-containing linkers discussed below.

말레이미드 함유 링커는 본원에 기재된 작제물과 관련하여 아래에 논의된 링커 중 임의의 하나일 수 있다. 말레이미드 함유 링커는 바람직하게는 말레이미드 그룹과 2, 3, 4, 5, 6개 또는 그 이상의 탄소 원자로 구성된 선형 탄소 사슬을 포함하거나 이로 구성된다. 선형 탄소 사슬은 전형적으로 말레이미드 그룹의 질소 원자에 부착된다. 선형 탄소 사슬은 또한 바람직하게는 말단 카르복실 그룹을 포함한다. 이러한 카르복실 그룹은 보조 단백질 또는 융합 단백질의 아미노산과 아미드 결합을 형성할 수 있다. 링커는 바람직하게는 말레이미도아세트산, 말레이미도프로피온산, 말레이미도부티르산, 말레이미도펜탄산 또는 말레이미도헥사논산이다. 링커는 가장 바람직하게는 말레이미도프로피온산이다. 이러한 링커는 도 15에서 나타나 있다.The maleimide-containing linker can be any one of the linkers discussed below in connection with the constructs described herein. The maleimide-containing linker preferably comprises or consists of a linear carbon chain consisting of a maleimide group and 2, 3, 4, 5, 6 or more carbon atoms. The linear carbon chain is typically attached to the nitrogen atom of the maleimide group. The linear carbon chain also preferably comprises a terminal carboxyl group. This carboxyl group is capable of forming an amide bond with an amino acid of the accessory protein or fusion protein. The linker is preferably maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanoic acid. The linker is most preferably maleimidopropionic acid. Such linkers are illustrated in FIG. 15 .

본 개시내용은 또한 보조 단백질 또는 융합 단백질에 공유 부착된 CsgG 포어 단량체를 포함하는 포어 단량체 접합체를 제공하며, 여기서 보조 단백질 또는 융합 단백질은 티올 반응기를 포함하는 링커에 의해 CsgG 포어 단량체의 시스테인 잔기에 공유 부착된다. 티올 반응기는 말레이미드 그룹, 피리딜디티오 그룹, 할로게노 그룹, 파라플루오로 그룹, 엔 그룹, 인 그룹, 비닐술폰 그룹 또는 티오설폰 그룹일 수 있다. 이러한 그룹은 도 16에서 나타나 있다. 티올 반응기를 포함하는 링커는 본 개시내용의 작제물과 관련하여 아래에서 논의되는 링커 중 하나일 수 있다. 링커는 바람직하게는 티올 반응기와 2, 3, 4, 5, 6개 또는 그 이상으로 구성되는 탄소 원자의 선형 탄소 사슬을 포함하거나 이로 구성된다. 선형 탄소 사슬은 또한 바람직하게는 말단 카르복실 그룹을 포함한다. 이러한 카르복실 그룹은 보조 단백질 또는 융합 단백질의 아미노산과 아미드 결합을 형성할 수 있다. 링커는 위에서 논의한 특정 말레이미드 함유 링커 중 하나일 수 있으며 말레이미드는 상이한 티올 반응기로 대체된다. 티올 반응기를 함유한 링커는 위에서 논의된 길이 중 임의의 하나일 수 있다.The present disclosure also provides a pore monomer conjugate comprising a CsgG pore monomer covalently attached to an accessory protein or fusion protein, wherein the accessory protein or fusion protein is covalently attached to a cysteine residue of the CsgG pore monomer by a linker comprising a thiol reactive group. The thiol reactive group can be a maleimide group, a pyridyldithio group, a halogeno group, a parafluoro group, an ene group, a phosphorus group, a vinylsulfone group or a thiosulfone group. Such groups are illustrated in FIG. 16 . The linker comprising a thiol reactive group can be any of the linkers discussed below in connection with the constructs of the present disclosure. The linker preferably comprises or consists of a linear carbon chain of 2, 3, 4, 5, 6 or more carbon atoms with the thiol reactive group. The linear carbon chain preferably also comprises a terminal carboxyl group. Such carboxyl group is capable of forming an amide bond with an amino acid of the accessory protein or fusion protein. The linker can be one of the specific maleimide-containing linkers discussed above, wherein the maleimide is replaced with a different thiol reactive group. The linker containing the thiol reactive group can be any one of the lengths discussed above.

적절한 연결 그룹은 기존 모델링 기법을 사용하여 설계될 수 있다. 링커는 전형적으로 단량체 또는 서브유닛이 각각의 단백질 올리고머로 조립되고 공통 대칭 축을 따라 정렬되어 포어 복합체 내에 연속 채널을 생성할 수 있을 만큼 충분히 유연하다.Suitable linkage groups can be designed using existing modeling techniques. The linker is typically flexible enough to allow the monomers or subunits to assemble into individual protein oligomers and align along a common axis of symmetry to create a continuous channel within the pore complex.

보조 단백질의 식별 및 선택Identification and selection of auxiliary proteins

본 개시내용의 양태는 단백질 포어 복합체(예를 들어, CsgG 나노포어를 포함하는 단백질 포어 복합체)에 포함시키기 위한 보조 단백질 및/또는 융합 단백질을 설계 및/또는 선택하는 컴퓨터 기반 방법에 관한 것이다. 일부 실시형태에서, 방법은 단백질 백본 서열 선택 기법을 구현하고 아미노산 서열을 처리하여 출력으로서 백본 아미노산 서열을 생성하는 코드를 포함하는 소프트웨어에 입력으로서 아미노산 서열(예를 들어 CsgF 아미노산 서열)을 제공하는 것을 포함한다. 일부 실시형태에서, 단백질 백본 선택 기법은 MASTER일 수 있다(예를 들어, Zhou 및 Grigoryan, Protein Sci. 2015년 4월; 24(4): 508-524에서 설명되는 바와 같음, 이의 전체 내용은 인용방식에 의해 본원에 원용됨). 일부 실시형태에서, 단백질 백본 선택 기법은 공지된 단백질 백본 구조(예를 들어, Protein Data Bank, PDB에 기재된 바와 같음)에서 하나 이상의 표적 특성(예를 들어, 하나 이상의 나선 영역을 형성하는 능력, 단백질 포어의 하나 이상의 나선 영역으로 패킹하는 능력 등)을 갖는 단백질 백본 구조를 선택하는 단계를 포함한다. 일부 실시형태에서, 백본 구조는 단백질 서열 설계 및 구조적 예측 기법을 구현하고 백본 구조를 처리하여 하나 이상의 신규 설계된 펩티드 서열을 생성하는 코드를 포함하는 소프트웨어에 대한 입력으로서 제공된다. 일부 실시형태에서, 단백질 서열 설계 및 구조 예측 기법은 Rosetta일 수 있다.(예를 들어, Leaver-Fay 등 제19장 - Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules, Methods in Enzymology, Academic Press, 제487권, 2011, 페이지 545-574, doi.org/10.1016/B978-0-12-381270-4.00019-6.에서 기재된 바와 같음, 이의 전체 내용은 인용방식에 의해 본원에 원용됨). 일부 실시형태에서, 신규 설계된 펩티드 서열은 백본 아미노산 서열의 하나 이상의 바람직한 특징과 동일한 하나 이상의 표적 특징을 포함한다.Aspects of the present disclosure relate to computer-based methods for designing and/or selecting accessory proteins and/or fusion proteins for inclusion in a protein pore complex (e.g., a protein pore complex comprising a CsgG nanopore). In some embodiments, the method comprises providing an amino acid sequence (e.g., a CsgF amino acid sequence) as input to software comprising code that implements a protein backbone sequence selection technique and processes the amino acid sequence to generate a backbone amino acid sequence as output. In some embodiments, the protein backbone selection technique can be MASTER (e.g., as described in Zhou and Grigoryan, Protein Sci . Apr. 2015; 24(4): 508-524, the entire contents of which are incorporated herein by reference). In some embodiments, the protein backbone selection technique comprises selecting a protein backbone structure having one or more target properties (e.g., the ability to form one or more helical regions, the ability to pack into one or more helical regions of a protein pore, etc.) from known protein backbone structures (e.g., as described in the Protein Data Bank, PDB). In some embodiments, the backbone structures are provided as input to software comprising code that implements protein sequence design and structural prediction techniques and processes the backbone structures to generate one or more novel designed peptide sequences. In some embodiments, the protein sequence design and structure prediction technique can be Rosetta (see, e.g., Leaver-Fay et al., Chapter 19 - Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules, Methods in Enzymology, Academic Press, Vol. 487, 2011, pp. 545-574, doi.org/10.1016/B978-0-12-381270-4.00019-6., the entire contents of which are incorporated herein by reference). In some embodiments, the de novo designed peptide sequence comprises one or more target features that are identical to one or more desirable features of the backbone amino acid sequence.

나노포어 복합체를 생성하는 방법Method for producing nanopore complexes

보조 단백질 또는 융합 단백질과 막횡단 단백질 나노포어를 포함하는 포어 복합체는, 일 실시형태에서, 공동 발현을 통해 만들어질 수 있다. 일부 실시형태에서, 방법은 적합한 숙주 세포에서 포어 단량체 및 보조 단백질 또는 융합 단백질 둘 모두 또는 보조 단백질 또는 단량체를 발현시키고, 생체내에서 복합체 포어 형성을 허용하는 단계를 포함한다. 이러한 실시형태에서, 하나의 벡터에 있는 포어 단량체를 인코딩하는 적어도 하나의 유전자와 제2 벡터에 있는 보조 단백질 또는 융합 단백질 또는 적어도 하나의 보조 단백질 서브유닛 또는 단량체를 인코딩하는 유전자를 함께 형질전환시켜 단백질을 발현시키고 형질전환된 세포 내에서 복합체를 만들 수 있다. 이는 바람직하게는 생체외에서 또는 시험관내에서 수행된다. 대안적으로, 포어 단량체와 보조 단백질(또는 융합 단백질) 또는 이의 서브유닛을 인코딩하는 두 유전자는 단일 프로모터의 제어하에 또는 동일하거나 상이할 수 있는 2개의 별도 프로모터의 제어하에 하나의 벡터에 배치될 수 있다.A pore complex comprising an accessory protein or fusion protein and a transmembrane protein nanopore can, in one embodiment, be produced by co-expression. In some embodiments, the method comprises expressing in a suitable host cell both a pore monomer and an accessory protein or fusion protein, or an accessory protein or monomer, and allowing formation of a complex pore in vivo . In such embodiments, at least one gene encoding a pore monomer in one vector and a gene encoding an accessory protein or fusion protein or at least one accessory protein subunit or monomer in a second vector are co-transformed to express the proteins and form a complex within the transformed cell. This is preferably done ex vivo or in vitro . Alternatively, the two genes encoding the pore monomer and the accessory protein (or fusion protein) or subunit thereof can be placed in one vector under the control of a single promoter or under the control of two separate promoters, which can be identical or different.

보조 단백질 또는 융합 단백질과 막횡단 단백질 나노포어에 의해 형성된 포어 복합체를 생성하는 다른 방법은 기능성 포어를 수득하기 위해 단백질을 시험관내에서 재구성하는 것이다. 일부 실시형태에서, 본 방법은 복합체 형성을 허용하기 위해 적합한 시스템에서 막횡단 단백질 나노포어의 단량체를 보조 단백질(또는 융합 단백질) 또는 보조 단백질 서브유닛이나 단량체와 접촉시키는 단계를 포함한다. 상기 시스템은 "시험관내 시스템"일 수 있으며, 이는 상기 방법을 실행하는 데 적어도 필요한 구성요소 및 환경을 포함하는 시스템을 지칭하고, 정상적인 자연 발생 환경 외부의 생물학적 분자, 유기체, 세포(또는 세포의 일부)를 사용하므로 전체 유기체에 대해 수행할 수 있는 것보다 더 상세하거나 더 편리하거나 더 효율적인 분석이 가능하다. 시험관내 시스템은 또한 테스트 튜브에 제공된 적합한 완충액 조성물을 포함할 수 있으며, 여기서 복합체를 형성하기 위한 상기 단백질 구성요소가 첨가되었다. 당업자는 상기 시스템을 제공하기 위한 옵션을 알고 있다. Another method for generating a pore complex formed by an accessory protein or fusion protein and a transmembrane protein nanopore is to reconstitute the protein in vitro to obtain a functional pore. In some embodiments, the method comprises contacting a monomer of the transmembrane protein nanopore with an accessory protein (or fusion protein) or an accessory protein subunit or monomer in a suitable system to allow for complex formation. The system may be an " in vitro system," which refers to a system comprising at least the components and environment necessary to perform the method, and which utilizes a biological molecule, organism, cell (or part of a cell) outside of its normal, naturally occurring environment, thereby allowing for more detailed, more convenient, or more efficient analysis than would be possible with a whole organism. The in vitro system may also comprise a suitable buffer composition provided in a test tube, to which the protein component for forming the complex has been added. Those skilled in the art are aware of options for providing such a system.

이러한 실시형태에서, 나노포어는 보조 단백질 또는 융합 단백질과 별도로 단량체를 발현시켜 생성될 수 있다. 포어 단량체 또는 나노포어는 적어도 하나의 포어 단량체를 인코딩하는 벡터로 형질전환된 세포 또는 포어 단량체를 각각 발현하는 두 개 이상의 벡터로 형질 전환된 세포로부터 정제될 수 있다. 보조 단백질 또는 융합 단백질은 적어도 하나의 보조 단백질 또는 융합 단백질을 인코딩하는 벡터로 형질 전환된 세포로부터 정제될 수 있다. 정제된 포어 단량체/나노포어는 이후 보조 단백질 또는 융합 단백질과 함께 배양되어 포어 복합체를 만들 수 있다. In such embodiments, the nanopore can be generated by expressing the monomer separately from the accessory protein or fusion protein. The pore monomer or nanopore can be purified from a cell transformed with a vector encoding at least one pore monomer or from a cell transformed with two or more vectors each expressing a pore monomer. The accessory protein or fusion protein can be purified from a cell transformed with a vector encoding at least one accessory protein or fusion protein. The purified pore monomer/nanopore can then be incubated with the accessory protein or fusion protein to form a pore complex.

다른 실시형태에서, 나노포어 단량체 및/또는 보조 단백질 또는 융합 단백질은 시험관내 번역 및 전사(IVTT)에 의해 별도로 생성된다. 나노포어 단량체는 이후 보조 단백질 또는 융합 단백질과 함께 배양되어 포어 복합체를 만들 수 있다. In another embodiment, the nanopore monomer and/or the accessory protein or fusion protein are produced separately by in vitro translation and transcription (IVTT). The nanopore monomer can then be incubated with the accessory protein or fusion protein to form a pore complex.

상기 실시형태는 조합될 수 있어, 예를 들어 (i) 나노포어는 생체내에서 생성되고 보조 단백질 또는 융합 단백질은 생체내에서 생성되거나; (ii) 나노포어는 시험관내에서 생성되고 보조 단백질 또는 융합 단백질은 생체내에서 되거나, (iii) 나노포어는 생체내에서 생성되고 보조 단백질 또는 융합 단백질은 시험관내에서 생성되거나, 또는 (iv) 나노포어는 시험관내에서 생성되고 보조 단백질 또는 융합 단백질은 시험관내에서 생성된다.The above embodiments can be combined, for example, (i) the nanopore is generated in vivo and the auxiliary protein or fusion protein is generated in vivo ; (ii) the nanopore is generated in vitro and the auxiliary protein or fusion protein is generated in vivo ; (iii) the nanopore is generated in vivo and the auxiliary protein or fusion protein is generated in vitro ; or (iv) the nanopore is generated in vitro and the auxiliary protein or fusion protein is generated in vitro .

나노포어 단량체와 보조 단백질 또는 융합 단백질 중 하나 또는 둘 모두 정제를 촉진하기 위해 태깅될 수 있다. 정제는 나노포어 단량체 및/또는 보조 단백질 또는 융합 단백질이 태깅되지 않은 경우에도 수행될 수 있다. 당해 기술분야에서 공지된 방법(예를 들어, 이온 교환, 겔 여과, 소수성 상호작용 컬럼 크로마토그래피 등)을 단독으로 또는 상이한 조합으로 사용하여 포어 복합체의 구성요소를 정제할 수 있다.Either or both of the nanopore monomer and the accessory protein or fusion protein may be tagged to facilitate purification. Purification may also be performed when the nanopore monomer and/or the accessory protein or fusion protein is not tagged. Components of the pore complex may be purified using methods known in the art, such as ion exchange, gel filtration, hydrophobic interaction column chromatography, and the like, alone or in different combinations.

임의의 공지된 태그는 두 단백질 중 임의의 하나에 사용될 수 있다. 일 실시형태에서, 2개의 태그 정제를 사용하여 포어 복합체를 이의 구성요소 부분으로부터 정제할 수 있다. 예를 들어, Strep 태그는 나노포어에 사용될 수 있고 His 태그는 보조 단백질(또는 융합 단백질)에 사용될 수 있으며 그 반대의 경우도 마찬가지이다. 두 단백질을 개별적으로 정제하고 함께 혼합한 후 다른 Strep 및 His 정제를 수행하면 유사한 최종 결과를 수득할 수 있다.Any known tag may be used on either protein. In one embodiment, a two-tag purification may be used to purify the pore complex from its component parts. For example, a Strep tag may be used on the nanopore and a His tag may be used on the accessory protein (or fusion protein), or vice versa. Similar end results may be obtained by purifying the two proteins separately and mixing them together followed by another Strep and His purification.

포어 복합체는 막에 삽입되기 전 또는 나노포어를 막에 삽입한 후 만들어질 수 있다. 그러나, 나노포어를 막 내로 삽입한 후 보조 단백질(또는 융합 단백질)을 첨가하여 인 시츄(in situ) 포어 복합체를 형성할 수도 있다. 예를 들어, 일 실시형태에서, 막의 트랜스 측 또는 시스 측이 접근 가능한 시스템(예를 들어 전기생리학 측정을 위한 칩 또는 챔버에서)에서, 나노포어가 막에 삽입될 수 있고 이후 보조 단백질(또는 융합 단백질)은 막의 트랜스 측 또는 시스 측으로부터 첨가되어 복합체가 원위치에서 형성될 수 있다. The pore complex can be formed prior to membrane insertion or after the nanopore is inserted into the membrane. However, the pore complex can also be formed in situ by adding an accessory protein (or fusion protein) after the nanopore is inserted into the membrane. For example, in one embodiment, in a system where the trans or cis side of the membrane is accessible (e.g., in a chip or chamber for electrophysiological measurements), the nanopore can be inserted into the membrane and the accessory protein (or fusion protein) can then be added from the trans or cis side of the membrane to form the complex in situ.

일 실시형태에서, 보조 단백질은 프로테아제 절단 가능한 부위(예를 들어 TEV, HRV 3 또는 임의의 다른 프로테아제 절단 가능한 부위)를 포함할 수 있으며 나노포어와 회합하기 전 또는 후 절단될 수 있다. 예를 들어, 전장 보조 단백질(또는 융합 단백질)을 사용하여 포어를 형성할 수 있다. 채널 구성의 일부를 형성하지 않고 막횡단 포어와의 상호작용에 필요하지 않은 아미노산 잔기의 절단은 보조 단백질 또는 융합 단백질에서 절단될 수 있다. 이러한 실시형태에서, 일단 포어 복합체가 형성되면, 보조 단백질 또는 융합 단백질을 절단하기 위해 프로테아제가 사용된다. 대안적으로, 프로테아제는 포어 복합체 조립 이전에 보조 단백질 또는 융합 단백질을 생성하기 위해 사용될 수 있다.In one embodiment, the accessory protein can include a protease cleavable site (e.g., TEV, HRV 3, or any other protease cleavable site) and can be cleaved prior to or after associating with the nanopore. For example, a full-length accessory protein (or a fusion protein) can be used to form the pore. Cleavage of amino acid residues that do not form part of the channel configuration and are not required for interaction with the transmembrane pore can be cleaved from the accessory protein or fusion protein. In such embodiments, once the pore complex has been formed, a protease is used to cleave the accessory protein or fusion protein. Alternatively, the protease can be used to generate the accessory protein or fusion protein prior to pore complex assembly.

일부 프로테아제 부위는 절단 후 추가 태그(또는 태그의 부분, 예를 들어 태그의 하나 이상의 아미노산)를 남긴다. 예를 들어, TEV 프로테아제 절단 서열은 ENLYFQS이다. TEV 프로테아제는 Q와 S 사이의 단백질을 절단하여 CsgF 펩티드의 C 말단에 ENLYFQ를 그대로 유지한다. 또 다른 예로서, HRV C3 절단 부위는 LEVLFQGP이고, 효소는 Q와 G 사이를 절단하여 LEVLFQ를 CsgF 펩티드의 C 말단에 그대로 유지한다.Some protease sites leave an additional tag (or part of a tag, e.g., one or more amino acids of the tag) after cleavage. For example, the TEV protease cleavage sequence is ENLYFQS. TEV protease cleaves the protein between Q and S, leaving ENLYFQ intact at the C-terminus of the CsgF peptide. As another example, the HRV C3 cleavage site is LEVLFQGP, and the enzyme cleaves between Q and G, leaving LEVLFQ intact at the C-terminus of the CsgF peptide.

단백질은 단량체를 포함하는 포어와 표적 뉴클레오티드 또는 표적 폴리뉴클레오티드 서열 사이의 상호작용을 촉진하는 분자 어댑터로 화학적으로 변형될 수 있다. 시클릭 분자, 시클로덱스트린, 혼성화할 수 있는 종, DNA 결합제 또는 인터킬레이터, 펩티드 또는 펩티드 유사체, 합성 중합체, 방향족 평면 분자, 양으로 하전된 작은 분자 또는 수소-결합이 가능한 작은 분자를 포함하는 적합한 어댑터는 WO 2019/002893(전문이 본원에 인용방식에 의해 원용됨)에 기재되어 있다. 분자 어댑터는 위에서 논의된 임의의 방법 및 링커를 사용하여 부착될 수 있다.The protein may be chemically modified with a molecular adaptor that facilitates interaction between the pore containing the monomer and the target nucleotide or target polynucleotide sequence. Suitable adaptors, including cyclic molecules, cyclodextrins, hybridizable species, DNA binding agents or intercalators, peptides or peptide analogs, synthetic polymers, aromatic planar molecules, positively charged small molecules or small molecules capable of hydrogen bonding, are described in WO 2019/002893 (which is incorporated herein by reference in its entirety). The molecular adaptor may be attached using any of the methods and linkers discussed above.

단백질은 폴리뉴클레오티드 결합 단백질에 부착될 수 있다. 이는 모듈식 시퀀싱 시스템을 형성한다. 폴리뉴클레오티드 결합 단백질은 아래에서 논의된다. 단백질은 당업계에 알려진 임의의 방법을 사용하여 단량체에 공유적으로 부착될 수 있다. 단량체 및 단백질은 화학적으로 융합되거나 유전적으로 융합될 수 있다. 폴리뉴클레오티드 결합 단백질에 대한 단량체의 유전적 융합은 WO 2010/004265(전문이 본원에 인용방식에 의해 원용됨)에 논의되어 있다. 폴리뉴클레오티드 결합 단백질은 위에 기재된 임의의 방법을 사용하여 시스테인 연결을 통해 부착될 수 있다. The protein can be attached to the polynucleotide binding protein. This forms a modular sequencing system. The polynucleotide binding protein is discussed below. The protein can be covalently attached to the monomer using any method known in the art. The monomer and protein can be chemically fused or genetically fused. Genetic fusion of a monomer to a polynucleotide binding protein is discussed in WO 2010/004265 (which is incorporated herein by reference in its entirety). The polynucleotide binding protein can be attached via a cysteine linkage using any of the methods described above.

폴리뉴클레오티드 결합 단백질은 단백질에 직접적으로 하나 이상의 링커를 통해 부착될 수 있다. 분자는 WO 2010/086602(전문이 인용방식에 의해 본원에 원용됨)에 기재된 혼성화 링커를 사용하여 CsgG 포어 단량체에 부착될 수 있다. 대안적으로, 펩티드 링커가 사용될 수 있다. 적합한 펩티드 링커는 위에서 논의되어 있다.The polynucleotide binding protein can be attached directly to the protein via one or more linkers. The molecule can be attached to the CsgG pore monomer using a hybridization linker as described in WO 2010/086602 (which is incorporated herein by reference in its entirety). Alternatively, a peptide linker can be used. Suitable peptide linkers are discussed above.

임의의 단백질은 당해 기술분야에서 공지된 표준 방법을 사용하여 생성될 수 있다. 단백질을 암호화하는 폴리뉴클레오티드 서열은 당업계의 표준 방법을 사용하여 유래되고 복제될 수 있다. 단백질을 암호화하는 폴리뉴클레오티드 서열은 당업계의 표준 기술을 사용하여 박테리아 숙주 세포에서 발현될 수 있다. 단백질은 재조합 발현 벡터로부터 폴리펩티드의 인 시츄 발현에 의해 세포에서 생성될 수 있다. 발현 벡터는 임의로 폴리펩티드의 발현을 제어하기 위해 유도성 프로모터를 운반한다. 이러한 방법은 Sambrook, J. 및 Russell, D. (2001). Molecular Cloning: A Laboratory Manual, 제3판. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY에 기재되어 있다.Any protein can be produced using standard methods known in the art. A polynucleotide sequence encoding the protein can be derived and cloned using standard methods known in the art. A polynucleotide sequence encoding the protein can be expressed in a bacterial host cell using standard techniques known in the art. The protein can be produced in the cell by in situ expression of the polypeptide from a recombinant expression vector. The expression vector optionally carries an inducible promoter to control expression of the polypeptide. Such methods are described in Sambrook, J. and Russell, D. (2001). Molecular Cloning: A Laboratory Manual, 3rd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

단백질은 단백질 생산 유기체로부터의 임의의 단백질 액체 크로마토그래피 시스템에 의한 정제 후 또는 재조합 발현 후에 대규모로 생산될 수 있다. 전형적인 단백질 액체 크로마토그래피 시스템은 FPLC, AKTA 시스템, Bio-Cad 시스템, Bio-Rad BioLogic 시스템, 및 Gilson HPLC 시스템을 포함한다. Proteins can be produced in large quantities after purification by any protein liquid chromatography system from a protein producing organism or after recombinant expression. Typical protein liquid chromatography systems include the FPLC, AKTA system, Bio-Cad system, Bio-Rad BioLogic system, and Gilson HPLC system.

시스템System

다른 양태에서, 본 개시내용은 표적 폴리뉴클레오티드를 특성규명하기 위한 시스템에 관한 것이며, 이 시스템은 막 및 포어 복합체를 포함하며; 여기서 포어 복합체는 (i) 막에 위치하는 나노포어 및 (ii) 나노포어에 부착된 보조 단백질 또는 융합 단백질을 포함하며; 여기서 나노포어 및 보조 단백질 또는 융합 단백질은 함께 막을 가로질러 연속적인 채널을 형성하며, 이 채널은 제1 협착부 영역 및 제2 협착부 영역을 포함한다. In another aspect, the present disclosure relates to a system for characterizing a target polynucleotide, the system comprising a membrane and a pore complex; wherein the pore complex comprises (i) a nanopore positioned in the membrane and (ii) an accessory protein or fusion protein attached to the nanopore; wherein the nanopore and the accessory protein or fusion protein together form a continuous channel across the membrane, the channel comprising a first constriction region and a second constriction region.

포어 복합체, 나노포어 및 보조 단백질 또는 융합 단백질은 위에 본원에 기재된 임의의 것일 수 있다.The pore complex, nanopore and accessory protein or fusion protein may be any of those described herein above.

일 실시형태에서, 시스템은 제1 챔버 및 제2 챔버를 추가로 포함하며, 여기서 제1 챔버 및 제2 챔버는 막에 의해 분리되어 있다. 표적 폴리뉴클레오티드를 특성규명하기 위해 사용되는 경우, 시스템은 표적 폴리뉴클레오티드를 추가로 포함할 수 있으며, 여기서 표적 폴리뉴클레오티드는 연속 채널 내에 일시적으로 위치하고, 여기서 표적 폴리뉴클레오티드의 하나의 단부가 제1 챔버에 위치하고, 표적 폴리뉴클레오티드의 하나의 단부가 제2 챔버에 위치한다. In one embodiment, the system further comprises a first chamber and a second chamber, wherein the first chamber and the second chamber are separated by a membrane. When used to characterize a target polynucleotide, the system can further comprise a target polynucleotide, wherein the target polynucleotide is transiently positioned within the continuous channel, wherein one end of the target polynucleotide is positioned in the first chamber and one end of the target polynucleotide is positioned in the second chamber.

일부 실시형태에서, 시스템은 나노포어와 접촉하는 전기 전도성 용액, 막을 가로질러 전압 전위를 제공하는 전극 및 나노포어를 통과하는 전류를 측정하기 위한 측정 시스템을 추가로 포함한다. 일 실시형태에서, 막 및 포어 복합체를 가로질러 인가되는 전압은 +5 V 내지 -5 V, 예컨대, -600 mV 내지 +600 mV 또는 -400 mV 내지 +400 mV이다. 사용되는 전압은 바람직하게는 100 mV 내지 240 mV 범위, 보다 바람직하게는 120 mV 내지 220 mV 범위이다. 증가된 인가된 전위를 사용함으로써 포어에 의해 상이한 뉴클레오티드 간의 판별력을 증가시키는 것이 가능하다. 임의의 적합한 전기 전도성 용액이 사용될 수 있다. 예를 들어, 용액은 전하 담체, 예컨대, 금속 염, 예를 들어, 알칼리 금속 염, 할라이드 염, 예를 들어, 클로라이드 염, 예컨대, 알칼리 금속 클로라이드 염을 포함할 수 있다. 전하 담체는 이온성 액체 또는 유기 염, 예를 들어, 테트라메틸 암모늄 클로라이드, 트리메틸페닐 암모늄 클로라이드, 페닐트리메틸 암모늄 클로라이드 또는 1-에틸-3-메틸 이미다졸륨 클로라이드를 포함할 수 있다. 예시적인 시스템에서, 염은 챔버 내 수용액에 존재한다. 포타슘 클로라이드 (KCl), 소듐 클로라이드 (NaCl), 세슘 클로라이드 (CsCl), 또는 포타슘 페로시아나이드 및 포타슘 페리시아나이드의 혼합물이 전형적으로 사용된다. KCl, NaCl, 및 포타슘 페로시아나이드 및 포타슘 페리시아나이드의 혼합물이 바람직하다. 전하 담체는 막을 가로질러 비대칭일 수 있다. 예를 들어, 전하 담체의 유형 및/또는 농도는 막의 각각의 측면, 예컨대, 각각의 챔버에서 상이할 수 있다.In some embodiments, the system further comprises an electrically conductive solution in contact with the nanopore, an electrode providing a voltage potential across the membrane, and a measurement system for measuring the current passing through the nanopore. In one embodiment, the voltage applied across the membrane and the pore complex is from +5 V to -5 V, such as from -600 mV to +600 mV or from -400 mV to +400 mV. The voltage used is preferably in the range of 100 mV to 240 mV, more preferably in the range of 120 mV to 220 mV. By using an increased applied potential, it is possible to increase the discrimination between different nucleotides by the pore. Any suitable electrically conductive solution can be used. For example, the solution can comprise a charge carrier, such as a metal salt, such as an alkali metal salt, a halide salt, such as a chloride salt, such as an alkali metal chloride salt. The charge carrier can include an ionic liquid or an organic salt, such as tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1-ethyl-3-methyl imidazolium chloride. In an exemplary system, the salt is present in an aqueous solution within the chamber. Potassium chloride (KCl), sodium chloride (NaCl), cesium chloride (CsCl), or mixtures of potassium ferrocyanide and potassium ferricyanide are typically used. KCl, NaCl, and mixtures of potassium ferrocyanide and potassium ferricyanide are preferred. The charge carrier can be asymmetric across the membrane. For example, the type and/or concentration of the charge carrier can be different on each side of the membrane, e.g., in each chamber.

염 농도는 포화 상태일 수 있다. 염 농도는 3 M 이하일 수 있고, 전형적으로 0.1 내지 2.5 M, 0.3 내지 1.9 M, 0.5 내지 1.8 M, 0.7 내지 1.7 M, 0.9 내지 1.6 M, 또는 1 M 내지 1.4 M이다. 염 농도는 바람직하게는 150 mM 내지 1 M이다. 방법은 바람직하게는 적어도 0.3 M, 예컨대, 적어도 0.4 M, 적어도 0.5 M, 적어도 0.6 M, 적어도 0.8 M, 적어도 1.0 M, 적어도 1.5 M, 적어도 2.0 M, 적어도 2.5 M 또는 적어도 3.0 M의 염 농도를 사용하여 수행된다. 높은 염 농도는 높은 신호 대 노이즈 비를 제공하고, 정상적인 전류 변동의 배경에 대비하여 뉴클레오티드의 존재를 나타내는 전류가 식별될 수 있도록 한다.The salt concentration can be saturating. The salt concentration can be up to 3 M, and is typically from 0.1 to 2.5 M, from 0.3 to 1.9 M, from 0.5 to 1.8 M, from 0.7 to 1.7 M, from 0.9 to 1.6 M, or from 1 M to 1.4 M. The salt concentration is preferably from 150 mM to 1 M. The method is preferably performed using a salt concentration of at least 0.3 M, such as at least 0.4 M, at least 0.5 M, at least 0.6 M, at least 0.8 M, at least 1.0 M, at least 1.5 M, at least 2.0 M, at least 2.5 M or at least 3.0 M. High salt concentrations provide a high signal-to-noise ratio, allowing currents indicating the presence of nucleotides to be discerned against a background of normal current fluctuations.

전기 전도성 용액에는 완충액이 존재할 수 있다. 전형적으로, 완충액은 포스페이트 완충액이다. 다른 적합한 완충액은 HEPES 및 Tris-HCl 완충액이다. 전기 전도성 용액의 pH는 4.0 내지 12.0, 4.5 내지 10.0, 5.0 내지 9.0, 5.5 내지 8.8, 6.0 내지 8.7, 또는 7.0 내지 8.8, 또는 7.5 내지 8.5일 수 있다. 사용된 pH는 바람직하게는 약 6.9이다. The electrically conductive solution may contain a buffer. Typically, the buffer is a phosphate buffer. Other suitable buffers are HEPES and Tris-HCl buffer. The pH of the electrically conductive solution may be from 4.0 to 12.0, from 4.5 to 10.0, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7, or from 7.0 to 8.8, or from 7.5 to 8.5. The pH used is preferably about 6.9.

시스템은 막에 존재하는 포어 복합체의 배열을 포함할 수 있다. 바람직한 실시형태에서, 어레이의 각각의 막은 하나의 포어 복합체를 포함한다. 어레이가 형성되는 방식으로 인해, 예를 들어, 어레이는 포어 복합체를 포함하지 않는 하나 이상의 막, 및/또는 2개 이상의 포어 복합체를 포함하는 하나 이상의 막을 포함할 수 있다. 어레이는 약 2개에서 약 12,000개, 예를 들어 약 10개에서 약 800개, 약 20개에서 약 600개, 약 30개에서 약 500개, 약 250개에서 약 2000개, 약 500개에서 약 4000개, 약 1000개에서 약 5000개, 약 2500개에서 약 10,000개, 또는 약 5000개에서 약 12,000개의 막을 포함할 수 있다. 일부 실시형태에서, 어레이는 12,000개 초과의 막을 포함한다.The system can include an array of pore complexes present in the membrane. In a preferred embodiment, each membrane of the array includes a pore complex. Due to the manner in which the array is formed, for example, the array can include one or more membranes that do not include a pore complex, and/or one or more membranes that include two or more pore complexes. The array can include from about 2 to about 12,000 membranes, for example, from about 10 to about 800, from about 20 to about 600, from about 30 to about 500, from about 250 to about 2000, from about 500 to about 4000, from about 1000 to about 5000, from about 2500 to about 10,000, or from about 5000 to about 12,000 membranes. In some embodiments, the array includes more than 12,000 membranes.

시스템은 장치에 포함될 수 있다. 장치는 어레이 또는 칩과 같은 분석물 분석을 위한 임의의 종래의 장치일 수 있다. 장치는 바람직하게는 개시된 방법을 수행하도록 설정된다. 예를 들어, 장치는 수용액을 포함하는 챔버 및 챔버를 2개의 섹션으로 분리하는 배리어를 포함할 수 있다. 배리어는 전형적으로 포어를 함유하는 막이 형성되는 천공을 갖는다. 대안적으로, 배리어는 포어가 존재하는 막을 형성한다.The system may be incorporated into a device. The device may be any conventional device for analyte analysis, such as an array or a chip. The device is preferably configured to perform the disclosed method. For example, the device may include a chamber containing an aqueous solution and a barrier separating the chamber into two sections. The barrier typically has perforations through which a membrane containing pores is formed. Alternatively, the barrier forms a membrane through which pores are present.

일 실시형태에서, 장치는 복수의 포어 및 막을 지지할 수 있고 포어 및 막을 사용하여 분석물 특성규명을 수행하도록 작동 가능한 센서 장치; 및 특성규명을 수행하기 위한 자료의 전달을 위한 적어도 하나의 포트를 포함한다.In one embodiment, the device comprises a sensor device capable of supporting a plurality of pores and membranes and operable to perform characterization of an analyte using the pores and membranes; and at least one port for transferring data for performing the characterization.

일 실시형태에서, 장치는 복수의 포어 및 막을 지지할 수 있고 포어 및 막을 사용하여 분석물 특성규명을 수행하도록 작동 가능한 센서 장치를 포함하며; 및 특성규명을 수행하기 위한 물질을 보유하기 위한 적어도 하나의 저장소를 포함한다. In one embodiment, the device comprises a sensor device operable to support a plurality of pores and membranes and perform characterization of an analyte using the pores and membranes; and at least one reservoir for holding a material for performing characterization.

일 실시형태에서, 장치는 막 및 복수의 포어 및 막을 지지할 수 있고 포어 및 막을 사용하여 분석물 특성규명을 수행하도록 작동 가능한 센서 장치; 특성규명을 수행하기 위한 물질을 보유하기 위한 적어도 하나의 저장소; 적어도 하나의 저장소로부터 센서 장치로 물질을 제어 가능하게 공급하도록 구성된 유체공학 시스템; 및 각각의 샘플을 수용하기 위한 하나 이상의 용기를 포함하며, 유체공학 시스템은 선택적으로 하나 이상의 용기로부터 센서 장치로 샘플을 공급하도록 구성된다. In one embodiment, the device comprises a membrane and a plurality of pores and a sensor device operable to support the membrane and perform characterization of an analyte using the pores and the membrane; at least one reservoir for holding a substance for performing the characterization; a fluidics system configured to controllably supply a substance from the at least one reservoir to the sensor device; and one or more vessels for receiving each sample, wherein the fluidics system is configured to selectively supply a sample from the one or more vessels to the sensor device.

장치는 또한 전위를 인가하고 막 및 포어 복합체를 가로질러 전기 신호를 측정할 수 있는 전기 회로를 포함할 수 있다. 장치는 WO 2008/102120, WO 2009/077734, WO 2010/122293, WO 2011/067559 또는 WO 00/28312에 기재된 것 중 임의의 것일 수 있다.The device may also include electrical circuitry capable of applying a potential and measuring an electrical signal across the membrane and pore complex. The device may be any of those described in WO 2008/102120, WO 2009/077734, WO 2010/122293, WO 2011/067559 or WO 00/28312.

막membrane

임의의 적합한 막이 시스템에 사용될 수 있다. 막은 바람직하게는 양친매성 층이다. 양친매성 층은 친수성 특성 및 친유성 특성 둘 모두를 갖는 인지질과 같은 양친매성 분자로부터 형성된 층이다. 양친매성 분자는 합성되거나 자연적으로 발생할 수 있다. 비-자연 발생 양친매성 물질 및 단일층을 형성하는 양친매성 물질은 당업계에 알려져 있으며, 예를 들어, 블록 공중합체를 포함한다 (Gonzalez-Perez 등, Langmuir, 2009, 25, 10447-10450). 블록 공중합체는 2개 이상의 단량체 서브-유닛이 함께 중합되어 단일 중합체 쇄를 생성하는 중합체 재료이다. 블록 공중합체는 전형적으로 각각의 단량체 서브-유닛에 의해 기여되는 특성을 갖는다. 그러나, 블록 공중합체는 개별 서브-유닛으로부터 형성된 중합체가 보유하지 않는 독특한 특성을 가질 수 있다. 블록 공중합체는 단량체 서브-유닛 중 하나가 소수성(즉, 친유성)인 반면 다른 서브-유닛(들)은 수성 매질에 있는 동안 친수성이도록 조작될 수 있다. 이 경우, 블록 공중합체는 양친매성 특성을 보유할 수 있으며, 생물학적 막을 모방하는 구조를 형성할 수 있다. 블록 공중합체는 2중블록(2개의 단량체 서브-유닛으로 구성됨)일 수 있지만, 또한 2개 초과의 단량체 서브-유닛으로부터 작제되어 양친매성 물질처럼 거동하는 더 복잡한 배열을 형성할 수 있다. 공중합체는 3중블록, 4중블록 또는 5중블록 공중합체일 수 있다. 막은 바람직하게는 3중블록 공중합체 막이다.Any suitable membrane may be used in the system. The membrane is preferably an amphiphilic layer. An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties. The amphiphilic molecules may be synthetic or naturally occurring. Non-naturally occurring amphiphiles and monolayer-forming amphiphiles are known in the art and include, for example, block copolymers (Gonzalez-Perez et al., Langmuir, 2009, 25, 10447-10450). Block copolymers are polymeric materials in which two or more monomer sub-units are polymerized together to form a single polymer chain. Block copolymers typically have properties contributed by each of the monomer sub-units. However, block copolymers may have unique properties that are not possessed by polymers formed from the individual sub-units. Block copolymers can be engineered so that one of the monomer sub-units is hydrophobic (i.e., lipophilic) while the other sub-unit(s) is hydrophilic while in an aqueous medium. In this case, the block copolymer can possess amphiphilic properties and can form structures that mimic biological membranes. The block copolymers can be diblocks (consisting of two monomer sub-units), but can also be constructed from more than two monomer sub-units to form more complex arrangements that behave like amphiphiles. The copolymers can be triblock, tetrablock, or pentablock copolymers. The membrane is preferably a triblock copolymer membrane.

고세균 양극성 테트라에테르 지질은 지질이 단일층 막을 형성하도록 작제된 자연 발생 지질이다. 이러한 지질은 일반적으로 가혹한 생물학적 환경에서 생존하는 극한성 생물, 호열성 생물, 호염성 생물 및 호산성 생물에서 발견된다. 이들의 안정성은 최종 이중층의 융합된 성질로부터 유도되는 것으로 여겨진다. 친수성-소수성-친수성의 일반 모티프를 갖는 3중블록 중합체를 생성함으로써 이러한 생물학적 엔티티를 모방하는 블록 공중합체 재료를 작제하는 것은 간단하다. 이 재료는 지질 이중층과 유사하게 거동하고 소포로부터 층류 막에 이르기까지 다양한 상 거동을 포함하는 단량체 막을 형성할 수 있다. 이러한 3중블록 공중합체로부터 형성된 막은 생물학적 지질 막에 비해 몇몇 이점을 가지고 있다. 3중블록 공중합체는 합성되기 때문에, 정확한 작제를 신중하게 제어하여 막을 형성하고 포어 및 다른 단백질과 상호작용하기 위해 필요한 정확한 쇄 길이 및 특성을 제공할 수 있다. Archaeal amphiphilic tetraether lipids are naturally occurring lipids engineered to form monolayer membranes. These lipids are commonly found in extremophiles, thermophiles, halophiles, and acidophiles that survive in harsh biological environments. Their stability is thought to derive from the fused nature of the final bilayer. It is straightforward to engineer block copolymer materials that mimic these biological entities by producing triblock polymers with a general motif of hydrophilic-hydrophobic-hydrophilic. These materials behave similarly to lipid bilayers and can form monomeric membranes with a variety of phase behaviors, from vesicles to laminar membranes. Membranes formed from these triblock copolymers have several advantages over biological lipid membranes. Since triblock copolymers are synthetic, their precise construction can be carefully controlled to provide the precise chain lengths and properties necessary to form membranes and interact with pores and other proteins.

블록 공중합체는 지질 하위 물질로 분류되지 않는 하위 단위로 구성될 수도 있다. 예를 들어, 소수성 중합체는 실록산 또는 기타 비탄화수소 기반 단량체로 만들어질 수 있다. 블록 공중합체의 친수성 서브-섹션은 또한 낮은 단백질 결합 특성을 보유할 수 있으며, 이는 미가공 생물학적 샘플에 노출될 때 고도로 저항성이 있는 막의 생성을 허용한다. 이 헤드 기 유닛은 또한 비-고전적인 지질 헤드-기로부터 유래될 수 있다.Block copolymers may also be composed of sub-units that are not classified as lipid sub-substances. For example, the hydrophobic polymer may be made of siloxanes or other non-hydrocarbon based monomers. The hydrophilic sub-section of the block copolymer may also possess low protein binding properties, which allows for the production of highly resistant membranes when exposed to unprocessed biological samples. The head group unit may also be derived from non-classical lipid head-groups.

3중블록 공중합체 막은 또한 생물학적 지질 막과 비교하여 증가된 기계적 및 환경적 안정성, 예를 들어, 훨씬 더 높은 작동 온도 또는 pH 범위를 갖는다. 블록 공중합체의 합성 성질은 광범위한 적용을 위해 중합체 기반 막을 맞춤화하기 위한 플랫폼을 제공한다.Triblock copolymer membranes also have increased mechanical and environmental stability compared to biological lipid membranes, for example, a much higher operating temperature or pH range. The synthetic nature of block copolymers provides a platform for tailoring polymer-based membranes for a wide range of applications.

막은 가장 바람직하게는 국제 출원 번호 WO2014/064443 또는 WO2014/064444에 개시된 막 중 하나이다.The membrane is most preferably one of the membranes disclosed in International Application No. WO2014/064443 or WO2014/064444.

양친매성 분자는 폴리뉴클레오티드의 커플링을 촉진하기 위해 화학적으로 변형되거나 기능화될 수 있다. 양친매성 층은 단일층 또는 이중층일 수 있다. 양친매성 층은 전형적으로 평면이다. 양친매성 층은 곡선형일 수 있다. 양친매성 층은 지지될 수 있다.The amphiphilic molecule may be chemically modified or functionalized to facilitate coupling of the polynucleotides. The amphiphilic layer may be a single layer or a double layer. The amphiphilic layer is typically planar. The amphiphilic layer may be curved. The amphiphilic layer may be supported.

양친매성 막은 전형적으로 자연적으로 이동하여, 본질적으로 대략 10^-8 cm s^-1의 지질 확산 속도를 갖는 2차원 유체로서 작용한다. 이는 포어 및 커플링된 폴리뉴클레오티드가 전형적으로 양친매성 막 내에서 이동할 수 있음을 의미한다.Amphiphilic membranes are typically mobile in nature, essentially behaving as two-dimensional fluids with lipid diffusion rates of the order of 10 ^-8 cm s ^-1 . This means that pores and coupled polynucleotides can typically move within an amphiphilic membrane.

막은 지질 이중층일 수 있다. 지질 이중층은 세포막의 모델이며 다양한 실험 연구를 위한 탁월한 플랫폼으로서 역할을 한다. 예를 들어, 지질 이중층은 단일-채널 기록에 의해 막 단백질의 생체외 조사를 위해 사용될 수 있다. 대안적으로, 지질 이중층은 다양한 물질의 존재를 검출하기 위해 바이오센서로서 사용될 수 있다. 지질 이중층은 임의의 지질 이중층일 수 있다. 적합한 지질 이중층은 평면 지질 이중층, 지지된 이중층 또는 리포솜을 포함하나 이에 제한되지 않는다. 지질 이중층은 바람직하게는 평면 지질 이중층이다. 적합한 지질 이중층은 WO 2008/102121, WO 2009/077734 및 WO 2006/100484에 개시되어 있다.The membrane may be a lipid bilayer. Lipid bilayers are models of cell membranes and serve as excellent platforms for a variety of experimental studies. For example, lipid bilayers can be used for in vitro investigation of membrane proteins by single-channel recording. Alternatively, lipid bilayers can be used as biosensors to detect the presence of various substances. The lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include, but are not limited to, planar lipid bilayers, supported bilayers or liposomes. The lipid bilayer is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in WO 2008/102121, WO 2009/077734 and WO 2006/100484.

지질 이중층을 형성하기 위한 방법은 당업계에 알려져 있다. 지질 이중층은 일반적으로 Montal 및 Mueller의 방법 (Proc. Natl. Acad. Sci. USA., 1972; 69: 3561-3566)에 의해 형성되며, 지질 단일층은 해당 계면에 수직인 어느 하나의 천공의 측면을 통과하여 수용액/공기 계면 상에서 운반된다. 지질은 보통 이를 먼저 유기 용매에 용해시킨 다음 어느 하나의 천공의 측면 상의 수용액 표면에서 용매의 방울이 증발하는 것을 허용함으로써 전해질 수용액의 표면에 첨가된다. 유기 용매가 증발되면, 이중층이 형성될 때까지 어느 하나의 천공의 측면 상의 용액/공기 계면이 천공을 통과하여 위아래로 물리적으로 이동한다. 평면 지질 이중층은 막의 천공을 가로질러 또는 오목부로 들어가는 개구부를 가로질러 형성될 수 있다.Methods for forming lipid bilayers are known in the art. Lipid bilayers are typically formed by the method of Montal and Mueller (Proc. Natl. Acad. Sci. USA., 1972; 69: 3561-3566), wherein the lipid monolayer is transported across the aqueous/air interface across one side of the perforation perpendicular to that interface. The lipid is usually added to the surface of an aqueous electrolyte solution by first dissolving it in an organic solvent and then allowing droplets of the solvent to evaporate from the surface of the aqueous solution on one side of the perforation. As the organic solvent evaporates, the solution/air interface on the side of one of the perforations physically moves up and down through the perforation until a bilayer is formed. Planar lipid bilayers can be formed across perforations in the membrane or across openings that enter a recess.

Montal & Mueller의 방법은 단백질 포어 삽입에 적합한 양호한 품질의 지질 이중층을 형성하는 비용-효과적이고 비교적 간단한 방법이기 때문에 인기가 있다. 다른 일반적인 이중층 형성의 방법은 팁-침지, 이중층 페인팅 및 리포솜 이중층의 패치-클램핑을 포함한다. The Montal & Mueller method is popular because it is a cost-effective and relatively simple method for forming good quality lipid bilayers suitable for protein pore insertion. Other common methods for forming bilayers include tip-dipping, bilayer painting, and patch-clamping of liposome bilayers.

팁-침지 이중층 형성은 지질의 단일층을 운반하는 테스트 용액의 표면 상으로 천공 표면 (예를 들어, 피펫 팁)을 건드리는 것을 수반한다. 다시 말하면, 지질 단일층은 유기 용매에 용해된 지질의 방울이 용액 표면에서 증발하는 것을 허용함으로써 용액/공기 계면에서 먼저 생성된다. 그런 다음, 이중층은 Langmuir-Schaefer 공정에 의해 형성되며, 용액 표면에 대해 천공을 이동시키기 위해 기계적 자동화를 필요로 한다.Tip-immersion bilayer formation involves touching a perforated surface (e.g., a pipette tip) to the surface of a test solution carrying a lipid monolayer. In other words, the lipid monolayer is first created at the solution/air interface by allowing a droplet of lipid dissolved in an organic solvent to evaporate from the solution surface. The bilayer is then formed by the Langmuir-Schaefer process, requiring mechanical automation to move the perforation relative to the solution surface.

페인팅된 이중층의 경우, 유기 용매에 용해된 지질의 방울이 천공에 직접 적용되고, 이는 테스트 수용액에 담궈진다. 지질 용액은 페인트브러시 또는 등가물을 사용하여 천공에 걸쳐 얇게 확산된다. 용매의 박화는 지질 이중층의 형성을 초래한다. 그러나, 이중층으로부터 용매의 완전한 제거는 어려우며, 결과적으로 이 방법에 의해 형성된 이중층은 덜 안정하며 전기화학적 측정 동안 노이즈에 더 취약하다.For painted bilayers, a drop of lipid dissolved in an organic solvent is applied directly to the perforation, which is then immersed in the test solution. The lipid solution is spread thinly across the perforation using a paintbrush or equivalent. Thinning of the solvent results in the formation of a lipid bilayer. However, complete removal of the solvent from the bilayer is difficult, and consequently, bilayers formed by this method are less stable and more susceptible to noise during electrochemical measurements.

패치-클램핑은 생물학적 세포막의 연구에서 일반적으로 사용된다. 세포막은 흡입에 의해 피펫의 단부에 클램핑되고, 막의 패치가 천공 위에 부착되게 된다. 이 방법은 리포솜을 클램핑하여 지질 이중층을 생산하기 위해 적용되었으며, 이어서 파열되어 피펫의 천공 위에 지질 이중층 밀봉을 남긴다. 이 방법은 안정하고 거대한 단층형 리포솜 및 유리 표면을 갖는 재료에서 작은 천공의 제작을 필요로 한다.Patch-clamping is commonly used in the study of biological membranes. The membrane is clamped to the end of a pipette by suction, and a patch of membrane is attached over the perforation. This method has been applied to produce lipid bilayers by clamping liposomes, which are then ruptured, leaving a lipid bilayer seal over the perforation of the pipette. This method requires the fabrication of stable, large unilamellar liposomes and small perforations in materials with a glassy surface.

리포솜은 초음파 처리, 압출 또는 모자파리(Mozafari) 방법(Colas 등 (2007) Micron 38:841-847)에 의해 형성될 수 있다. 바람직한 실시양태에서, 지질 이중층은 국제 출원 번호 WO 2009/077734에 기재된 바와 같이 형성된다. 유리하게는, 이 방법에서 지질 이중층은 건조된 지질로부터 형성된다. 가장 바람직한 실시양태에서, 지질 이중층은 WO2009/077734에 기재된 바와 같이 개구부를 가로질러 형성된다.Liposomes can be formed by sonication, extrusion or the Mozafari method (Colas et al. (2007) Micron 38:841-847). In a preferred embodiment, the lipid bilayer is formed as described in International Application No. WO 2009/077734. Advantageously, in this method the lipid bilayer is formed from dried lipids. In a most preferred embodiment, the lipid bilayer is formed across the opening as described in WO2009/077734.

지질 이중층은 대향하는 2개의 지질층으로 형성된다. 2개의 지질층은 소수성 테일 기가 서로를 향하여 소수성 내부를 형성하도록 배열된다. 지질의 친수성 헤드 기는 이중층의 각각의 측면 상에서 수성 환경을 향해 바깥쪽을 향한다. 이중층은 액체 불규칙 상 (유체 층판), 액체 규칙 상, 고체 규칙 상 (층판 겔 상, 맞물린 겔 상) 및 평면 이중층 결정 (층판 하위-겔 상, 층판 결정질 상)을 포함하나 이에 제한되지 않는 다수의 지질 상으로 존재할 수 있다.Lipid bilayers are formed of two opposing lipid layers. The two lipid layers are arranged so that the hydrophobic tail groups point toward each other to form a hydrophobic interior. The hydrophilic head groups of the lipids point outward toward the aqueous environment on each side of the bilayer. Bilayers can exist in a number of lipid phases, including but not limited to a liquid disordered phase (fluid lamellar), a liquid ordered phase, a solid ordered phase (lamellar gel phase, interdigitated gel phase), and a planar bilayer crystal (lamellar sub-gel phase, lamellar crystalline phase).

지질 이중층을 형성하는 임의의 지질 조성이 사용될 수 있다. 지질 조성은 표면 전하, 막 단백질을 지지하는 능력, 패킹 밀도 또는 기계적 특성과 같은 요구되는 특성을 갖는 지질 이중층이 형성되도록 선택된다. 지질 조성은 하나 이상의 상이한 지질을 포함할 수 있다. 예를 들어, 지질 조성은 최대 100 개의 지질을 함유할 수 있다. 지질 조성은 바람직하게는 1 내지 10 개의 지질을 함유한다. 지질 조성은 자연 발생 지질 및/또는 인공 지질을 포함할 수 있다. Any lipid composition that forms a lipid bilayer can be used. The lipid composition is selected such that a lipid bilayer having the desired properties, such as surface charge, ability to support membrane proteins, packing density, or mechanical properties, is formed. The lipid composition can comprise one or more different lipids. For example, the lipid composition can contain up to 100 lipids. The lipid composition preferably contains 1 to 10 lipids. The lipid composition can comprise naturally occurring lipids and/or artificial lipids.

지질은 전형적으로 동일하거나 상이할 수 있는 헤드 기, 계면 모이어티 및 2개의 소수성 테일 기를 포함한다. 적합한 헤드 그룹은 중성 헤드 그룹, 예컨대, 아실글리세리드(DG) 및 세라마이드(CM); 양성이온성 헤드 그룹, 예컨대, 포스파티딜콜린(PC), 포스파티딜에탄올아민(PE) 및 스핑고미엘린(SM); 음전하 헤드 그룹, 예컨대, 포스파티딜글리세롤(PG); 포스파티딜세린(PS), 포스파티딜이노시톨(PI), 인산(PA) 및 카디오리핀(CA); 양전하를 띤 헤드 그룹, 예컨대, 트리메틸암모늄-프로판(TAP)을 포함하나 이에 제한되지 않는다. 적합한 계면 모이어티는 자연 발생 계면 모이어티, 예컨대, 글리세롤 기반 또는 세라마이드 기반 모이어티를 포함하나 이에 제한되지 않는다. 적합한 소수성 테일 그룹은 포화 탄화수소 사슬, 예컨대, 라우르산(n-도데카놀산), 미리스트산(n-테트라데코논산), 팔미트산(n-헥사데칸산), 스테아르산(n-옥타데칸산) 및 아라키드산(n-에이코산산); 불포화 탄화수소 사슬, 예컨대, 올레산(시스-9-옥타데칸산); 분지형 탄화수소 사슬, 예컨대, 피타노일을 포함하나 이에 제한되지 않는다. 불포화 탄화수소 쇄에서 쇄의 길이 그리고 이중 결합의 위치 및 수는 다양할 수 있다. 분지된 탄화수소 쇄에서 쇄의 길이 그리고 메틸기와 같은 분지의 위치 및 수는 다양할 수 있다. 소수성 테일 기는 에테르 또는 에스테르로서 계면 모이어티에 연결될 수 있다. 지질은 미콜산일 수 있다.Lipids typically comprise a head group, an interfacial moiety and two hydrophobic tail groups, which may be the same or different. Suitable head groups include, but are not limited to, neutral head groups, such as acylglycerides (DG) and ceramides (CM); zwitterionic head groups, such as phosphatidylcholine (PC), phosphatidylethanolamine (PE), and sphingomyelin (SM); negatively charged head groups, such as phosphatidylglycerol (PG); phosphatidylserine (PS), phosphatidylinositol (PI), phosphate (PA), and cardiolipin (CA); and positively charged head groups, such as trimethylammonium-propane (TAP). Suitable interfacial moieties include, but are not limited to, naturally occurring interfacial moieties, such as glycerol-based or ceramide-based moieties. Suitable hydrophobic tail groups include, but are not limited to, saturated hydrocarbon chains, such as lauric acid ( n- dodecanolic acid), myristic acid ( n -tetradecanoic acid), palmitic acid ( n- hexadecanoic acid), stearic acid ( n- octadecanoic acid), and arachidic acid ( n- eicosanoic acid); unsaturated hydrocarbon chains, such as oleic acid ( cis -9-octadecanoic acid); and branched hydrocarbon chains, such as phytanoyl. The length of the chain and the position and number of double bonds in the unsaturated hydrocarbon chain can vary. The length of the chain and the position and number of branches, such as methyl groups, can vary. The hydrophobic tail group can be linked to the interfacial moiety as an ether or ester. The lipid can be mycolic acid.

지질은 또한 화학적으로 변형될 수 있다. 지질의 헤드 그룹 또는 테일 그룹은 화학적으로 변형될 수 있다. 헤드 그룹이 화학적으로 변형된 적합한 지질은 PEG 변형 지질, 예컨대, 1,2-디아실-sn-글리세로-3-포스포에탄올아민-N -[메톡시(폴리에틸렌 글리콜)-2000]; 기능화된 PEG 지질, 예컨대, 1,2-디스테아로일-sn-글리세로-3 포스포에탄올아민-N-[비오티닐(폴리에틸렌 글리콜)2000]; 및 접합을 위해 변형된 지질, 예컨대, 1,2-디올레오일-sn-글리세로-3-포스포에탄올아민-N-(숙시닐) 및 1,2-디팔미토일-sn-글리세로-3-포스포에탄올아민-N-(비오티닐)을 포함하나 이에 제한되지 않는다. 테일 그룹이 화학적으로 변형된 적합한 지질은 중합체화 가능한 지질, 예컨대, 1,2-비스(10,12-트리코사디이노일)-sn-글리세로-3-포스포콜린; 불소화 지질, 예컨대, 1-팔미토일-2-(16-플루오로팔미토일)-sn-글리세로-3-포스포콜린; 중수소화 지질, 예컨대, 1,2-디팔미토일-D62-sn-글리세로-3-포스포콜린; 및 에테르 결합 지질, 예컨대, 1,2-디-O-피타닐-sn-글리세로-3-포스포콜린을 포함한다. 지질은 폴리뉴클레오티드의 커플링을 촉진하기 위해 화학적으로 변형되거나 기능화될 수 있다.Lipids can also be chemically modified. The head group or tail group of the lipid can be chemically modified. Suitable lipids having a chemically modified head group include, but are not limited to, PEG modified lipids, such as 1,2-diacyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000]; functionalized PEG lipids, such as 1,2-distearoyl-sn-glycero-3 phosphoethanolamine-N-[biotinyl(polyethylene glycol)2000]; and lipids modified for conjugation, such as 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine-N-(succinyl) and 1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-(biotinyl). Suitable lipids having chemically modified tail groups include polymerizable lipids, such as 1,2-bis(10,12-tricosadiinoyl)-sn-glycero-3-phosphocholine; fluorinated lipids, such as 1-palmitoyl-2-(16-fluoropalmitoyl)-sn-glycero-3-phosphocholine; deuterated lipids, such as 1,2-dipalmitoyl-D62-sn-glycero-3-phosphocholine; and ether linked lipids, such as 1,2-di-O-phytanyl-sn-glycero-3-phosphocholine. The lipids can be chemically modified or functionalized to facilitate coupling of the polynucleotides.

양친매성 층, 예를 들어, 지질 조성은 전형적으로 층의 특성에 영향을 미칠 하나 이상의 첨가제를 포함한다. 적합한 첨가제는 지방산, 예컨대, 팔미트산, 미리스트산 및 올레산; 지방 알코올, 예컨대, 팔미트산 알코올, 미리스트산 알코올 및 올레산 알코올; 스테롤, 예컨대, 콜레스테롤, 에르고스테롤, 라노스테롤, 시토스테롤 및 스티그마스테롤; 리소인지질, 예컨대, 1-아실-2-하이드록시-sn-글리세로-3-포스포콜린; 및 세라마이드를 포함하나 이에 제한되지 않는다.The amphiphilic layer, e.g., the lipid composition, typically includes one or more additives that will affect the properties of the layer. Suitable additives include, but are not limited to, fatty acids, such as palmitic acid, myristic acid, and oleic acid; fatty alcohols, such as palmitic alcohol, myristic alcohol, and oleic alcohol; sterols, such as cholesterol, ergosterol, lanosterol, sitosterol, and stigmasterol; lysophospholipids, such as 1-acyl-2-hydroxy-sn-glycero-3-phosphocholine; and ceramides.

다른 바람직한 실시형태에서, 막은 고체 상태 층을 포함한다. 고체-상태 층은 마이크로전자 재료, 절연 재료, 예컨대, Si₃N₄, A1₂O₃ 및 SiO, 유기 및 무기 중합체, 예컨대, 폴리아미드, 플라스틱, 예컨대, Teflon® 또는 엘라스토머, 예컨대, 2-구성요소 첨가-경화형 실리콘 고무, 및 유리를 포함하나 이에 제한되지 않는 유기 및 무기 재료 둘 모두로 형성될 수 있다. 고체 상태 층은 그래핀으로 형성될 수 있다. 적합한 그래핀 층은 WO 2009/035647에 개시되어 있다. 막이 고체 상태 층을 포함하는 경우, 포어는 전형적으로 고체 상태 층 내에 함유된 양친매성 막 또는 층, 예를 들어, 고체 상태 층 내의 홀, 웰, 갭, 채널, 트렌치 또는 슬릿 내에 존재한다. 당업자는 적합한 고체 상태/양친매성 하이브리드 시스템을 제조할 수 있다. 적합한 시스템은 WO 2009/020682 및 WO 2012/005857에 개시되어 있다. 위에 논의된 양친매성 막 또는 층 중 임의의 것이 사용될 수 있다.In another preferred embodiment, the membrane comprises a solid state layer. The solid-state layer can be formed of both organic and inorganic materials, including but not limited to microelectronic materials, insulating materials such as Si ₃ N ₄ , Al ₂ O ₃ and SiO, organic and inorganic polymers such as polyamides, plastics such as Teflon® or elastomers such as two-component addition-cured silicone rubbers, and glasses. The solid state layer can be formed of graphene. Suitable graphene layers are disclosed in WO 2009/035647. When the membrane comprises a solid state layer, the pores are typically present within an amphiphilic membrane or layer contained within the solid state layer, for example, holes, wells, gaps, channels, trenches or slits within the solid state layer. Those skilled in the art can prepare suitable solid state/amphiphilic hybrid systems. Suitable systems are disclosed in WO 2009/020682 and WO 2012/005857. Any of the amphiphilic membranes or layers discussed above may be used.

방법은 전형적으로 (i) 포어를 포함하는 인공 양친매성 층, (ii) 포어를 포함하는 단리된 자연 발생 지질 이중층 또는 (iii) 그 안에 삽입된 포어를 갖는 세포를 사용하여 수행된다. 방법은 전형적으로 인공 3중블록 공중합체 층과 같은 인공 양친매성 층을 사용하여 수행된다. 층은 포어에 더하여 다른 분자뿐만 아니라 다른 막관통 및/또는 막내 단백질을 포함할 수 있다. 적합한 장치 및 조건이 아래에 논의되어 있다. 본 개시내용의 방법은 전형적으로 시험관내에서 수행된다. The method is typically performed using (i) an artificial amphiphilic layer comprising a pore, (ii) an isolated naturally occurring lipid bilayer comprising a pore, or (iii) a cell having a pore inserted therein. The method is typically performed using an artificial amphiphilic layer, such as an artificial triblock copolymer layer. The layer may comprise other molecules in addition to the pore, as well as other transmembrane and/or intramembrane proteins. Suitable apparatus and conditions are discussed below. The methods of the present disclosure are typically performed in vitro .

분석물을 특성규명하는 방법How to characterize analytes

추가 양태에서, 표적 분석물의 존재, 부재 또는 하나 이상의 특성을 결정하는 방법이 개시된다. 방법은 표적 분석물을 포어 복합체를 포함하는 막과 접촉시켜, 표적 분석물이 포어 복합체 내의 나노포어와 보조 단백질 또는 펩티드에 의해 제공된 적어도 2개의 구조를 포함하는 연속적인 채널에 대하여, 예를 들어, 연속적인 채널 내부로 또는 연속적인 채널을 통과하여 이동하도록 하는 단계 및 분석물이 채널에 대하여 이동할 때 하나 이상의 측정을 수행하여 분석물의 존재, 부재 또는 하나 이상의 특성을 결정하는 단계를 포함한다. 분석물은 나노포어 협착부를 통과한 후 보조 단백질 협착부를 통과할 수 있다. 대안적인 실시형태에서, 분석물은 막에 있는 포어 복합체의 배향에 따라 보조 단백질 협착부를 통과한 후 나노포어 협착부를 통과할 수 있다.In a further aspect, a method of determining the presence, absence, or one or more characteristics of a target analyte is disclosed. The method comprises the steps of contacting a target analyte with a membrane comprising a pore complex, allowing the target analyte to migrate across a continuous channel comprising a nanopore within the pore complex and at least two structures provided by an auxiliary protein or peptide, for example, into or through the continuous channel, and performing one or more measurements as the analyte migrates across the channel, thereby determining the presence, absence, or one or more characteristics of the analyte. The analyte can pass through the nanopore constriction and then through the auxiliary protein constriction. In an alternative embodiment, the analyte can pass through the nanopore constriction and then through the auxiliary protein constriction depending on the orientation of the pore complex in the membrane.

일 실시양태에서, 방법은 표적 분석물의 존재, 부재 또는 하나 이상의 특성을 결정하기 위한 것이다. 방법은 적어도 하나의 분석물의 존재, 부재 또는 하나 이상의 특성을 결정하기 위한 것일 수 있다. 방법은 2개 이상의 분석물의 존재, 부재 또는 하나 이상의 특성을 결정하는 것과 관련될 수 있다. 방법은 임의의 수의 분석물, 예컨대, 2, 5, 10, 15, 20, 30, 40, 50, 100개 이상의 분석물의 존재, 부재 또는 하나 이상의 특성을 결정하는 단계를 포함할 수 있다. 하나 이상의 분석물의 임의의 개수의 특성, 예컨대, 1, 2, 3, 4, 5, 10개 이상의 특성이 결정될 수 있다.In one embodiment, the method is for determining the presence, absence, or one or more characteristics of a target analyte. The method can be for determining the presence, absence, or one or more characteristics of at least one analyte. The method can involve determining the presence, absence, or one or more characteristics of two or more analytes. The method can comprise determining the presence, absence, or one or more characteristics of any number of analytes, e.g., 2, 5, 10, 15, 20, 30, 40, 50, 100, or more analytes. Any number of characteristics of one or more analytes can be determined, e.g., 1, 2, 3, 4, 5, 10, or more characteristics.

포어 복합체의 채널 또는 채널의 개구부 부근에서 분자의 결합은 포어를 통한 개방형 채널 이온 유동에 영향을 미칠 것이며, 이는 포어 채널의 "분자 감지"의 핵심이다. 핵산 시퀀싱 적용과 유사한 방식으로, 개방형-채널 이온 유동의 변동은 전류의 변화에 의한 적합한 측정 기법을 사용하여 측정될 수 있다(예를 들어, WO 2000/28312 및 D. Stoddart 등, Proc. Natl. Acad. Sci., 2010, 106, 7702-7 또는 WO 2009/077734). 전류의 감소에 의해 측정된 바와 같은 이온 유동의 감소의 정도는 포어 내부 또는 부근의 방해물의 크기와 관련된다. 따라서, "분석물"로서 또한 지칭되는 관심 분자의 포어 내 또는 근처의 결합은 검출가능하고 측정가능한 이벤트를 제공하여, "생물학적 센서"의 기초를 형성한다. 나노포어 감지에 적합한 분자는 핵산; 단백질; 펩티드; 다당류 및 소분자(여기서는 저분자량(예컨대, < 900 Da 또는 < 500 Da) 유기 또는 무기 화합물을 지칭함), 예컨대, 의약품, 독소, 시토카인 및 오염물질을 포함한다. 생물학적 분자의 존재를 검출하는 것은 개인화된 약물 개발, 의학, 진단, 생명 과학 연구, 환경 모니터링, 그리고 보안 및/또는 방위 산업에 적용된다.Binding of a molecule within or near the opening of a pore complex will affect the open-channel ion flux through the pore, which is the essence of "molecular sensing" of the pore channel. In a manner analogous to nucleic acid sequencing applications, the variation in the open-channel ion flux can be measured using a suitable measurement technique by changes in current (see, e.g., WO 2000/28312 and D. Stoddart et al., Proc. Natl. Acad. Sci., 2010, 106, 7702-7 or WO 2009/077734). The extent of the decrease in ion flux, as measured by a decrease in current, is related to the size of the obstruction within or near the pore. Thus, binding of a molecule of interest, also referred to as an "analyte", within or near the pore provides a detectable and measurable event, forming the basis of a "biological sensor". Molecules suitable for nanopore sensing include nucleic acids; proteins; peptides; Polysaccharides and small molecules (herein referred to as low molecular weight (e.g., <900 Da or <500 Da) organic or inorganic compounds), such as pharmaceuticals, toxins, cytokines and pollutants. Detecting the presence of biological molecules has applications in personalized drug development, medicine, diagnostics, life science research, environmental monitoring, and in the security and/or defense industries.

표적 분석물은 금속 이온, 무기 염, 중합체, 아미노산, 펩티드, 폴리펩티드, 단백질, 뉴클레오티드, 올리고뉴클레오티드, 폴리뉴클레오티드, 단당류, 다당류, 염료, 표백제, 의약품, 진단제, 기분전환 약물, 폭발물, 독성 화합물 또는 환경 오염물질일 수 있다. 방법은 동일한 유형의 2개 이상의 분석물, 예컨대, 2개 이상의 단백질, 2개 이상의 뉴클레오티드 또는 2개 이상의 의약품의 존재, 부재 또는 하나 이상의 특성을 결정하는 것과 관련될 수 있다. 대안적으로, 방법은 상이한 유형의 2개 이상의 분석물, 예컨대, 하나 이상의 단백질, 하나 이상의 뉴클레오티드 및 하나 이상의 의약품의 존재, 부재 또는 하나 이상의 특성을 결정하는 것과 관련될 수 있다. The target analyte can be a metal ion, an inorganic salt, a polymer, an amino acid, a peptide, a polypeptide, a protein, a nucleotide, an oligonucleotide, a polynucleotide, a monosaccharide, a polysaccharide, a dye, a bleach, a pharmaceutical, a diagnostic agent, a recreational drug, an explosive, a toxic compound, or an environmental pollutant. The method can involve determining the presence, absence, or one or more characteristics of two or more analytes of the same type, such as two or more proteins, two or more nucleotides, or two or more pharmaceuticals. Alternatively, the method can involve determining the presence, absence, or one or more characteristics of two or more analytes of different types, such as one or more proteins, one or more nucleotides, and one or more pharmaceuticals.

표적 분석물은 세포로부터 분비될 수 있다. 대안적으로, 표적 분석물은 세포 내부에 존재하는 분석물일 수 있고, 그에 따라 분석물은 방법이 수행되기 전에 세포로부터 추출되어야 한다. The target analyte may be secreted from the cell. Alternatively, the target analyte may be an analyte that is present inside the cell, and thus the analyte must be extracted from the cell before the method is performed.

일 실시형태에서, 분석물은 아미노산, 펩티드, 폴리펩티드 또는 단백질이다. 아미노산, 펩티드, 폴리펩티드 또는 단백질은 자연 발생적이거나 비자연 발생적일 수 있다. 폴리펩티드 또는 단백질은 이들 내에 합성 또는 변형된 아미노산을 포함할 수 있다. 아미노산에 대한 몇몇 상이한 유형의 변형이 당업계에 알려져 있다. 적합한 아미노산 및 이의 변형은 위에 나와 있다. 표적 분석물은 당업계에서 이용가능한 임의의 방법에 의해 변형될 수 있음이 이해되어야 한다. In one embodiment, the analyte is an amino acid, a peptide, a polypeptide, or a protein. The amino acid, peptide, polypeptide, or protein may be naturally occurring or non-naturally occurring. The polypeptide or protein may include synthetic or modified amino acids therein. Several different types of modifications to amino acids are known in the art. Suitable amino acids and their modifications are listed above. It should be understood that the target analyte may be modified by any method available in the art.

바람직한 실시양태에서, 분석물은 핵산과 같은 폴리뉴클레오티드이다. 폴리뉴클레오티드는 2개 이상의 뉴클레오티드를 포함하는 거대분자로서 정의된다. DNA 및 RNA에 있는 자연 발생 핵산 염기는 물리적 크기에 의해 구별될 수 있다. 핵산 분자 또는 개별 염기가 나노포어의 채널을 통과할 때, 염기 간의 크기 차이는 채널을 통한 이온 유동의 직접적인 상관 감소를 유발한다. 이온 유동의 변동량이 기록될 수 있다. 이온 유동 변동량을 기록하기 위한 적합한 전기적 측정 기술은 예를 들어, WO 2000/28312 및 [D. Stoddart 등, Proc. Natl. Acad. Sci., 2010, 106, pp 7702-7](단일 채널 기록 장비); 및 예를 들어, WO 2009/077734(다중 채널 기록 기법)에 있다. 적합한 보정을 통해, 이온 유동의 특성적인 감소를 사용하여, 채널을 횡단하는 특정 뉴클레오티드 및 연관된 염기를 실시간으로 식별할 수 있다. 전형적인 나노포어 핵산 시퀀싱에서, 뉴클레오티드에 의한 채널의 부분적인 차단으로 인해 관심 핵산 서열의 개별 뉴클레오티드가 나노포어의 채널을 순차적으로 통과함에 따라 개방형-채널 이온 유동이 감소된다. 위에 기재된 적합한 기록 기술을 사용하여 측정되는 것은 이온 유동의 이러한 감소이다. 이온 유동의 감소는 채널을 통해 알려진 뉴클레오티드에 대한 측정된 이온 유동의 감소로 보정될 수 있으며, 그 결과 어떤 뉴클레오티드가 채널을 통과하는지 결정하는 수단을 초래하고, 따라서, 순차적으로 행해질 때, 나노포어를 통과하는 핵산의 뉴클레오티드 서열을 결정하는 방법을 초래한다. 개별 뉴클레오티드의 정확한 결정을 위해, 채널을 통한 이온 유동의 감소가 협착부 (또는 "판독 헤드")를 통과하는 개별 뉴클레오티드의 크기와 직접적으로 상관되는 것이 전형적으로 요구되어 왔다. 예를 들어, 연관된 중합효소 또는 헬리카제의 작용을 통해 포어를 통해 '스레딩'된 온전한 핵산 중합체에 대해 시퀀싱이 수행될 수 있음이 이해될 것이다. 대안적으로, 서열은 포어에 근접한 표적 핵산으로부터 순차적으로 제거된 뉴클레오티드 트리포스페이트 염기의 통과에 의해 결정될 수 있다 (예를 들어, WO 2014/187924 참고). In a preferred embodiment, the analyte is a polynucleotide, such as a nucleic acid. A polynucleotide is defined as a macromolecule comprising two or more nucleotides. Naturally occurring nucleic acid bases in DNA and RNA can be distinguished by their physical size. When a nucleic acid molecule or individual base passes through the channel of a nanopore, the size difference between the bases causes a directly correlated decrease in ion flux through the channel. The variation in ion flux can be recorded. Suitable electrical measurement techniques for recording the variation in ion flux are, for example, WO 2000/28312 and [D. Stoddart et al., Proc. Natl. Acad. Sci., 2010, 106, pp 7702-7] (single channel recording apparatus); and, for example, WO 2009/077734 (multi-channel recording technique). With suitable compensation, the characteristic decrease in ion flux can be used to identify specific nucleotides and associated bases traversing the channel in real time. In typical nanopore nucleic acid sequencing, the open-channel ion flux is reduced as individual nucleotides of a nucleic acid sequence of interest pass sequentially through the channel of the nanopore due to partial blockage of the channel by nucleotides. It is this reduction in ion flux that is measured using the appropriate recording techniques described above. The reduction in ion flux can be compensated for by a measured reduction in ion flux for a known nucleotide through the channel, resulting in a means of determining which nucleotides pass through the channel, and thus, when performed sequentially, the nucleotide sequence of the nucleic acid passing through the nanopore. For accurate determination of individual nucleotides, it has typically been desired that the reduction in ion flux through the channel be directly correlated to the size of the individual nucleotides passing through the constriction (or "read head"). It will be appreciated that sequencing can be performed on intact nucleic acid polymers that are "threaded" through the pore, for example, by the action of an associated polymerase or helicase. Alternatively, the sequence can be determined by passage of nucleotide triphosphate bases sequentially removed from the target nucleic acid in proximity to the pore (see, e.g., WO 2014/187924).

폴리뉴클레오티드 또는 핵산은 임의의 뉴클레오티드의 임의의 조합을 포함할 수 있다. 뉴클레오티드는 자연 발생적이거나 인공적일 수 있다. 폴리뉴클레오티드 내 하나 이상의 뉴클레오티드는 산화되거나 메틸화될 수 있다. 폴리뉴클레오티드 내 하나 이상의 뉴클레오티드가 손상될 수 있다. 예를 들어, 폴리뉴클레오티드는 피리미딘 이합체를 포함할 수 있다. 이러한 이합체는 전형적으로 자외선에 의한 손상과 연관되어 있으며, 피부 흑색종의 주요 원인이다. 폴리뉴클레오티드 내 하나 이상의 뉴클레오티드는 예를 들어, 표지 또는 태그를 이용하여 변형될 수 있으며, 이에 대한 적합한 실시예는 당업자에게 알려져 있다. 폴리뉴클레오티드는 하나 이상의 스페이서를 포함할 수 있다. 뉴클레오티드는 전형적으로 핵염기, 당 및 적어도 하나의 포스페이트 기를 함유한다. 핵염기 및 당은 뉴클레오시드를 형성한다. 핵염기는 전형적으로 헤테로사이클릭이다. 핵염기는 퓨린 및 피리미딘, 보다 구체적으로 아데닌 (A), 구아닌 (G), 티민 (T), 우라실 (U) 및 시토신 (C)을 포함하나, 이에 제한되지 않는다. 당은 전형적으로 펜토스 당이다. 뉴클레오티드 당은 리보스 및 데옥시리보스를 포함하나, 이에 제한되지 않는다. 당은 바람직하게는 데옥시리보스이다. 폴리뉴클레오티드는 바람직하게는 다음의 뉴클레오시드를 포함한다: 데옥시아데노신 (dA), 데옥시우리딘 (dU) 및/또는 티미딘 (dT), 데옥시구아노신 (dG) 및 데옥시시티딘 (dC). 뉴클레오티드는 전형적으로 리보뉴클레오티드 또는 데옥시리보뉴클레오티드이다. 뉴클레오티드는 전형적으로 모노포스페이트, 디포스페이트 또는 트리포스페이트를 함유한다. 뉴클레오티드는 3 개 초과의 포스페이트, 예컨대, 4 또는 5 개의 포스페이트를 포함할 수 있다. 포스페이트는 뉴클레오티드의 5' 또는 3' 측면에 부착될 수 있다. 폴리뉴클레오티드 내 뉴클레오티드는 임의의 방식으로 서로 부착될 수 있다. 뉴클레오티드는 전형적으로 핵산에서처럼 당 및 포스페이트 기에 의해 부착된다. 뉴클레오티드는 피리미딘 이합체에서처럼 핵염기를 통해 연결될 수 있다. 폴리뉴클레오티드는 단일 가닥 또는 이중 가닥일 수 있다. 폴리뉴클레오티드의 적어도 일부는 바람직하게는 이중 가닥이다. 폴리뉴클레오티드는 가장 바람직하게는 리보핵 핵산 (RNA) 또는 데옥시리보핵산 (DNA)이다. 특히, 분석물로서 폴리뉴클레오티드를 사용하는 상기 방법은 대안적으로 (i) 폴리뉴클레오티드의 길이, (ii) 폴리뉴클레오티드의 아이덴티티, (iii) 폴리뉴클레오티드의 서열, (iv) 폴리뉴클레오티드의 이차 구조 및 (v) 폴리뉴클레오티드가 변형되었는지 여부로부터 선택된 하나 이상의 특성을 결정하는 단계를 포함한다. The polynucleotide or nucleic acid can comprise any combination of any nucleotides. The nucleotides can be naturally occurring or artificial. One or more of the nucleotides in the polynucleotide can be oxidized or methylated. One or more of the nucleotides in the polynucleotide can be damaged. For example, the polynucleotide can comprise a pyrimidine dimer. Such dimers are typically associated with damage caused by ultraviolet radiation and are a major cause of skin melanoma. One or more of the nucleotides in the polynucleotide can be modified, for example, using a label or tag, suitable examples of which are known to those skilled in the art. The polynucleotide can comprise one or more spacers. The nucleotides typically contain a nucleobase, a sugar, and at least one phosphate group. The nucleobase and the sugar form a nucleoside. The nucleobase is typically heterocyclic. Nucleobases include, but are not limited to, purines and pyrimidines, more specifically, adenine (A), guanine (G), thymine (T), uracil (U), and cytosine (C). The sugar is typically a pentose sugar. Nucleotide sugars include, but are not limited to, ribose and deoxyribose. The sugar is preferably deoxyribose. The polynucleotide preferably comprises the following nucleosides: deoxyadenosine (dA), deoxyuridine (dU), and/or thymidine (dT), deoxyguanosine (dG), and deoxycytidine (dC). The nucleotides are typically ribonucleotides or deoxyribonucleotides. The nucleotides typically contain a monophosphate, a diphosphate, or a triphosphate. The nucleotides can contain more than three phosphates, such as four or five phosphates. The phosphates can be attached to the 5' or 3' side of the nucleotide. The nucleotides in the polynucleotide can be attached to each other in any manner. The nucleotides are typically attached by sugar and phosphate groups, as in nucleic acids. The nucleotides can be linked via nucleobases, as in pyrimidine dimers. The polynucleotide can be single-stranded or double-stranded. At least a portion of the polynucleotide is preferably double-stranded. The polynucleotide is most preferably ribonucleic acid (RNA) or deoxyribonucleic acid (DNA). In particular, the method using a polynucleotide as an analyte alternatively comprises the step of determining one or more characteristics selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide, and (v) whether the polynucleotide is modified.

폴리뉴클레오티드는 (i) 임의의 길이일 수 있다. 예를 들어, 폴리뉴클레오티드는 적어도 10, 적어도 50, 적어도 100, 적어도 150, 적어도 200, 적어도 250, 적어도 300, 적어도 400 또는 적어도 500 개의 뉴클레오티드 또는 뉴클레오티드 쌍의 길이일 수 있다. 폴리뉴클레오티드는 1000 개 이상의 뉴클레오티드 또는 뉴클레오티드 쌍, 5000 개 이상의 뉴클레오티드 또는 뉴클레오티드 쌍 길이, 또는 100000 개 이상의 뉴클레오티드 또는 뉴클레오티드 쌍 길이일 수 있다. 임의의 개수의 폴리뉴클레오티드가 조사될 수 있다. 예를 들어, 방법은 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50 , 100개 이상의 폴리뉴클레오티드를 특성규명하는 것과 관련될 수 있다. 2개 이상의 폴리뉴클레오티드가 특성규명되는 경우, 이들은 상이한 폴리뉴클레오티드 또는 동일한 폴리뉴클레오티드의 2개의 인스턴스(instance)일 수 있다. 폴리뉴클레오티드는 자연 발생적이거나 인공적일 수 있다. 예를 들어, 제조된 올리고뉴클레오티드의 서열을 확인하기 위해 방법을 사용할 수 있다. 방법은 전형적으로 시험관내에서에서 수행된다. The polynucleotide (i) can be of any length. For example, the polynucleotide can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400, or at least 500 nucleotides or nucleotide pairs in length. The polynucleotide can be at least 1000 nucleotides or nucleotide pairs in length, at least 5000 nucleotides or nucleotide pairs in length, or at least 100000 nucleotides or nucleotide pairs in length. Any number of polynucleotides can be examined. For example, the method can relate to characterizing 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100 or more polynucleotides. When two or more polynucleotides are characterized, they may be different polynucleotides or two instances of the same polynucleotide. The polynucleotides may be naturally occurring or artificial. For example, the method may be used to determine the sequence of a manufactured oligonucleotide. The method is typically performed in vitro .

뉴클레오티드는 (ii) 임의의 아이덴티티를 가질 수 있으며, 아데노신 모노포스페이트 (AMP), 구아노신 모노포스페이트 (GMP), 티미딘 모노포스페이트 (TMP), 우리딘 모노포스페이트 (UMP), 5-메틸시티딘 모노포스페이트, 5-하이드록시메틸시티딘 모노포스페이트, 시티딘 모노포스페이트 (CMP), 사이클릭 아데노신 모노포스페이트 (cAMP), 사이클릭 구아노신 모노포스페이트 (cGMP), 데옥시아데노신 모노포스페이트 (dAMP), 데옥시구아노신 모노포스페이트 (dGMP), 데옥시티미딘 모노포스페이트 (dTMP), 데옥시우리딘 모노포스페이트 (dUMP), 데옥시시티딘 모노포스페이트 (dCMP) 및 데옥시메틸시티딘 모노포스페이트를 포함하나, 이에 제한되지 않는다. 뉴클레오티드는 바람직하게는 AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP 및 dUMP로부터 선택된다. 뉴클레오티드는 무염기성일 수 있다 (즉, 핵염기가 결여됨). 뉴클레오티드에는 핵염기 및 당이 또한 결여될 수 있다 (즉, C3 스페이서임). (iii) 뉴클레오티드의 서열은 가닥의 5'에서 3' 방향으로 폴리뉴클레오티드 균주 전반에 걸쳐 서로 부착된 다음의 뉴클레오티드의 연속적인 아이덴티티에 의해 결정된다. The nucleotides (ii) can have any identity, including but not limited to adenosine monophosphate (AMP), guanosine monophosphate (GMP), thymidine monophosphate (TMP), uridine monophosphate (UMP), 5-methylcytidine monophosphate, 5-hydroxymethylcytidine monophosphate, cytidine monophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate (dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine monophosphate (dCMP) and deoxymethylcytidine monophosphate. The nucleotide is preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP and dUMP. The nucleotide may be abasic (i.e., lacking a nucleobase). The nucleotide may also lack a nucleobase and a sugar (i.e., is a C3 spacer). (iii) The sequence of the nucleotides is determined by the sequential identity of the following nucleotides attached to each other throughout the polynucleotide strain in the 5' to 3' direction of the strand.

적어도 2개의 협착부를 포함하는 포어 복합체는 동종중합체를 분석하는 데 특히 유용하다. 예를 들어, 포어는 동일한 2개 이상, 예컨대, 적어도 3, 4, 5, 6, 7, 8, 9 또는 10개의 연속 뉴클레오티드를 포함하는 폴리뉴클레오티드의 서열을 결정하기 위해 사용될 수 있다. 예를 들어, 포어는 폴리A, 폴리T, 폴리G 및/또는 폴리C 영역을 포함하는 폴리뉴클레오티드를 시퀀싱하기 위해 사용될 수 있다.A pore complex comprising at least two constrictions is particularly useful for analyzing homopolymers. For example, the pore can be used to determine the sequence of a polynucleotide comprising two or more identical, for example, at least 3, 4, 5, 6, 7, 8, 9 or 10 consecutive nucleotides. For example, the pore can be used to sequence a polynucleotide comprising a polyA, a polyT, a polyG and/or a polyC region.

일부 실시형태에서, CsgG 포어 협착부는 서열 번호: 59의 51번, 55번 및 56번 위치에 있는 잔기로 이루어진다. DNA가 협착부를 통과할 때, 임의의 주어진 시간에 포어의 협착부와 대략 5개 염기의 DNA의 상호작용이 전류 신호를 지배한다. 특정 CsgG 포어(예를 들어, 본원에 기재된 바와 같이 하나 이상의 보조 단백질이나 융합 단백질이 없는 CsgG 포어)는 DNA의 혼합 서열 영역(A, T, G 및 C가 혼합된 경우)을 판독하는 데 매우 뛰어나지만 DNA 내에 동종중합체 영역(예를 들어, 폴리T, 폴리G, 폴리A, 폴리C)이 있는 경우 신호가 평탄해지고 일부 정보가 부족하다. 5개 염기가 CsgG 및 이의 협착부 돌연변이체의 신호를 지배하기 때문에 추가적인 체류 시간 정보를 사용하지 않고 5개보다 긴 동종중합체를 판별하는 것은 어렵다. 그러나, DNA가 제2 협착부를 통과하는 경우 더 많은 DNA 염기가 조합된 협착부와 상호 작용하여 판별할 수 있는 동종중합체의 길이를 증가시킨다.In some embodiments, the CsgG pore constriction is comprised of residues at positions 51, 55, and 56 of SEQ ID NO: 59. As DNA passes through the constriction, at any given time, the interaction of the pore's constriction with approximately 5 bases of DNA dominates the current signal. Certain CsgG pores (e.g., CsgG pores lacking one or more accessory proteins or fusion proteins as described herein) are very good at reading mixed sequence regions of DNA (where As, Ts, Gs, and Cs are mixed), but the signal flattens and lacks some information when there are homopolymer regions within the DNA (e.g., poly-T, poly-G, poly-A, poly-C). Because the 5 bases dominate the signal of CsgG and its constriction mutants, it is difficult to discern homopolymers longer than 5 bases without using additional residence time information. However, as the DNA passes through the second constriction, more DNA bases interact with the combined constriction, increasing the length of the recognizable homopolymer.

키트Kit

추가 양태에서, 본 개시내용은 또한 표적 폴리뉴클레오티드를 특성규명하기 위한 키트를 제공한다. 키트는 개시된 포어 복합체 및 막의 구성요소를 포함한다. 막은 바람직하게는 구성요소로부터 형성된다. 포어 복합체는 바람직하게는 막에 존재하여, 함께 막횡단 포어 복합체 채널을 형성한다. 키트는 양친매성 층 또는 3중블록 공중합체 막과 같은 임의의 유형의 막의 구성요소를 포함할 수 있다. 키트는 폴리뉴클레오티드 결합 단백질, 예를 들어 핵산 처리 효소, 예를 들어 폴리머라제 또는 헬리카제를 추가로 포함할 수 있다. 키트는 폴리뉴클레오티드를 막에 커플링시키기 위한 하나 이상의 앵커, 예컨대, 콜레스테롤을 추가로 포함할 수 있다. 키트는 폴리뉴클레오티드의 특성규명을 용이하게 하기 위해 표적 폴리뉴클레오티드에 부착될 수 있는 하나 이상의 폴리뉴클레오티드 어댑터를 추가로 포함할 수 있다. 일 실시양태에서, 앵커, 예컨대, 콜레스테롤은 폴리뉴클레오티드 어댑터에 부착된다. 키트는 위에 언급된 실시양태 중 임의의 것이 수행될 수 있게 하는 하나 이상의 다른 시약 또는 기기를 추가적으로 포함할 수 있다. 이러한 시약 또는 기기는 다음 중 하나 이상을 포함한다: 적합한 완충액(들)(수용액), 대상체로부터 샘플을 수득하기 위한 수단 (예컨대, 바늘을 포함하는 용기 또는 기기), 폴리뉴클레오티드를 증폭시키고/시키거나 발현시키기 위한 수단, 또는 전압 또는 패치 클램프 장치. 시약은 유체 샘플이 시약을 재현탁하도록 건조 상태로 키트에 존재할 수 있다. 키트는 또한 선택적으로 키트가 본 개시내용의 방법에 사용될 수 있도록 하는 지침 또는 방법이 사용될 수 있는 유기체에 관한 상세사항을 포함할 수 있다. 마지막으로, 키트는 폴리뉴클레오티드 특성규명에 유용한 추가적인 구성요소를 또한 포함할 수 있다.In a further aspect, the present disclosure also provides a kit for characterizing a target polynucleotide. The kit comprises components of a disclosed pore complex and a membrane. The membrane is preferably formed from the components. The pore complex is preferably present in the membrane, together forming a transmembrane pore complex channel. The kit can comprise components of any type of membrane, such as an amphiphilic layer or a triblock copolymer membrane. The kit can further comprise a polynucleotide binding protein, e.g., a nucleic acid processing enzyme, e.g., a polymerase or a helicase. The kit can further comprise one or more anchors, e.g., cholesterol, for coupling the polynucleotide to the membrane. The kit can further comprise one or more polynucleotide adaptors that can be attached to the target polynucleotide to facilitate characterization of the polynucleotide. In one embodiment, the anchor, e.g., cholesterol, is attached to the polynucleotide adaptor. The kit can further comprise one or more other reagents or devices that enable any of the embodiments mentioned above to be performed. Such reagents or devices include one or more of the following: a suitable buffer(s) (aqueous solution), a means for obtaining a sample from a subject (e.g., a vessel or device comprising a needle), a means for amplifying and/or expressing a polynucleotide, or a voltage or patch clamp device. The reagents may be present in the kit in a dry state such that a fluid sample resuspends the reagents. The kit may also optionally include instructions for enabling the kit to be used in the methods of the present disclosure, or details regarding the organism for which the method may be used. Finally, the kit may also include additional components useful for characterizing the polynucleotides.

본 개시내용에 따른 조작된 세포 및 방법에 대해 특정 실시형태, 특이적 구성뿐만 아니라 물질 및/또는 분자가 본원에서 논의되었지만, 형태 및 상세사항에 있어서 다양한 변화 또는 변형이 본 개시내용의 범위 및 사상을 벗어나지 않으면서 이루어질 수 있음이 이해되어야 한다. 다음의 실시예는 특정 실시양태를 더 잘 예시하기 위해 제공되며, 이들은 적용을 제한하는 것으로 간주되어서는 안 된다. 적용은 청구범위에 의해서만 제한된다.Although specific embodiments, specific configurations, as well as materials and/or molecules, for the engineered cells and methods according to the present disclosure have been discussed herein, it should be understood that various changes or modifications in form and detail may be made without departing from the scope and spirit of the present disclosure. The following examples are provided to better illustrate specific embodiments and are not to be construed as limiting the application. The application is limited only by the claims.

실시예Example

실시예 1Example 1

나선 협착부를 생성하기 위해 신규 설계를 사용하여 잘 접혀 있고 나노포어의 내강으로 원하는 정도로 투영되는 작은 단백질 도메인을 선택하였다. 이러한 목적을 위해 여러 가지 프로그램을 사용할 수 있다. 이 예에서는 MASTER 프로그램을 사용하여 백본 설계를 촉진하고, Rosetta를 사용하여 가변 백본 기하학저 구조를 사용하여 서열 선택을 하는 작업흐름을 설명한다.To generate the helical constriction, a novel design was used to select small protein domains that are well folded and project into the lumen of the nanopore to a desired extent. Several programs are available for this purpose. This example describes a workflow using the MASTER program to facilitate backbone design and Rosetta to perform sequence selection using variable backbone geometry.

RF 확산, CHROMA 또는 MASTER 프로그램과 같은 프로그램을 사용하여 포어 내강으로 새로운 도메인을 투사할 수 있다. 여기서는 MASTER를 사용하였다. Protein Data Bank (PDB)에서 다음 기준과 일치하는 구조를 검색하였다 1) CsgF의 표적 영역(잔기 16-30)의 안정화; 2) 모든 단위가 9중 대칭 연산자를 사용하여 생성될 때 직경(포어 내강으로 가장 멀리 연장되는 아미노산 잔기의 Ca에서 Ca 거리) 10 과 30 사이의 새로운 협착부를 생성하기 위해 나노포어로 투사; 3) 새로운 도메인은 CsgG의 원자 또는 CsgF의 대칭 짝과 충돌해서는 안 된다. Programs such as RF diffusion, CHROMA, or MASTER can be used to project new domains into the pore lumen. Here, MASTER was used. The Protein Data Bank (PDB) was searched for structures matching the following criteria: 1) stabilization of the target region of CsgF (residues 16–30); 2) diameter (Ca-to-Ca distance of the amino acid residues extending furthest into the pore lumen) of 10 when all units were generated using the ninefold symmetry operator. and 30 3) The new domain must not collide with atoms of CsgG or its symmetric counterpart in CsgF;

첫째, PDB의 천연 단백질에서 자주 관찰되어 "설계 가능한" 기하학적 구조에서 CsgF 및 이의 대칭 이웃의 표적 영역에 도킹하는 나선이 식별되었다. 표적 영역의 RMSD와 발견된 나선을 기반으로 한 출력을 클러스터링한 후 데이터베이스에서 발견된 밀접하게 관련된 나선-나선 쌍의 수를 기반으로 상위 후보를 선택하였다(도 1). 이러한 방식으로 표적과 표적의 N 말단에 있는 4개의 아미노산에 대해 잘 패킹된 나선의 기하학적 구조가 선택되었다. 또한 대칭 관련 파트너와 유리한 나선-나선 상호 작용에 참여한 나선이 데이터베이스에서 검색되었다. 나선 백본 데이터베이스를 사용하여 나선을 연결하는 링커(예: 루프 구조)를 선택하였다(도 1). 이후 결과 백본의 서열은 Rosetta를 사용하여 설계되었다. 대표적인 서열이 생성되었다(예를 들어, 서열 번호: 1-58).First, helices that are frequently observed in natural proteins in the PDB and dock to the target region of CsgF and its symmetric neighbors in a “designable” geometry were identified. After clustering the output based on the RMSD of the target region and the discovered helices, the top candidates were selected based on the number of closely related helix-helix pairs found in the database (Fig. 1). In this way, well-packed helical geometries were selected for the target and the four amino acids at the N-terminus of the target. In addition, helices that participate in favorable helix-helix interactions with symmetry-related partners were searched in the database. Linkers (e.g., loop structures) that connect the helices were selected using the helix backbone database (Fig. 1). The sequences of the resulting backbones were then designed using Rosetta. Representative sequences were generated (e.g., SEQ ID NOs: 1-58).

실험적 검증을 위한 서열은 도 2에서 나타낸 바와 같이 가장 낮은 에너지 점수와 가장 높은 PackStat 점수를 기준으로 선택되었다. 서열의 우선순위를 더욱 높이기 위해 다수의 응집 및 아밀로이드 예측 프로그램 중 하나를 사용하여 응집 성향을 시험할 수도 있다. Sequences for experimental validation were selected based on the lowest energy score and highest PackStat score, as shown in Figure 2. To further prioritize sequences, their aggregation propensity can also be tested using one of several aggregation and amyloid prediction programs.

실시예 2Example 2

물질 및 방법 Materials and Methods

대장균Escherichia coli CsgG 포어 생성CsgG pore generation

C 말단 Strep 친화성 태그 및 암피실린 저항성 유전자를 갖는 CsgG 변이체 나노포어를 코딩하는 재조합 발현 벡터를 화학적 능력이 있는 대장균 세포로 형질전환하였다. 세포를 선택을 위한 적절한 항생제가 함유된 LB 한천 플레이트에 도말하고 37℃에서 밤새 배양하였다. 적절한 항생제가 있는 LB 배지에 한천 플레이트의 단일 콜로니를 접종하고 진탕하면서 37℃에서 밤새 성장시켰다. 배양물을 자가유도 배지와 필요한 항생제로 희석하고 진탕하면서 18℃에서 68시간 동안 배양하였다. 세포를 원심분리를 통해 채취한 후 용해하고 1x Bugbuster 추출 시약(Merck 70921) 및 0.1% DDM을 함유한 완충액으로 추출하였다. 용해물을 스핀다운하고 친화성 크로마토그래피, 열 처리 및 크기 배제 크로마토그래피를 사용하여 가용성 추출물로부터 포어를 정제하고 SDS-PAGE로 판단하여 올리고머 나노포어를 선택하였다.A recombinant expression vector encoding a CsgG mutant nanopore with a C-terminal Strep affinity tag and an ampicillin resistance gene was transformed into chemocompetent E. coli cells. Cells were streaked on LB agar plates containing the appropriate antibiotics for selection and grown overnight at 37°C. A single colony from the agar plate was inoculated into LB medium containing the appropriate antibiotics and grown overnight at 37°C with shaking. The culture was diluted with autoinduction medium and the appropriate antibiotics and grown at 18°C with shaking for 68 h. Cells were harvested by centrifugation, lysed, and extracted with buffer containing 1x Bugbuster extraction reagent (Merck 70921) and 0.1% DDM. The lysate was spun down and the pores were purified from the soluble extract using affinity chromatography, heat treatment, and size exclusion chromatography, and oligomeric nanopores were selected for by SDS-PAGE.

CsgG/CsgF 또는 융합 단백질 복합체 형성 프로토콜Protocol for formation of CsgG/CsgF or fusion protein complexes

CsgG-CsgF 복합체는 위와 같이 정제된 나노포어와 말레이미드 변형 유무에 관계없이 화학적으로 합성된 신규 융합 단백질로부터 제조되었다. 시스테인을 포함하는 융합 단백질의 경우, 융합 단백질의 고리화는 적절한 시스테인에서 티올을 가교시킴으로써 달성되었다. 나노포어는 환원제가 없는 pH 7.0 완충액으로 완충액 교환하고, 25℃에서 1시간 동안 CsgG 단량체에 대한 8배 몰 과잉의 펩티드와 함께 배양하였다. 이후 샘플을 60℃에서 15분 동안 가열한 후 원심분리하여 임의의 침전물을 제거하고 DTT를 첨가하여 추가 반응을 방지하였다. CsgG-CsgF complexes were prepared from purified nanopores as described above and novel chemically synthesized fusion proteins with or without maleimide modification. For fusion proteins containing cysteines, cyclization of the fusion proteins was achieved by cross-linking the thiol at the appropriate cysteine. The nanopores were buffer exchanged into a pH 7.0 buffer without reducing agent and incubated with an 8-fold molar excess of the peptide relative to the CsgG monomer for 1 h at 25 °C. The samples were then heated at 60 °C for 15 min, centrifuged to remove any precipitate, and DTT was added to prevent further reaction.

SDS-PAGE 분석SDS-PAGE analysis

1 μg의 복합체 및 CsgG 단독 포어 대조군을 개별 0.5 mL ProteinLoBind Eppendorf 튜브(Fisher, 10316752)에 추가하고 반응 완충액을 사용하여 10 μL 부피로 만들었다. 2x Laemmli 완충액 10 uL을 첨가하여 최종 부피를 20 μL로 만들었다. 각 샘플을 1x TGS 완충액(Sigma, T7777)을 사용하여 실행되는 4-20% TGX 겔(BioRad, 5671093)에 전체를 로딩하였다. 이는 300V에서 21분 동안 실행되었다. 겔을 이미지화하기 위해 Spyro Ruby(Merk, S4942) 염료를 제조업체의 지침에 따라 사용하였다. 이후 450 nm 레이저를 사용하여 GE Typhoon 겔 이미저에서 이미지화되었다.1 μg of complex and CsgG-only pore control were added to individual 0.5 mL ProteinLoBind Eppendorf tubes (Fisher, 10316752) and made up to 10 μL with reaction buffer. Final volume was made up to 20 μL by adding 10 uL of 2x Laemmli buffer. Each sample was loaded in its entirety onto a 4-20% TGX gel (BioRad, 5671093) run with 1x TGS buffer (Sigma, T7777). It was run at 300 V for 21 minutes. To image the gel, Spyro Ruby (Merk, S4942) dye was used according to the manufacturer's instructions. It was then imaged on a GE Typhoon gel imager using a 450 nm laser.

일부 분석의 경우, 1 ug의 복합체 및 CsgG 단독 포어 대조군을 개별 PCR 튜브에 추가하고 반응 완충액을 사용하여 10 μL 부피로 만들었다. 새로 준비된 1M DTT 스톡을 준비하고 이를 최종 농도가 10 mM이 되도록 개별 PCR 튜브에 첨가하였다. 2x Laemmli 완충액 10 μL를 첨가하여 최종 부피를 20 μL로 만들었다. 각 샘플을 PCR 열순환기에서 95℃에서 2분 동안 가열하였다. 이를 5분 동안 식힌 후 각 샘플의 물질을 1x TGS 완충액(Sigma, T7777)을 사용하여 실행되는 4-20% TGX 겔(BioRad, 5671093)에 전체적으로 로딩하였다. 이는 300V에서 21분 동안 실행되었다. 겔을 이미지화하기 위해 Spyro Ruby(Merk, S4942) 염료를 제조업체의 지침에 따라 사용하였다. 이후 450 nm 레이저를 사용하여 GE Typhoon 겔 이미저에서 이미지화되었다.For some assays, 1 ug of complex and CsgG-only pore control were added to individual PCR tubes and made up to 10 μL volume with reaction buffer. Freshly prepared 1 M DTT stock was prepared and added to individual PCR tubes to a final concentration of 10 mM. 10 μL of 2x Laemmli buffer was added to make up to a final volume of 20 μL. Each sample was heated at 95 °C for 2 minutes in a PCR thermocycler. After cooling for 5 minutes, material from each sample was loaded in its entirety onto a 4-20% TGX gel (BioRad, 5671093) run using 1x TGS buffer (Sigma, T7777). This was run at 300 V for 21 minutes. Spyro Ruby (Merk, S4942) dye was used to image the gel according to the manufacturer's instructions. Subsequently, images were obtained on a GE Typhoon gel imager using a 450 nm laser.

전기적 측정Electrical Measurement

전기적 측정은 MinION 유동 세포에 삽입된 CsgG 전용, CsgG/CsgF 또는 CsgG/융합 단백질 복합체로부터 획득되었다. 블록 공중합체 막에 단일 포어를 삽입한 후, 25 mM 칼륨 포스페이트, 150 mM 칼륨 페로시아나이드 (II), 150 mM 칼륨 페리시아나이드 (III), pH 8.0을 포함하는 1 mL의 완충액을 시스템을 통해 유동시켜 임의의 과량의 나노포어를 제거하였다. Electrical measurements were obtained from CsgG-only, CsgG/CsgF, or CsgG/fusion protein complexes inserted into the MinION flow cell. After inserting a single pore into the block copolymer membrane, 1 mL of buffer containing 25 mM potassium phosphate, 150 mM potassium ferrocyanide (II), 150 mM potassium ferricyanide (III), pH 8.0, was flowed through the system to remove any excess nanopores.

DNA 파형선을 평가하기 위해 사용될 분석물은 도 23에 기재된 바에 따라 람다 게놈의 3' 단부로부터의 3.6-킬로베이스 DNA 섹션이었다. 분석물의 제조, 분석물을 Y-어댑터에 결찰시키는 것, 결찰된 분석물의 SPRI-비드 클린-업 및 minION 유동 세포에 대한 첨가는 Oxford Nanopore Technologies Q-SQK-LSK110 프로토콜을 사용하여 수행되었다.The analyte used to evaluate DNA waveforms was a 3.6-kilobase DNA section from the 3' end of the lambda genome as described in Figure 23. Preparation of the analyte, ligation of the analyte to the Y-adapter, SPRI-bead clean-up of the ligated analyte, and addition to the minION flow cell were performed using the Oxford Nanopore Technologies Q-SQK-LSK110 protocol.

전기적 측정은 Oxford Nanopore Technologies로부터의 minION Mk1b를 사용하여 획득하였다. 연장된 나노포어 블록을 제거하기 위해 5분마다 정적 플릭을 이용하여 -180 mV의 표준 시퀀싱 스크립트를 6시간 동안 실행하였다. 미가공 데이터를 MinKNOW 소프트웨어 (Oxford Nanopore Technologies)를 사용하여 벌크 FAST5 파일로 수집하였다.Electrical measurements were acquired using a minION Mk1b from Oxford Nanopore Technologies. A standard sequencing script was run for 6 h at -180 mV with static flicks every 5 min to remove extended nanopore block. Raw data were collected as bulk FAST5 files using MinKNOW software (Oxford Nanopore Technologies).

판별 프로파일링Discriminant profiling

람다 게놈(3.6 Kb 람다)의 3' 말단으로부터 3.6-킬로베이스 DNA 섹션에 대한 DNA 파형선(예를 들어, 전기적 측정)을 포함하는 FAST5를 획득하였다. DNA 시퀀싱이 개시되기 전에 캡처된 임의의 전기적 신호 측정값을 제거하기 위해 맞춤형 파이썬 스크립트를 사용하여 DNA 파형선을 다듬었다.FAST5 containing DNA waveforms (i.e., electrical measurements) for a 3.6-kilobase DNA section from the 3' end of the lambda genome (3.6 Kb lambda) was acquired. DNA waveforms were trimmed using a custom Python script to remove any electrical signal measurements captured before DNA sequencing was initiated.

다듬은 3.6 Kb 람다 파형선과 이러한 영역에 대한 상응하는 게놈 참조를 사용하여 신경망의 매개변수를 훈련하였다. 4개의 레이어를 포함하는 신경망은 사용자가 지정한 범위 길이의 모델링된 서열과 해당 서열의 관련 전류 수준을 포함한다. 이러한 모델에 대해 지정된 범위 길이는 +/- 12개의 뉴클레오티드 영역이 임의의 한 위치에서 전류 수준에 기여할 수 있도록 허용하였다.The parameters of the neural network were trained using the trimmed 3.6 Kb lambda-waveline and the corresponding genome references for these regions. The neural network, which consists of four layers, contains modeled sequences of user-specified span lengths and their associated current levels. The span length specified for these models allowed +/- 12 nucleotide regions to contribute to the current level at any one location.

훈련된 신경망은 3.6 Kb 람다 DNA 참조 서열에 상응하는 전류 수준을 예측하기 위해 사용되었다. 또한 해당 서열의 가능한 모든 단일 염기 편집에서 전류 수준을 예측하기 위해 사용되었다.The trained neural network was used to predict current levels corresponding to a 3.6 Kb lambda DNA reference sequence, and also to predict current levels for all possible single-base edits of that sequence.

서열의 단일 위치(L)에서 염기를 변경하면 이러한 염기가 포어의 주요 협착부를 통과할 때 예측된 전류가 변경되지만 염기가 이러한 주요 협착부를 통과하기 전후의 전류도 변경된다. 위치 L의 염기가 변경될 때 위치 L+X(오프셋)에서 예측 전류의 범위를 계산하기 위해 편집된 3.6 Kb 람다 서열 세트에 대해 예측 전류 수준을 분석하였다. -16에서 +16 사이의 오프셋이 각 위치에서 분석되었다. 다이어그램에 데이터를 제공하기 위해 각 오프셋에서 예측 전류의 중앙값 범위를 계산하였다. 모델은 CsgG 협착부를 나타내는 가장 큰 피크가 위치 0에 상응하도록 중심에 배치되었다.Changing bases at a single position (L) in the sequence changes the predicted current when that base passes through the major constriction of the pore, but also changes the current before and after the base passes through this major constriction. The predicted current levels were analyzed for the edited 3.6 Kb lambda sequence set to calculate the range of predicted current at positions L+X (offsets) when the base at position L is changed. Offsets between -16 and +16 were analyzed for each position. The median range of predicted current at each offset was calculated to provide data for the diagrams. The model was centered so that the largest peak representing the CsgG constriction corresponds to position 0.

실시예 3Example 3

Rosetta를 사용하여 설계된 신규 융합 단백질 서열을 분석하고 실험적 검증을 위한 서열은 가장 낮은 에너지 점수와 가장 높은 PackStat 점수를 기준으로 선택되었다(도 2). 융합 단백질 2차 구조를 예측하기 위해 PSIPRED(예를 들어, McGuffin LJ, Bryson, K, Jones D, Bioinformatics, 16, 404-405, 2000에 기재됨) 분석을 수행하였다. 잔기는 각각 가닥, 나선 및 코일로 예측되는지 여부에 따라 음영 처리된다. 신규 설계된 융합 단백질(예를 들어, 연장된 CsgF 단백질) 및 야생형 CsgF의 성숙 서열의 2차 구조 분석이 도 3a에서 나타나 있다. 신규 설계된 융합 단백질, ONT1 내지 ONT10, ONT11 내지 ONT20, 및 ONT21 내지 ONT25에 대한 구조 분석이 도 3b 내지 도 3c에서 나타나 있다.The designed novel fusion protein sequences were analyzed using Rosetta, and the sequences for experimental validation were selected based on the lowest energy score and the highest PackStat score (Fig. 2). PSIPRED (e.g., described in McGuffin LJ, Bryson, K, Jones D, Bioinformatics, 16, 404-405, 2000) analysis was performed to predict the secondary structures of the fusion proteins. Residues are shaded according to whether they are predicted to be strands, helices, and coils, respectively. Secondary structure analysis of the mature sequences of the newly designed fusion proteins (e.g., extended CsgF protein) and wild-type CsgF is shown in Fig. 3a. Structural analyses of the newly designed fusion proteins, ONT1 to ONT10, ONT11 to ONT20, and ONT21 to ONT25, are shown in Figs. 3b to 3c.

신규 설계된 융합 단백질에 대한 대체 서열의 3차원 구조도 단백질 접힘 알고리즘을 사용하여 조사되었다. 신규 설계된 융합 단백질 ONT1 내지 ONT10, ONT11 내지 ONT20, 및 ONT21 내지 ONT25에 대한 예측된 3D 구조는 도 4a 내지 도 4c에서 나타나 있다. 구조는 신뢰도 측정인 예측된 지역 거리 차이 시험(pLDDT)에 따라 음영 처리된다.The three-dimensional structures of alternative sequences for the newly designed fusion proteins were also investigated using protein folding algorithms. The predicted 3D structures for the newly designed fusion proteins ONT1 to ONT10, ONT11 to ONT20, and ONT21 to ONT25 are shown in Figures 4a to 4c. The structures are shaded according to the predicted local distance difference test (pLDDT), a confidence measure.

CsgG 단독 포어 및 CsgG/융합 단백질 복합체의 SDS-PAGE 겔 분석을 수행하였다. 복합체는 말레이미드 가교제 유무에 관계없이 CsgF-del(S31-F119) 대조군 또는 신규 설계된 융합 단백질로 구성되었다(도 5). 융합 단백질을 포함하는 복합체는 밴드 이동을 나타내었으며, 이는 이러한 샘플이 나노포어 복합체임을 나타낸다. 겔에 로딩하기 전에 샘플을 가열하지 않았다. CsgG 단독 포어 및 CsgG/융합 단백질 복합체의 SDS-PAGE 겔 분석도 수행되었다. 이러한 복합체는 말레이미드 가교제 유무에 관계없이 CsgF-del(S31-F119) 대조군 또는 신규 설계된 융합 단백질로 구성되었다(도 6). 겔에 로딩하기 전에 DTT 존재하에 끓이는 즉시 포어가 구성 단량체 구성요소로 분해되었다. 말레이미드 가교가 없는 경우 밴드 이동이 관찰되지 않았으며, 이는 이러한 밴드가 CsgG 단량체로만 구성되었음을 나타낸다. 레인 7은 CsgG 전용 대조군과 비교하여 밴드 이동을 나타냈는데, 이는 융합 단백질이 말레이미드의 존재로 인해 CsgG 포어에 공유 결합된다는 것을 나타낸다. 레인 8 및 레인 9는 융합 단백질의 질량 증가로 인해 추가적인 밴드 이동을 나타내었다. 이는 융합 단백질이 CsgG 포어에 공유 결합되어 있음을 나타낸다. SDS-PAGE gel analysis of CsgG-only pore and CsgG/fusion protein complexes was performed. The complexes were composed of CsgF-del(S31-F119) control or newly designed fusion proteins with or without maleimide crosslinker (Fig. 5). The complexes containing the fusion proteins showed a band shift, indicating that these samples were nanopore complexes. The samples were not heated prior to loading onto the gel. SDS-PAGE gel analysis of CsgG-only pore and CsgG/fusion protein complexes was also performed. The complexes were composed of CsgF-del(S31-F119) control or newly designed fusion proteins with or without maleimide crosslinker (Fig. 6). The pores were immediately decomposed into their constituent monomer components upon boiling in the presence of DTT prior to loading onto the gel. No band shift was observed in the absence of maleimide crosslinker, indicating that these bands were composed solely of CsgG monomers. Lane 7 showed a band shift compared to the CsgG-only control, indicating that the fusion protein is covalently bound to the CsgG pore due to the presence of maleimide. Lanes 8 and 9 showed additional band shifts due to the increase in mass of the fusion protein, indicating that the fusion protein is covalently bound to the CsgG pore.

단일 가닥 DNA가 CsgG 단독 포어를 통해 전위될 때 이온 전류(pA) 대 시간(s)을 측정하였다. 각 개별 그래프는 minION 플로우 셀에 삽입된 단일 포어에 해당한다. CsgG 단독 포어에 대해 관찰된 개방 포어 전류는 -180 mV의 인가 전압하에서 대략 180 pA였다. 아래 표 1은 본 개시내용에 의해 기재된 바와 같은 단백질 포어 복합체의 중앙값 범위, 중앙값 잡음 및 중앙값 신호 대 잡음비(SNR)에 대한 대표적인 데이터를 나타낸다.Ionic current (pA) versus time (s) was measured as single-stranded DNA translocated through a CsgG-only pore. Each individual graph corresponds to a single pore inserted into the minION flow cell. The open pore current observed for the CsgG-only pore was approximately 180 pA at an applied voltage of -180 mV. Table 1 below presents representative data for median range, median noise, and median signal-to-noise ratio (SNR) of protein pore complexes as described by the present disclosure.

매트릭스 표Matrix table 포어 단량체(접합체)Pore monomer (conjugate) 중앙값 범위(pA)Median range (pA) 중앙값 잡음(pA)Median noise (pA) 중앙값 SNRMedian SNR CsgG-F56QCsgG-F56Q 25.0125.01 3.843.84 6.496.49 CsgG-F56Q / del(S31-F119) CsgG-F56Q / del(S31-F119) 12.6412.64 1.771.77 7.117.11 CsgG-F56Q / del(S31-F119)-EXT1 CsgG-F56Q/del(S31-F119)-EXT1 12.6312.63 1.781.78 6.996.99 CsgG-F56Q / K37R-del(S31-F119)-EXT1CsgG-F56Q/K37R-del(S31-F119)-EXT1 12.4612.46 1.741.74 7.047.04 CsgG-F56Q / N24C-K37R-del(S31-F119)-EXT2CsgG-F56Q/N24C-K37R-del(S31-F119)-EXT2 15.7215.72 2.592.59 5.715.71 CsgG-F56Q-Q153CCsgG-F56Q-Q153C 25.6225.62 3.533.53 7.267.26 CsgG-F56Q-Q153C / Mal-Del(S31-F119) CsgG-F56Q-Q153C / Mal-Del(S31-F119) 13.1713.17 1.751.75 7.487.48 CsgG-F56Q-Q153C / Mal-K37R-Del(S31-F119)-EXT1CsgG-F56Q-Q153C / Mal-K37R-Del(S31-F119)-EXT1 12.8512.85 1.771.77 7.257.25 CsgG-F56Q-Q153C / Mal-N24C/K37R-Del(S31-F119)-EXT2CsgG-F56Q-Q153C / Mal-N24C/K37R-Del(S31-F119)-EXT2 15.8115.81 2.152.15 7.367.36

도 7 내지 도 11은 단일 가닥 DNA가 CsgG 단독 포어, del(S31-F119) CsgF 펩티드를 포함하는 CsgG 또는 신규 설계된 융합 단백질을 포함하는 CsgG를 통해 전위할 때의 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 원시 전류 추적은 흑색 선으로 나타나 있으며, 이벤트 검출 신호는 적색 선으로 나타나 있다. 각 포어에 대해 상단 행은 전체 DNA 전류 추적이 나타내고 하단 행은 전류 추적의 제1 섹션이 확대된 보기로 나타나 있다. CsgG 전용인 포어에 대한 개방 포어 전류는 대략 175-200 pA인 것으로 관찰되었으며, DNA 파형선의 중앙 전류는 대략 75 pA였다. CsgF 펩티드를 포함하는 포어의 경우, 개방 포어 전류는 대략 90-120 pA이고, 중앙 전류는 대략 35-50 pA이다. 도 7은 DNA가 CsgG 단독 포어를 통해 전위함에 따른 추적을 나타낸다. 도 8은 말레이미드 가교제가 있거나(우측) 없는(좌측) del(S31-F119) CsgF 펩티드를 포함하는 CsgG를 통해 단일 가닥 DNA가 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 도 9는 단일 가닥 DNA가 말레이미드 가교제의 부재하에 신규 설계된 융합 단백질, ONLP20623을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 도 10은 단일 가닥 DNA가 신규 설계된 융합 단백질, ONLP20624(말레이미드 가교 없음) 또는 ONLP20627(말레이미드 가교 있음)을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 도 11은 단일 가닥 DNA가 신규 설계된 융합 단백질, ONLP20628(말레이미드 가교제 있음) 또는 ONLP20625(말레이미드 가교제 없음)을 포함하는 CsgG를 통해 전위될 때 대표적인 이온 전류(pA) 대 시간(s) 추적을 나타낸다. 일부 실시형태에서, 융합 단백질은 시스테인 잔기와 함께 37R 잔기를 포함하여 펩티드 내에 내부 이황화 결합을 형성한다. 즉, 융합 단백질을 고리화한다.Figures 7-11 show representative ion current (pA) versus time (s) traces as single-stranded DNA translocates through a CsgG-only pore, CsgG comprising the del(S31-F119) CsgF peptide, or CsgG comprising the novel designed fusion proteins. The raw current traces are shown as black lines, and the event detection signals are shown as red lines. For each pore, the top row shows the entire DNA current trace, and the bottom row shows an enlarged view of the first section of the current trace. The open pore current for the CsgG-only pore was observed to be approximately 175-200 pA, with a median current of the DNA waveform being approximately 75 pA. For the pore comprising the CsgF peptide, the open pore current was approximately 90-120 pA, with a median current of approximately 35-50 pA. Figure 7 shows traces as DNA translocates through a CsgG-only pore. Figure 8 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG comprising del(S31-F119) CsgF peptide with (right) and without (left) a maleimide crosslinker. Figure 9 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG comprising the novel designed fusion protein, ONLP20623, in the absence of a maleimide crosslinker. Figure 10 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG comprising the novel designed fusion protein, ONLP20624 (without maleimide crosslink) or ONLP20627 (with a maleimide crosslink). Figure 11 shows representative ion current (pA) versus time (s) traces when single-stranded DNA is translocated through CsgG comprising the novel designed fusion proteins, ONLP20628 (with maleimide cross-linker) or ONLP20625 (without maleimide cross-linker). In some embodiments, the fusion protein comprises a 37R residue together with a cysteine residue to form an internal disulfide bond within the peptide, i.e., cyclization of the fusion protein.

DNA 분자가 포어를 통해 전위될 때 포어 내의 위치와 이온 전류 수준의 전반적인 변화("판별")에 대한 이들의 기여를 입증하는 프로파일이 생성되었다. 포어 내의 거리는 주요 협착부를 기준으로 뉴클레오티드 단계로 측정된다. 음수 값은 주요 협착부 아래 위치에 해당하고 양수 값은 주요 협착부(CsgG) 위 위치에 해당한다. 점선 상자는 신규 설계된 융합 단백질의 도입에 의해 영향을 받는 영역을 나타낸다. 도 12는 DNA 분자가 CsgG 단독 포어를 통해 전위되는 경우의 대표적인 프로파일을 나타낸다. CsgG 단독 포어(+/- Q153C)은 위치 0에서 하나의 주요 판별 피크를 나타낸다. 도 13은 DNA 분자가 CsgG/CsgF 포어를 통해 전위되는 경우의 대표적인 프로파일을 나타낸다. 점선 상자는 신규 설계된 융합 단백질의 도입에 의해 영향을 받는 영역을 나타낸다. 말레이미드 가교제가 있거나(우측) 없는(좌측) CsgG-CsgF-del(S31-F119) 포어는 2개의 판별 피크를 나타낸다. CsgG 단독 포어에서 볼 수 있는 바와 같은 위치 0의 주요 판별 피크 및 주요 협착부(위치 -4 내지 -6) 아래의 추가 판별 피크 4-6 뉴클레오티드. 이러한 추가 판별 영역은 위치 0의 주요 판별 피크에 비해 이온 전류에 대한 영향이 적다. 도 14는 DNA 분자가 CsgG/융합 단백질(ONLP20641 또는 ONLP20644) 포어를 통해 전위되는 경우의 대표적인 프로파일을 나타낸다. 말레이미드 가교제가 있는(오른쪽) 또는 없는(왼쪽) 고리화가 있는 K37R을 함유하는 CsgG 및 신규 설계된 융합 단백질로 구성된 복합체는 3가지 판별 피크를 나타낸다. CsgG 단독 포어에서 볼 수 있듯이 주요 판별 피크는 위치 0에 있고 추가 피크는 위치 -6 및 -9에 있다. 위치 -9의 피크는 올바른 방향으로 접힐 때 신규 설계된 융합 단백질에 의해 생성된 예상된 협착부에 해당한다. Profiles were generated demonstrating their contribution to the overall change in ionic current level ("discrimination") and position within the pore when a DNA molecule is translocated through the pore. Distance within the pore is measured in nucleotide steps relative to the major constriction. Negative values correspond to positions below the major constriction and positive values correspond to positions above the major constriction (CsgG). Dashed boxes indicate regions affected by introduction of the novel designed fusion proteins. Figure 12 shows representative profiles when a DNA molecule is translocated through a CsgG-only pore. The CsgG-only pore (+/- Q153C) exhibits one major discriminant peak at position 0. Figure 13 shows representative profiles when a DNA molecule is translocated through a CsgG/CsgF pore. Dashed boxes indicate regions affected by introduction of the novel designed fusion proteins. CsgG-CsgF-del(S31-F119) pores with (right) or without (left) the maleimide crosslinker exhibit two discriminant peaks: a major discriminant peak at position 0 as seen in the CsgG-only pore and an additional discriminant peak 4-6 nucleotides below the major constriction (positions -4 to -6). This additional discriminant region has less influence on the ion current than the major discriminant peak at position 0. Figure 14 shows representative profiles when DNA molecules are translocated through a CsgG/fusion protein (ONLP20641 or ONLP20644) pore. Complexes composed of CsgG and the newly designed fusion proteins containing K37R with cyclization with (right) or without (left) the maleimide crosslinker exhibit three discriminant peaks. As seen in the CsgG-only pore, the main discriminant peak is at position 0, with additional peaks at positions -6 and -9. The peak at position -9 corresponds to the expected constriction created by the newly designed fusion protein when folded in the correct orientation.

실시예 3Example 3

서열 번호: 61에 나타낸 9개의 서브유닛으로부터 형성된 포어(말레이미드 가교제가 있거나 없음; 둘 모두 고리화 없음) 또한 실시예 2에서 기재된 바와 같이 시험하였다. 결과는 도 17 내지 18에서 나타나 있다.Pores formed from the nine subunits shown in SEQ ID NO: 61 (with or without maleimide crosslinker; both without cyclization) were also tested as described in Example 2. The results are shown in FIGS. 17-18 .

대표적인 서열Representative sequence

> (서열 번호: 1) CsgF-WT-del(S31-F119) )-Ext(31-GGELAAKLWANGDETNALSLFQTIIQS) (ONLP20623)> (SEQ ID NO: 1) CsgF-WT-del(S31-F119) )-Ext(31-GGELAAKLWANGDETNALSLFQTIIQS) (ONLP20623)

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGGELAAKLWANGDETNALSLFQTIIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGGELAAKLWANGDETNALSLFQTIIQS

> (서열 번호: 2) CsgF-WT-K37R-del(S31-F119)-Ext(31-GGELAAKLWANGDETNALSLFQTIIQS) (ONLP20624)> (SEQ ID NO: 2) CsgF-WT-K37R-del(S31-F119)-Ext(31-GGELAAKLWANGDETNALSLFQTIIQS) (ONLP20624)

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGGELAARLWANGDETNALSLFQTIIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGGELAARLWANGDETNALSLFQTIIQS

> (서열 번호: 3) CsgF-WT-N24C/K37R-del(S31-F119)-Ext(31-GGELAAKLWANGDETNALSLFQTIIQSC) (ONLP20625)> (SEQ ID NO: 3) CsgF-WT-N24C/K37R-del(S31-F119)-Ext(31-GGELAAKLWANGDETNALSLFQTIIQSC) (ONLP20625)

GTMTFQFRNPNFGGNPNNGAFLLCSAQAQNGGELAARLWANGDETNALSLFQTIIQSCGTMTFQFRNPNFGGNPNNGAFLLCSAQAQNGGELAARLWANGDETNALSLFQTIIQSC

> (서열 번호: 4) Mat-CsgF-Eco-(WT-Del(S31-F119)-Ext(31-AGELAKKLWENGNVNQALSLFQTVIQS) (ONLZ19432, DGLONT76)> (SEQ ID NO: 4) Mat-CsgF-Eco-(WT-Del(S31-F119)-Ext(31-AGELAKKLWENGNVNQALSLFQTVIQS) (ONLZ19432, DGLONT76)

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGNVNQALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGNVNQALSLFQTVIQS

> (서열 번호: 5) Mat-CsgF-Eco-(WT-K36R/K37R-Del(S31-F119)-Ext(31-AGELAKKLWENGNVNQALSLFQTVIQS) (ONLZ19431)> (SEQ ID NO: 5) Mat-CsgF-Eco-(WT-K36R/K37R-Del(S31-F119)-Ext(31-AGELAKKLWENGNVNQALSLFQTVIQS) (ONLZ19431)

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELARRLWENGNVNQALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELARRLWENGNVNQALSLFQTVIQS

> (서열 번호: 6) Mat-CsgF-Eco-(WT-N24C/K36R/K37R-Del(S31-F119)-Ext(31-AGELARRLWENGNVNQALSLFQTVIQSC) (ONLZ19781)> (SEQ ID NO: 6) Mat-CsgF-Eco-(WT-N24C/K36R/K37R-Del(S31-F119)-Ext(31-AGELARRLWENGNVNQALSLFQTVIQSC) (ONLZ19781)

GTMTFQFRNPNFGGNPNNGAFLLCSAQAQNAGELARRLWENGNVNQALSLFQTVIQSCGTMTFQFRNPNFGGNPNNGAFLLCSAQAQNAGELARRLWENGNVNQALSLFQTVIQSC

>(서열 번호: 7) ONT113_2>(SEQ ID NO: 7) ONT113_2

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAAELAAKLWANADETNALSLFQTIIQS GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAAELAAKLWANADETNALSLFQTIIQS

>(서열 번호: 8) ONT113_3>(SEQ ID NO: 8) ONT113_3

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAAELAAKLWANADETNALSLFQTLIQS GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAAELAAKLWANADETNALSLFQTLIQS

> (서열 번호: 9) ONT1> (SEQ ID NO: 9) ONT1

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKKGDLTNALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKKGDLTNALSLFQTVIQS

> (서열 번호: 10) ONT2> (SEQ ID NO: 10) ONT2

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELVEKLFKNGDWTNAISIFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELVEKLFKNGDWTNAISIFQTVIQS

> (서열 번호: 11) ONT 3> (SEQ ID NO: 11) ONT 3

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGDETNALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGDETNALSLFQTVIQS

> (서열 번호: 12) ONT 4> (SEQ ID NO: 12) ONT 4

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWKNGDETNALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWKNGDETNALSLFQTVIQS

> (서열 번호: 13) ONT 5> (SEQ ID NO: 13) ONT 5

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGDETNALSLFQTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGDETNALSLFQTVVQS

> (서열 번호: 14) ONT 6> (SEQ ID NO: 14) ONT 6

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGNESDALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGNESDALSLFQTVIQS

> (서열 번호: 15) ONT 7> (SEQ ID NO: 15) ONT 7

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFENGDKTNALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFENGDKTNALSLFQTVIQS

> (서열 번호: 16) ONT 8> (SEQ ID NO: 16) ONT 8

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGDETNALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGDETNALSLFQTVIQS

> (서열 번호: 17) ONT 9> (SEQ ID NO: 17) ONT 9

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGNSEDALALFRTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGNSEDALALFRTVVQS

> (서열 번호: 18) ONT 10 > (SEQ ID NO: 18) ONT 10

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGDMENAMKLFQTVIASGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGDMENAMKLFQTVIAS

> (서열 번호: 19) ONT 11> (SEQ ID NO: 19) ONT 11

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGDKDRALALFRTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRNGDKDRALALFRTVIQS

> (서열 번호: 20) ONT 12> (SEQ ID NO: 20) ONT 12

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELADKLWKNGDKDRALSLFQTVIQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELADKLWKNGDKDRALSLFQTVIQS

> (서열 번호: 21) ONT 13> (SEQ ID NO: 21) ONT 13

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGDMDRALALFRTVIASGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGDMDRALALFRTVIAS

> (서열 번호: 22) ONT 14> (SEQ ID NO: 22) ONT 14

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGNEEDALALFRTVVASGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLFDNGNEEDALALFRTVVAS

> (서열 번호: 23) ONT 15> (SEQ ID NO: 23) ONT 15

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDEENALKLFRTVVTS GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDEENALKLFRTVVTS

> (서열 번호: 24) ONT 16> (SEQ ID NO: 24) ONT 16

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGNMEDALKLFRTVIASGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGNMEDALKLFRTVIAS

> (서열 번호: 25) ONT 17> (SEQ ID NO: 25) ONT 17

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAILWKNGNKSDALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAILWKNGNKSDALSLFQTVVTS

> (서열 번호: 26) ONT 18> (SEQ ID NO: 26) ONT 18

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDLTNALSLFQTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDLTNALSLFQTVVQS

> (서열 번호: 27) ONT 19> (SEQ ID NO: 27) ONT 19

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLLRKGDVETALTLFAQVISGGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLLRKGDVETALTLFAQVISG

> (서열 번호: 28) ONT 20> (SEQ ID NO: 28) ONT 20

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLILKGDLETALKLFAIVIAGGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLILKGDLETALKLFAIVIAG

> (서열 번호: 29) ONT 21> (SEQ ID NO: 29) ONT 21

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLLRKGDVETALKLFAIVIAGGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGLKLLRKGDVETALKLFAIVIAG

> (서열 번호: 30) ONT 22> (SEQ ID NO: 30) ONT 22

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLYENGLIELALMLFALVIASGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLYENGLIELALMLFALVIAS

> (서열 번호: 31) ONT 23> (SEQ ID NO: 31) ONT 23

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELYKKLWDNGEVDKALDLFAKIIAGGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELYKKLWDNGEVDKALDLFAKIIAG

> (서열 번호: 32) ONT 24> (SEQ ID NO: 32) ONT 24

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGKKLIEKGDLETALKLFAIVIAGGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELGKKLIEKGDLETALKLFAIVIAG

> (서열 번호: 33) ONT 25> (SEQ ID NO: 33) ONT 25

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGEIALRLLKNGKEEEALKTLLVTIAGGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGEIALRLLKNGKEEEALKTLLVTIAG

> (서열 번호: 34) ONT26> (SEQ ID NO: 34) ONT26

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDETNALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDETNALSLFQTVVTS

> (서열 번호: 35) ONT27> (SEQ ID NO: 35) ONT27

> (서열 번호: 36) ONT28> (SEQ ID NO: 36) ONT28

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDETNALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDETNALSLFQTVVTS

> (서열 번호: 37) ONT29> (SEQ ID NO: 37) ONT29

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGDLAAKLWKKGDETNALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGDLAAKLWKKGDETNALSLFQTVVTS

> (서열 번호: 38) ONT30> (SEQ ID NO: 38) ONT30

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGNSSDALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGNSSDALSLFQTVVTS

> (서열 번호: 39) ONT31> (SEQ ID NO: 39) ONT31

> (서열 번호: 40) ONT32> (SEQ ID NO: 40) ONT32

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDSSNALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWEKGDSSNALSLFQTVVTS

> (서열 번호: 41) ONT33> (SEQ ID NO: 41) ONT33

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGDLAAKLWKNGDETNALSLFQTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGDLAAKLWKNGDETNALSLFQTVVTS

> (서열 번호: 42) ONT34> (SEQ ID NO: 42) ONT34

> (서열 번호: 43) ONT35> (SEQ ID NO: 43) ONT35

> (서열 번호: 44) ONT36> (SEQ ID NO: 44) ONT36

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNSGDLDRALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNSGDLDRALALFRTVVTS

> (서열 번호: 45) ONT37> (SEQ ID NO: 45) ONT37

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAKELYDNGDEKWALLLFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAKELYDNGDEKWALLLFRTVVTS

> (서열 번호: 46) ONT38> (SEQ ID NO: 46) ONT38

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAELYKNGDEKNALLLFRTVVASGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGKVAAELYKNGDEKNALLLFRTVVAS

>(서열 번호: 47) ONT39>(SEQ ID NO: 47) ONT39

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDMENALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFKNGDMENALALFRTVVTS

>(서열 번호: 48) ONT40>(SEQ ID NO: 48) ONT40

> (서열 번호: 49) ONT41> (SEQ ID NO: 49) ONT41

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNKGDEDRALALFRTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNKGDEDRALALFRTVVQS

> (서열 번호: 50) ONT42> (SEQ ID NO: 50) ONT42

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGDEENALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGDEENALALFRTVVTS

> (서열 번호: 51) ONT43> (SEQ ID NO: 51) ONT43

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRSGDADRALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAEKLWRSGDADRALALFRTVVTS

> (서열 번호: 52) ONT44> (SEQ ID NO: 52) ONT44

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGNEEDALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKNGNEEDALALFRTVVTS

> (서열 번호: 53) ONT45> (SEQ ID NO: 53) ONT45

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNNGDEDRALALFRTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNNGDEDRALALFRTVVQS

> (서열 번호: 54) ONT46> (SEQ ID NO: 54) ONT46

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDEDRALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLWKKGDEDRALALFRTVVTS

> (서열 번호: 55) ONT47> (SEQ ID NO: 55) ONT47

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNSGDEDRALALFRTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLFNSGDEDRALALFRTVVQS

> (서열 번호: 56) ONT48> (SEQ ID NO: 56) ONT48

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLYNNGDLDRADATFRTVVQSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAAKLYNNGDLDRADATFRTVVQS

> (서열 번호: 57) ONT49> (SEQ ID NO: 57) ONT49

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGNEEDALALFRTVVTSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGELAKKLWENGNEEDALALFRTVVTS

> (서열 번호: 58) ONT50> (SEQ ID NO: 58) ONT50

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGEIAKQLWEKGDESSAITVATIVLSSGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGEIAKQLWEKGDESSAITVATIVLSS

>(서열 번호: 59) 야생형 대장균 CsgG 단백질 단량체(신호 서열 없음)>(SEQ ID NO: 59) Wild-type E. coli CsgG protein monomer (without signal sequence)

CLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEVGYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPESCLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYE SNVKSGGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEVGYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES

>(서열 번호: 60) CsgF 펩티드의 잔기 1-30>(SEQ ID NO: 60) Residues 1-30 of CsgF peptide

GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNGTMTFQFRNPNFGGNPNNGAFLLNSAQAQN

>(서열 번호: 61) CsgF-WT-del(S31-F119)-Ext(31-AGILAAQLWNNGDYDRALSLFIAVVQS-57) GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGILAAQLWNNGDYDRALSLFIAVVQS>(SEQ ID NO: 61) CsgF-WT-del(S31-F119)-Ext(31-AGILAAQLWNNGDYDRALSLFIAVVQS-57) GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNAGILAAQLWNNGDYDRALSLFIAVVQS

서열목록 전자파일 첨부Attach electronic file of sequence list

Claims

As a protein nanopore complex,
(a) CsgG nanopores comprising an inner lumen; and
(b) a protein nanopore complex comprising a fusion polypeptide comprising a first portion comprising a CsgF protein and a second portion comprising a helix-forming accessory protein, wherein the fusion protein is attached to the nanopore.

In the first aspect, the first portion of the fusion protein is a protein nanopore complex attached to the CsgG nanopore.

A protein nanopore complex according to claim 1 or 2, wherein the first portion of the fusion protein is positioned within the lumen of the CsgG nanopore.

A protein nanopore complex according to any one of claims 1 to 3, wherein the first portion of the fusion protein extends outside the lumen of the CsgG nanopore.

A protein nanopore complex according to any one of claims 1 to 4, wherein the first portion forms a first constricted region in the lumen of the CsgG nanopore.

In the fifth paragraph, the second portion is a protein nanopore complex forming a second constricted region.

A protein nanopore complex according to any one of claims 1 to 6, wherein the CsgG nanopore further comprises a constriction region.

A protein nanopore complex according to any one of claims 1 to 7, wherein the second portion is not attached to the CsgG nanopore.

A protein nanopore complex according to any one of claims 1 to 8, wherein the second portion comprises at least one alpha helix.

A protein nanopore complex according to any one of claims 1 to 9, wherein each of said alpha helices comprises 0 to 15 alpha helical turns.

A protein nanopore complex according to any one of claims 1 to 10, wherein the second portion comprises a first alpha helix comprising 1 to 4 alpha helix turns and a second alpha helix comprising 3 to 6 alpha helix turns.

In claim 11, a protein nanopore complex wherein the second alpha helix is packed into the first alpha helix.

A protein nanopore complex according to any one of claims 9 to 12, wherein the second portion comprises 1 to 55 amino acid residues.

In any one of claims 6 to 13, the distance between the first constriction region and the second constriction region is in a range of about 5 About 80 Human protein nanopore complex.

In any one of claims 1 to 14, the protein nanopore complex is 90 having an excess shaft length, optionally the shaft length being in the range of about 95 About 160 Human protein nanopore complex.

A protein nanopore complex according to any one of claims 1 to 15, wherein the fusion protein is attached to the nanopore by a linker.

In claim 16, the linker is a protein nanopore complex comprising a bond, a peptide linker or a chemical linker.

A protein nanopore complex according to claim 16 or 17, wherein the linker comprises a bond formed by a sulfur (VI) fluoride exchange (SuFEx) reaction.

A protein nanopore complex according to claim 16 or 17, wherein the linker comprises one or more maleimide molecules.

A protein nanopore complex according to any one of claims 1 to 19, wherein the fusion protein is a cyclized protein.

In claim 20, the cyclization is a protein nanopore complex comprising one or more side chain-to-side chain cyclization bonds.

A protein nanopore complex in claim 21, wherein at least one of the side chain-to-side chain cyclization bonds is a disulfide bond.

As a protein nanopore complex,
(a) a CsgG nanopore comprising a lumen and a first constricted region formed within the lumen of the nanopore; and
(b) a protein nanopore complex comprising a fusion protein comprising a first portion comprising a CsgF protein and a second portion comprising a helix-forming accessory protein, wherein the fusion protein is attached to the nanopore.

In claim 23, a protein nanopore complex wherein the first portion of the fusion protein is attached to the CsgG nanopore.

A protein nanopore complex according to claim 23 or 24, wherein the first portion of the fusion protein is positioned within the lumen of the CsgG nanopore.

A protein nanopore complex according to any one of claims 23 to 25, wherein the second portion of the fusion protein is positioned outside the lumen of the CsgG nanopore.

A protein nanopore complex according to any one of claims 23 to 26, wherein the first portion forms a second constriction region in the lumen of the CsgG nanopore.

In claim 27, the second portion is a protein nanopore complex forming a third constriction region in the lumen of the CsgG nanopore.

A protein nanopore complex according to any one of claims 23 to 28, wherein the second portion is not attached to the CsgG nanopore.

A protein nanopore complex according to any one of claims 23 to 29, wherein the second portion comprises at least one alpha helix.

A protein nanopore complex in claim 30, wherein each of said alpha helices comprises 0 to 15 alpha helical turns.

A protein nanopore complex according to any one of claims 23 to 31, wherein the second portion comprises 1 to 55 amino acid residues.

A protein nanopore complex according to any one of claims 23 to 32, wherein the fusion protein is a cyclized protein.

In claim 33, the cyclization is a protein nanopore complex comprising one or more side chain-to-side chain cyclization bonds.

A protein nanopore complex in claim 34, wherein at least one of the side chain-to-side chain cyclization bonds is a disulfide bond.

As a protein nanopore complex,
(a) a CsgG nanopore comprising a lumen and a first constricted region formed within the lumen of the nanopore;
(b) a first accessory protein that attaches to the CsgG nanopore and forms a second constriction region within the lumen of the nanopore; and
(c) a protein nanopore complex comprising a second accessory protein attached to the CsgG nanopore or the first accessory protein to form a third constriction region.

In claim 36, the first auxiliary protein is a protein nanopore complex positioned within the lumen of the CsgG nanopore.

A protein nanopore complex according to claim 36 or 37, wherein the first auxiliary protein comprises a CsgF protein or peptide.

A protein nanopore complex according to any one of claims 36 to 38, wherein the second auxiliary protein comprises at least one alpha helix.

A protein nanopore complex in claim 39, wherein each of said one or more alpha helices comprises 0 to 15 alpha helical turns.

A protein nanopore complex according to claim 39 or 40, wherein the second auxiliary protein comprises two alpha helices.

A protein nanopore complex in claim 41, wherein one of the alpha helices comprises 1 to 6 alpha helical turns.

A protein nanopore complex according to claim 41 or 42, wherein one of the alpha helices comprises 1 to 10 alpha helical turns.

A protein nanopore complex according to any one of claims 41 to 43, wherein one of the alpha helices comprises three alpha helical turns and the other alpha helix comprises three or four alpha helical turns.

A protein nanopore complex according to any one of claims 36 to 44, wherein the second auxiliary protein comprises at least one alpha helix packed into an alpha helix of the first auxiliary protein.

A protein nanopore complex according to any one of claims 36 to 45, wherein the second auxiliary protein comprises 1 to 55 amino acid residues.

In any one of claims 36 to 46, the distance between the first constriction and the second constriction is in a range of about 10 About 80 Human protein nanopore complex.

In any one of claims 36 to 47, the distance between the second constriction and the third constriction is in the range of about 5 About 80 Human protein nanopore complex.

In any one of claims 36 to 48, the protein nanopore complex is 90 having an excess shaft length, optionally the shaft length being in the range of about 95 About 160 Human protein nanopore complex.

A protein nanopore complex according to any one of claims 36 to 49, wherein the first auxiliary protein and the second auxiliary protein are attached by a linker.

In claim 50, the linker is a protein nanopore complex comprising a bond, a peptide linker or a chemical linker.

A protein nanopore complex in claim 50 or 51, wherein the linker comprises a bond formed by a sulfur (VI) fluoride exchange (SuFEx) reaction.

A protein nanopore complex according to claim 50 or 51, wherein the linker comprises one or more maleimide molecules.

A protein nanopore complex according to any one of claims 36 to 53, wherein the first auxiliary protein and the second auxiliary protein comprise one or more side chain-to-side chain cyclization bonds.

A protein nanopore complex in claim 54, wherein at least one of the side chain-to-side chain cyclization bonds is a disulfide bond.

A system for characterizing a target analyte, comprising a protein nanopore complex of any one of claims 1 to 55 inserted into a membrane.

A system further comprising: an electrically conductive solution in contact with the protein nanopore complex, an electrode providing a voltage potential across the membrane; and a measurement system for measuring current passing through the protein nanopore complex, in claim 56.

A method for characterizing a target analyte, comprising: (a) contacting a system according to claim 56 with the target analyte; (b) applying a potential across the membrane so that the target analyte moves about the lumen formed by the protein nanopore complex; and (c) performing one or more measurements as the target analyte moves about the lumen, thereby characterizing the target analyte.

In claim 58, a method wherein the target analyte comprises a target polynucleotide.

A method according to claim 58 or 59, wherein step (c) comprises measuring a current passing through the continuous channel, wherein the current is indicative of the presence and/or one or more characteristics of the target analyte, thereby detecting and/or characterizing the target analyte.

A method according to any one of claims 58 to 60, wherein the target analyte is a polynucleotide, wherein nucleotides in the polynucleotide interact with the first constriction region, the second constriction region, and optionally the third constriction region within the lumen, and wherein each of the first constriction region, the second constriction region, and optionally the third constriction region is capable of discriminating a different nucleotide, such that the total current passing through the lumen is affected by the interaction between each of the first constriction region, the second constriction region, and the third constriction region and the nucleotides located in each of the regions.