CN113025697A - Rapid sequencing method based on nanopore - Google Patents
Rapid sequencing method based on nanopore Download PDFInfo
- Publication number
- CN113025697A CN113025697A CN202110305615.5A CN202110305615A CN113025697A CN 113025697 A CN113025697 A CN 113025697A CN 202110305615 A CN202110305615 A CN 202110305615A CN 113025697 A CN113025697 A CN 113025697A
- Authority
- CN
- China
- Prior art keywords
- module
- nanopore
- data
- signal
- computational analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 19
- 230000005540 biological transmission Effects 0.000 claims abstract description 9
- 238000010205 computational analysis Methods 0.000 claims abstract description 8
- 230000008859 change Effects 0.000 claims abstract description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 230000006835 compression Effects 0.000 claims description 5
- 238000007619 statistical method Methods 0.000 claims description 4
- 230000003139 buffering effect Effects 0.000 claims description 3
- 230000006837 decompression Effects 0.000 claims description 3
- 239000007787 solid Substances 0.000 claims description 3
- 230000008836 DNA modification Effects 0.000 claims description 2
- 102000001708 Protein Isoforms Human genes 0.000 claims description 2
- 108010029485 Protein Isoforms Proteins 0.000 claims description 2
- 230000026279 RNA modification Effects 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 102000054766 genetic haplotypes Human genes 0.000 claims description 2
- 239000002773 nucleotide Substances 0.000 claims description 2
- 125000003729 nucleotide group Chemical group 0.000 claims description 2
- 102000054765 polymorphisms of proteins Human genes 0.000 claims description 2
- 230000008054 signal transmission Effects 0.000 claims description 2
- 238000001303 quality assessment method Methods 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 description 19
- 238000004458 analytical method Methods 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 10
- 241000711573 Coronaviridae Species 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000012070 whole genome sequencing analysis Methods 0.000 description 5
- 239000000523 sample Substances 0.000 description 4
- 238000007672 fourth generation sequencing Methods 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000013441 quality evaluation Methods 0.000 description 2
- 229910052581 Si3N4 Inorganic materials 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000012372 quality testing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- HQVNEWCFYHHQES-UHFFFAOYSA-N silicon nitride Chemical compound N12[Si]34N5[Si]62N3[Si]51N64 HQVNEWCFYHHQES-UHFFFAOYSA-N 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a rapid sequencing method based on a nanopore. The method comprises the following steps: step S1, detecting the signal change caused by the passing of the molecule by using the nanopore chip; step S2, the encoding module encodes the signal change into electronic data; step S3, uploading the electronic data to a server through a transmission module; step S4, the decoding module at the server end decodes the uploaded electronic data into signal data; in step S5, the computational analysis module at the server identifies the signal data as a molecular sequence. The invention can be applied to scenes needing molecular sequencing.
Description
Technical Field
The invention relates to the field of molecular sequencing, in particular to a rapid sequencing method based on a nanopore.
Background
Nanopore sequencing technology is a new generation of sequencing technology that has emerged in recent years. The widely accepted Nanopore sequencing platform on the market today is the Oxford Nanopore Technologies (ONT) Bionanopore sequencer. Compared with the second generation sequencing technology, the method has the advantages of single molecule sequencing, long sequencing read length (the reported longest length can exceed 2 Mb), real-time acquisition of sequencing data, no need of amplification and recognition of nucleic acid modification and the like. The method gradually shows irreplaceable status in a plurality of specific application fields such as metagenome sequencing, new species genome sequencing, pathogen sequencing and epigenetic sequencing.
The rationale for nanopore sequencing is that motor proteins pull DNA/RNA into association with nanopore proteins because the potential difference across the membrane causes the melted strand to pass through the nanopore. The electrical signals are different due to the difference in resistance caused by the difference in Base structure and charge, and finally Base recognition is performed by reading the original electrical signals (Base-calling).
Since the frequency of amperometric detection is typically 7-9 times the speed of DNA sequence through a nanopore, this poses a significant technical challenge to Base-calling. Compared with the second generation Illumina sequencing data, the method has the characteristics of longer read length, high error rate, uneven length distribution and the like.
Generally, sequencing can generate massive data in G magnitude and even T magnitude, and is difficult to store in a conventional manner. And the calculation amount of the Base-calling algorithm and the subsequent analysis algorithm is very large, the requirements on software and hardware are extremely high, and the use is difficult and serious.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a rapid sequencing method based on a nanopore.
The technical scheme of the invention is as follows: a rapid sequencing method based on a nanopore.
As shown in fig. 1, the method comprises the steps of:
step S1, drawing the pre-prepared molecules to be detected through the nanometer detection holes, and detecting the signal change caused by the passing of the molecules;
step S2, after detecting the signal change when the molecule passes through the nanopore, encoding the molecule into electronic data through an encoding module for transmission and storage;
step S3, the system uploads the data to the server end through the transmission module;
step S4, decoding the uploaded electronic data into original signal data by a decoding module at the server end;
in step S5, the signal data is identified as a molecular sequence by the calculation and analysis module at the server side.
Further, the nanopore chip described in step S1 is composed of a nanopore element, a signal capture module, and a signal transmission module, and organically combined by a chipset.
Furthermore, the nanopore element is a solid nanopore and consists of a silicon substrate and a silicon nitride film.
Further, the signal encoding in step S2 further includes an encryption module, a compression module, and a data buffering module. The encryption module can use a symmetric encryption algorithm or an asymmetric encryption algorithm, the compression module is a high-efficiency compression algorithm which is realized in a targeted manner according to the particularity of signal data, the data cache is composed of two layers of caches, the first layer of cache is a memory, and the second layer of cache is a solid state disk.
Further, the decoding module of step S4 includes a decryption module, a decompression module, and a data caching module. The algorithms of the decryption module and the decompression module correspond to the encryption module and the compression module in step S2, the data cache is composed of three layers of caches, the first layer is a memory cache, the second layer is a solid-state cache, and the third layer is a mechanical hard disk.
Further, the server side in step S5 is a cloud server cluster, and the calculation and analysis module is a parallel algorithm running in the cloud server cluster.
Further, the calculation and analysis module of step S5 further includes performing calibration and data quality evaluation on the sequence.
Further, the molecular sequences in step S5 are DNA sequences and RNA sequences, and DNA modifications and RNA modifications.
Further, the calculation analysis module in step S5 further includes a further calculation analysis or statistical analysis after the identification sequence, and all the calculation analysis processes are performed in parallel by the cluster on the server side.
Further, the computational analysis and statistical analysis module of step S5 further comprises analyzing structural variations, repeat regions, Single Nucleotide Polymorphisms (SNPs), modified bases, haplotypes, metagenomes, isoforms, indirect variants, and fusions. Each analysis is a separate calculation analysis algorithm and runs in parallel on the server cluster, so that the analysis can be completed quickly and the whole analysis report can be generated.
The invention has the beneficial effects that: by the rapid sequencing method based on the nanopore, the analysis and calculation cost of sequencing and the storage cost of sequencing results are greatly reduced, the updating and using cost of sequencing software and analysis algorithm is also reduced, and the comprehensive performance of the whole sequencing process is improved. The threshold of using the related sequencing technology by a user is lower, the speed is higher, and the overall efficiency of the industry is greatly improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of steps of a method for nanopore-based molecular sequencing according to an embodiment of the present invention;
FIG. 2 is a block diagram of a nanopore based molecular sequencing system according to an embodiment of the present invention;
FIG. 3 is a signal schematic of nanopore-based whole genome sequencing according to an embodiment of the invention;
FIG. 4 is a DNA sequence schematic of nanopore-based whole genome sequencing according to an embodiment of the invention;
detailed description of the preferred embodiments
In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.
A rapid sequencing method based on a nanopore is disclosed, as shown in figure 2, namely, a signal is generated when a target to be detected passes through the nanopore at a chip terminal of the nanopore, the signal is captured by a system, then the signal is coded into electronic data, the data is uploaded to a server end through a network transmission module, and relevant sequence recognition operation and a calculation analysis report are carried out at the server end.
Optionally, the transmission module may include but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication.
Optionally, in this embodiment, when the signal is encoded into electronic data, some local processing may be performed, including but not limited to data compression, data encryption, data source quality evaluation, and transmission efficiency evaluation.
Optionally, in this embodiment, the server-side recognition algorithm includes, but is not limited to, a Machine Learning (Machine Learning) based algorithm and a Consensus (Consensus) based algorithm.
Two specific examples follow.
Whole genome sequencing was performed based on the invention:
according to the using steps, a certain laboratory firstly carries out biological sampling, then carries out preparation operation on a sample to be detected, places the completely prepared sample in a detection hole of a nanopore instrument, generates a current signal shown in figure 3 when DNA in the sample passes through the nanopore, and after the instrument transmits signal data to a cloud end, the cloud end generates a DNA sequence in real time through a recognition algorithm, and establishes a sequence library (library establishment) of the biological sample at the cloud end as shown in figure 4. Subsequent laboratories may also use a series of quality testing and sequencing tools in the cloud to obtain Whole Genome Sequencing (WGS) datasets.
The invention is based on the rapid detection of the new coronavirus:
an organization or a laboratory develops an identification algorithm according to a known new coronavirus sequence, and the algorithm is deployed to the cloud in advance, so that the algorithm can be opened to other organizations or laboratories. When the method is used specifically, a mechanism or a laboratory carries out preparation, sequencing uploading and other work on a sample to be detected according to using steps, after signal data to be detected are transmitted to a cloud, the cloud runs a corresponding new coronavirus sequence recognition algorithm, and a library is built in real time and an analysis report is generated.
The new coronavirus has the characteristics of extremely high transmission speed and extremely large transmission amount. The novel coronavirus detection method has the advantages of obvious advantages, high detection speed and capability of reporting results in real time.
The new coronavirus identification algorithm is developed into an intelligent learning algorithm, so that the virus can be rapidly detected, a virus variation structure can be rapidly found, and the method has great significance for preventing and treating the new coronavirus.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. Any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and all the equivalents or changes thereof are covered within the protective scope of the present invention.
Claims (10)
1. A rapid sequencing method based on a nanopore is characterized by comprising the following steps:
step S1, detecting the signal change caused by the passing of the molecule by using the nanopore chip;
step S2, the encoding module encodes the signal change into electronic data;
step S3, uploading the electronic data to a server through a transmission module;
step S4, the decoding module at the server end decodes the uploaded electronic data into signal data;
in step S5, the computational analysis module at the server identifies the signal data as a molecular sequence.
2. The method of claim 1, wherein the nanopore chip of step S1 is comprised of a nanopore element, a signal capture module, and a signal transmission module.
3. The method of claim 2, wherein the nanopore element is a solid state nanopore.
4. The method as claimed in claim 1, 2 or 3, wherein the signal encoding of step S2 includes an encryption module, a compression module, and a data buffering module.
5. The method of claim 4, wherein the decoding module of step S4 comprises a decryption module, a decompression module, and a data buffering module.
6. The method according to claim 1, 2, 3 or 5, wherein the server side in step S5 is a cloud server cluster, and the computational analysis module is to be run in the cloud server cluster.
7. The method of claim 1, 2, 3, 5 or 6, wherein the computational analysis module of step S5 further comprises performing calibration and data quality assessment on the sequence.
8. The method of claim 1 or 2 or 3 or 5 or 6 or 7, wherein the molecular sequences of step S5 are DNA sequences and RNA sequences, and DNA modifications and RNA modifications.
9. The method of claim 8, wherein the computational analysis module of step S5 further comprises further computational analysis or statistical analysis after identifying the sequence.
10. The method of claim 9, wherein the further computational analysis and statistical analysis module can analyze structural variations, repeat regions, Single Nucleotide Polymorphisms (SNPs), modified bases, haplotypes, metagenomes, isoforms, indirect variants, and fusions.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110305615.5A CN113025697A (en) | 2021-03-24 | 2021-03-24 | Rapid sequencing method based on nanopore |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110305615.5A CN113025697A (en) | 2021-03-24 | 2021-03-24 | Rapid sequencing method based on nanopore |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN113025697A true CN113025697A (en) | 2021-06-25 |
Family
ID=76472828
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110305615.5A Pending CN113025697A (en) | 2021-03-24 | 2021-03-24 | Rapid sequencing method based on nanopore |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113025697A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114615567A (en) * | 2022-03-08 | 2022-06-10 | 东南大学 | A wireless communication-based method for solid-state nanopore gene sequencing data communication |
| WO2023123344A1 (en) * | 2021-12-31 | 2023-07-06 | 深圳华大生命科学研究院 | Nucleic acid molecule capable of blocking motor protein, and construction method and application thereof |
-
2021
- 2021-03-24 CN CN202110305615.5A patent/CN113025697A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023123344A1 (en) * | 2021-12-31 | 2023-07-06 | 深圳华大生命科学研究院 | Nucleic acid molecule capable of blocking motor protein, and construction method and application thereof |
| CN114615567A (en) * | 2022-03-08 | 2022-06-10 | 东南大学 | A wireless communication-based method for solid-state nanopore gene sequencing data communication |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Xie et al. | Applications and potentials of nanopore sequencing in the (epi) genome and (epi) transcriptome era | |
| Zhang et al. | Real-time mapping of nanopore raw signals | |
| CN108350494B (en) | Systems and methods for genome analysis | |
| US10930370B2 (en) | Polynucleotide sequencer tuned to artificial polynucleotides | |
| CN103049680B (en) | gene sequencing data reading method and system | |
| CN113025697A (en) | Rapid sequencing method based on nanopore | |
| CN112309503A (en) | Base interpretation method, interpretation equipment and storage medium based on nanopore electric signal | |
| Zhang et al. | A single-molecule nanopore sequencing platform | |
| Jackson et al. | A heritability-based comparison of methods used to cluster 16S rRNA gene sequences into operational taxonomic units | |
| CN116564415B (en) | Stream sequencing analysis method, device, storage medium and computer equipment | |
| CN115798605B (en) | Nanopore sequencing original signal data compression method, device, equipment and medium | |
| JP2024542960A (en) | Nanopore measurement signal analysis | |
| CN115412295A (en) | A multi-scenario low-resource encrypted traffic recognition method and system based on large-scale pre-training | |
| CN117497055A (en) | Method and device for training neural network model and fragmenting electric signals of base sequencing | |
| Wang et al. | WaveNano: a signal‐level nanopore base‐caller via simultaneous prediction of nucleotide labels and move labels through bi‐directional WaveNets | |
| Kagale et al. | Analysis of genotyping-by-sequencing (GBS) data | |
| CN111477248B (en) | Audio noise detection method and device | |
| CN110637341A (en) | Processing of Sequential Data Streams | |
| Urquiola Hernández et al. | Molecular Dynamics of Peptide Sequencing through MoS2 Solid-State Nanopores for Binary Encoding Applications | |
| CN116743473A (en) | An intrusion detection method based on parallel metric learning | |
| CN116962083A (en) | Detection methods, devices, equipment and readable storage media for abnormal network behavior | |
| Sun et al. | HBS‐Tools for Hairpin Bisulfite Sequencing Data Processing and Analysis | |
| CN119446271B (en) | Method, device and medium for generating nanopore sequencing signal decoding model training set | |
| GUDODAGI et al. | Customized Computational Environment for Investigations and Compression of Genomic Data. | |
| US20240161870A1 (en) | Alignment of target and reference sequences of polymer units |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210625 |
|
| WD01 | Invention patent application deemed withdrawn after publication |