CN113025697A

CN113025697A - Rapid sequencing method based on nanopore

Info

Publication number: CN113025697A
Application number: CN202110305615.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Suzhou Tianqing Intelligent Technology Co ltd
Current assignee: Suzhou Tianqing Intelligent Technology Co ltd
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2021-06-25

Abstract

The invention discloses a rapid sequencing method based on a nanopore. The method comprises the following steps: step S1, detecting the signal change caused by the passing of the molecule by using the nanopore chip; step S2, the encoding module encodes the signal change into electronic data; step S3, uploading the electronic data to a server through a transmission module; step S4, the decoding module at the server end decodes the uploaded electronic data into signal data; in step S5, the computational analysis module at the server identifies the signal data as a molecular sequence. The invention can be applied to scenes needing molecular sequencing.

Description

Rapid sequencing method based on nanopore

Technical Field

The invention relates to the field of molecular sequencing, in particular to a rapid sequencing method based on a nanopore.

Background

Nanopore sequencing technology is a new generation of sequencing technology that has emerged in recent years. The widely accepted Nanopore sequencing platform on the market today is the Oxford Nanopore Technologies (ONT) Bionanopore sequencer. Compared with the second generation sequencing technology, the method has the advantages of single molecule sequencing, long sequencing read length (the reported longest length can exceed 2 Mb), real-time acquisition of sequencing data, no need of amplification and recognition of nucleic acid modification and the like. The method gradually shows irreplaceable status in a plurality of specific application fields such as metagenome sequencing, new species genome sequencing, pathogen sequencing and epigenetic sequencing.

The rationale for nanopore sequencing is that motor proteins pull DNA/RNA into association with nanopore proteins because the potential difference across the membrane causes the melted strand to pass through the nanopore. The electrical signals are different due to the difference in resistance caused by the difference in Base structure and charge, and finally Base recognition is performed by reading the original electrical signals (Base-calling).

Since the frequency of amperometric detection is typically 7-9 times the speed of DNA sequence through a nanopore, this poses a significant technical challenge to Base-calling. Compared with the second generation Illumina sequencing data, the method has the characteristics of longer read length, high error rate, uneven length distribution and the like.

Generally, sequencing can generate massive data in G magnitude and even T magnitude, and is difficult to store in a conventional manner. And the calculation amount of the Base-calling algorithm and the subsequent analysis algorithm is very large, the requirements on software and hardware are extremely high, and the use is difficult and serious.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a rapid sequencing method based on a nanopore.

The technical scheme of the invention is as follows: a rapid sequencing method based on a nanopore.

As shown in fig. 1, the method comprises the steps of:

step S1, drawing the pre-prepared molecules to be detected through the nanometer detection holes, and detecting the signal change caused by the passing of the molecules;

step S2, after detecting the signal change when the molecule passes through the nanopore, encoding the molecule into electronic data through an encoding module for transmission and storage;

step S3, the system uploads the data to the server end through the transmission module;

step S4, decoding the uploaded electronic data into original signal data by a decoding module at the server end;

in step S5, the signal data is identified as a molecular sequence by the calculation and analysis module at the server side.

Further, the nanopore chip described in step S1 is composed of a nanopore element, a signal capture module, and a signal transmission module, and organically combined by a chipset.

Furthermore, the nanopore element is a solid nanopore and consists of a silicon substrate and a silicon nitride film.

Further, the signal encoding in step S2 further includes an encryption module, a compression module, and a data buffering module. The encryption module can use a symmetric encryption algorithm or an asymmetric encryption algorithm, the compression module is a high-efficiency compression algorithm which is realized in a targeted manner according to the particularity of signal data, the data cache is composed of two layers of caches, the first layer of cache is a memory, and the second layer of cache is a solid state disk.

Further, the decoding module of step S4 includes a decryption module, a decompression module, and a data caching module. The algorithms of the decryption module and the decompression module correspond to the encryption module and the compression module in step S2, the data cache is composed of three layers of caches, the first layer is a memory cache, the second layer is a solid-state cache, and the third layer is a mechanical hard disk.

Further, the server side in step S5 is a cloud server cluster, and the calculation and analysis module is a parallel algorithm running in the cloud server cluster.

Further, the calculation and analysis module of step S5 further includes performing calibration and data quality evaluation on the sequence.

Further, the molecular sequences in step S5 are DNA sequences and RNA sequences, and DNA modifications and RNA modifications.

Further, the calculation analysis module in step S5 further includes a further calculation analysis or statistical analysis after the identification sequence, and all the calculation analysis processes are performed in parallel by the cluster on the server side.

Further, the computational analysis and statistical analysis module of step S5 further comprises analyzing structural variations, repeat regions, Single Nucleotide Polymorphisms (SNPs), modified bases, haplotypes, metagenomes, isoforms, indirect variants, and fusions. Each analysis is a separate calculation analysis algorithm and runs in parallel on the server cluster, so that the analysis can be completed quickly and the whole analysis report can be generated.

The invention has the beneficial effects that: by the rapid sequencing method based on the nanopore, the analysis and calculation cost of sequencing and the storage cost of sequencing results are greatly reduced, the updating and using cost of sequencing software and analysis algorithm is also reduced, and the comprehensive performance of the whole sequencing process is improved. The threshold of using the related sequencing technology by a user is lower, the speed is higher, and the overall efficiency of the industry is greatly improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of steps of a method for nanopore-based molecular sequencing according to an embodiment of the present invention;

FIG. 2 is a block diagram of a nanopore based molecular sequencing system according to an embodiment of the present invention;

FIG. 3 is a signal schematic of nanopore-based whole genome sequencing according to an embodiment of the invention;

FIG. 4 is a DNA sequence schematic of nanopore-based whole genome sequencing according to an embodiment of the invention;

detailed description of the preferred embodiments

In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

A rapid sequencing method based on a nanopore is disclosed, as shown in figure 2, namely, a signal is generated when a target to be detected passes through the nanopore at a chip terminal of the nanopore, the signal is captured by a system, then the signal is coded into electronic data, the data is uploaded to a server end through a network transmission module, and relevant sequence recognition operation and a calculation analysis report are carried out at the server end.

Optionally, the transmission module may include but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication.

Optionally, in this embodiment, when the signal is encoded into electronic data, some local processing may be performed, including but not limited to data compression, data encryption, data source quality evaluation, and transmission efficiency evaluation.

Optionally, in this embodiment, the server-side recognition algorithm includes, but is not limited to, a Machine Learning (Machine Learning) based algorithm and a Consensus (Consensus) based algorithm.

Two specific examples follow.

Whole genome sequencing was performed based on the invention:

according to the using steps, a certain laboratory firstly carries out biological sampling, then carries out preparation operation on a sample to be detected, places the completely prepared sample in a detection hole of a nanopore instrument, generates a current signal shown in figure 3 when DNA in the sample passes through the nanopore, and after the instrument transmits signal data to a cloud end, the cloud end generates a DNA sequence in real time through a recognition algorithm, and establishes a sequence library (library establishment) of the biological sample at the cloud end as shown in figure 4. Subsequent laboratories may also use a series of quality testing and sequencing tools in the cloud to obtain Whole Genome Sequencing (WGS) datasets.

The invention is based on the rapid detection of the new coronavirus:

an organization or a laboratory develops an identification algorithm according to a known new coronavirus sequence, and the algorithm is deployed to the cloud in advance, so that the algorithm can be opened to other organizations or laboratories. When the method is used specifically, a mechanism or a laboratory carries out preparation, sequencing uploading and other work on a sample to be detected according to using steps, after signal data to be detected are transmitted to a cloud, the cloud runs a corresponding new coronavirus sequence recognition algorithm, and a library is built in real time and an analysis report is generated.

The new coronavirus has the characteristics of extremely high transmission speed and extremely large transmission amount. The novel coronavirus detection method has the advantages of obvious advantages, high detection speed and capability of reporting results in real time.

The new coronavirus identification algorithm is developed into an intelligent learning algorithm, so that the virus can be rapidly detected, a virus variation structure can be rapidly found, and the method has great significance for preventing and treating the new coronavirus.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. Any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and all the equivalents or changes thereof are covered within the protective scope of the present invention.

Claims

1. A rapid sequencing method based on a nanopore is characterized by comprising the following steps:

step S1, detecting the signal change caused by the passing of the molecule by using the nanopore chip;

step S2, the encoding module encodes the signal change into electronic data;

step S3, uploading the electronic data to a server through a transmission module;

step S4, the decoding module at the server end decodes the uploaded electronic data into signal data;

in step S5, the computational analysis module at the server identifies the signal data as a molecular sequence.

2. The method of claim 1, wherein the nanopore chip of step S1 is comprised of a nanopore element, a signal capture module, and a signal transmission module.

3. The method of claim 2, wherein the nanopore element is a solid state nanopore.

4. The method as claimed in claim 1, 2 or 3, wherein the signal encoding of step S2 includes an encryption module, a compression module, and a data buffering module.

5. The method of claim 4, wherein the decoding module of step S4 comprises a decryption module, a decompression module, and a data buffering module.

6. The method according to claim 1, 2, 3 or 5, wherein the server side in step S5 is a cloud server cluster, and the computational analysis module is to be run in the cloud server cluster.

7. The method of claim 1, 2, 3, 5 or 6, wherein the computational analysis module of step S5 further comprises performing calibration and data quality assessment on the sequence.

8. The method of claim 1 or 2 or 3 or 5 or 6 or 7, wherein the molecular sequences of step S5 are DNA sequences and RNA sequences, and DNA modifications and RNA modifications.

9. The method of claim 8, wherein the computational analysis module of step S5 further comprises further computational analysis or statistical analysis after identifying the sequence.

10. The method of claim 9, wherein the further computational analysis and statistical analysis module can analyze structural variations, repeat regions, Single Nucleotide Polymorphisms (SNPs), modified bases, haplotypes, metagenomes, isoforms, indirect variants, and fusions.