CN112575388A

CN112575388A - Single-molecule target gene library building method and kit thereof

Info

Publication number: CN112575388A
Application number: CN202011533112.5A
Authority: CN
Inventors: 崔品
Original assignee: Shenzhen Ruifa Biotechnology Co ltd
Current assignee: Shenzhen Ruifa Biotechnology Co ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-30
Also published as: CN114214734A

Abstract

A single molecule target gene library building method and a kit thereof, wherein the method comprises the following steps: the extension step comprises the steps of mixing a template molecule and a target probe which is connected with a first sequencing joint in series, and combining the target probe to a target area of the template molecule and extending to obtain a target probe extension product; a second sequencing joint connection step, which comprises adding a second sequencing joint, wherein the second sequencing joint is provided with a complementary paired forward chain and a complementary paired reverse chain, the 5 ' end of the forward chain can be connected in series to the 3 ' end of the extension product of the target probe, the 3 ' end of the reverse chain is connected in series with a random sequence, and a second sequencing joint connection product is obtained through reaction; performing melting treatment, namely removing single-chain molecules which are connected with random sequences in series in the product to obtain a single-chain product connected with a first sequencing joint in series; and a double-strand synthesis step, which comprises adding a first primer and a second primer, and reacting to obtain an amplification product. The invention integrates the library construction and the target gene enrichment, and is widely suitable for single-stranded or double-stranded DNA samples with various lengths and high and low initial quantities.

Description

Single-molecule target gene library building method and kit thereof

Technical Field

The invention relates to the technical field of gene sequencing, in particular to a single-molecule target gene library building method and a kit thereof.

Background

The existing target sequencing sample preparation of Next Generation Sequencing (NGS) needs four steps of library building, amplification before capture, hybridization capture and amplification after capture, the steps are connected in series but are not optional, the whole process needs about 2 to 3 days, time and money are wasted, and the steps are linked with each other with certain difficulty. Furthermore, the initial DNA needs to be pre-processed before interruption, and library fragment length screening is carried out after library construction. Therefore, the existing database construction technology is difficult to aim at severely degraded samples or trace DNA samples (generally, the DNA quantity is less than 20ng, namely, the database construction quality is difficult to guarantee). In addition, the two rounds of amplification before and after capture are exponential amplification, which brings a great amount of errors and preference, and causes an excessively high technical error background, so that the detection of low-frequency (less than one in a thousand) gene mutation cannot be carried out.

Disclosure of Invention

According to a first aspect, there is provided in one embodiment a single molecule target gene library-building method comprising:

the extension step comprises the steps of mixing a template molecule and a target probe connected with a first sequencing joint in series, wherein the target probe is combined to a target area of the template molecule in a targeted mode and is subjected to extension reaction to obtain a target probe extension product;

a second sequencing joint connecting step, which comprises adding a second sequencing joint into the reaction system obtained in the extending step, wherein the second sequencing joint contains a complementary paired forward chain and a reverse chain, the 5 ' end of the forward chain of the second sequencing joint can be connected in series to the 3 ' end of the target probe extension product, the 3 ' end of the reverse chain is connected in series with a random nucleotide sequence, reacting to obtain a second sequencing joint connecting product, then melting, removing a single chain in which a random nucleotide sequence is connected in series in the product, and obtaining a single chain product in series with the first sequencing joint;

and a double-strand synthesis step, wherein a first primer and a second primer are added into the single-strand product connected in series with a first sequencing joint, and the reaction is carried out to obtain an amplification product for on-machine sequencing, wherein the first primer contains a sequence which is complementarily paired with the first sequencing joint, and the second primer contains a sequence which is complementarily paired with the second sequencing joint.

According to a second aspect, an embodiment provides a library constructed by the method of the first aspect.

According to a third aspect, there is provided in one embodiment a kit comprising a first sequencing adaptor having a target probe attached thereto in series, the target probe being capable of binding to a target region of a template molecule and extending a reaction, a second sequencing adaptor having complementary paired forward and reverse strands, the forward strand of the second sequencing adaptor having its 5 ' end serially connectable to the 3 ' end of a target probe extension product and the reverse strand having its 3 ' end serially connected to a random nucleotide sequence.

According to the single-molecule target gene library construction method and the kit thereof, the library construction step and the targeted gene enrichment step are combined, fragmentation treatment is not needed, the time required by library construction is effectively shortened, and the library construction efficiency is improved. The method or the kit can be suitable for single-stranded or double-stranded DNA samples with various lengths, cDNA reverse-transcribed from RNA, DNA treated by bisulfite (used for DNA methylation sequencing), various seriously degraded DNA (DNA extracted from formalin-fixed and paraffin-embedded (FFPE) tissues or forensic samples, free DNA (cfDNA) contained in body fluid, and the like), and the application range of the initial amount is wide (0.1 to 1000 nanograms).

Drawings

FIG. 1 is a flow diagram of a library building in one embodiment;

FIG. 2 is a diagram of a second sequencing adapter in one embodiment.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning.

Herein, high throughput sequencing sample DNA library construction, referred to as library construction for short. The conventional library building technical process comprises the following steps: the ends of double-stranded DNA molecules are flattened through a series of enzymatic reactions, and then a first sequencing adaptor and a second sequencing adaptor of double strands are respectively connected to the two ends of the DNA molecules.

As used herein, "μ M" refers to μmol/L in units of concentration, and Chinese is micromoles per liter.

Hair pin structure (also can be expressed as hairpin structure) means: a specific spatial structure formed by a pair of inverted repeat folding pairs.

In the prior art, the preparation process of the NGS targeted library (including single-chain library construction by a novel library construction method) is generally divided into two sets of processes, the mainstream capture library construction method needs four necessary steps of library construction, amplification before capture, hybridization capture and amplification after capture, and the whole process generally consumes 2 to 3 days. Another common method is amplicon pooling, which is generally performed by first performing multiplex PCR and then pooling the PCR products, and some commercial kits add linker sequences corresponding to the NGS platform outside the 5' end of the primers during multiplex PCR to integrate the two steps into one step.

The first main technical route is to strictly separate library construction and hybridization capture, which has many steps and long period, and depends on magnetic bead capture based on streptavidin and biotin connection, and the magnetic bead is expensive and depends on import. Although the second technical route is simpler than the former one, because it is based on multiplex PCR, there are the following problems: 1. the initial investment for building the warehouse is high; 2. the number of target sites (plex number) in the same reaction system cannot be too large, so that the gene detection of a large probe library (panel) is difficult to finish through single-tube reaction, the gene detection can only be divided into a plurality of single-tube reactions, and then products are combined to realize, so that the cost and the operation time are greatly increased, the single-tube reaction detection flux is limited, and the popularization is not facilitated; 3, PCR requires primer pairing at two ends, so that structural variation such as unknown gene fusion (novel fusion) and virus insertion sites cannot be detected; 4, exponential amplification of PCR leads to undetectable gene copy number variation; 5. multiplex PCR inevitably results in low uniformity due to poor amplification preference, resulting in poor coverage of part of the region targeted by the panel and excessive coverage of part of the region.

In comparison, in some embodiments, the two steps of library construction and target gene enrichment are integrated into one process, and the revolutionary innovation not only overcomes the defects of multiple steps and high cost of the mainstream library construction and hybrid capture process, but also overcomes the defects of small single-tube reaction detection flux, poor uniformity, incapability of effectively detecting genome structural variation and gene copy number variation and the like inherent in amplicon library construction through linear amplification.

In some embodiments, because the P5 end of the created library carries molecular tags, PCR and sequencing errors can be effectively corrected, and therefore ultralow frequency detection is realized.

In some embodiments, the present invention is also advantageous for use with a wide variety of DNA samples that are not amenable to severe degradation and trace amounts of these conventional NGS techniques, as are many samples used in clinical testing applications, such as DNA extracted from FFPE (formalin-fixed and paraffin-embedded) samples, extracellular free DNA from bodily fluids (plasma, pleural effusion, urine, etc.), and the like. And the method has no requirement on the sample length, does not need to break the complete genome DNA of the long fragment, and saves time and cost.

In some embodiments, the starting amount of sample suitable for use in the present invention is between 0.1 and 1000ng, especially for low starting amount samples.

In a first aspect, in some embodiments, there is provided a single molecule target gene library-building method comprising:

the extension step comprises the steps of mixing a template molecule and a target probe which is connected with a first sequencing joint in series, wherein the target probe is combined to a target area of the template molecule in a targeted mode and is subjected to extension reaction to obtain a target probe extension product;

In some embodiments, the template molecule is single-stranded DNA and the target probe extension product is also single-stranded DNA.

In some embodiments, the template molecule can be any type of DNA molecule, including but not limited to single-stranded DNA, single-stranded DNA obtained after melting processing of double-stranded DNA, and the initial amount is applicable to a wide range (0.1 to 1000 ng), and can be applied to DNA molecules of various lengths, thereby omitting the step of breaking the DNA sample before the conventional library construction technology (i.e., breaking the long-fragment DNA molecule to about 300bp in length by various physical or chemical methods, which would otherwise reduce the library construction efficiency, and making the subsequent high-throughput sequencing unable to sequence the DNA molecule).

In some embodiments, the template molecule is derived from at least one of bisulfite-treated DNA, DNA of formalin-fixed and paraffin-embedded (FFPE) tissue, DNA extracted from forensic samples, free DNA (cfdna) contained in body fluids, DNA samples extracted from biological remains of ancient fossils or archaeological excavation, and the like, and in the case of double-stranded DNA samples, the template molecule can be dissociated into single strands, typically by thermal denaturation or the like, to give the template molecule.

In some embodiments, the method of melting the double-stranded DNA may be the same as the melting process in the second sequencing adaptor ligation step, and may typically be a heat denaturation process.

The DNA sample applicable to the invention has wide types, can be samples which can not be competent by the conventional NGS library construction technology such as serious degradation and/or trace samples, and the like, has no requirement on the length of template molecules in the samples, and has no need of interrupting the complete genome DNA of long fragments.

In some embodiments, the 5' end of the forward strand of the second sequencing linker is modified with a phosphate group.

In some embodiments, a molecular tag is concatemeric between the first sequencing adapter and the target probe.

In some embodiments, the molecular tag is a random nucleotide sequence.

In some embodiments, the molecular tag is 4-19nt in length. Specifically, the number of the first and second groups may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, 16nt, 17nt, 18nt, 19nt, and the like.

In some embodiments, the length of the random nucleotide sequence concatenated at the 3' end of the reverse strand is 5-15nt, and specifically may be 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and the like.

In some embodiments, the first primer contains a sequence that is complementarily paired with the first sequencing adapter and the second primer contains a sequence that is complementarily paired with the second sequencing adapter.

In some embodiments, the first primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary mateable to the first sequencing adaptor.

In some embodiments, the first primer contains or does not contain a first sample tag.

In some embodiments, when the first primer comprises a first sample tag, the first sample tag is located between the inner and outer adaptor sequences.

In some embodiments, the length of the first sample tag is 4nt to 15nt, and specifically may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and so on.

In some embodiments, the second primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary mateable with the second sequencing adaptor.

In some embodiments, the second primer contains or does not contain a second sample tag.

In some embodiments, when the second primer comprises a second sample tag, the second sample tag is located between the inner and outer adaptor sequences.

In some embodiments, the length of the second sample label is 4-15nt, and specifically may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and so on.

In some embodiments, the number of amplification cycles for the extension reaction is ≧ 10. The number of times of the cyclic reaction is not limited and can be selected as required.

In some embodiments, the number of amplification cycles for the extension reaction in the extension step is 10-500 cycles.

In some embodiments, in the extending step, each cycle is reacted as follows: 94-98 ℃ for 10-60 seconds; 55-65 ℃ for 10-60 seconds; 68-72 ℃ for 10-60 seconds.

In some embodiments, the extension step further comprises, after obtaining the target probe extension product, subjecting the obtained target probe extension product to a purification treatment.

In some embodiments, the extended product is treated with magnetic bead purification. Magnetic beads for purification are commercially available, and may be obtained, for example and without limitation, from Nanjing Novophilia Biotech, Inc.

In some embodiments, in the second sequencing linker ligation step, the ligation reaction is performed, specifically at 22-40 ℃ for 0.5-2 hours.

In some embodiments, in the second sequencing linker ligation step, the ligase employed includes, but is not limited to, T4 DNA ligase. T4 DNA ligase is commercially available.

In some embodiments, in the second sequencing linker ligation step, the melting process may generally be a denaturation process.

In some embodiments, the denaturing treatment may generally be a heat denaturing treatment.

In some embodiments, the heat denaturation treatment may specifically be heating the target molecule to at least 80 ℃ for at least 1 min.

In some embodiments, the heat denaturation may be performed at 80-98 deg.C for 1-30min to obtain single-stranded template molecules.

In some embodiments, the first sequencing linker includes, but is not limited to, any one of a P5-terminal sequencing linker of the Illumina sequencing platform, a P2-terminal sequencing linker of the MGI sequencing platform.

In some embodiments, the second sequencing linker includes, but is not limited to, any of a P7-terminal sequencing linker of the Illumina sequencing platform, a P1-terminal sequencing linker of the MGI sequencing platform.

In some embodiments, the linker of the second sequencing linker is shown in fig. 2, and has a complementary pair of a forward strand and a reverse strand, the 5 'end of the forward strand is modified with a phosphate group, and the 3' end of the reactive strand is connected in series with a random nucleotide sequence, which may be 5-15 nt. The 5' end of the reverse strand is unmodified with a phosphate group.

In a second aspect, in some embodiments, there is provided a library constructed using the method of the first aspect.

In some embodiments, the library construction and the targeted gene enrichment are combined, so that the experimental steps are effectively shortened, library construction, amplification, capture and re-amplification are not required, the material consumption is remarkably reduced, and the experimental time is shortened.

In some embodiments, the second sequencing adapter is a hairpin adapter, without biotin, and does not require streptavidin magnetic bead capture (streptavidin magnetic beads are expensive to manufacture).

In some embodiments, the number of sites can be from 1 to 1 thousand for each single-tube reaction, with each site corresponding to a primer with a specific target gene binding region.

In a third aspect, in some embodiments, there is provided a kit comprising a first sequencing adaptor having a target probe linked in series, the target probe being capable of binding to a target region of a template molecule and extending a reaction, a second sequencing adaptor having a complementary pair of a forward strand and a reverse strand, the 5 ' end of the forward strand of the second sequencing adaptor being capable of linking in series to the 3 ' end of a target probe extension product, and the 3 ' end of the reverse strand being linked in series to a random nucleotide sequence.

In some embodiments, a molecular tag is concatemeric between the first sequencing adapter and the target probe;

in some embodiments, the molecular tag is 4-19nt in length.

In some embodiments, the random nucleotide sequence concatenated at the 3' end of the reverse strand of the second sequencing linker is 5-15nt in length.

In some embodiments, the kit further comprises a first primer comprising a sequence that is complementarily pairable with the first sequencing adapter, a second primer comprising a sequence that is complementarily pairable with the second sequencing adapter.

The following examples take a library of an international commonly used Illumina sequencing platform as an example, but the library can be compatible with other NGS platforms, and only the joint of the corresponding sequencing platform needs to be replaced.

Example 1

Preparing a mutant free nucleic acid (cfDNA) standard substance with the mutation frequency of three ten-thousandth, taking three equal parts of 30ng of the cfDNA standard substance, respectively using the three independent library construction experiments, respectively using the method (embodiment 1) of the embodiment, the existing hybridization capture library construction method (comparative example 1) and the existing amplicon library construction method (comparative example 2) to prepare libraries for on-machine sequencing, wherein target gene regions designed by the three groups of experiments are basically consistent, then performing on-machine sequencing on two identical high-throughput sequencing platforms, sequencing the same data volume, and finally using the same data analysis process to check the mutation detection conditions of the same 8 target gene sites so as to evaluate the performance difference of the three high-throughput sequencing target gene library construction methods.

The standard substance of this example was purchased from Jinglian Genesis technologies (Shenzhen) Limited, specifically, lung cancer ctDNA standard substance suite GW-OCTM009, which contains wild-type DNA standard substance and ctDNA standard substance with mutation frequency of 0.1%, and the two were made to be in the following ratio of 7: 3 to obtain a diluted standard substance with mutation frequency of 0.03%.

The target detection sites are shown in table 1.

TABLE 1

All required oligomers (oligos) are shown in tables 2 and 3 below (synthesized by tsinggis biotechnology limited, tokyo, HPLC purification).

TABLE 2

TABLE 3

The structure of the target gene probe (IS1-UMI-GSP) with the first sequencing linker IS illustrated as follows:

"ACACTCTTTCCCTACACGACGCTCTTCCGATCT" is the first sequencing linker, "NNNNNNNNN" is the molecular tag, "xxxxxxxxxxxxxxxxxx" is the sequence that is complementary paired to the target gene region.

In the structure of P7-index-1, the sequence labeled with a downward straight line, "TGATAG", is a sample label.

The symbols in tables 2 and 3 are described below: (1) IS2revcomp-sp-Pho and IS 2-spline anneal to hairpin adapters, the second sequencing adapters. A mixture containing the probe pool shown in Table 3 was named Rui Fa 4 gene panel.

(2) "N" represents a random base.

(3) "X" represents a sequence that is complementary paired to the target gene region, 20 nucleotides in length, and is arranged one at a time every 10 nucleotides onwards of the target gene region, i.e., 2 × tile coverage.

(4) "Pho" represents a phosphate group.

(5) "" indicates a disulfide bond to strengthen the linkage between nucleotides and prevent degradation of the polynucleotide. .

(6) "spacer C12" indicates 12 empty carbon backbones to prevent non-specific binding of primers.

(6) In the second sequencing linker reverse strand, "spacer c 12" represents 12 empty carbon backbones to prevent non-specific binding of primers.

(7) In the reverse strand of the second sequencing linker, "AmC 6" indicates an amino modification at carbon position 6 to block the 3' end of the polynucleotide.

The reagents and instrumentation are described below:

1. t4 DNA fragment (Rapid) (Cat: N103-01) (available from NyVo Nuo Zan Biotech Co., Ltd.) was used for each ligation reaction.

2. The library Amplification reaction was performed using VAHTS HiFi Amplification Mix (cat # N616-01) (available from Nanjing NuoZan Biotechnology Ltd.).

3.PCR product purification magnetic Beads VAHTS DNA Clean Beads (cat # N411-01) (available from Nanjing NuoWei Zan Biotech Co., Ltd.).

4. The control group used an International Universal methylation library kit (for illumina) (available from Swift Biosciences, USA, Catalog No. 30024).

5. The primer extension complementary to the single-stranded linker in the reverse direction was performed using DNA polymerase I Klenow fragment (cat # N104-01) (available from Nanjing Novowed Biotech Co., Ltd.).

6. T4 RNA ligase buffer (10X) and FastAP (1U/. mu.L) required for DNA sample dephosphorylation reaction were respectively adopted under product No. B0216L of NEB Limited and product No. EF0651 of Enwei Jie (Shanghai) trade Limited.

7. Dynabeads from streptavidin magnetic beads for binding single stranded ligation products^TM MyOne^TMStreptavidin C1 (Yinxie Jie (Shanghai) trade Co., Ltd., Cat number 65001).

8. Ultra pure water used in each step of experiment is ULTRAPUre^TMDNase/RNase-Free Distilled Water (Yinxie Jie Co., Ltd., Cat. 10977023).

9. The instrument comprises ABI veriti96 type PCR instrument (product of Yinxie Jie (Shanghai) trade Co., Ltd.), constant-temperature mixing instrument (Hangzhou Youning, Cat. HC-100), four-dimensional rotating mixing instrument (BE-1100, Nihlin Beier instruments manufacturing Co., Ltd., Haimen), magnetic frame (Wuxi Baige Biotechnology Limited Biotech, Ltd.), and magnetic frameCompany, cat # BMB16-1.5-2), Qubit^TM4Fluorometer, with WiFi (Yinxi Weijie trading Limited, cat # Q33238), Bioptic full-automatic multiplex nucleic acid detection System (Hangzhou Kagazei Biotech Limited, cat # Qsep-100), Eppendorf brand pipettor 1000. mu.L range, 100. mu.L range, 10. mu.L range (available from Eppendort, Germany).

The TE buffer composition of this example was as follows: 10mmol/L Tris-HCl, 1mmol/L EDTA, pH 8.0.

As shown in fig. 1, the experimental procedure is as follows:

1. a cyanine fine gene-lung cancer ctDNA standard set-GW-OCTM 009 contains a wild type DNA standard and a ctDNA standard with mutation frequency of 0.1%, and is prepared according to the wild type DNA standard: ctDNA standard with mutation frequency of 0.1% ═ 7: 3 to form a cfDNA sample with a mutation frequency of 0.03%, 30ng, and denatured at 95 ℃ for 2 minutes.

2. Preparation of second sequencing linker

2.1 the following reaction system was placed in a 200. mu.L PCR tube:

TABLE 4

2.2 annealing reaction conditions: 95 ℃ for 10 seconds; RAMP 4% was added and the temperature was reduced to 14 ℃ at a rate of 0.1 ℃/s.

2.3 mu.L of TE buffer was added to the above reaction product system (50. mu.L), and the final concentration of the second sequencing linker in the resulting system was 100. mu.M. The prepared product system can be stored at-20 ℃ for a long time or at 4 ℃ for 8 hours.

3. The target gene probes with the first sequencing adaptors at the 5 'ends in the table 3 are mixed in equal molar numbers to obtain a mixed solution, and the final concentration of the target gene probes with the first sequencing adaptors at the 5' ends in the mixed solution is 200 mu M.

4. Annealing and extending the target gene probe mixture with the first sequencing joint at the 5' end, wherein the annealing and extending are as follows:

for each single-tube reaction, the number of target gene sites to be detected can be from 1 to 1 ten thousand, and each site corresponds to a primer with a specific target gene binding region, so that at most 1 ten thousand probes can be mixed for each single-tube reaction. The number of target detection sites in this example is 8, as shown in table 1.

The following reaction system was arranged in a 200. mu.l PCR tube:

TABLE 5

Components	Volume (μ L)
		Target gene probe mixture with first sequencing joint at 5' end	5
0.03% mutation frequency of cfDNA sample 30ng (20 ng/. mu.L)	1.5
		Ultrapure water	18.5
VAHTS HiFi Amplification Mix	25
		Total volume	50

Vortex and mix evenly and centrifuge briefly, place in PCR instrument and do the following reaction:

the multiplex variable site primer with the 5' end carrying the illumina P5 adaptor and the molecular tag is annealed and extended in a target region of a genome and is carried out in a PCR instrument under the following reaction conditions:

TABLE 6

After the reaction was completed, the product was purified using magnetic Beads of VAHTS DNA Clean Beads according to the standard procedure for purifying PCR products using the magnetic Beads, and the final step was performed by eluting the final product with 20. mu.L of ultrapure water.

5. The following reaction system was arranged in a 200. mu.l PCR tube:

TABLE 7

The total reaction volume was 50. mu.L, and the ligation reaction was performed at 37 ℃ for 1 hour in a PCR apparatus, followed by 1 to 10 minutes at 95 ℃ to denature the ligated product into a single strand, thereby removing the N-carrying complementary strand in the hairpin linker.

6. The following reaction system (illumina inducing PCR) was directly configured in a 200 μ l PCR tube where the second adaptor was connected to the reaction product:

TABLE 8

Components	Volume (μ L)
		Second linker connecting the reaction product	48
P7-index-1	0.1
		P7-index-2	0.1
P7-index-3	0.1
		P7-index-4	0.1
P7-index-5	0.1
		P7-index-6	0.1
P7-index-7	0.1
		P7-index-8	0.1
P7-index-9	0.1
		P7-index-10	0.1
IS4_indPCR.P5	1
		VAHTS HiFi Amplification Mix	50
Total volume	100

A100-microliter reaction system was prepared as shown in Table 8 for PCR under the following conditions:

TABLE 9

After the reaction is finished, purifying products by VAHTS DNA Clean Beads according to the standard operation of purifying PCR products by the Beads, and finally eluting final products by 25 microliters of ultrapure water to obtain a P7 terminal sample-labeled illumina target gene library, namely the library capable of being subjected to on-machine sequencing.

Comparative example 1

This comparative example provides a hybridization capture control experiment.

Taking cyanine fine gene-lung cancer ctDNA standard, and sleeving GW-OCTM009 which contains a wild type DNA standard and a ctDNA standard with mutation frequency of 0.1%, wherein the wild type DNA standard is as follows: ctDNA standard with mutation frequency of 0.1% ═ 7: 3 to form a DNA sample with 0.03% mutation frequency, the total mass of the obtained sample is 30ng, the same probe library as that in example 1, namely Rui Fa 4 gene panel, is adopted, and library construction and hybridization capture kit from Nanjing Kingsry Biotechnology GmbH are adopted according to standard operation procedures, and the library construction comprises amplification before capture, hybridization capture and amplification after capture to obtain the library capable of being subjected to on-machine sequencing.

Comparative example 2

This comparative example provides an amplicon banking control experiment (based on multiplex PCR technology).

Taking cyanine fine gene-lung cancer ctDNA standard, and sleeving GW-OCTM009 which contains a wild type DNA standard and a ctDNA standard with mutation frequency of 0.1%, wherein the wild type DNA standard is as follows: ctDNA standard with mutation frequency of 0.1% ═ 7: 3, 30ng of DNA sample with mutation frequency of 0.03%, using the same probe library as in example 1, i.e., Rui Fa 4 gene panel, and using amplicon library-building kit provided by Nanjing Kingsrie Biotech, Inc., performing amplicon library-building according to standard operation procedures to obtain library capable of on-machine sequencing.

Sequencing on machine

The library products prepared in example 1, comparative example 1 and comparative example 2 were taken, the concentration of the library products was measured by using the Qubit4.0, and 20ng of the library products were taken and machine-sequenced. The model of the sequencing instrument is Illumina HiSeq 4000, the sequencing strategy is PE150, and the sample data volume is1 Gb.

Sequencing data quality control and analysis process

Raw data was processed using fastp software, genome Alignment using BWA software (i.e. Burrows-Wheeler-Alignment Tool, algorithm BWA-MEM), reference genome using GRCh38 (also known as hg38, international universal human reference genome sequence) and labeling using sambamba software (markdup).

The analytical results were as follows:

the sequencing results obtained for the library of example 1 are a collection of 10 index resolved reads (reads) as specified in the following table:

watch 10

index No.	Number of reads	Ratio of
			1	666670	9.99％
2	668281	10.01％
			3	666578	9.99％
4	666694	9.99％
			5	666646	9.99％
6	666765	9.99％
			7	666473	9.98％
8	666657	9.99％
			9	669152	10.02％
10	666349	9.98％
			Number of reads that cannot be listed in index	4673	0.07％
Total reads number	6674938	100.00％

As can be seen from the above table, the distribution preference of the reads among the indexes is low (the reads split by each index are close), and the reads which cannot be listed in the index account for seven ten-thousandth of the total reads, which indicates that the P7 end-labeled indexing amplification system used in example 1 can accurately perform mixed target gene library construction and sequencing on a plurality of samples.

The mutation detection results are given in the following table:

TABLE 11

In table 11, raw base refers to the raw data amount.

The GC content is the ratio of Guanine (Guanine) to Cytosine (Cytosine).

Q30 represents the ratio of reads to total reads for 99.9% accuracy.

depth in target refers to the sequencing depth of the target site.

ref _ reads represents the corresponding number of reads on the human reference genome.

alt reads represent the number of reads of the mutation (variant).

MAF (mutation Allole frequency) is the frequency of the abrupt change, specifically the ratio of alt reads to ref _ reads.

As can be seen from the above table, the quality of the sequencing data of the library constructed in example 1 is higher than the sequencing results of the libraries constructed in the other two prior arts, specifically, the proportion of Q30 is higher, and Q30 represents the proportion of reads with the accuracy of 99.9% to the total reads; and the frequency of the target gene mutation detected based on the library of example 1 is closer to the true value, i.e., MAF (mutation Allle frequency) is closer to the preset value of three ten-thousandths. Therefore, the library construction method of example 1 has better performance and takes shorter time when sequencing detection is carried out on specific target genes of complex genomes such as human, the hybrid capture library construction of comparative example 1 needs 72-80 hours, the amplicon library construction of comparative example 2 needs 24-32 hours, the library construction method of example 1 only needs 10 hours, and the steps needed by example 1 are few, and various reagents and consumables are few, so the cost is lower. In conclusion, the library construction method of the embodiment 1 has wider application prospect in clinical detection, medical research and genome science research.

Example 2

This example prepares five ten thousandth tumor mutation standard products with mutation frequency from DNA extracted from formalin-fixed and paraffin-embedded (FFPE) tissue standard products (purchased from qian genetic science and technology (shenzhen) limited, specifically, tumor wild-type FFPE standard products and tumor SNV 5% FFPE standard products), takes three equal parts, each 30ng, the DNA standard products are used in three independent library construction experiments, the method and technique, hybrid capture library construction method and amplicon library construction method of this example are respectively used as library preparation methods for sequencing libraries, and the target gene regions designed by the three groups of experiments are basically consistent, then machine-sequences on two identical high-throughput sequencing platforms, and sequence the same data volume, finally, the same data analysis process is used, the same 7 target gene mutation sites are checked (the 7 sites are distributed in the exon regions of 4 genes, the 4 genes are respectively NRAS, KRAS, PIK3CA and EGFR, which are the detection contents of the Rui-method 4 gene panel), so as to evaluate the performance difference of the three high-throughput sequencing target gene library construction technologies (the embodiment, hybrid capture library construction and amplicon library construction).

The DNA standard is purchased from Jingzhen Genetica technology (Shenzhen) Limited, in particular to a tumor wild type FFPE standard (mutation frequency is 0, cargo number GW-OPSM005) and a tumor SNV 5% FFPE standard (cargo number GW-OPSM 003).

The FFPE standard substance is extracted by adopting a magnetic bead method paraffin-embedded tissue DNA extraction kit (product number: D6323-02B) of Guangzhou Meiji biological science and technology Limited.

FFPE Total DNA Fragmentation (i.e., total DNA Fragmentation of long fragments of 10kb or more into short fragments of 200-500 bp) cleavage was performed using the KAPA fragment Kit for Enzymatic Fragmentation (cat No. KK8600) manufactured by KAPA Biosystem, USA.

The target detection sites are shown in the table below.

TABLE 12

All oligomers (oligos) required are shown in tables 13 and 14 below (synthesized by tsingtaury biotechnology limited, tokyo, HPLC purification).

Watch 13

TABLE 14

The reagents and apparatus were the same as in example 1.

The experimental procedure was as follows:

1. total DNA extraction was carried out on a tumor wild type FFPE standard (purchased from Jingchen Gene science and technology (Shenzhen) Limited, with mutation frequency of 0, Cat No. GW-OPSM005) and a tumor SNV 5% FFPE standard (purchased from Jingchen Gene science and technology (Shenzhen) Limited, with Cat No. GW-OPSM003) by using a magnetic bead method paraffin-embedded tissue DNA extraction kit (Cat No. D6323-02B) purchased from Guangzhou Meiji Bioscience and technology Limited, according to the standard operation procedure of the kit, and finally DNA extracts were obtained by elution in a volume of 50. mu.l.

2. The concentration is determined by using the Qubit4. the FFPE DNA concentration of the wild type and the 5% SNV is respectively 15.54 ng/muL and 14.78 ng/muL, the total amount is respectively 777ng and 739ng, the FFPE standard DNA 297ng of the tumor wild type and the FFPE standard DNA 3ng of the tumor SNV 5% are mixed (namely mixed according to the mass ratio of 99 to 1), 300ng of FFPE DNA sample with the mutation frequency of 0.05% is formed, and the mixture is fully mixed by vortex.

3. 30ng of the product from the previous step was taken and put into a 200. mu.l PCR tube, and cleavage was performed using KAPA fragment Kit for Enzymatic Fragmentation, manufactured by KAPA Biosystem, USA.

4. The product of the previous step (still placed in the original 200. mu.l PCR tube) was denatured in a PCR instrument at 95 ℃ for 2 minutes.

5. Preparation of second sequencing linker

5.1 the following reaction system was placed in a 200. mu.L PCR tube:

watch 15

5.2 annealing reaction conditions: 95 ℃ for 10 seconds; RAMP 4% was added and the temperature was reduced to 14 ℃ at a rate of 0.1 ℃/s.

5.3 Add 50. mu.L of TE buffer to the reaction product (50. mu.L) and the final concentration of the second sequencing linker was 100. mu.M.

6. The target gene probes with the first sequencing adaptors at the 5 'ends in table 14 were mixed in equimolar amounts to give a mixture with a final concentration of 200 μ M of the target gene probe with the first sequencing adaptor at each 5' end.

7. Annealing and extending the target gene probe mixture with the first sequencing joint at the 5' end, wherein the annealing and extending are as follows:

for each single-tube reaction, the number of target gene sites to be detected can be from 1 to 1 ten thousand, and each site corresponds to a primer with a specific target gene binding region, so that at most 1 ten thousand probes can be mixed for each single-tube reaction. The number of target detection sites in this example was 7, as shown in table 12.

The following reaction system was arranged in a 200. mu.l PCR tube:

TABLE 16

TABLE 17

8. The following reaction system was arranged in a 200. mu.l PCR tube:

watch 18

Components	Volume (μ L)
		Target gene probe extension product	19
2×Rapid Ligation Buffer	25
		Second sequencing Joint (100. mu.M)	2
T4 DNA Ligase(Rapid)(600U/μL)	4
		Total volume	50

9. A reaction system (illumina inducing PCR) shown in Table 8 was directly arranged in a 200. mu.l PCR tube in which the second adaptor was ligated with the reaction product.

A100. mu.l reaction system was prepared as shown in Table 8 and used for PCR under the conditions shown in Table 9.

Comparative example 3

This comparative example provides a hybridization capture control experiment.

Tumor wild type FFPE standards and tumor SNV 5% FFPE standards purchased from qian elite gene technology (shenzhen) limited were obtained according to the tumor wild type FFPE standards: tumor SNV 5% FFPE standard 99: 1, 30ng of DNA sample with mutation frequency of 0.05%, using the same probe library as in example 2, i.e., Rui Fa 4 gene panel (without IS1-UMI-EGFR V769_ D770 insASV-1, IS1-UMI-EGFR V769_ D770 insASV-2), using a library construction and hybrid capture kit purchased from Nanjing Kingnshire Biotech Co., Ltd. for library construction according to a standard procedure, including amplification before capture, hybrid capture, amplification after capture, and sequencing.

Comparative example 4

Tumor wild type FFPE standards and tumor SNV 5% FFPE standards purchased from qian elite gene technology (shenzhen) limited were obtained according to the tumor wild type FFPE standards: tumor SNV 5% FFPE standard 99: 1, 30ng of DNA sample with mutation frequency of 0.05%, using the same probe library as in example 2, i.e., Rui Fa 4 gene panel (without IS1-UMI-EGFR V769_ D770 insASV-1, IS1-UMI-EGFR V769_ D770 insASV-2), using amplicon banking kit purchased from Osrui Biotechnology, Inc., amplicon banking according to standard procedures, and sequencing.

Sequencing on machine

The library products prepared in example 2, comparative example 3 and comparative example 4 were taken, the concentration of the library products was measured by using Qubit4.0, and 20ng of the library products were taken and machine-sequenced. The model of the sequencing instrument is Illumina HiSeq 4000, the sequencing strategy is PE150, and the sample data volume is1 Gb.

Sequencing data quality control and analysis process

The analytical results were as follows:

the sequencing results obtained for the library of example 2 are a collection of 10 index resolved reads (reads) as specified in the following table:

watch 19

As can be seen from the above table, the distribution preference of the reads among the indexes is low (the reads split by each index are close), and the reads which cannot be listed in the index account for seven points in ten-thousandth of the total reads, which indicates that the P7 end sample-labeled indexing amplification system used in example 2 can accurately perform mixed target gene library construction and sequencing on a plurality of samples.

The mutation detection results are given in the following table:

watch 20

As can be seen from the above table, the quality of the sequencing data of the library constructed in example 2 is higher compared with the sequencing results of the other two prior arts, specifically, the proportion of Q30 is higher, and Q30 represents the proportion of reads with the accuracy of 99.9% to the total reads; and the frequency of the target gene mutation detected based on the library of example 2 is closer to the true value, i.e., MAF (mutation Allle frequency) is closer to the preset value of five ten-thousandths. Therefore, the library construction method of example 2 has better performance and takes shorter time when sequencing detection is carried out on specific target genes of complex genomes such as human, the hybrid capture library construction of comparative example 3 needs 72-80 hours, the amplicon library construction of comparative example 4 needs 24-32 hours, the library construction method of example 2 only needs 10 hours, and the steps needed by example 2 are few, and various reagents and consumables are few, so the cost is lower. In conclusion, the library construction method of the embodiment 2 has wider application prospect in clinical detection, medical research and genome science research.

In some embodiments, the invention is applicable to a wide variety of DNA samples that are not amenable to severe degradation and trace amounts of these conventional NGS techniques, and does not require sample length and does not require interruption of long fragments of intact genomic DNA.

In some embodiments, the present invention can integrate the steps separated in the existing library building technology, has a short flow, only needs about 5 hours, and is easy to operate.

In some embodiments, the present invention employs self-contained reagents that can be completely eliminated from the reagent inlet.

In some embodiments, the pooling starting template is in single stranded form, which is suitable for severe degradation and micro-scale samples.

In some embodiments, based on linear amplification and direct molecular labeling on target gene molecules, errors and preferences caused by exponential amplification are reduced, and quantitative detection, ultralow frequency mutation detection and genome structural variation detection, such as gene copy number variation, fusion gene and virus insertion sequences, and the like, can be realized.

The existing NGS library building needs to break a DNA sample to a length range of 200-500bp to adapt to the actual reading length of NGS sequencing (the most common sequencing mode at present is PE150, namely bidirectional sequencing, and each 150bp is long), and complex library fragment length screening is carried out in the library building process (a two-step phrase screening method is generally adopted).

In some embodiments, the initial amount can be as low as 0.1ng, and ultramicro DNA banking is really realized.

In some embodiments, the present invention is automatically compatible with RNA samples after reverse transcription into cDNA, and does not require duplex synthesis, saving material and time, and avoiding a series of errors and preferences associated with random primers in existing duplex synthesis processes.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

<110> organization Name Shenzhen Rui method Biotech limited

Application Project

-------------------

<120> Title, single molecular target gene library building method and kit thereof

<130> AppFileReference : 20I30442

<140> CurrentAppNumber :

<141> CurrentFilingDate : ____-__-__

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

agatcggaag agcacacgtc tgaactccag tcac 34

<212> Type : DNA

<211> Length : 34

SequenceName : 1

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

aagtgactgg agttcagacg tgtgctcttc cgatctnnnn nnn 43

<212> Type : DNA

<211> Length : 43

SequenceName : 2

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agattgatag gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 3

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agattatacg gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 4

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatcgatca gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 5

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatatacac gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 6

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatatagcg gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 7

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agattgttca gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 8

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatagatac gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 9

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agattagctg gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 10

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatgtatgt gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 11

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatggctca gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 12

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agatcatgct gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 13

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

caagcagaag acggcatacg agattcatcg gtgactggag ttcagacgtg tgctcttccg 60

<212> Type : DNA

<211> Length : 60

SequenceName : 14

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct t 51

<212> Type : DNA

<211> Length : 51

SequenceName : 15

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nntgatttgt agtggagaag 60

ga 62

<212> Type : DNA

<211> Length : 62

SequenceName : 16

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nntggcctgg cttgcttacc 60

tt 62

<212> Type : DNA

<211> Length : 62

SequenceName : 17

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nngcatctgc ctcacctcca 60

cc 62

<212> Type : DNA

<211> Length : 62

SequenceName : 18

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nntccaggag gcagccgaag 60

gg 62

<212> Type : DNA

<211> Length : 62

SequenceName : 19

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nnggaaactg aattcaaaaa 60

ga 62

<212> Type : DNA

<211> Length : 62

SequenceName : 20

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nngaccttac cttatacacc 60

gt 62

<212> Type : DNA

<211> Length : 62

SequenceName : 21

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nngaaataaa tacagatctg 60

tt 62

<212> Type : DNA

<211> Length : 62

SequenceName : 22

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nnaaaaggaa ttccataact 60

tc 62

<212> Type : DNA

<211> Length : 62

SequenceName : 23

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nngacgatac agctaattca 60

ga 62

<212> Type : DNA

<211> Length : 62

SequenceName : 24

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nnacaagttt atattcagtc 60

at 62

<212> Type : DNA

<211> Length : 62

SequenceName : 25

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nntgagagac caatacatga 60

gg 62

<212> Type : DNA

<211> Length : 62

SequenceName : 26

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nntatgtcca acaaacaggt 60

tt 62

<212> Type : DNA

<211> Length : 62

SequenceName : 27

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nnagaaggtg agaaagttaa 60

aa 62

<212> Type : DNA

<211> Length : 62

SequenceName : 28

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nntcacatcg aggatttcct 60

tg 62

<212> Type : DNA

<211> Length : 62

SequenceName : 29

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nnccctccct ccaggaagcc 60

ta 62

<212> Type : DNA

<211> Length : 62

SequenceName : 30

SequenceDescription :

Sequence

--------

<213> OrganismName : Artificial Sequence

<400> PreSequenceString :

acactctttc cctacacgac gctcttccga tctnnnnnnn nnaggcagat gcccagcagg 60

cg 62

<212> Type : DNA

<211> Length : 62

SequenceName : 31

SequenceDescription :

Claims

1. A method for single molecule target gene library construction, comprising:

2. The single molecule target gene library construction method of claim 1, wherein the template molecule is single stranded DNA;

and/or the template molecule is selected from at least one of single-stranded DNA obtained after the melting treatment of the double-stranded DNA and cDNA obtained by reverse transcription of RNA;

and/or the template molecule is derived from at least one of bisulfite treated DNA, DNA from Formalin Fixed and Paraffin Embedded (FFPE) tissue, DNA extracted from forensic samples, free DNA contained in body fluids (cfdna), DNA extracted from ancient fossils or biological remains from archaeological excavation;

and/or the initial amount of the template molecule is from 0.1 to 1000 nanograms;

and/or, the extension product of the target probe is single-stranded DNA.

3. The method for single-molecule target gene library construction according to claim 1, wherein the 5' end of the forward strand of the second sequencing adaptor is modified with a phosphate group;

and/or a molecular tag is connected in series between the first sequencing joint and the target probe;

and/or the molecular label is 4-19nt in length.

4. The method for single molecule target gene library construction according to claim 1, wherein the random nucleotide sequence concatenated at the 3' end of the reverse strand of the second sequencing adaptor is 5-15nt in length.

5. The method of single molecule target gene banking according to claim 1 wherein the first primer contains a sequence that is complementarily pairable with the first sequencing adaptor and the second primer contains a sequence that is complementarily pairable with the second sequencing adaptor;

and/or, the first primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary pairable to the first sequencing adaptor;

and/or, the first primer contains or does not contain a first sample tag;

and/or, when the first primer comprises a first sample tag, the first sample tag is located between the inner adaptor sequence and the outer adaptor sequence;

and/or, the first sample tag is 4-15nt in length;

and/or, the second primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary pairable to the second sequencing adaptor;

and/or, the second primer contains or does not contain a second sample tag;

and/or, when the second primer contains a second sample tag, the second sample tag is positioned between the inner adaptor sequence and the outer adaptor sequence;

and/or the length of the second sample label is 4-15 nt;

and/or, in the extension step, the amplification cycle number of the extension reaction is more than or equal to 1;

and/or, in the extension step, the number of amplification cycles of the extension reaction is 5-500 cycles;

and/or, in the extension step, each cycle of the reaction is as follows: 94-98 ℃ for 10-60 seconds; 55-65 ℃ for 10-60 seconds; 68-72 ℃ for 10-60 seconds.

6. The method for single-molecule target gene library construction according to claim 1, wherein in the second sequencing linker ligation step, the ligation reaction is carried out at 22-40 ℃ for 0.5-2 hours;

and/or, in the second sequencing linker ligation step, the ligase used is T4 DNA ligase.

7. The single molecule target gene library construction method according to claim 1, wherein in the second sequencing linker ligation step, the melting process is a denaturation process;

and/or the denaturation treatment is thermal denaturation treatment;

and/or, the heat denaturation treatment is specifically heating the target molecule to at least 80 ℃ for at least 1 min;

and/or the first sequencing linker is selected from any one of a P5 terminal sequencing linker of an Illumina sequencing platform and a P2 terminal sequencing linker of an MGI sequencing platform;

and/or the second sequencing linker is selected from any one of a P7 terminal sequencing linker of an Illumina sequencing platform and a P1 terminal sequencing linker of an MGI sequencing platform.

8. The library constructed by the method of any one of claims 1 to 7.

9. A kit comprising a first sequencing adaptor and a second sequencing adaptor, wherein the first sequencing adaptor is connected in series with a target probe, the target probe can bind to a target region of a template molecule and extend for reaction, the second sequencing adaptor comprises a complementary pair of a forward strand and a reverse strand, the 5 ' end of the forward strand of the second sequencing adaptor can be connected in series to the 3 ' end of a target probe extension product, and the 3 ' end of the reverse strand is connected in series with a random nucleotide sequence.

10. The kit of claim 9, wherein the template molecule is single-stranded DNA;

and/or the template molecule is at least one of single-stranded DNA obtained after the melting treatment of the double-stranded DNA and cDNA obtained by reverse transcription of RNA;

and/or, the template molecule is selected from bisulfite treated DNA, various types of severely degraded DNA;

and/or, the types of severely degraded DNA include Formalin Fixed and Paraffin Embedded (FFPE) tissue DNA, forensic sample extracted DNA, free DNA contained in body fluids (cfdna);

and/or, the target probe extension product is single-stranded DNA;

and/or the 5' end of the forward strand of the second sequencing linker is modified with a phosphate group;

and/or the molecular label is 4-19nt in length;

and/or the length of the random nucleotide sequence connected in series with the 3' end of the reverse strand of the second sequencing linker is 5-15 nt;

and/or, the kit further comprises a first primer comprising a sequence that is complementarily pairable with the first sequencing adaptor, a second primer comprising a sequence that is complementarily pairable with the second sequencing adaptor;

and/or, the first primer contains or does not contain a first sample tag;

and/or, the first sample tag is 4-15nt in length;

and/or, the second primer contains or does not contain a second sample tag;

and/or the length of the second sample label is 4-15 nt;