Background
The Next Generation Sequencing (NGS) is widely used in many fields, such as noninvasive prenatal screening, neonatal genetic disease detection, tumor-associated diagnosis and postoperative monitoring, and pathogenic microorganism detection. High-throughput sequencing is divided into sequencing technology and database construction technology, the sequencing technology mainly comprises second-generation sequencing and third-generation sequencing, and the second-generation sequencing is the current absolute mainstream sequencing technology. The corresponding classical library construction techniques of these two sequencing platforms are very similar, and usually have the following steps: 1) DNA breaking; 2) End repair and phosphorylation; 3) Adding an "A" base at the end of the repaired DNA; 4) Connecting a joint; 5) Amplifying the library; 6) Library recovery and quantification. Although each of the above steps is important, it is also the library conversion (library conversion) rate, i.e., the ratio of the DNA molecules that are ligated at both ends with linkers, that is of most concern. Some of the industry-known import library building kits, such as KAPA Hyper Prep of KAPA Biosystems and Ultra II DNA library Prep of NEB, claim that the library conversion rate is 10% -50% (the lower the DNA input, the lower the conversion rate). Domestic NGS library-building product suppliers, such as the next saint organism and the Novozam organism, have long-term optimization, and the highest conversion rate of the library can reach about 60 percent. The classical library construction method is that an A base (commonly called A tail) is added at the 3 'end of DNA respectively, a T is added at the 5' end of a joint, and the T-A pairing connection mode has efficiency bottleneck, mainly because the A base is added by virtue of terminal transferase activity of Taq enzyme, and the A adding performance of the A base is only about 78% (doi: 10.1016/j.jmb.2006.10.008) and is hardly improved in a breakthrough manner.
In addition to the T-A linker ligation, there are several works reporting novel linker ligations, and there is known a blunt-ended ligation protocol (CN 106459968B) published by Swift Biosciences which does not require phosphorylation of DNA fragments (instead dephosphorylation) and addition of "A" bases, while the linker is made blunt-ended and dideoxy-modified at one end to prevent self-ligation of the linker. Although this approach avoids the step of adding "a" which seriously affects the efficiency, the subsequent ligation step requires multiple enzymes to achieve, which greatly increases the complexity of the operation and the time for building a library, which is not conducive to large-scale operation, and the pure blunt-end ligation method has low ligation efficiency. Another banking patent (CN 110248675A) from IDT also discloses a blunt end linker ligation scheme, which requires phosphorylation modification of the DNA fragment to be banked, and which employs a T4 DNA ligase mutant (K159S) in order to avoid self-ligation of the DNA fragment. This mutant enzyme cannot adenylate the substrate with ATP and can only ligate the pre-adenylated substrate. This feature can largely avoid the production of chimeras by linking the inserted DNA with a pre-adenylated blunt-ended linker, the complementary strand of which is blocked with a dideoxy modification. By the method, an 'A' base addition step which seriously influences the efficiency is avoided, and only a pre-adenylated linker can be connected, so that the conversion rate of the library can reach more than 90 percent at most, but the method is advantageous only when the input amount of the library building DNA is less than 10ng, and the library building efficiency of the method is no two compared with that of the conventional method when the input amount of the library building DNA is more than 10 ng. Moreover, base pre-adenylation and dideoxy modification are expensive, severely limiting the commercialization of this approach.
Disclosure of Invention
The invention aims to provide a library construction method for improving the conversion rate of a library, which is characterized by comprising the following steps of:
(1) Fragmenting a DNA sample, and flattening the tail end of the fragmented DNA;
(2) Purifying the product, adding terminal transferase TdT and ribonucleotide, and adding base at the 3' end of the DNA fragment;
(3) Purifying the product, and incubating the purified product with a linker;
(4) Adding DNA polymerase, DNA ligase and dNTP, and performing notch translation and joint connection;
(5) And purifying the product, and performing library amplification to obtain a DNA library.
Preferably, the linker sequence is a stem loop structure, with 2-4 identical dntps provided 3' of the linker, and no barcode sequence.
Preferably, the sequence of the linker is as shown in table 1.
Preferably, the ribonucleotide is any one of rATP, rUTP, rGTP and rCTP in the step (2), and is complementary to dNTP 3' of the linker.
Preferably, the ribonucleotide in step (2) is rGTP.
Preferably, the enzyme for performing end blunt in step (1) is T4 DNA polymerase or Klenow DNA polymerase or a combination of both.
Preferably, the DNA polymerase in step (4) is DNA polymerase I or Taq DNA polymerase.
Preferably, the DNA ligase in step (4) is e.
TdT-terminal transferase has been reported mainly for the base addition at the 3' end of single-stranded DNA. The invention uses TdT for the base adding of the double-stranded DNA end for the first time and is used for the double-stranded DNA joint connection.
The current library construction technique, in which DNA is ligated to a linker in either blunt end or single base pairing. The invention uses terminal transferase TdT, takes ribonucleotide as a reaction substrate, inserts 2-4 bases at the tail end of a DNA fragment with a flat tail end, and designs a joint of a stem-loop structure matched with the terminal transferase TdT for connection. The number of the base used for pairing is 2-4, and the base pairing efficiency is greatly improved. Meanwhile, the sticky structures (base protruding from the 3' end) at both ends of the insert are added by terminal transferase TdT, which is much more efficient than the traditional Taq DNA polymerase in adding "A" base. In addition, the joint does not need special modification, and the library DNA does not need phosphorylation treatment, thereby effectively reducing the cost. When the library construction kit prepared by the method is used for constructing the library of free DNA (cell-free DNA, cfDNA) and formaldehyde fixed paraffin embedded DNA (FFPE DNA) samples, the joint connection efficiency is obviously improved, and the library complexity, the coverage and the sensitivity to low-frequency variants are greatly improved.
Comparison of the library construction method of the present invention with the commercially available imported reagent KAPA Hyper DNA library construction kit (KAPA Biosystems) for comparison of library conversion, FFPE and cfDNA deep capture sequencing effects. The result shows that the library conversion rate indexes of DNA with different input amounts are all superior to those of imported reagents. Meanwhile, the high-depth capture sequencing experimental result also shows that the method can obtain higher capture efficiency and higher effective sequencing depth.
Detailed Description
The following describes the specific implementation process of the present invention with reference to the flow chart drawings:
TABLE 1 oligonucleotide sequences used in the examples of this patent
* i5 The Index and i7 Index sequences may be referenced to official barcode sequences of the Illumina sequencing platform.
Example 1: the invention establishes the library establishing process and tests the reaction conditions
In this example, we constructed the library construction process disclosed in the present invention (fig. 1), and compared the conversion rate differences of different linker types (fig. 4), and the specific operation steps are as follows:
DNA fragmentation
Human genomic DNA standard NA12878 was purchased from Coriell corporation and DNA was fragmented to a size of 200bp using a Covaris ultrasonic fragmenter.
2. End repair and rG base addition
The method comprises the following steps of carrying out end repair on DNA broken by ultrasound, and adding a plurality of rG bases, wherein the specific operations are as follows:
1) The reaction system was prepared as follows:
| components
|
Amount of the composition
|
| 10×NEB Buffer 2(#M0203,NEB)
|
5μl
|
| T4 DNA polymerase(#M0203,NEB)
|
2μl
|
| Fragmented DNA
|
100ng
|
| Sterile water
|
Make up to 50 μ l |
After 10 minutes of reaction at 20 ℃, the product was purified from DNA using 1.8 × AMPure XP magnetic beads (Beckman) (see product description for specific procedures), and the final eluted DNA was made to 20 μ l.
2) The following components are added into the reaction liquid in the previous step:
| components
|
Dosage of
|
| The DNA obtained in the previous step is purified
|
20μl
|
| 10 XTdT reaction buffer (# M0315, NEB)
|
5μl
|
| Terminal transferase TdT (# M0315, NEB)
|
2μl
|
| rGTP or rCTP or rTTP or rATP (50. Mu.M).)
|
1μl
|
| Sterile water
|
Make up to 50 μ l |
* Note: the selection of rGTP or rCTP or rTTP or rATP depends on the linker selected in the next step of linker attachment, if the 3' end of the selected linker is ' rG ' base, then rCTP is selected in the step; if the 3' end of the linker is selected to be "rC" base, then rGTP is selected in this step, and so on.
The reaction is carried out at 37 ℃ for 10 minutes, the reaction is carried out at 75 ℃ for 10 minutes for inactivation, the product is subjected to DNA purification by using 1.8 multiplied by AMPure XP magnetic beads (Beckman) (the specific operation steps refer to the product specification), and finally the volume of the eluted DNA is up to 20 mu l.
3. Joint connection
1) The joint is incubated with the DNA to be banked added with the base of ' rG ' or ' rC ' or ' rT ' or ' rA
| Components
|
Dosage of
|
| The reaction product of the last step
|
20μl
|
| Joint (10. Mu.M)
|
5μl |
* Note: linker selection the above experiments were repeated three times for each linker according to the overall linker sequence in table 1.
Incubate at 25 ℃ for 5 minutes.
2) Gap translation and splice connection
Reacting at 25 deg.C for 15 min, reacting at 75 deg.C for 10 min to inactivate, and adding 3 μ l USER TM The enzyme (M5505, NEB) was reacted at 37 ℃ for 15 minutes. The product was purified using 1.8 × AMPure XP magnetic beads (Beckman) (see product description for specific procedures), and the final eluted DNA volume was made to 20 μ l
4. Evaluation of library conversion
The purified adaptor ligation product can be subjected to accurate library quantification by using a qPCR method library quantification kit (# 12302) of the Saint Netherlands organism or a library quantification kit of the same type, and the specific operation flow is described by referring to an instruction book. The initial DNA used for the library construction was 100ng (length 200 bp), converted to a molar mass of 0.76pmol. The library conversion was obtained by quantitatively determining the molar mass (pmol) of the library and dividing the value by 0.76.
5. Library amplification
And (3) amplifying the library ligation products obtained in the step 3 by using KAPA HiFi high-fidelity DNA amplification mix (the amplification cycle number is according to the recommended product parameters), wherein i5 amplification primers and i7 amplification primers in the table 1 are respectively used as primers, DNA purification is carried out on the products by using 1.8 multiplied by AMPure XP magnetic beads (Beckman) (the product specification is referred to in specific operation steps), and finally the volume of the eluted DNA is fixed to 20 mu l.
6. Library quantitation and hybrid Capture
Library amplification products Using Qubit TM The concentration determination is performed by a dsDNA library quantification kit (# Q32854, thermofoisher), and subsequent capture experiments can be performed after meeting the minimum hybrid capture requirement. Selecting Pan tumor Panel (xGen Pan-Cancer Panel v 1.5) of IDT company to perform capture experiment, purifying the final product with 1.8 × AMPure XP magnetic bead (Beckman), and using Qubit again TM After the concentration of the dsDNA library quantitative kit is determined, the dsDNA library quantitative kit is sent to a Norway grass-derived sequencing service company for on-machine sequencing.
The results of the experiments showed that the library transformation efficiencies of the Adapter-C3 and Adapter-G3 adapters (i.e., the addition of "rG" or "rC" bases at the 3' end of the library DNA) were the highest among the various combinations of neck-loop adapters, 78.5% and 77.7%, respectively (FIGS. 4 and 5).
Example 2: comparison of the patented method with the conversion of the imported reagent library
The results of this example demonstrate that higher library conversion rates can be achieved using the library construction method of this patent as compared to using KAPA Hyper DNA library construction reagents. As described in example 1, the present method was used with KAPA Hyper DNA banking kit (KAPA Biosystems) to bank 1ng, 10ng, 100ng, 250ng and 500ng of ultrasonically fragmented NA12878 gDNA, and qpCR library quantification was performed at the step of linker ligation to calculate the banking efficiency at various DNA input. The result shows that the library conversion rate of the method is 2-5 times higher than that of the KAPA kit, and the lower the input amount of the library construction DNA is, the more obvious the advantages are, which indicates that the method is more suitable for trace library construction occasions.
Example 3: real plasma free DNA (cfDNA) sample library construction and deep sequencing
The results of this example demonstrate that there is a significant increase in the depth of coverage obtained using the method of this patent to construct cfDNA libraries and capture sequencing compared to the average depth of coverage of libraries obtained using KAPA Hyper DNA banking reagents. Real cfDNA from free DNA extracted from human plasma, which is given by the customer, is more complex in structure and can more truly reflect the level of the pooling reagents, unlike the mock obtained by breaking leukocyte gDNA. 1ng and 10ng cfDNA were pooled using the method of the invention and KAPA Hyper DNA pooling kit (KAPA Biosystems) and capture sequencing was performed with the Pan-tumor (xGen Pan-Cancer Panel v 1.5) of IDT corporation, all experiments were performed in triplicate techniques, as described in example 1. Compared with the KAPA method, the method of the invention uses real cfDNA to build a library, and the average coverage depth is about 2.5 times higher.
Example 4: real FFPE DNA sample library construction and deep sequencing
This example demonstrates a significant increase in depth of coverage using the method of this patent to construct FFPE DNA libraries of varying quality and capture sequencing compared to the average depth of coverage of libraries obtained using KAPA Hyper DNA pooling reagents. FFPE samples were given from the customer and the sample quality was graded as mild degradation, moderate degradation and severe degradation. As described in example 1, 100ng of three germplasm FFPE DNA was fragmented to around 200bp using a Covaris disrupter, pooled separately as starting material with the method of the invention and the KAPA Hyper DNA pooling kit (KAPA Biosystems), and captured and sequenced using the IDT Pan tumor (xGen Pan-Cancer Panel v 1.5), all experiments were performed in triplicate. Compared with the KAPA method, the FFPE DNA library is established by using different quality measures, and the average coverage depth of the method is respectively 2 times, 1.6 times and 1.3 times higher.
Example 5: FFPE DNA standard library construction sequencing and mutation analysis
The patent method and the KAPA Hyper DNA library construction method are used for respectively carrying out library construction and capture sequencing on FFPE DNA standard products with known mutation frequency and mutation sites, and counting the types and mutation frequencies of the mutation sites. FFPE standards Quantitative Multiplex Reference Standard gDNA (# HD 701) and Structural Multiplex Reference Standard gDNA (# HD 753) were purchased from Horizon Discovery, inc. The two standards respectively have fixed single base mutation (SNV), insertion deletion (InDel) and Copy Number Variation (CNV) frequencies at the genome level, and are very suitable for evaluating the performance of a library building kit. Following the protocol described in example 1, 100ng of two of the above FFPE DNA standards were disrupted to approximately 200bp using Covaris, and as starting material were separately pooled with the KAPA Hyper DNA pooling kit (KAPA Biosystems) using the method of the present invention and capture sequencing was performed using Pan-tumor (xGen Pan-Cancer Panel v 1.5) from IDT corporation, and all experiments were performed in triplicate. The results obtained show that the mutation frequency detected by the method of the present patent is closer to the expected value (tables 2 and 3).
TABLE 2 mutation detection results of two library construction methods on FFPE DNA Standard (# HD 701)
TABLE 3 mutation detection results of two library construction methods on FFPE DNA Standard (# HD 753)
The library construction procedure disclosed in the present invention is illustrated in FIG. 1, and a linker with a neck ring structure is selected, wherein 3 bases are protruded from the 3 'end to perform complementary pairing with 3' protruded bases on the insert. According to literature reports, terminal transferase TdT also has strong ability to add ntp at DNA ends, and when the concentration of rGTP in the reaction reaches 5 μ M, the addition efficiency can reach 98.6%, although there are possibilities of adding 2, 3 and 4 rgtps, and the efficiency is different (fig. 2). In some embodiments, different types of stem-loop structure linkers are specifically designed, taking into account the variation in linker matching patterns that may result from the addition of different numbers of rgtps (fig. 3), and the differences in library conversion that may result from the addition of other rntps (fig. 4). The quantitative characterization of the library conversion of the above linker by means of PCR-free library construction using qPCR library showed that Adapter-C3 had the best effect, i.e., the highest library conversion was obtained by adding rGTP at the 3 'end of the insert and a combination of 3 dCTPs at the 3' end of the linker.
The embodiments of the present invention are further described with reference to the drawings and the embodiments, and it should not be construed that the embodiments of the present invention are limited to the description. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.